Stefan Richter [Sun, 26 Oct 2008 11:03:37 +0000 (12:03 +0100)]
ieee1394: raw1394: fix possible deadlock in multithreaded clients
Regression in 2.6.28-rc1: When I added the new state_mutex which
prevents corruption of raw1394's internal state when accessed by
multithreaded client applications, the following possible though
highly unlikely deadlock slipped in:
The simplest fix is to use mutex_trylock() instead of mutex_lock() in
raw1394_mmap(). This changes the behavior under contention in a way
which is visible to userspace clients. However, since multithreaded
access was entirely buggy before state_mutex was added and libraw1394's
documentation advised application programmers to use a handle only in a
single thread, this change in behaviour should not be an issue in
practice at all.
Since we have to use mutex_trylock() in raw1394_mmap() regardless
whether /dev/raw1394 was opened with O_NONBLOCK or not, we now use
mutex_trylock() unconditionally everywhere for state_mutex, just to have
consistent behavior.
Reported-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Jarek Poplawski [Fri, 31 Oct 2008 07:47:01 +0000 (00:47 -0700)]
pkt_sched: Add peek emulation for non-work-conserving qdiscs.
This patch adds qdisc_peek_dequeued() wrapper to emulate peek method
with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until
dequeuing. This is mainly for compatibility reasons not to break some
strange configs because peeking is expected for non-work-conserving
parent qdiscs to query work-conserving child qdiscs.
This implementation requires using qdisc_dequeue_peeked() wrapper
instead of directly calling qdisc->dequeue() for all qdiscs ever
querried with qdisc->ops->peek() or qdisc_peek_dequeued().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Fri, 31 Oct 2008 07:46:19 +0000 (00:46 -0700)]
pkt_sched: Use qdisc->ops->peek() instead of ->dequeue() & ->requeue()
Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() pair.
After this patch the only remaining user of qdisc->ops->requeue() is
netem_enqueue(). Based on ideas of Herbert Xu, Patrick McHardy and
David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 31 Oct 2008 07:44:18 +0000 (00:44 -0700)]
pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs.
From: Patrick McHardy <kaber@trash.net>
Just as a demonstration how easy adding a peek operation to the
work-conserving qdiscs actually is. It doesn't need to keep or change
any internal state in many cases thanks to the guarantee that the
packet will either be dequeued or, if another packet arrives, the
upper qdisc will immediately ->peek again to reevaluate the state.
(This is only slightly modified Patrick's patch.)
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 31 Oct 2008 07:40:19 +0000 (00:40 -0700)]
bpa10x: free sk_buff with kfree_skb
Inspired by Sergio Luis' similar patches, I finally found
a case which is trivial enough that spatch won't choke
on it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
sh: Fix up the shared IRQ demuxer's control bit testing logic.
Correct the interrupt handler in sh4 serial device, return the correct
value and check for what is anabled in the SCSCR register. The sh7722 is
broken just sending a break using minicom.
Signed-off-by: Michael Trimarchi <trimarchimichael@yahoo.it> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Matt Fleming [Wed, 29 Oct 2008 07:16:02 +0000 (07:16 +0000)]
Define SCSPTR1 for SH 7751R
After the recent commit to kill off SCI/SCIF special casing SH 7751R
fails to compile with CONFIG_SH_RTS7751R2D set. This is because SCSPTR1
is undefined. Take the value for SCSPTR1 from the SH7751R Group Hardware
Manual.
Signed-off-by: Matt Fleming <mjf@gentoo.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
I noticed that, under certain conditions, ESRCH can be leaked from the
xfrm layer to user space through sys_connect. In particular, this seems
to happen reliably when the kernel fails to resolve a template either
because the AF_KEY receive buffer being used by racoon is full or
because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
state.
However, since this could be a transient issue it could be argued that
EAGAIN would be more appropriate. Besides this error code is not even
documented in the man page for sys_connect (as of man-pages 3.07).
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Harvey Harrison [Fri, 31 Oct 2008 07:01:22 +0000 (16:01 +0900)]
sh: use the new byteorder headers.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
David S. Miller [Fri, 31 Oct 2008 07:00:33 +0000 (00:00 -0700)]
net: Really remove all of LOOPBACK_TSO code.
As noticed by Saikiran Madugula, commit 7447ef63cf2dfdc444f4c72ae13f604350b2e25f
("loopback: Remove rest of LOOPBACK_TSO code.") got rid of
emulate_large_send_offload() but didn't get rid of the call
site as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Fri, 31 Oct 2008 06:55:44 +0000 (23:55 -0700)]
netfilter: nf_conntrack_proto_gre: switch to register_pernet_gen_subsys()
register_pernet_gen_device() can't be used is nf_conntrack_pptp module is
also used (compiled in or loaded).
Right now, proto_gre_net_exit() is called before nf_conntrack_pptp_net_exit().
The former shutdowns and frees GRE piece of netns, however the latter
absolutely needs it to flush keymap. Oops is inevitable.
Switch to shiny new register_pernet_gen_subsys() to get correct ordering in
netns ops list.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
netns ops which are registered with register_pernet_gen_device() are
shutdown strictly before those which are registered with
register_pernet_subsys(). Sometimes this leads to opposite (read: buggy)
shutdown ordering between two modules.
Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules
which aren't elite enough for entry in struct net, and which can't use
register_pernet_gen_device(). PPTP conntracking module is such one.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Fri, 31 Oct 2008 06:54:35 +0000 (23:54 -0700)]
net: delete excess kernel-doc notation
Remove excess kernel-doc function parameters from networking header
& driver files:
Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Wed, 27 Aug 2008 13:23:18 +0000 (15:23 +0200)]
libata: add whitelist for devices with known good pata-sata bridges
libata currently imposes a UDMA5 max transfer rate and 200 sector max
transfer size for SATA devices that sit behind a pata-sata bridge. Lots
of devices have known good bridges that don't need this limit applied.
The MTRON SSD disks are such devices. Transfer rates are increased by
20-30% with the restriction removed.
So add a "blacklist" entry for the MTRON devices, with a flag indicating
that the bridge is known good.
Tejun Heo [Tue, 21 Oct 2008 15:46:36 +0000 (00:46 +0900)]
sata_via: fix support for 5287
5287 used to be treated as vt6420 but it didn't work. It's new family
of controllers called vt8251 which hosts four SATA ports as M/S of the
two ATA ports. This configuration is rather peculiar in that although
the M/S devices are on the same port, each have its own SCR (or
equivalent link status/control) registers which screws up the
port-link-device hierarchy assumed by libata. Another controller
which falls into this category is ata_piix w/ SIDPR access.
libata now has facility to deal with this class of controllers named
slave_link. A low level driver for such controllers can just call
ata_slave_link_init() on the respective ports and libata will handle
all the difficult parts like following up with single SRST after
hardresetting both ports.
This patch creates new controller class vt8251, implements slave_link
aware init sequence and config space based SCR access for it and moves
5287 to the new class.
This patch is based on Joseph Chan's larger patch which was created
before slave_link was implemented in libata.
Roland Dreier [Tue, 28 Oct 2008 23:52:20 +0000 (16:52 -0700)]
libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127
In ata_tf_to_lba48(), when evaluating
(tf->hob_lbal & 0xff) << 24
the expression is promoted to signed int (since int can hold all values
of u8). However, if hob_lbal is 128 or more, then it is treated as a
negative signed value and sign-extended when promoted to u64 to | into
sectors, which leads to the MSB 32 bits of section getting set
incorrectly.
For example, Phillip O'Donnell <phillip.odonnell@gmail.com> reported
that a 1.5GB drive caused:
Randy Dunlap [Thu, 30 Oct 2008 05:35:08 +0000 (22:35 -0700)]
ATA: remove excess kernel-doc notation
Remove excess kernel-doc function parameter notation from drivers/ata/:
Warning(drivers/ata/libata-core.c:1622): Excess function parameter or struct member 'fn' description in 'ata_pio_queue_task'
Warning(drivers/ata/libata-core.c:4655): Excess function parameter or struct member 'err_mask' description in 'ata_qc_complete'
Warning(drivers/ata/ata_piix.c:751): Excess function parameter or struct member 'udma' description in 'do_pata_set_dmamode'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Carl Love [Wed, 29 Oct 2008 05:06:45 +0000 (05:06 +0000)]
powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function
The size of the pm_signal_local array should be equal to the
number of SPUs being configured in the array. Currently, the
array is of size 4 (NR_PHYS_CTRS) but being indexed by a for
loop from 0 to 7 (NUM_SPUS_PER_NODE). This could potentially
cause an oops or random memory corruption since the pm_signal_local
array is on the stack. This fixes it.
Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Kumar Gala [Tue, 28 Oct 2008 18:01:39 +0000 (18:01 +0000)]
powerpc/mpic: Fix regression caused by change of default IRQ affinity
The Freescale implementation of MPIC only allows a single CPU destination
for non-IPI interrupts. We add a flag to the mpic_init to distinquish
these variants of MPIC. We pull in the irq_choose_cpu from sparc64 to
select a single CPU as the destination of the interrupt.
This is to deal with the fact that the default smp affinity was
changed by commit 18404756765c713a0be4eb1082920c04822ce588 ("genirq:
Expose default irq affinity mask (take 3)") to be all CPUs.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark Nelson [Mon, 27 Oct 2008 20:38:08 +0000 (20:38 +0000)]
powerpc: Update remaining dma_mapping_ops to use map/unmap_page
After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
their map/unmap_single() functions but gained map/unmap_page(). This
caused a problem for Cell because Cell's dma_iommu_fixed_ops called
the dma_direct_ops if the fixed linear mapping was to be used or the
iommu ops if the dynamic window was to be used. So in order to fix
this problem we need to update the 64bit DMA code to use
map/unmap_page.
First, we update the generic IOMMU code so that iommu_map_single()
becomes iommu_map_page() and iommu_unmap_single() becomes
iommu_unmap_page(). Then we propagate these changes up through all
the callers of these two functions and in the process update all the
dma_mapping_ops so that they have map/unmap_page rahter than
map/unmap_single. We can do this because on 64bit there is no HIGHMEM
memory so map/unmap_page ends up performing exactly the same function
as map/unmap_single, just taking different arguments.
This has no affect on drivers because the dma_map_single_attrs() just
ends up calling the map_page() function of the appropriate
dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
unmap_page().
This fixes an oops on Cell blades, which oops on boot without this
because they call dma_direct_ops.map_single, which is NULL.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
A typo/thinko made us pass the wrong argument to __flush_hash_table_range
when unplugging bridges, thus not flushing all the translations for
the IO space on unplug. The third parameter to __flush_hash_table_range
is `end', not `size'.
This causes the hypervisor to refuse unplugging slots.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Nathan Fontenot [Mon, 27 Oct 2008 19:48:17 +0000 (19:48 +0000)]
powerpc/pci: Properly allocate bus resources for hotplug PHBs
Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.
Not having these resources allocated causes an oops when removing
the PHB when we try to release them.
The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Jeremy Kerr [Sun, 26 Oct 2008 21:51:25 +0000 (21:51 +0000)]
OF-device: Don't overwrite numa_node in device registration
Currently, the numa_node of OF-devices will be overwritten during
device_register, which simply sets the node to -1. On cell machines,
this means that devices can't find their IOMMU, which is referenced
through the device's numa node.
Set the numa node for OF devices with no parent, and use the
lower-level device_initialize and device_add functions, so that the
node is preserved.
We can remove the call to set_dev_node in of_device_alloc, as it
will be overwritten during register.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Michael Neuling [Thu, 23 Oct 2008 00:42:36 +0000 (00:42 +0000)]
powerpc: Fix swapcontext system for VSX + old ucontext size
Since VSX support was added, we now have two sizes of ucontext_t;
the older, smaller size without the extra VSX state, and the new
larger size with the extra VSX state. A program using the
sys_swapcontext system call and supplying smaller ucontext_t
structures will currently get an EINVAL error if the task has
used VSX (e.g. because of calling library code that uses VSX) and
the old_ctx argument is non-NULL (i.e. the program is asking for
its current context to be saved). Thus the program will start
getting EINVAL errors on calls that previously worked.
This commit changes this behaviour so that we don't send an EINVAL in
this case. It will now return the smaller context but the VSX MSR bit
will always be cleared to indicate that the ucontext_t doesn't include
the extra VSX state, even if the task has executed VSX instructions.
Both 32 and 64 bit cases are updated.
[paulus@samba.org - also fix some access_ok() and get_user() calls]
Thanks to Ben Herrenschmidt for noticing this problem.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Wed, 22 Oct 2008 18:43:45 +0000 (18:43 +0000)]
powerpc: Work around ld bug in older binutils
Commit 549e8152de8039506f69c677a4546e5427aa6ae7 ("powerpc: Make the
64-bit kernel as a position-independent executable") added lines to
vmlinux.lds.S to add the extra sections needed to implement a
relocatable kernel. However, those lines seem to trigger a bug in
older versions of GNU ld (such as 2.16.1) when building a
non-relocatable kernel. Since ld 2.16.1 is still a popular choice for
cross-toolchains, this adds an #ifdef to vmlinux.lds.S so the added
lines are only included when building a relocatable kernel.
Milton Miller [Thu, 23 Oct 2008 18:41:09 +0000 (18:41 +0000)]
powerpc/ppc64/kdump: Better flag for running relocatable
The __kdump_flag ABI is overly constraining for future development.
As of 2.6.27, the kernel entry point has 4 constraints: Offset 0 is
the starting point for the master (boot) cpu (entered with r3 pointing
to the device tree structure), offset 0x60 is code for the slave cpus
(entered with r3 set to their device tree physical id), offset 0x20 is
used by the iseries hypervisor, and secondary cpus must be well behaved
when the first 256 bytes are copied to address 0.
Placing the __kdump_flag at 0x18 is bad because:
- It was taking the last 8 bytes before the iseries hypervisor data.
- It was 8 bytes for a boolean flag
- It had no way of identifying that the flag was present
- It does leave any room for the master to add any additional code
before branching, which hurts debug.
- It will be unnecessarily hard for 32 bit code to be common (8 bytes)
Now that we have eliminated the use of __kdump_flag in favor of
the standard is_kdump_kernel(), this flag only controls run without
relocating the kernel to PHYSICAL_START (0), so rename it __run_at_load.
Move the flag to 0x5c, 1 word before the secondary cpu entry point at
0x60. Initialize it with "run0" to say it will run at 0 unless it is
set to 1. It only exists if we are relocatable.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Milton Miller [Wed, 22 Oct 2008 10:39:18 +0000 (10:39 +0000)]
powerpc: Kexec exit should not use magic numbers
Commit 54622f10a6aabb8bb2bdacf3dd070046f03dc246 ("powerpc: Support for
relocatable kdump kernel") added a magic flag value in a register to
tell purgatory that it should be a panic kernel. This part is wrong
and is reverted by this commit.
The kernel gets a list of memory blocks and a entry point from user space.
Its job is to copy the blocks into place and then branch to the designated
entry point (after turning "off" the mmu).
The user space tool inserts a trampoline, called purgatory, that runs
before the user supplied code. Its job is to establish the entry
environment for the new kernel or other application based on the contents
of memory. The purgatory code is compiled and embedded in the tool,
where it is later patched using the elf symbol table using elf symbols.
Since the tool knows it is creating a purgatory that will run after a
kernel crash, it should just patch purgatory (or the kernel directly)
if something needs to happen.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Trent Piepho [Fri, 31 Oct 2008 01:17:07 +0000 (18:17 -0700)]
gianfar: Don't reset TBI<->SerDes link if it's already up
The link may be up already via the chip's reset strapping, or though action
of U-Boot, or from the last time the interface was brought up. Resetting
the link causes it to go down for several seconds. This can significantly
increase the time from power-on to DHCP completion and a device being
accessible to the network.
Signed-off-by: Trent Piepho <tpiepho@freescale.com> Acked-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Trent Piepho [Fri, 31 Oct 2008 01:17:06 +0000 (18:17 -0700)]
gianfar: Fix race in TBI/SerDes configuration
The init_phy() function attaches to the PHY, then configures the
SerDes<->TBI link (in SGMII mode). The TBI is on the MDIO bus with the PHY
(sort of) and is accessed via the gianfar's MDIO registers, using the
functions gfar_local_mdio_read/write(), which don't do any locking.
The previously attached PHY will start a work-queue on a timer, and
probably an irq handler as well, which will talk to the PHY and thus use
the MDIO bus. This uses phy_read/write(), which have locking, but not
against the gfar_local_mdio versions.
The result is that PHY code will try to use the MDIO bus at the same time
as the SerDes setup code, corrupting the transfers.
Setting up the SerDes before attaching to the PHY will insure that there is
no race between the SerDes code and *our* PHY, but doesn't fix everything.
Typically the PHYs for all gianfar devices are on the same MDIO bus, which
is associated with the first gianfar device. This means that the first
gianfar's SerDes code could corrupt the MDIO transfers for a different
gianfar's PHY.
The lock used by phy_read/write() is contained in the mii_bus structure,
which is pointed to by the PHY. This is difficult to access from the
gianfar drivers, as there is no link between a gianfar device and the
mii_bus which shares the same MDIO registers. As far as the device layer
and drivers are concerned they are two unrelated devices (which happen to
share registers).
Generally all gianfar devices' PHYs will be on the bus associated with the
first gianfar. But this might not be the case, so simply locking the
gianfar's PHY's mii bus might not lock the mii bus that the SerDes setup
code is going to use.
We solve this by having the code that creates the gianfar platform device
look in the device tree for an mdio device that shares the gianfar's
registers. If one is found the ID of its platform device is saved in the
gianfar's platform data.
A new function in the gianfar mii code, gfar_get_miibus(), can use the bus
ID to search through the platform devices for a gianfar_mdio device with
the right ID. The platform device's driver data is the mii_bus structure,
which the SerDes setup code can use to lock the current bus.
Signed-off-by: Trent Piepho <tpiepho@freescale.com> CC: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Arjan van de Ven [Tue, 21 Oct 2008 04:42:39 +0000 (21:42 -0700)]
pci: use pci_ioremap_bar() in drivers/net
Use the newly introduced pci_ioremap_bar() function in drivers/net.
pci_ioremap_bar() just takes a pci device and a bar number, with the goal
of making it really hard to get wrong, while also having a central place
to stick sanity checks.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* Use the observation that it is sufficient to call pci_enable_wake()
once, unless it fails
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Frans Pop <elendil@planet.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Adrian Bunk [Wed, 29 Oct 2008 21:22:15 +0000 (14:22 -0700)]
The overdue eepro100 removal.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* Use device_set_wakeup_enable() and friends as needed
* Remove an open-coded reference to the standard PCI PM registers
* Use pci_prepare_to_sleep() and pci_back_from_sleep() in the
->suspend() and ->resume() callbacks
* Use the observation that it is sufficient to call pci_enable_wake()
once, unless it fails
Tested on Asus L5D (Yukon-Lite rev 7).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
David Howells [Fri, 31 Oct 2008 04:50:04 +0000 (15:50 +1100)]
CRED: Wrap task credential accesses in the XFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com>
This patch causes the VLAN tag to be stored in its proper location.
Tested-by: Ramon Casellas <ramon.casellas@cttc.es> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Andy Gospodarek [Fri, 31 Oct 2008 00:41:16 +0000 (17:41 -0700)]
bonding: fix panic when taking bond interface down before removing module
A panic was discovered with bonding when using mode 5 or 6 and trying to
remove the slaves from the bond after the interface was taken down.
When calling 'ifconfig bond0 down' the following happens:
As you might guess we panic when trying to access a few entries into the
table that no longer exists.
I experimented with several options (like moving the calls to
tlb_deinitialize somewhere else), but it really makes the most sense to
be part of the bond_close routine. It also didn't seem logical move
tlb_clear_slave around too much, so the simplest option seems to add a
check in tlb_clear_slave to make sure we haven't already wiped the
tx_hashtbl away before searching for all the non-existent hash-table
entries that used to point to the slave as the output interface.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Jay Vosburgh [Fri, 31 Oct 2008 00:41:15 +0000 (17:41 -0700)]
bonding: Clean up resource leaks
This patch reworks the resource free logic performed at the time
a bonding device is released. This (a) closes two resource leaks, one
for workqueues and one for multicast lists, and (b) improves commonality
of code between the "destroy one" and "destroy all" paths by performing
final free activity via destructor instead of explicitly (and differently)
in each path.
"Sean E. Millichamp" <sean@bruenor.org> reported the workqueue
leak, and included a different patch.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: fix irq vectors.
lguest: fix early_ioremap.
lguest: fix example launcher compile after moved asm-x86 dir.
Linus Torvalds [Fri, 31 Oct 2008 01:32:03 +0000 (18:32 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: remove sched-design.txt from 00-INDEX
sched: change sched_debug's mode to 0444
Linus Torvalds [Fri, 31 Oct 2008 01:31:42 +0000 (18:31 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: handle archs that do not support irqs_disabled_flags
Rusty Russell [Fri, 31 Oct 2008 16:24:27 +0000 (11:24 -0500)]
lguest: fix early_ioremap.
dmi_scan_machine breaks under lguest:
lguest: unhandled trap 14 at 0xc04edeae (0xffa00000)
This is because we use current_cr3 for the read_cr3() paravirt
function, and it isn't set until the first cr3 change. We got away
with it until this happened.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ingo Molnar [Thu, 30 Oct 2008 23:43:03 +0000 (00:43 +0100)]
x86: cpu_index build fix
fix:
arch/x86/kernel/cpu/common.c: In function 'early_identify_cpu':
arch/x86/kernel/cpu/common.c:553: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
James Bottomley [Thu, 30 Oct 2008 21:13:37 +0000 (16:13 -0500)]
x86/voyager: fix missing cpu_index initialisation
Impact: fix /proc/cpuinfo output on x86/Voyager
Ever since
| commit 92cb7612aee39642d109b8d935ad265e602c0563
| Author: Mike Travis <travis@sgi.com>
| Date: Fri Oct 19 20:35:04 2007 +0200
|
| x86: convert cpuinfo_x86 array to a per_cpu array
We've had an extra field in cpuinfo_x86 which is cpu_index.
Unfortunately, voyager has never initialised this, although the only
noticeable impact seems to be that /proc/cpuinfo shows all zeros for
the processor ids.
Anyway, fix this by initialising the boot CPU properly and setting the
index when the secondaries update.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| Author: Suresh Siddha <suresh.b.siddha@intel.com>
| Date: Tue Jul 29 10:29:19 2008 -0700
|
| x86, xsave: enable xsave/xrstor on cpus with xsave support
Which deliberately expose boot cpu dependence to pieces of the system,
I think it's time to explicitly have a variable for it to prevent this
continual misassumption that the boot CPU is zero.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 24 Oct 2008 13:42:59 +0000 (09:42 -0400)]
ftrace: handle archs that do not support irqs_disabled_flags
Impact: build fix on non-lockdep architectures
Some architectures do not support a way to read the irq flags that
is set from "local_irq_save(flags)" to determine if interrupts were
disabled or enabled. Ftrace uses this information to display to the user
if the trace occurred with interrupts enabled or disabled.
Besides the fact that those archs that do not support this will fail to
compile, unless they fix it, we do not want to have the trace simply
say interrupts were not disabled or they were enabled, without knowing
the real answer.
This patch adds a 'X' in the output to let the user know that the
architecture they are running on does not support a way for the tracer
to determine if interrupts were enabled or disabled. It also lets those
same archs compile with tracing enabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: fix /dev/mem mmap breakage when PAT is disabled
Impact: allow /dev/mem mmaps on non-PAT CPUs/platforms
Fix mmap to /dev/mem when CONFIG_X86_PAT is off and CONFIG_STRICT_DEVMEM is
off
mmap to /dev/mem on kernel memory has been failing since the
introduction of PAT (CONFIG_STRICT_DEVMEM=n case). Seems like
the check to avoid cache aliasing with PAT is kicking in even
when PAT is disabled. The bug seems to have crept in 2.6.26.
This patch makes sure that the mmap to regular
kernel memory succeeds if CONFIG_STRICT_DEVMEM=n and
PAT is disabled, and the checks to avoid cache aliasing
still happens if PAT is enabled.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Tested-by: Tim Sirianni <tim@scalemp.com> Cc: <stable@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
prefill_possible_mask() was hidden under CONFIG_HOTPLUG_CPU rendering
it invisitble to voyager. Since this commit it's exposed, but not
provided by the voyager subarch, so add a dummy stub to fix the link
breakage.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
James Bottomley [Thu, 30 Oct 2008 21:05:39 +0000 (16:05 -0500)]
x86: use CONFIG_X86_SMP instead of CONFIG_SMP
Impact: fix x86/Voyager boot
CONFIG_SMP is used for features which work on *all* x86 boxes.
CONFIG_X86_SMP is used for standard PC like x86 boxes (for things like
multi core and apics)
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
James Bottomley [Thu, 30 Oct 2008 21:08:38 +0000 (16:08 -0500)]
x86/voyager: fix boot breakage caused by x86: boot secondary cpus through initial_code
Impact: boot up secondary CPUs as well on x86/Voyager systems
This commit:
| commit 3e9704739daf46a8ba6593d749c67b5f7cd633d2
| Author: Glauber Costa <gcosta@redhat.com>
| Date: Wed May 28 13:01:54 2008 -0300
|
| x86: boot secondary cpus through initial_code
removed the use of initialize_secondary. However, it didn't update
voyager, so the secondary cpus no longer boot. Fix this by adding the
initial_code switch to voyager as well.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Chuck Lever [Thu, 23 Oct 2008 04:50:35 +0000 (00:50 -0400)]
NLM: Set address family before calling nlm_host_rebooted()
The nlm_host_rebooted() function uses nlm_cmp_addr() to find an
nsm_handle that matches the rebooted peer. In order for this to work,
the passed-in address must have a proper address family.
This fixes a post-2.6.28 regression introduced by commit 781b61a6, which
added AF_INET6 support to nlm_cmp_addr(). Before that commit,
nlm_cmp_addr() didn't care about the address family; it compared only
the sin_addr.s_addr field for equality.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Thu, 30 Oct 2008 17:48:33 +0000 (13:48 -0400)]
nfsd: fix failure to set eof in readdir in some situations
Before 14f7dd632011bb89c035722edd6ea0d90ca6b078 "[PATCH] Copy XFS
readdir hack into nfsd code", readdir_cd->err was reset to eof before
each call to vfs_readdir; afterwards, it is set only once. Similarly, c002a6c7977320f95b5edede5ce4e0eeecf291ff "[PATCH] Optimise NFS readdir
hack slightly", can cause us to exit without nfserr_eof set. Fix this.
This ensures the "eof" bit is set when needed in readdir replies. (The
particular case I saw was an nfsv4 readdir of an empty directory, which
returned with no entries (the protocol requires "." and ".." to be
filtered out), but with eof unset.)
Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Steven Rostedt [Thu, 30 Oct 2008 20:08:32 +0000 (16:08 -0400)]
ftrace: nmi safe code modification
Impact: fix crashes that can occur in NMI handlers, if their code is modified
Modifying code is something that needs special care. On SMP boxes,
if code that is being modified is also being executed on another CPU,
that CPU will have undefined results.
The dynamic ftrace uses kstop_machine to make the system act like a
uniprocessor system. But this does not address NMIs, that can still
run on other CPUs.
One approach to handle this is to make all code that are used by NMIs
not be traced. But NMIs can call notifiers that spread throughout the
kernel and this will be very hard to maintain, and the chance of missing
a function is very high.
The approach that this patch takes is to have the NMIs modify the code
if the modification is taking place. The way this works is that just
writing to code executing on another CPU is not harmful if what is
written is the same as what exists.
Two buffers are used: an IP buffer and a "code" buffer.
The steps that the patcher takes are:
1) Put in the instruction pointer into the IP buffer
and the new code into the "code" buffer.
2) Set a flag that says we are modifying code
3) Wait for any running NMIs to finish.
4) Write the code
5) clear the flag.
6) Wait for any running NMIs to finish.
If an NMI is executed, it will also write the pending code.
Multiple writes are OK, because what is being written is the same.
Then the patcher must wait for all running NMIs to finish before
going to the next line that must be patched.
This is basically the RCU approach to code modification.
Thanks to Ingo Molnar for suggesting the idea, and to Arjan van de Ven
for his guidence on what is safe and what is not.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>