]> pilppa.org Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
18 years ago[PATCH] powerpc: avoid timer interrupt replay effect when onlining cpu
Nathan Lynch [Tue, 7 Feb 2006 04:44:23 +0000 (22:44 -0600)]
[PATCH] powerpc: avoid timer interrupt replay effect when onlining cpu

When a cpu is hotplug-onlined, if we don't set per_cpu(last_jiffy) to
something sane, timer_interrupt will execute its while loop for every
tick missed since the cpu was last online (or since the system was
booted, if we're adding a new cpu).  This can cause weird hangs, ssh
sessions dropping, and we can even go xmon if we take a global IPI at
the wrong time.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: hypervisor check in pseries_kexec_cpu_down
Michael Neuling [Mon, 6 Feb 2006 23:58:21 +0000 (10:58 +1100)]
[PATCH] powerpc: hypervisor check in pseries_kexec_cpu_down

We call unregister_vpa but we don't check to see if the hypervisor
supports this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Anton Blanchard <anton@samba.org>
--
 arch/powerpc/platforms/pseries/setup.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] documentation/powerpc: add bus-frequency property to SOC node
Becky Bruce [Mon, 6 Feb 2006 20:26:31 +0000 (14:26 -0600)]
[PATCH] documentation/powerpc: add bus-frequency property to SOC node

Updated SOC node definition in documentation to include bus-frequency
property. Also extended mdio example to match specification.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Don't use toc in decrementer_iSeries_masked
Michael Ellerman [Tue, 7 Feb 2006 02:26:14 +0000 (13:26 +1100)]
[PATCH] powerpc: Don't use toc in decrementer_iSeries_masked

Since 404849bbd2bfd62e05b36f4753f6e1af6050a824 we've been using
LOAD_REG_ADDRBASE, which uses the toc pointer, in decrementer_iSeries_masked.

This can explode if we take the decrementer interrupt while we're in a module,
because the toc pointer in r2 will be the module's toc pointer.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Cleanup, consolidating icache dirtying logic
David Gibson [Mon, 6 Feb 2006 02:24:53 +0000 (13:24 +1100)]
[PATCH] powerpc: Cleanup, consolidating icache dirtying logic

The code to mark a page as icache dirty (so that it will later be
icache-dcache flushed when we try to execute from it) is duplicated in
three places: flush_dcache_page() does this marking and nothing else,
but clear_user_page() and copy_user_page() duplicate it, since those
functions make the page icache dirty themselves.

This patch makes those other functions call flush_dcache_page()
instead, so the logic's all in one place.  This will make life less
confusing if we ever need to tweak the details of the the lazy icache
flush mechanism.

 arch/powerpc/mm/mem.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] Don't check pointer for NULL before passing it to kfree [arch/powerpc/kernel...
Jesper Juhl [Sat, 4 Feb 2006 19:35:59 +0000 (20:35 +0100)]
[PATCH] Don't check pointer for NULL before passing it to kfree [arch/powerpc/kernel/rtas_flash.c]

Checking a pointer for NULL before passing it to kfree is pointless, kfree
does its own NULL checking of input.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: fix compile warning in udbg_init_maple_realmode
Olaf Hering [Sat, 4 Feb 2006 12:33:46 +0000 (13:33 +0100)]
[PATCH] powerpc: fix compile warning in udbg_init_maple_realmode

arch/powerpc/kernel/udbg_16550.c: In function `udbg_init_maple_realmode':
arch/powerpc/kernel/udbg_16550.c:162: warning: assignment from incompatible pointer type

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: add refcounting to setup_peg2 and of_get_pci_address
Olaf Hering [Sat, 4 Feb 2006 11:55:41 +0000 (12:55 +0100)]
[PATCH] powerpc: add refcounting to setup_peg2 and of_get_pci_address

setup_peg2 must do some refcounting.
of_get_pci_address may need to drop the node

Pegasos l2cr : L2 cache was not active, activating
PCI bus 0 controlled by pci at 80000000
Badness in kref_get at /home/olaf/kernel/olh/ppc64/linux-2.6.16-rc2-olh/lib/kref.c:32
Call Trace:
[C037BD00] [C0007934] show_stack+0x5c/0x184 (unreliable)
[C037BD30] [C000E068] program_check_exception+0x184/0x584
[C037BD90] [C000F5F0] ret_from_except_full+0x0/0x4c
--- Exception: 700 at kref_get+0xc/0x24
    LR = of_node_get+0x24/0x3c
[C037BE50] [C004FD94] __pte_alloc_kernel+0x64/0x80 (unreliable)
[C037BE70] [C000CA18] of_get_parent+0x34/0x58
[C037BE90] [C0009B18] of_get_address+0x24/0x174
[C037BED0] [C000A108] of_address_to_resource+0x24/0x68
[C037BF00] [C038B128] chrp_find_bridges+0x114/0x470
[C037BF90] [C038AE48] chrp_setup_arch+0x1fc/0x32c
[C037BFB0] [C03849B0] setup_arch+0x144/0x188
[C037BFD0] [C037C45C] start_kernel+0x34/0x1a8
[C037BFF0] [000037A0] 0x37a0
Badness in kref_get at /home/olaf/kernel/olh/ppc64/linux-2.6.16-rc2-olh/lib/kref.c:32
Call Trace:
[C037BC90] [C0007934] show_stack+0x5c/0x184 (unreliable)
[C037BCC0] [C000E068] program_check_exception+0x184/0x584
[C037BD20] [C000F5F0] ret_from_except_full+0x0/0x4c
--- Exception: 700 at kref_get+0xc/0x24
    LR = of_node_get+0x24/0x3c
[C037BDE0] [00000000] 0x0 (unreliable)
[C037BE00] [C000CA18] of_get_parent+0x34/0x58
[C037BE20] [C0009CE8] of_translate_address+0x2c/0x2fc
[C037BEA0] [C0009FE8] __of_address_to_resource+0x30/0xc4
[C037BED0] [C000A130] of_address_to_resource+0x4c/0x68
[C037BF00] [C038B128] chrp_find_bridges+0x114/0x470
[C037BF90] [C038AE48] chrp_setup_arch+0x1fc/0x32c
[C037BFB0] [C03849B0] setup_arch+0x144/0x188
[C037BFD0] [C037C45C] start_kernel+0x34/0x1a8
[C037BFF0] [000037A0] 0x37a0
PCI bus 0 controlled by pci at c0000000
Top of RAM: 0x10000000, Total RAM: 0x10000000

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: remove pointer/integer confusion in of_find_node_by_name
Olaf Hering [Sat, 4 Feb 2006 11:44:56 +0000 (12:44 +0100)]
[PATCH] powerpc: remove pointer/integer confusion in of_find_node_by_name

remove pointer/integer confusion

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: restore clock speed in /proc/cpuinfo
Olaf Hering [Sat, 4 Feb 2006 10:05:33 +0000 (11:05 +0100)]
[PATCH] powerpc: restore clock speed in /proc/cpuinfo

Use generic_calibrate_decr to restore missing clock: speed in /proc/cpuinfo

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: remove pointer/integer confusion in generic_calibrate_decr
Olaf Hering [Sat, 4 Feb 2006 09:34:56 +0000 (10:34 +0100)]
[PATCH] powerpc: remove pointer/integer confusion in generic_calibrate_decr

remove pointer/integer confusion

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Don't overwrite flat device tree with kdump kernel
Michael Ellerman [Fri, 3 Feb 2006 08:05:47 +0000 (19:05 +1100)]
[PATCH] powerpc: Don't overwrite flat device tree with kdump kernel

It's possible for prom_init to allocate the flat device tree inside the
kdump crash kernel region. If this happens, when we load the kdump kernel we
overwrite the flattened device tree, which is bad.

We could make prom_init try and avoid allocating inside the crash kernel
region, but then we run into issues if the crash kernel region uses all the
space inside the RMO. The easiest solution is to move the flat device tree
once we're running in the kernel.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: remove useless call to touch_softlockup_watchdog
Dave C Boutcher [Fri, 3 Feb 2006 07:18:36 +0000 (01:18 -0600)]
[PATCH] powerpc: remove useless call to touch_softlockup_watchdog

It turns out that we can't stop the watchdog from
triggering here.  If we touch the timer (which just uses the current jiffie
value) before we enable interrupts, it does nothing because jiffies
are not mass-updated until after we enable interrupts.  If we touch the
timer after we enable interrupts, its too late because the softlockup
watchdog will already have triggered.  The touch_softlockup_watchdog
call removed below does nothing.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: prod all processors after ibm,suspend-me
Dave C Boutcher [Fri, 3 Feb 2006 07:18:39 +0000 (01:18 -0600)]
[PATCH] powerpc: prod all processors after ibm,suspend-me

We need to prod everyone here since this is the only CPU that is
guaranteed to be running after the ibm,suspend-me RTAS call returns.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: return correct rtas status from ibm,suspend-me
Dave C Boutcher [Fri, 3 Feb 2006 07:18:46 +0000 (01:18 -0600)]
[PATCH] powerpc: return correct rtas status from ibm,suspend-me

Correctly return the status from the RTAS call.  rtas_call expects
to return the status as a return value.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Fix !SMP build of rtas.c
Michael Ellerman [Tue, 31 Jan 2006 06:17:47 +0000 (17:17 +1100)]
[PATCH] powerpc: Fix !SMP build of rtas.c

arch/powerpc/kernel/rtas.c is getting hvcall.h via spinlock.h, but when we're
building for UP we don't include spinlock.h.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: IOMMU SG paranoia
Jake Moilanen [Tue, 31 Jan 2006 03:51:54 +0000 (21:51 -0600)]
[PATCH] powerpc: IOMMU SG paranoia

This addresses two items, which are unlikely to be hit if we
trust drivers.

The first is moving a memory barrier below where the vmerged SG count
is passed back, but before the list is set to end.  If those
instructions were reordered, there could be an issue in iommu_unmap_sg().

The second is making sure we terminate the list on the failure case of
iommu_map_sg().  If a driver does not look at the failure return code,
it could pass a ill-formed SG list to iommu_unmap_sg().

Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Refuse to boot a kdump kernel via OF
Michael Ellerman [Wed, 25 Jan 2006 08:48:48 +0000 (21:48 +1300)]
[PATCH] powerpc: Refuse to boot a kdump kernel via OF

You can't boot a kdump kernel via OF, not reliably anyway, the kernel being at
32 MB conflicts with the zImage wrapper etc. and it blows up.

It's trivial to check in prom_init though, and this is early enough that we can
actually drop back to OF where a reset-all will get you going again, which is
kinda nice. I think this should go in for 2.6.16.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Make sure we don't create empty lmb regions
Michael Ellerman [Wed, 25 Jan 2006 08:31:26 +0000 (21:31 +1300)]
[PATCH] powerpc: Make sure we don't create empty lmb regions

To prevent problems later in boot, make sure we don't create zero-size lmb
regions.

I've checked all the callers, and at the moment no one should ever hit this.
All callers use a constant size, or they check the computed size before they
call us.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Don't allocate zero bytes in finish_device_tree()
Michael Ellerman [Wed, 25 Jan 2006 08:31:25 +0000 (21:31 +1300)]
[PATCH] powerpc: Don't allocate zero bytes in finish_device_tree()

In prom.c we run finish_node() on allnodes twice. The first time we just
calculate how much memory we'll need, the second time we do the actual work.

If the calculation stage determines that we need 0 bytes, then we should skip
the lmb allocation. Although an alloc of zero will work, it has been seen to
lead to a BUG_ON() in reserve_bootmem() on at least one machine.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc/8xx: last two 8MB D-TLB entries are incorrectly set
Marcelo Tosatti [Mon, 23 Jan 2006 15:57:06 +0000 (13:57 -0200)]
[PATCH] powerpc/8xx: last two 8MB D-TLB entries are incorrectly set

The last two 8MB TLB entries are being incorrectly set by initial_mmu on 8xx.

The first entry is written with the same virtual/physical address, which
renders it invalid:

BDI>rms 792 0x00001e00
BDI>rms 824 1
BDI>rds 824
SPR  824 : 0xc08000c0  -1065353024
BDI>rds 825
SPR  825 : 0xc0800de0  -1065349664
BDI>rds 826
SPR  826 : 0x00000000            0

And the second entry, in addition, does not have its TLB index set
correctly.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years ago[PATCH] powerpc: Fix spufs initialization sequence.
Geoff Levand [Tue, 24 Jan 2006 01:37:11 +0000 (17:37 -0800)]
[PATCH] powerpc: Fix spufs initialization sequence.

This is a small fix to get the spufs init sequence right.

init_spu_base() in spu_base.c should be called (via
module_init(init_spu_base)) before spufs_init() (via
module_init(spufs_init)) in spufs/inode.c gets called.

Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years agopowerpc/64: Fix bug in setting floating-point exception mode
Paul Mackerras [Tue, 7 Feb 2006 02:55:30 +0000 (13:55 +1100)]
powerpc/64: Fix bug in setting floating-point exception mode

When loading up the FPU, we were using a 'ld' (load doubleword)
instruction to get the FP exception mode from the thread_struct,
but it's only an int field.  This changes the ld to lwz (load
word and zero-extend).

Signed-off-by: Paul Mackerras <paulus@samba.org>
18 years agoMerge ../linux-2.6
Paul Mackerras [Mon, 6 Feb 2006 23:43:36 +0000 (10:43 +1100)]
Merge ../linux-2.6

18 years ago[PATCH] USB: Fix GPL markings on usb core functions.
Greg KH [Sun, 5 Feb 2006 22:16:08 +0000 (14:16 -0800)]
[PATCH] USB: Fix GPL markings on usb core functions.

I thought we had fixed up all non-gpl USB drivers, and was wrong to do
this.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agomm/slab.c (non-NUMA): Fix compile warning and clean up code
Linus Torvalds [Sun, 5 Feb 2006 19:26:38 +0000 (11:26 -0800)]
mm/slab.c (non-NUMA): Fix compile warning and clean up code

The non-NUMA case would do an unmatched "free_alien_cache()" on an alien
pointer that had never been allocated.

It might not matter from a code generation standpoint (since in the
non-NUMA case, the code doesn't actually _do_ anything), but it not only
results in a compiler warning, it's really really ugly too.

Fix the compiler warning by just having a matching dummy allocation.
That also avoids an unnecessary #ifdef in the code.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Sun, 5 Feb 2006 19:10:54 +0000 (11:10 -0800)]
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6

18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sun, 5 Feb 2006 19:10:29 +0000 (11:10 -0800)]
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

18 years ago[PATCH] kconfig: detect if -lintl is needed when linking conf,mconf
Robb, Sam [Sun, 5 Feb 2006 07:28:06 +0000 (23:28 -0800)]
[PATCH] kconfig: detect if -lintl is needed when linking conf,mconf

On a system where libintl.h is present, but the NLS functionality is
supplied by a separate library instead of the system C library, an attempt
to "make config" or "make menuconfig" will fail with link errors, ex:

  scripts/kconfig/mconf.o:mconf.c:(.text+0xf63): undefined reference to
    `_libintl_gettext'

This patch attempts to correct the problem by detecting whether or not NLS
support requires linking with libintl.

Signed-off-by: Samuel J Robb <sam.robb@timesys.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: HIGHMEM64G must depend on X86_CMPXCHG64
Adrian Bunk [Sun, 5 Feb 2006 07:28:05 +0000 (23:28 -0800)]
[PATCH] i386: HIGHMEM64G must depend on X86_CMPXCHG64

Due to the usage of set_64bit in include/asm-i386/pgtable-3level.h,
HIGHMEM64G must depend on X86_CMPXCHG64.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Fix "value computed is not used" compile warnings with gcc-4.1
Takashi Iwai [Sun, 5 Feb 2006 07:28:05 +0000 (23:28 -0800)]
[PATCH] Fix "value computed is not used" compile warnings with gcc-4.1

Fix gcc4.1 compile warnings "value computed is not used" with
set_current_state() and set_task_state() on i386/SMP and x86-64.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: print kernel version in register dumps
Chuck Ebbert [Sun, 5 Feb 2006 07:28:04 +0000 (23:28 -0800)]
[PATCH] i386: print kernel version in register dumps

Show first field of kernel version in register dumps like x86_64 does.

Changes output from e.g.:
(2.6.16-rc1)
to:
(2.6.16-rc1 #12)

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386 cpu hotplug: don't access freed memory
Chuck Ebbert [Sun, 5 Feb 2006 07:28:03 +0000 (23:28 -0800)]
[PATCH] i386 cpu hotplug: don't access freed memory

i386 CPU init code accesses freed init memory when booting a newly-started
processor after CPU hotplug.  The cpu_devs array is searched to find the
vendor and it contains pointers to freed data.

Fix that by:

        1. Zeroing entries for freed vendor data after bootup.
        2. Changing Transmeta, NSC and UMC to all __init[data].
        3. Printing a warning (once only) and setting this_cpu
           to a safe default when the vendor is not found.

This does not change behavior for AMD systems.  They were broken already
but no error was reported.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] namei.c: unlock missing in error case
Ulrich Drepper [Sun, 5 Feb 2006 07:28:02 +0000 (23:28 -0800)]
[PATCH] namei.c: unlock missing in error case

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] VFS: Ensure LOOKUP_CONTINUE flag is preserved by link_path_walk()
Trond Myklebust [Sun, 5 Feb 2006 07:28:01 +0000 (23:28 -0800)]
[PATCH] VFS: Ensure LOOKUP_CONTINUE flag is preserved by link_path_walk()

When walking a path, the LOOKUP_CONTINUE flag is used by some filesystems
(for instance NFS) in order to determine whether or not it is looking up
the last component of the path.  It this is the case, it may have to look
at the intent information in order to perform various tasks such as atomic
open.

A problem currently occurs when link_path_walk() hits a symlink.  In this
case LOOKUP_CONTINUE may be cleared prematurely when we hit the end of the
path passed by __vfs_follow_link() (i.e.  the end of the symlink path)
rather than when we hit the end of the path passed by the user.

The solution is to have link_path_walk() clear LOOKUP_CONTINUE if and only
if that flag was unset when we entered the function.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] NUMA slab locking fixes: fix cpu down and up locking
Ravikiran G Thirumalai [Sun, 5 Feb 2006 07:27:59 +0000 (23:27 -0800)]
[PATCH] NUMA slab locking fixes: fix cpu down and up locking

This fixes locking and bugs in cpu_down and cpu_up paths of the NUMA slab
allocator.  Sonny Rao <sonny@burdell.org> reported problems sometime back on
POWER5 boxes, when the last cpu on the nodes were being offlined.  We could
not reproduce the same on x86_64 because the cpumask (node_to_cpumask) was not
being updated on cpu down.  Since that issue is now fixed, we can reproduce
Sonny's problems on x86_64 NUMA, and here is the fix.

The problem earlier was on CPU_DOWN, if it was the last cpu on the node to go
down, the array_caches (shared, alien) and the kmem_list3 of the node were
being freed (kfree) with the kmem_list3 lock held.  If the l3 or the
array_caches were to come from the same cache being cleared, we hit on
badness.

This patch cleans up the locking in cpu_up and cpu_down path.  We cannot
really free l3 on cpu down because, there is no node offlining yet and even
though a cpu is not yet up, node local memory can be allocated for it.  So l3s
are usually allocated at keme_cache_create and destroyed at
kmem_cache_destroy.  Hence, we don't need cachep->spinlock protection to get
to the cachep->nodelist[nodeid] either.

Patch survived onlining and offlining on a 4 core 2 node Tyan box with a 4
dbench process running all the time.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] NUMA slab locking fixes: irq disabling from cahep->spinlock to l3 lock
Ravikiran G Thirumalai [Sun, 5 Feb 2006 07:27:58 +0000 (23:27 -0800)]
[PATCH] NUMA slab locking fixes: irq disabling from cahep->spinlock to l3 lock

Earlier, we had to disable on chip interrupts while taking the
cachep->spinlock because, at cache_grow, on every addition of a slab to a slab
cache, we incremented colour_next which was protected by the cachep->spinlock,
and cache_grow could occur at interrupt context.  Since, now we protect the
per-node colour_next with the node's list_lock, we do not need to disable on
chip interrupts while taking the per-cache spinlock, but we just need to
disable interrupts when taking the per-node kmem_list3 list_lock.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] NUMA slab locking fixes: move color_next to l3
Ravikiran G Thirumalai [Sun, 5 Feb 2006 07:27:56 +0000 (23:27 -0800)]
[PATCH] NUMA slab locking fixes: move color_next to l3

colour_next is used as an index to add a colouring offset to a new slab in the
cache (colour_off * colour_next).  Now with the NUMA aware slab allocator, it
makes sense to colour slabs added on the same node sequentially with
colour_next.

This patch moves the colouring index "colour_next" per-node by placing it on
kmem_list3 rather than kmem_cache.

This also helps simplify locking for CPU up and down paths.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] hugetlb: add comment explaining reasons for Bus Errors
Christoph Lameter [Sun, 5 Feb 2006 07:27:55 +0000 (23:27 -0800)]
[PATCH] hugetlb: add comment explaining reasons for Bus Errors

I just spent some time researching a Bus Error.  Turns out that the huge
page fault handler can return VM_FAULT_SIGBUS for various conditions where
no huge page is available.

Add a note explaining the reasoning in the source.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] jbd: fix transaction batching
Andrew Morton [Sun, 5 Feb 2006 07:27:54 +0000 (23:27 -0800)]
[PATCH] jbd: fix transaction batching

Ben points out that:

  When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
  significant drop in throughput as the disk sits idle.  The patch below
  results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
  IDE test box) when writing out files using O_SYNC.

So optimise the batching code by omitting it entirely if the process which is
doing a sync write is the same as the one which did the most recent sync
write.  If that's true, we're unlikely to get any other processes joining the
transaction.

(Has been in -mm for ages - it took me a long time to get on to performance
testing it)

Numbers, on write-cache-disabled IDE:

/usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name

Unpatched:
40 seconds
Patched:
35 seconds
Batching disabled:
35 seconds

This is the problematic single-process-doing-fsync case.  With multiple
fsyncing processes the numbers are AFACIT unaltered by the patch.

Aside: performance testing and instrumentation shows that the transaction
batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
dir-name on non-writeback-caching IDE).  This is because by the time one
process is running a synchronous commit, a bunch of other processes already
have a transaction handle open, so they're all going to batch into the same
transaction anyway.

The batching seems to offer maybe 5-10% speedup with this workload, but I'm
pretty sure it was more important than that when it was first developed 4-odd
years ago...

Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] reiserfs_get_acl() build fix
Andrew Morton [Sun, 5 Feb 2006 07:27:51 +0000 (23:27 -0800)]
[PATCH] reiserfs_get_acl() build fix

With CONFIG_REISERFS_FS_XATTR=y, CONFIG_REISERFS_FS_POSIX_ACL=n:

fs/reiserfs/xattr.c: In function `reiserfs_check_acl':
fs/reiserfs/xattr.c:1330: called object is not a function

Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86: fix stack trace facility level
Hugh Dickins [Sun, 5 Feb 2006 07:27:51 +0000 (23:27 -0800)]
[PATCH] x86: fix stack trace facility level

dump_stack() on page allocation failure presently has an irritating habit
of shouting just "====" at everyone: please stop it.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] selinux: require SECURITY_NETWORK
Stephen Smalley [Sun, 5 Feb 2006 07:27:50 +0000 (23:27 -0800)]
[PATCH] selinux: require SECURITY_NETWORK

Make SELinux depend on SECURITY_NETWORK (which depends on SECURITY), as it
requires the socket hooks for proper operation even in the local case.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] missing license tag in intermodule
Dave Jones [Sun, 5 Feb 2006 07:27:49 +0000 (23:27 -0800)]
[PATCH] missing license tag in intermodule

It may suck something awful, but it shouldn't taint the kernel.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] pktcdvd: Allow larger packets
Phillip Susi [Sun, 5 Feb 2006 07:27:48 +0000 (23:27 -0800)]
[PATCH] pktcdvd: Allow larger packets

The pktcdvd driver uses a compile time macro constant to define the maximum
supported packet length.  I changed this from 32 sectors to 128 sectors
because that allows over 100 MB of additional usable space on a 700 MB cdrw,
and increases throughput.

Note that you need a modified cdrwtool program that can format a CDRW disc
with larger packets to benefit from this change.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] pktcdvd: Don't waste kernel memory
Peter Osterlund [Sun, 5 Feb 2006 07:27:47 +0000 (23:27 -0800)]
[PATCH] pktcdvd: Don't waste kernel memory

Allocate memory for read-gathering at open time, when it is known just how
much memory is needed.  This avoids wasting kernel memory when the real packet
size is smaller than the maximum packet size supported by the driver.  This is
always the case when using DVD discs.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Let CDROM_PKTCDVD_WCACHE depend on EXPERIMENTAL
Adrian Bunk [Sun, 5 Feb 2006 07:27:45 +0000 (23:27 -0800)]
[PATCH] Let CDROM_PKTCDVD_WCACHE depend on EXPERIMENTAL

Unless the help text is outdated, this seems to be logical.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] pktcdvd: remove version string
Peter Osterlund [Sun, 5 Feb 2006 07:27:45 +0000 (23:27 -0800)]
[PATCH] pktcdvd: remove version string

The version information is not useful for a driver that is maintained in
Linus' kernel tree.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] pktcdvd: Fix overflow for discs with large packets
Phillip Susi [Sun, 5 Feb 2006 07:27:44 +0000 (23:27 -0800)]
[PATCH] pktcdvd: Fix overflow for discs with large packets

The pktcdvd driver was using an 8 bit field to store the packet length
obtained from the disc track info.  This causes it to overflow packet length
values of 128KB or more.  I changed the field to 32 bits to fix this.

The pktcdvd driver defaulted to its maximum allowed packet length when it
detected a 0 in the track info field.  I changed this to fail the operation
and refuse to access the media.  This seems more sane than attempting to
access it with a value that almost certainly will not work.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] sched: only print migration_cost once per boot
Chuck Ebbert [Sun, 5 Feb 2006 07:27:42 +0000 (23:27 -0800)]
[PATCH] sched: only print migration_cost once per boot

migration_cost prints after every CPU hotplug event.  Make it print only
once at boot.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] MAINTAINERS/CREDITS: Update SELinux contact info
Stephen Smalley [Sun, 5 Feb 2006 07:27:42 +0000 (23:27 -0800)]
[PATCH] MAINTAINERS/CREDITS: Update SELinux contact info

Update my contact info.  Please apply.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fuse: fix request_end() vs fuse_reset_request() race
Miklos Szeredi [Sun, 5 Feb 2006 07:27:40 +0000 (23:27 -0800)]
[PATCH] fuse: fix request_end() vs fuse_reset_request() race

The last fix for this function in fact opened up a much more often
triggering race.

It was uncommented tricky code, that was buggy.  Add comment, make it less
tricky and fix bug.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Fix i2o_scsi oops on abort
Markus Lidel [Sun, 5 Feb 2006 07:27:39 +0000 (23:27 -0800)]
[PATCH] Fix i2o_scsi oops on abort

Fix http://bugzilla.kernel.org/show_bug.cgi?id=5923

When a scsi command failed, an oops would result.

Back-to-back SMART queries would make the Seagate drives unhappy.  The
second SMART query would timeout, and the command would be aborted.

Acked-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Kenny Simpson <theonetruekenny@yahoo.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] block: request_queue->ordcolor must not be flipped on SOFTBARRIER
Tejun Heo [Sun, 5 Feb 2006 07:27:38 +0000 (23:27 -0800)]
[PATCH] block: request_queue->ordcolor must not be flipped on SOFTBARRIER

q->ordcolor must not be flipped on SOFTBARRIER.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix ordering on requeued request drainage
Jens Axboe [Sun, 5 Feb 2006 07:27:38 +0000 (23:27 -0800)]
[PATCH] fix ordering on requeued request drainage

Previously, if a fs request which was being drained failed and got
requeued, blk_do_ordered() didn't allow it to be reissued, which causes
queue stall.  This patch makes blk_do_ordered() use the sequence of each
request to determine whether a request can be issued or not.  This fixes
the bug and simplifies code.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] percpu data: only iterate over possible CPUs
Eric Dumazet [Sun, 5 Feb 2006 07:27:36 +0000 (23:27 -0800)]
[PATCH] percpu data: only iterate over possible CPUs

percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h.  powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoRevert "[PATCH] x86_64: Fix the node cpumask of a cpu going down"
Linus Torvalds [Sun, 5 Feb 2006 18:51:57 +0000 (10:51 -0800)]
Revert "[PATCH] x86_64: Fix the node cpumask of a cpu going down"

This reverts commit 10f4dc8b27ac42f930ac55adb8c521264dc997f8.

Quoth Andi Kleen:
  "Kiran decided that it makes the problem worse than it was before.
   Fixing it fully requires more work which is too much for 2.6.16.  So
   please revert that commit for now."

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[SPARC64]: Update defconfig.
David S. Miller [Sat, 4 Feb 2006 10:49:23 +0000 (02:49 -0800)]
[SPARC64]: Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: Add .gitignore file for sparc64 boot images.
David S. Miller [Sat, 4 Feb 2006 10:49:03 +0000 (02:49 -0800)]
[SPARC64]: Add .gitignore file for sparc64 boot images.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Fix check whether dst_entry needs to be released after NAT
Patrick McHardy [Sat, 4 Feb 2006 10:19:46 +0000 (02:19 -0800)]
[NETFILTER]: Fix check whether dst_entry needs to be released after NAT

After DNAT the original dst_entry needs to be released if present
so the packet doesn't skip input routing with its new address. The
current check for DNAT in ip_nat_in is reversed and checks for SNAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Prepare {ipt,ip6t}_policy match for x_tables unification
Patrick McHardy [Sat, 4 Feb 2006 10:19:09 +0000 (02:19 -0800)]
[NETFILTER]: Prepare {ipt,ip6t}_policy match for x_tables unification

The IPv4 and IPv6 version of the policy match are identical besides address
comparison and the data structure used for userspace communication. Unify
the data structures to break compatiblity now (before it is released), so
we can port it to x_tables in 2.6.17.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Fix ip6t_policy address matching
Patrick McHardy [Sat, 4 Feb 2006 10:17:55 +0000 (02:17 -0800)]
[NETFILTER]: Fix ip6t_policy address matching

Fix two bugs in ip6t_policy address matching:
- misorder arguments to ip6_masked_addrcmp, mask must be the second argument
- inversion incorrectly applied to the entire expression instead of just
  the address comparison

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Check policy length in policy match strict mode
Patrick McHardy [Sat, 4 Feb 2006 10:17:26 +0000 (02:17 -0800)]
[NETFILTER]: Check policy length in policy match strict mode

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Fix possible overflow in netfilters do_replace()
Kirill Korotaev [Sat, 4 Feb 2006 10:16:56 +0000 (02:16 -0800)]
[NETFILTER]: Fix possible overflow in netfilters do_replace()

netfilter's do_replace() can overflow on addition within SMP_ALIGN()
and/or on multiplication by NR_CPUS, resulting in a buffer overflow on
the copy_from_user().  In practice, the overflow on addition is
triggerable on all systems, whereas the multiplication one might require
much physical memory to be present due to the check above.  Either is
sufficient to overwrite arbitrary amounts of kernel memory.

I really hate adding the same check to all 4 versions of do_replace(),
but the code is duplicate...

Found by Solar Designer during security audit of OpenVZ.org

Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-Off-By: Solar Designer <solar@openwall.com>
Signed-off-by: Patrck McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: nf_conntrack: fix incorrect memset() size in FTP helper
Samir Bellabes [Sat, 4 Feb 2006 10:16:06 +0000 (02:16 -0800)]
[NETFILTER]: nf_conntrack: fix incorrect memset() size in FTP helper

This memset() is executing with a bad size. According to Yasuyuki Kozakai,
this memset() can be deleted, as 'ftp' is declared in global area.

Signed-off-by: Samir Bellabes <sbellabes@mandriva.com>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: iptables: fix typos in ipt_connbytes.h
Yasuyuki Kozakai [Sat, 4 Feb 2006 10:15:36 +0000 (02:15 -0800)]
[NETFILTER]: iptables: fix typos in ipt_connbytes.h

Fix some typos that make iptables userspace compilation fail.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Fix missing src port initialization in tftp expectation mask
Patrick McHardy [Sat, 4 Feb 2006 10:14:51 +0000 (02:14 -0800)]
[NETFILTER]: Fix missing src port initialization in tftp expectation mask

Reported by David Ahern <dahern@avaya.com>, netfilter bugzilla #426.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: nfnetlink_queue: fix packet marking over netlink
Patrick McHardy [Sat, 4 Feb 2006 10:14:24 +0000 (02:14 -0800)]
[NETFILTER]: nfnetlink_queue: fix packet marking over netlink

The packet marked is the netlink skb, not the queued skb.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Fix undersized skb allocation in ipt_ULOG/ebt_ulog/nfnetlink_log
Patrick McHardy [Sat, 4 Feb 2006 10:13:57 +0000 (02:13 -0800)]
[NETFILTER]: Fix undersized skb allocation in ipt_ULOG/ebt_ulog/nfnetlink_log

The skb allocated is always of size nlbufsize, even if that is smaller than
the size needed for the current packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: ULOG/nfnetlink_log: Use better default value for 'nlbufsiz'
Holger Eitzenberger [Sat, 4 Feb 2006 10:13:14 +0000 (02:13 -0800)]
[NETFILTER]: ULOG/nfnetlink_log: Use better default value for 'nlbufsiz'

Performance tests showed that ULOG may fail on heavy loaded systems
because of failed order-N allocations (N >= 1).

The default value of 4096 is not optimal in the sense that it actually
allocates _two_ contigous physical pages.  Reasoning: ULOG uses
alloc_skb(), which adds another ~300 bytes for skb_shared_info.

This patch sets the default value to NLMSG_GOODSIZE and adds some
documentation at the top.

Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: nf_conntrack: check address family when finding protocol module
Yasuyuki Kozakai [Sat, 4 Feb 2006 10:12:14 +0000 (02:12 -0800)]
[NETFILTER]: nf_conntrack: check address family when finding protocol module

__nf_conntrack_{l3}proto_find() doesn't check the passed protocol family,
then it's possible to touch out of the array which has only AF_MAX items.

Spotted by Pablo Neira Ayuso.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: ctnetlink: add MODULE_ALIAS for expectation subsystem
Pablo Neira Ayuso [Sat, 4 Feb 2006 10:11:41 +0000 (02:11 -0800)]
[NETFILTER]: ctnetlink: add MODULE_ALIAS for expectation subsystem

Add load-on-demand support for expectation request. eg. conntrack -L expect

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: ctnetlink: Fix subsystem used for expectation events
Marcus Sundberg [Sat, 4 Feb 2006 10:11:09 +0000 (02:11 -0800)]
[NETFILTER]: ctnetlink: Fix subsystem used for expectation events

The ctnetlink expectation events should use the NFNL_SUBSYS_CTNETLINK_EXP
subsystem, not NFNL_SUBSYS_CTNETLINK.

Signed-off-by: Marcus Sundberg <marcus@ingate.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ICMP]: Fix extra dst release when ip_options_echo fails
Herbert Xu [Sat, 4 Feb 2006 10:09:34 +0000 (02:09 -0800)]
[ICMP]: Fix extra dst release when ip_options_echo fails

When two ip_route_output_key lookups in icmp_send were combined I
forgot to change the error path for ip_options_echo to not drop the
dst reference since it now sits before the dst lookup.  To fix it we
simply jump past the ip_rt_put call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PATCH] x86_64: IOMMU printk cleanup
Jon Mason [Fri, 3 Feb 2006 20:51:59 +0000 (21:51 +0100)]
[PATCH] x86_64: IOMMU printk cleanup

This patch contains a printk reorder to remove the current problem of
displaying "PCI-DMA: Disabling IOMMU." and then "PCI-DMA: using GART
IOMMU" 20 lines later in dmesg.

It also constains a printk reorder in swiotlb to state swiotlb
enablement prior to describing the location of the bounce buffers, and a
printk reorder to state gart enablement prior to describing the
aperature.

Also constains a whitespace cleanup in arch/x86_64/kernel/setup.c

Tested (along with patch 2/2) on dual opteron with gart enabled,
iommu=soft, and iommu=off.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Let impossible CPUs point to reference per cpu data
Andi Kleen [Fri, 3 Feb 2006 20:51:56 +0000 (21:51 +0100)]
[PATCH] x86_64: Let impossible CPUs point to reference per cpu data

Hack for 2.6.16. In 2.6.17 all code that uses NR_CPUs should
be audited and changed to only touch possible CPUs.

Don't mark the reference per cpu data init data (so it stays
around after boot) and point all impossible CPUs to it. This way
they reference some valid - although shared memory. Usually
this is only initialization like INIT_LIST_HEADs and there
won't be races because these CPUs never run. Still somewhat hackish.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386/x86-64: Don't ack the APIC for bad interrupts when the APIC is not enabled
Andi Kleen [Fri, 3 Feb 2006 20:51:53 +0000 (21:51 +0100)]
[PATCH] i386/x86-64: Don't ack the APIC for bad interrupts when the APIC is not enabled

It's bad juju to touch the APIC when it hasn't been enabled.
I also moved ack_bad_irq for x86-64 out of line following i386.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Dont record local apic ids when they are disabled in MADT
Ashok Raj [Fri, 3 Feb 2006 20:51:50 +0000 (21:51 +0100)]
[PATCH] x86_64: Dont record local apic ids when they are disabled in MADT

Some broken BIOS's had processors disabled, but
same apic id as a valid processor. This causes
acpi_processor_start() to think this disabled
cpu is ok, and croak. So we dont record bad
apicid's anymore.

http://bugzilla.kernel.org/show_bug.cgi?id=5930

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: minor odering correction to dump_pagetable()
Jan Beulich [Fri, 3 Feb 2006 20:51:47 +0000 (21:51 +0100)]
[PATCH] x86_64: minor odering correction to dump_pagetable()

Checking of the validity of pointers should be consistently done before
dereferencing the pointer.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: small fix for CFI annotations
Jan Beulich [Fri, 3 Feb 2006 20:51:44 +0000 (21:51 +0100)]
[PATCH] x86_64: small fix for CFI annotations

Conditionalize two unwind directives to match other similarly
conditional code.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Cc: Jim Houston <jim.houston@ccur.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Calibrate APIC timer using PM timer
Andi Kleen [Fri, 3 Feb 2006 20:51:41 +0000 (21:51 +0100)]
[PATCH] x86_64: Calibrate APIC timer using PM timer

On some broken motherboards (at least one NForce3 based AMD64 laptop)
the PIT timer runs at a incorrect frequency.  This patch adds a new
option "apicpmtimer" that allows to use the APIC timer and calibrate it
using the PMTimer.  It requires the earlier patch that allows to run the
main timer from the APIC.

Specifying apicpmtimer implies apicmaintimer.

The option defaults to off for now.

I tested it on a few systems and the resulting APIC timer frequencies
were usually a bit off, but always <1%, which should be tolerable.

TBD figure out heuristic to enable this automatically on the affected
systems TBD perhaps do it on all NForce3s or using DMI?

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Don't allow kprobes on __switch_to
Andi Kleen [Fri, 3 Feb 2006 20:51:38 +0000 (21:51 +0100)]
[PATCH] x86_64: Don't allow kprobes on __switch_to

kprobes cannot deal with the funny calling conventions when it
runs on a different stack when it returns. If someone wants
to instrument context switch they can add a probe to schedule()
instead.

Cc: jkenisto@us.ibm.com, prasanna@in.ibm.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: align per-cpu section to configured cache bytes
Zach Brown [Fri, 3 Feb 2006 20:51:35 +0000 (21:51 +0100)]
[PATCH] x86_64: align per-cpu section to configured cache bytes

Align the start of the per-cpu section to the configured number of bytes in a
cache line.  This stops a BUG_ON() from triggering in load_module() when
DEFINE_PER_CPU() is used in a module and the section isn't cacheline-aligned.
Rusty also found this and sent a patch in a while ago
(http://lkml.org/lkml/2004/10/19/17), I don't know what came of that.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: When allocation of merged SG lists fails in the IOMMU don't merge
Kevin VanMaren [Fri, 3 Feb 2006 20:51:32 +0000 (21:51 +0100)]
[PATCH] x86_64: When allocation of merged SG lists fails in the IOMMU don't merge

[ AK: I redid Kevin's fix to be simpler, but the idea and original
  analysis of the problem is from Kevin]

This avoid allocation failures on some SATA systems like Nvidia CK8
when the IOMMU gets fragmented. Modern SATA devices have quite large queues
(128 entries) and the FS with ext2/3 is good enough now that it often
passes whole 128 page sg lists down to the driver. These require
512K of continuous free space in the IOMMU aperture to map when merged.
When the IOMMU is fragmented this could lead to spurious IO errors
due to failing mappings.

Short term fix is to just try to map the SG list again unmerged
page by page - this way fragmentation doesn't matter anymore.
The code for that was already there, but it just wasn't enabled for the
merge case.

According to Kevin at least the Nvidia device doesn't seem to benefit
from merging much anyways, so the only slowdown is from trying
to do an unnecessary merge attempt.

Kevin plans to implement better fragmentation avoidance in the future,
but that wouldn't be 2.6.16 material.

TBD: should add some statistic counters to count how often that really
happens.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix zero mcfg entry workaround on x86-64
Andi Kleen [Fri, 3 Feb 2006 20:51:29 +0000 (21:51 +0100)]
[PATCH] x86_64: Fix zero mcfg entry workaround on x86-64

I broke this earlier when moving the patch from i386 to x86-64.
Need to return the virtual address here, not the physical address.
This fixes some boot time crashes on x86-64.

Cc: gregkh@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Do more checking in the SRAT header code
Andi Kleen [Fri, 3 Feb 2006 20:51:26 +0000 (21:51 +0100)]
[PATCH] x86_64: Do more checking in the SRAT header code

 - Check if the processor/memory affinity entries are long enough
   according to the ACPI 3.0 spec.
 - Ignore memory affinity entries that define a zero length region.

All based on BIOS issues found in the field @)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: data/functions wrongly marked as __init with cpu hotplug.
Ashok Raj [Fri, 3 Feb 2006 20:51:23 +0000 (21:51 +0100)]
[PATCH] x86_64: data/functions wrongly marked as __init with cpu hotplug.

attached patch is 2 more cases i found via running the reference_init.pl
script. These were easy to spot just knowing the file names. There is
one another about init/main.c that i cant exactly zero in. (partly
because i dont know how to interpret the data thats spewed out of the tool).

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: mark two routines as __cpuinit
Shaohua Li [Fri, 3 Feb 2006 20:51:20 +0000 (21:51 +0100)]
[PATCH] x86_64: mark two routines as __cpuinit

SIgned-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing
Andi Kleen [Fri, 3 Feb 2006 20:51:17 +0000 (21:51 +0100)]
[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing

Might fix boot failures on systems with empty PXMs in SRAT

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix memory policy build without CONFIG_HUGETLBFS
Chen, Kenneth W [Fri, 3 Feb 2006 20:51:14 +0000 (21:51 +0100)]
[PATCH] x86_64: Fix memory policy build without CONFIG_HUGETLBFS

> mm/mempolicy.c: In function `huge_zonelist':
> mm/mempolicy.c:1045: error: `HPAGE_SHIFT' undeclared (first use in this function)
> mm/mempolicy.c:1045: error: (Each undeclared identifier is reported only once
> mm/mempolicy.c:1045: error: for each function it appears in.)
> make[1]: *** [mm/mempolicy.o] Error 1

Need to wrap huge_zonelist function with CONFIG_HUGETLBFS.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove rogue default y in EDAC Kconfig
Andi Kleen [Fri, 3 Feb 2006 20:51:11 +0000 (21:51 +0100)]
[PATCH] x86_64: Remove rogue default y in EDAC Kconfig

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove CONFIG_INIT_DEBUG
Andi Kleen [Fri, 3 Feb 2006 20:51:08 +0000 (21:51 +0100)]
[PATCH] x86_64: Remove CONFIG_INIT_DEBUG

It has been enabled by default for some time now and is cheap enough
so it doesn't matter anyways.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix the node cpumask of a cpu going down
Ravikiran G Thirumalai [Fri, 3 Feb 2006 20:51:05 +0000 (21:51 +0100)]
[PATCH] x86_64: Fix the node cpumask of a cpu going down

Currently, x86_64 and ia64 arches do not clear the corresponding bits
in the node's cpumask when a cpu goes down or cpu bring up is cancelled.
This is buggy since there are pieces of common code where the cpumask is
checked in the cpu down code path to decide on things (like in  the slab
down path).  PPC does the right thing, but x86_64 and ia64 don't (This
was the reason Sonny hit upon a slab bug during cpu offline on ppc and
could not reproduce on other arches).  This patch fixes it for x86_64.
I won't attempt ia64 as I cannot test it.

Credit for spotting this should go to Alok.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Undo the earlier changes to remove unrolled copy/memset functions
Andi Kleen [Fri, 3 Feb 2006 20:51:02 +0000 (21:51 +0100)]
[PATCH] x86_64: Undo the earlier changes to remove unrolled copy/memset functions

They cause quite bad performance regressions on Netburst
This is temporary until we can get new optimized functions
for these CPUs.

This undoes changes that were done in 2.6.15 and in 2.6.16-rc1,
essentially bringing the code back to 2.6.14 level. Only change
is I renamed the X86_FEATURE_K8_C flag to X86_FEATURE_REP_GOOD
and fixed the check for the flag and also fixed some comments.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix swiotlb dma_alloc_coherent fallback
Andi Kleen [Fri, 3 Feb 2006 20:50:59 +0000 (21:50 +0100)]
[PATCH] x86_64: Fix swiotlb dma_alloc_coherent fallback

This avoids BUG_ONs in the low level allocator when an illegal
GFP mask is added.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: [PATCH] timer resume
Shaohua Li [Fri, 3 Feb 2006 20:50:56 +0000 (21:50 +0100)]
[PATCH] x86_64: [PATCH] timer resume

At resume time, TSC's value or something similar might be changed a lot
against suspend time. This could make system gets a very big lost ticks.
See http://bugzilla.kernel.org/show_bug.cgi?id=5825

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Automatically enable apicmaintimer on ATI boards
Andi Kleen [Fri, 3 Feb 2006 20:50:53 +0000 (21:50 +0100)]
[PATCH] x86_64: Automatically enable apicmaintimer on ATI boards

They all have problems with IRQ 0 routing, so just use the APIC on them.

Can be overwritten with "noapicmaintimer"

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Allow to run main time keeping from the local APIC interrupt
Andi Kleen [Fri, 3 Feb 2006 20:50:50 +0000 (21:50 +0100)]
[PATCH] x86_64: Allow to run main time keeping from the local APIC interrupt

Another piece from the no-idle-tick patch.

This can be enabled with the "apicmaintimer" option.

This is mainly useful when the PIT/HPET interrupt is unreliable.
Note there are some systems that are known to stop the APIC
timer in C3. For those it will never work, but this case
should be automatically detected.

It also only works with PM timer right now. When HPET is used
the way the main timer handler computes the delay doesn't work.

It should be a bit more efficient because there is one less
regular interrupt to process on the boot processor.

Requires earlier bugfix from Venkatesh

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Only switch to IPI broadcast timer on Intel when C3 is supported
Venkatesh Pallipadi [Fri, 3 Feb 2006 20:50:47 +0000 (21:50 +0100)]
[PATCH] x86_64: Only switch to IPI broadcast timer on Intel when C3 is supported

Bug in apic timer removal on C3 patch. We should switch to IPI from APIC timer
only when C3 state is valid.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Define pmtmr_ioport to 0 when PM_TIMER is not available
Andi Kleen [Fri, 3 Feb 2006 20:50:44 +0000 (21:50 +0100)]
[PATCH] x86_64: Define pmtmr_ioport to 0 when PM_TIMER is not available

Avoids some ifdef mess later.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>