]> pilppa.org Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
17 years agopageflags: get rid of FLAGS_RESERVED
Christoph Lameter [Mon, 28 Apr 2008 09:12:48 +0000 (02:12 -0700)]
pageflags: get rid of FLAGS_RESERVED

NR_PAGEFLAGS specifies the number of page flags we are using.  From that we
can calculate the number of bits leftover that can be used for zone, node (and
maybe the sections id).  There is no need anymore for FLAGS_RESERVED if we use
NR_PAGEFLAGS.

Use the new methods to make NR_PAGEFLAGS available via the preprocessor.
NR_PAGEFLAGS is used to calculate field boundaries in the page flags fields.
These field widths have to be available to the preprocessor.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Miller <davem@davemloft.net>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopageflags: use an enum for the flags
Christoph Lameter [Mon, 28 Apr 2008 09:12:47 +0000 (02:12 -0700)]
pageflags: use an enum for the flags

Use an enum to ease the maintenance of page flags.  This is going to change
the numbering from 0 to 18.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopageflags: standardize comment inclusion in asm-offsets.h and fix MIPS
Christoph Lameter [Mon, 28 Apr 2008 09:12:45 +0000 (02:12 -0700)]
pageflags: standardize comment inclusion in asm-offsets.h and fix MIPS

Add the ability to pass comments into asm-offsets.h by generating asm
output like

-># comment line

Mips needs this feature to preserve the comments that are in
asm-mips/asm-offsets.h right now.

Then remove the special handling for mips from Kbuild and convert mips to use
the new string to include the comments.

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopage_mapping(): add ifdef around reference to swapper_space
Andrew Morton [Mon, 28 Apr 2008 09:12:44 +0000 (02:12 -0700)]
page_mapping(): add ifdef around reference to swapper_space

This fixes the superh build when the pageflags patches are applied.

But it shouldn't unless it's a gcc bug.

Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agokbuild: create a way to create preprocessor constants from C expressions
Christoph Lameter [Mon, 28 Apr 2008 09:12:44 +0000 (02:12 -0700)]
kbuild: create a way to create preprocessor constants from C expressions

The use of enums create constants that are not available to the preprocessor
when building the kernel (f.e.  MAX_NR_ZONES).

Arch code already has a way to export constants calculated to the preprocessor
through the asm-offsets.c file.  Generate something similar for the core
kernel through kbuild.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosparsemem: vmemmap does not need section bits
Christoph Lameter [Mon, 28 Apr 2008 09:12:43 +0000 (02:12 -0700)]
sparsemem: vmemmap does not need section bits

A set of patches that attempts to improve page flag handling.  First of all a
method is introduced to generate the page flag functions using macros.  Then
the number of page flags used by sparsemem is reduced.  All page flag
operations will no longer be macros.  All flags will use inline function.

Then we add a way to export enum constants to the preprocessor which allows us
to get rid of __ZONE_COUNT and use the NR_PAGEFLAGS for the dynamic
calculation of actually available page flags for fields.

This patch:

Sparsemem vmemmap does not need any section bits.  This patch has the effect
of reducing the number of bits used in page->flags by at least 6.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agovmallocinfo: add caller information
Christoph Lameter [Mon, 28 Apr 2008 09:12:42 +0000 (02:12 -0700)]
vmallocinfo: add caller information

Add caller information so that /proc/vmallocinfo shows where the allocation
request for a slice of vmalloc memory originated.

Results in output like this:

0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c34000-0xffffc20000c36000    8192 init_vdso_vars+0xde/0x1f1
0xffffc20000c36000-0xffffc20000c38000    8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
0xffffc20000c38000-0xffffc20000c3a000    8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
0xffffc20000c3a000-0xffffc20000c3e000   16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
0xffffc20000c40000-0xffffc20000c61000  135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
0xffffc20000c61000-0xffffc20000c6a000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c6a000-0xffffc20000c73000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c73000-0xffffc20000c7c000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c7c000-0xffffc20000c7f000   12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
0xffffc20002204000-0xffffc2000220d000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000220d000-0xffffc20002216000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002216000-0xffffc2000221f000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000221f000-0xffffc20002228000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002228000-0xffffc20002231000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002231000-0xffffc20002234000   12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
0xffffc20002240000-0xffffc20002261000  135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
0xffffffffa0000000-0xffffffffa0022000  139264 module_alloc+0x4f/0x55 pages=33 vmalloc
0xffffffffa0022000-0xffffffffa0029000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
0xffffffffa002b000-0xffffffffa0034000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa0034000-0xffffffffa003d000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa003d000-0xffffffffa0049000   49152 module_alloc+0x4f/0x55 pages=11 vmalloc
0xffffffffa0049000-0xffffffffa0050000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agovmalloc: show vmalloced areas via /proc/vmallocinfo
Christoph Lameter [Mon, 28 Apr 2008 09:12:40 +0000 (02:12 -0700)]
vmalloc: show vmalloced areas via /proc/vmallocinfo

Implement a new proc file that allows the display of the currently allocated
vmalloc memory.

It allows to see the users of vmalloc.  That is important if vmalloc space is
scarce (i386 for example).

And it's going to be important for the compound page fallback to vmalloc.
Many of the current users can be switched to use compound pages with fallback.
 This means that the number of users of vmalloc is reduced and page tables no
longer necessary to access the memory.  /proc/vmallocinfo allows to review how
that reduction occurs.

If memory becomes fragmented and larger order allocations are no longer
possible then /proc/vmallocinfo allows to see which compound page allocations
fell back to virtual compound pages.  That is important for new users of
virtual compound pages.  Such as order 1 stack allocation etc that may
fallback to virtual compound pages in the future.

/proc/vmallocinfo permissions are made readable-only-by-root to avoid possible
information leakage.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: CONFIG_MMU=n build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: make early_pfn_to_nid() a C function
Andrew Morton [Mon, 28 Apr 2008 09:12:39 +0000 (02:12 -0700)]
mm: make early_pfn_to_nid() a C function

Fix this (sparc64)

mm/sparse-vmemmap.c: In function `vmemmap_verify':
mm/sparse-vmemmap.c:64: warning: unused variable `pfn'

by switching to a C function which touches its arg.

(reason 3,555 why macros are bad)

Also, the `nid' arg was misnamed.

Reviewed-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: rotate_reclaimable_page() cleanup
Miklos Szeredi [Mon, 28 Apr 2008 09:12:38 +0000 (02:12 -0700)]
mm: rotate_reclaimable_page() cleanup

Clean up messy conditional calling of test_clear_page_writeback() from both
rotate_reclaimable_page() and end_page_writeback().

The only user of rotate_reclaimable_page() is end_page_writeback() so this is
OK.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm/page_alloc.c: fix indentation
S.Caglar Onur [Mon, 28 Apr 2008 09:12:38 +0000 (02:12 -0700)]
mm/page_alloc.c: fix indentation

zlc_setup(): handle jiffies wraparound
(10ed273f5016c582413dfbc468dd084957d847e1) changes tab with spaces

Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: save some bytes in mm_struct by filling holes on 64bit
Andi Kleen [Mon, 28 Apr 2008 09:12:37 +0000 (02:12 -0700)]
mm: save some bytes in mm_struct by filling holes on 64bit

Save some bytes in mm_struct by filling holes

Putting int values together for better packing on 64bit shrinks sizeof(struct
mm_struct) from 776 bytes to 764 bytes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agodmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too
Andi Kleen [Mon, 28 Apr 2008 09:12:37 +0000 (02:12 -0700)]
dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too

Previously it was only enabled for CONFIG_DEBUG_SLAB.

Not hooked into the slub runtime debug configuration, so you currently only
get it with CONFIG_SLUB_DEBUG_ON, not plain CONFIG_SLUB_DEBUG

Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: fix parsing of tmpfs mpol mount option
Lee Schermerhorn [Mon, 28 Apr 2008 09:12:36 +0000 (02:12 -0700)]
mempolicy: fix parsing of tmpfs mpol mount option

Parsing of new mode flags in the tmpfs mpol mount option is slightly broken:

Setting a valid flag works OK:
#mount -o remount,mpol=bind=static:1-2 /dev/shm
#mount
...
tmpfs on /dev/shm type tmpfs (rw,mpol=bind=static:1-2)
...

However, we can't remove them or change them, once we've
set a valid flag:

#mount -o remount,mpol=bind:1-2 /dev/shm
#mount
...
tmpfs on /dev/shm type tmpfs (rw,mpol=bind:1-2)
...

It SAYS it removed it, but that's just a copy of the input
string.  If we now try to set it to a different flag, we
get:

#mount -o remount,mpol=bind=relative:1-2 /dev/shm
mount: /dev/shm not mounted already, or bad option

And on the console, we see:
tmpfs: Bad value 'bind' for mount option 'mpol'
                      ^ lost remainder of string

Furthermore, bogus flags are accepted with out error.
Granted, they are a no-op:

#mount -o remount,mpol=interleave=foo:0-3 /dev/shm
#mount
...
tmpfs on /dev/shm type tmpfs (rw,mpol=interleave=foo:0-3)

Again, that's just a copy of the input string shown by the mount command.

This patch fixes the behavior by pre-zeroing the flags so that only one of the
mutually exclusive flags can be set at one time.  It also reports an error
when an unrecognized flag is specified.

The check for both flags being set is removed because it can't happen with
this implementation.  If we ever want to support multiple non-exclusive flags,
this area will need rework and we will need to check that any mutually
exclusive flags aren't specified.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: disallow static or relative flags for local preferred mode
David Rientjes [Mon, 28 Apr 2008 09:12:34 +0000 (02:12 -0700)]
mempolicy: disallow static or relative flags for local preferred mode

MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
MPOL_PREFERRED policies that were created with an empty nodemask (for purely
local allocations).  They'll never be invalidated because the allowed mems of
a task changes or need to be rebound relative to a cpuset's placement.

Also fixes a bug identified by Lee Schermerhorn that disallowed empty
nodemasks to be passed to MPOL_PREFERRED to specify local allocations.  [A
different, somewhat incomplete, patch already existed in 25-rc5-mm1.]

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: small header file cleanup
David Rientjes [Mon, 28 Apr 2008 09:12:34 +0000 (02:12 -0700)]
mempolicy: small header file cleanup

Removes forward definition of vm_area_struct in linux/mempolicy.h.  We already
get it from the linux/slab.h -> linux/gfp.h include.

Removes the unused mpol_set_vma_default() macro from linux/mempolicy.h.

Removes the extern definition of default_policy since it is only referenced,
as it should be, in mm/mempolicy.c.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: create mempolicy_operations structure
David Rientjes [Mon, 28 Apr 2008 09:12:33 +0000 (02:12 -0700)]
mempolicy: create mempolicy_operations structure

Create a mempolicy_operations structure that currently points to two
functions[*] for the various modes:

int (*create)(struct mempolicy *, const nodemask_t *);
void (*rebind)(struct mempolicy *, const nodemask_t *);

This splits the implementation for the various modes out of two large
functions, mpol_new() and mpol_rebind_policy().  Eventually it may be
beneficial to add additional functions to accomodate the existing switch()
statements in mm/mempolicy.c.

 [*] The ->create() function for MPOL_DEFAULT is currently NULL since no
     struct mempolicy is dynamically allocated.

[Lee.Schermerhorn@hp.com: fix regression in the package mempolicy regression tests]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: move rebind functions
David Rientjes [Mon, 28 Apr 2008 09:12:32 +0000 (02:12 -0700)]
mempolicy: move rebind functions

Move the mpol_rebind_{policy,task,mm}() functions after mpol_new() to avoid
having to declare function prototypes.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: update NUMA memory policy documentation
David Rientjes [Mon, 28 Apr 2008 09:12:31 +0000 (02:12 -0700)]
mempolicy: update NUMA memory policy documentation

Updates Documentation/vm/numa_memory_policy.txt and
Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode flags.

Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: add MPOL_F_RELATIVE_NODES flag
David Rientjes [Mon, 28 Apr 2008 09:12:30 +0000 (02:12 -0700)]
mempolicy: add MPOL_F_RELATIVE_NODES flag

Adds another optional mode flag, MPOL_F_RELATIVE_NODES, that specifies
nodemasks passed via set_mempolicy() or mbind() should be considered relative
to the current task's mems_allowed.

When the mempolicy is created, the passed nodemask is folded and mapped onto
the current task's mems_allowed.  For example, consider a task using
set_mempolicy() to pass MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES with a
nodemask of 1-3.  If current's mems_allowed is 4-7, the effected nodemask is
5-7 (the second, third, and fourth node of mems_allowed).

If the same task is attached to a cpuset, the mempolicy nodemask is rebound
each time the mems are changed.  Some possible rebinds and results are:

mems result
1-3 1-3
1-7 2-4
1,5-6 1,5-6
1,5-7 5-7

Likewise, the zonelist built for MPOL_BIND acts on the set of zones assigned
to the resultant nodemask from the relative remap.

In the MPOL_PREFERRED case, the preferred node is remapped from the currently
effected nodemask to the relative nodemask.

This mempolicy mode flag was conceived of by Paul Jackson <pj@sgi.com>.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: add bitmap_onto() and bitmap_fold() operations
Paul Jackson [Mon, 28 Apr 2008 09:12:29 +0000 (02:12 -0700)]
mempolicy: add bitmap_onto() and bitmap_fold() operations

The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.

The bitmap_onto() operator computes one bitmap relative to another.  If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.

The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.

There are two substantive changes between this patch and its
predecessor bitmap_relative:
 1) Renamed bitmap_relative() to be bitmap_onto().
 2) Added bitmap_fold().

The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.

Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
 1) A task cannot at present reliably establish a cpuset
    relative mempolicy because there is an essential race
    condition, in that the tasks cpuset may be changed in
    between the time the task can query its cpuset placement,
    and the time the task can issue the applicable mbind or
    set_memplicy system call.
 2) A task cannot at present establish what cpuset relative
    mempolicy it would like to have, if it is in a smaller
    cpuset than it might have mempolicy preferences for,
    because the existing interface only allows specifying
    mempolicies for nodes currently allowed by the cpuset.

Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.

The motivation for the added bitmap_fold() can be seen in the following
example.

Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15.  Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: add MPOL_F_STATIC_NODES flag
David Rientjes [Mon, 28 Apr 2008 09:12:27 +0000 (02:12 -0700)]
mempolicy: add MPOL_F_STATIC_NODES flag

Add an optional mempolicy mode flag, MPOL_F_STATIC_NODES, that suppresses the
node remap when the policy is rebound.

Adds another member to struct mempolicy, nodemask_t user_nodemask, as part of
a union with cpuset_mems_allowed:

struct mempolicy {
...
union {
nodemask_t cpuset_mems_allowed;
nodemask_t user_nodemask;
} w;
}

that stores the the nodemask that the user passed when he or she created the
mempolicy via set_mempolicy() or mbind().  When using MPOL_F_STATIC_NODES,
which is passed with any mempolicy mode, the user's passed nodemask
intersected with the VMA or task's allowed nodes is always used when
determining the preferred node, setting the MPOL_BIND zonelist, or creating
the interleave nodemask.  This happens whenever the policy is rebound,
including when a task's cpuset assignment changes or the cpuset's mems are
changed.

This creates an interesting side-effect in that it allows the mempolicy
"intent" to lie dormant and uneffected until it has access to the node(s) that
it desires.  For example, if you currently ask for an interleaved policy over
a set of nodes that you do not have access to, the mempolicy is not created
and the task continues to use the previous policy.  With this change, however,
it is possible to create the same mempolicy; it is only effected when access
to nodes in the nodemask is acquired.

It is also possible to mount tmpfs with the static nodemask behavior when
specifying a node or nodemask.  To do this, simply add "=static" immediately
following the mempolicy mode at mount time:

mount -o remount mpol=interleave=static:1-3

Also removes mpol_check_policy() and folds its logic into mpol_new() since it
is now obsoleted.  The unused vma_mpol_equal() is also removed.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: support optional mode flags
David Rientjes [Mon, 28 Apr 2008 09:12:25 +0000 (02:12 -0700)]
mempolicy: support optional mode flags

With the evolution of mempolicies, it is necessary to support mempolicy mode
flags that specify how the policy shall behave in certain circumstances.  The
most immediate need for mode flag support is to suppress remapping the
nodemask of a policy at the time of rebind.

Both the mempolicy mode and flags are passed by the user in the 'int policy'
formal of either the set_mempolicy() or mbind() syscall.  A new constant,
MPOL_MODE_FLAGS, represents the union of legal optional flags that may be
passed as part of this int.  Mempolicies that include illegal flags as part of
their policy are rejected as invalid.

An additional member to struct mempolicy is added to support the mode flags:

struct mempolicy {
...
unsigned short policy;
unsigned short flags;
}

The splitting of the 'int' actual passed by the user is done in
sys_set_mempolicy() and sys_mbind() for their respective syscalls.  This is
done by intersecting the actual with MPOL_MODE_FLAGS, rejecting the syscall of
there are additional flags, and storing it in the new 'flags' member of struct
mempolicy.  The intersection of the actual with ~MPOL_MODE_FLAGS is stored in
the 'policy' member of the struct and all current users of pol->policy remain
unchanged.

The union of the policy mode and optional mode flags is passed back to the
user in get_mempolicy().

This combination of mode and flags within the same actual does not break
userspace code that relies on get_mempolicy(&policy, ...) and either

switch (policy) {
case MPOL_BIND:
...
case MPOL_INTERLEAVE:
...
};

statements or

if (policy == MPOL_INTERLEAVE) {
...
}

statements.  Such applications would need to use optional mode flags when
calling set_mempolicy() or mbind() for these previously implemented statements
to stop working.  If an application does start using optional mode flags, it
will need to mask the optional flags off the policy in switch and conditional
statements that only test mode.

An additional member is also added to struct shmem_sb_info to store the
optional mode flags.

[hugh@veritas.com: shmem mpol: fix build warning]
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomempolicy: convert MPOL constants to enum
David Rientjes [Mon, 28 Apr 2008 09:12:23 +0000 (02:12 -0700)]
mempolicy: convert MPOL constants to enum

The mempolicy mode constants, MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, and
MPOL_INTERLEAVE, are better declared as part of an enum since they are
sequentially numbered and cannot be combined.

The policy member of struct mempolicy is also converted from type short to
type unsigned short.  A negative policy does not have any legitimate meaning,
so it is possible to change its type in preparation for adding optional mode
flags later.

The equivalent member of struct shmem_sb_info is also changed from int to
unsigned short.

For compatibility, the policy formal to get_mempolicy() remains as a pointer
to an int:

int get_mempolicy(int *policy, unsigned long *nmask,
  unsigned long maxnode, unsigned long addr,
  unsigned long flags);

although the only possible values is the range of type unsigned short.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: move cache_line_size() to <linux/cache.h>
Pekka Enberg [Mon, 28 Apr 2008 09:12:22 +0000 (02:12 -0700)]
mm: move cache_line_size() to <linux/cache.h>

Not all architectures define cache_line_size() so as suggested by Andrew move
the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agohugetlb: decrease hugetlb_lock cycling in gather_surplus_huge_pages
Adam Litke [Mon, 28 Apr 2008 09:12:20 +0000 (02:12 -0700)]
hugetlb: decrease hugetlb_lock cycling in gather_surplus_huge_pages

To reduce hugetlb_lock acquisitions and releases when freeing excess surplus
pages, scan the page list in two parts.  First, transfer the needed pages to
the hugetlb pool.  Then drop the lock and free the remaining pages back to the
buddy allocator.

In the common case there are zero excess pages and no lock operations are
required.

Thanks Mel Gorman for this improvement.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: try both endianess when checking for endianess
Chris Dearman [Mon, 28 Apr 2008 09:12:19 +0000 (02:12 -0700)]
mm: try both endianess when checking for endianess

When checking for the swap header try byteswapping the endianess dependent
fields to allow the swap partition to be shared between big & little endian
systems.

Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: filter based on a nodemask as well as a gfp_mask
Mel Gorman [Mon, 28 Apr 2008 09:12:18 +0000 (02:12 -0700)]
mm: filter based on a nodemask as well as a gfp_mask

The MPOL_BIND policy creates a zonelist that is used for allocations
controlled by that mempolicy.  As the per-node zonelist is already being
filtered based on a zone id, this patch adds a version of __alloc_pages() that
takes a nodemask for further filtering.  This eliminates the need for
MPOL_BIND to create a custom zonelist.

A positive benefit of this is that allocations using MPOL_BIND now use the
local node's distance-ordered zonelist instead of a custom node-id-ordered
zonelist.  I.e., pages will be allocated from the closest allowed node with
available memory.

[Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: have zonelist contains structs with both a zone pointer and zone_idx
Mel Gorman [Mon, 28 Apr 2008 09:12:17 +0000 (02:12 -0700)]
mm: have zonelist contains structs with both a zone pointer and zone_idx

Filtering zonelists requires very frequent use of zone_idx().  This is costly
as it involves a lookup of another structure and a substraction operation.  As
the zone_idx is often required, it should be quickly accessible.  The node idx
could also be stored here if it was found that accessing zone->node is
significant which may be the case on workloads where nodemasks are heavily
used.

This patch introduces a struct zoneref to store a zone pointer and a zone
index.  The zonelist then consists of an array of these struct zonerefs which
are looked up as necessary.  Helpers are given for accessing the zone index as
well as the node index.

[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
[hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
[hugh@veritas.com: just return do_try_to_free_pages]
[hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: use two zonelist that are filtered by GFP mask
Mel Gorman [Mon, 28 Apr 2008 09:12:16 +0000 (02:12 -0700)]
mm: use two zonelist that are filtered by GFP mask

Currently a node has two sets of zonelists, one for each zone type in the
system and a second set for GFP_THISNODE allocations.  Based on the zones
allowed by a gfp mask, one of these zonelists is selected.  All of these
zonelists consume memory and occupy cache lines.

This patch replaces the multiple zonelists per-node with two zonelists.  The
first contains all populated zones in the system, ordered by distance, for
fallback allocations when the target/preferred node has no free pages.  The
second contains all populated zones in the node suitable for GFP_THISNODE
allocations.

An iterator macro is introduced called for_each_zone_zonelist() that interates
through each zone allowed by the GFP flags in the selected zonelist.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: remember what the preferred zone is for zone_statistics
Mel Gorman [Mon, 28 Apr 2008 09:12:14 +0000 (02:12 -0700)]
mm: remember what the preferred zone is for zone_statistics

On NUMA, zone_statistics() is used to record events like numa hit, miss and
foreign.  It assumes that the first zone in a zonelist is the preferred zone.
When multiple zonelists are replaced by one that is filtered, this is no
longer the case.

This patch records what the preferred zone is rather than assuming the first
zone in the zonelist is it.  This simplifies the reading of later patches in
this set.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: introduce node_zonelist() for accessing the zonelist for a GFP mask
Mel Gorman [Mon, 28 Apr 2008 09:12:14 +0000 (02:12 -0700)]
mm: introduce node_zonelist() for accessing the zonelist for a GFP mask

Introduce a node_zonelist() helper function.  It is used to lookup the
appropriate zonelist given a node and a GFP mask.  The patch on its own is a
cleanup but it helps clarify parts of the two-zonelist-per-node patchset.  If
necessary, it can be merged with the next patch in this set without problems.

Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: use zonelists instead of zones when direct reclaiming pages
Mel Gorman [Mon, 28 Apr 2008 09:12:12 +0000 (02:12 -0700)]
mm: use zonelists instead of zones when direct reclaiming pages

The following patches replace multiple zonelists per node with two zonelists
that are filtered based on the GFP flags.  The patches as a set fix a bug with
regard to the use of MPOL_BIND and ZONE_MOVABLE.  With this patchset, the
MPOL_BIND will apply to the two highest zones when the highest zone is
ZONE_MOVABLE.  This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters
only custom zonelists.

The first patch cleans up an inconsistency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch introduces a helper function node_zonelist() for looking up
the appropriate zonelist for a GFP mask which simplifies patches later in the
set.

The third patch defines/remembers the "preferred zone" for numa statistics, as
it is no longer always the first zone in a zonelist.

The forth patch replaces multiple zonelists with two zonelists that are
filtered.  The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fifth patch introduces helper macros for retrieving the zone and node
indices of entries in a zonelist.

The final patch introduces filtering of the zonelists based on a nodemask.
Two zonelists exist per node, one for normal allocations and one for
__GFP_THISNODE.

Performance results varied depending on the machine configuration.  In real
workloads the gain/loss will depend on how much the userspace portion of the
benchmark benefits from having more cache available due to reduced referencing
of zonelists.

These are the range of performance losses/gains when running against
2.6.24-rc4-mm1.  The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.
     loss   to  gain
Total CPU time on Kernbench: -0.86% to  1.13%
Elapsed   time on Kernbench: -0.79% to  0.76%
page_test from aim9:         -4.37% to  0.79%
brk_test  from aim9:         -0.71% to  4.07%
fork_test from aim9:         -1.84% to  4.60%
exec_test from aim9:         -0.71% to  1.08%

This patch:

The allocator deals with zonelists which indicate the order in which zones
should be targeted for an allocation.  Similarly, direct reclaim of pages
iterates over an array of zones.  For consistency, this patch converts direct
reclaim to use a zonelist.  No functionality is changed by this patch.  This
simplifies zonelist iterators in the next patch.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomake swap_pte_to_pagemap_entry() static
Adrian Bunk [Mon, 28 Apr 2008 09:12:11 +0000 (02:12 -0700)]
make swap_pte_to_pagemap_entry() static

Make the needlessly global swap_pte_to_pagemap_entry() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: remove nopage
Nick Piggin [Mon, 28 Apr 2008 09:12:10 +0000 (02:12 -0700)]
mm: remove nopage

Nothing in the tree uses nopage any more.  Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agommap_region: cleanup the final vma_merge() related code
Oleg Nesterov [Mon, 28 Apr 2008 09:12:10 +0000 (02:12 -0700)]
mmap_region: cleanup the final vma_merge() related code

It is not easy to actually understand the "if (!file || !vma_merge())"
code, turn it into "if (file && vma_merge())".  This makes immediately
obvious that the subsequent "if (file)" is superfluous.

As Hugh Dickins pointed out, we can also factor out the ->i_writecount
corrections, and add a small comment about that.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofix invalidate_inode_pages2_range() to not clear ret
Hisashi Hifumi [Mon, 28 Apr 2008 09:12:08 +0000 (02:12 -0700)]
fix invalidate_inode_pages2_range() to not clear ret

DIO invalidates page cache through invalidate_inode_pages2_range().
invalidate_inode_pages2_range() sets ret=-EIO when
invalidate_complete_page2() fails, but this ret is cleared if
do_launder_page() succeed on a page of next index.

In this case, dio is carried out even if invalidate_complete_page2() fails
on some pages.

This can cause inconsistency between memory and blocks on HDD because the
page cache still exists.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <cel@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoremove sparse warning for mmzone.h
Harvey Harrison [Mon, 28 Apr 2008 09:12:07 +0000 (02:12 -0700)]
remove sparse warning for mmzone.h

include/linux/mmzone.h:640:22: warning: potentially expensive pointer subtraction

Calculate the offset into the node_zones array rather than the index
using casts to (char *) and comparing against the index * sizeof(struct zone).

On X86_32 this saves a sar, but code size increases by one byte per
is_highmem() use due to 32-bit cmps rather than 16 bit cmps.

Before:
 207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax
 20d:   c1 f8 0b                sar    $0xb,%eax
 210:   83 f8 02                cmp    $0x2,%eax
 213:   74 16                   je     22b <kmap_atomic_prot+0x144>
 215:   83 f8 03                cmp    $0x3,%eax
 218:   0f 85 8f 00 00 00       jne    2ad <kmap_atomic_prot+0x1c6>
 21e:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0
 225:   0f 85 82 00 00 00       jne    2ad <kmap_atomic_prot+0x1c6>
 22b:   64 a1 00 00 00 00       mov    %fs:0x0,%eax

After:
 207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax
 20d:   3d 00 10 00 00          cmp    $0x1000,%eax
 212:   74 18                   je     22c <kmap_atomic_prot+0x145>
 214:   3d 00 18 00 00          cmp    $0x1800,%eax
 219:   0f 85 8f 00 00 00       jne    2ae <kmap_atomic_prot+0x1c7>
 21f:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0
 226:   0f 85 82 00 00 00       jne    2ae <kmap_atomic_prot+0x1c7>
 22c:   64 a1 00 00 00 00       mov    %fs:0x0,%eax

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoRemove set_migrateflags()
Christoph Lameter [Mon, 28 Apr 2008 09:12:05 +0000 (02:12 -0700)]
Remove set_migrateflags()

Migrate flags must be set on slab creation as agreed upon when the antifrag
logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
unmovable and others in the reclaimable section depending on which flag was
active when a new slab page was allocated.

This likely slid in somehow when antifrag was merged. Remove it.

The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
effect there.

Radix tree allocations are not directly reclaimable but they are allocated
with __GFP_RECLAIMABLE set on each allocation.  We now set
SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
tree slabs are consistently placed in the reclaimable section.  Radix tree
slabs will also be accounted as such.

There is then no user left of set_migratepages. So remove it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoaio: io_getevents() should return if io_destroy() is invoked
Jeff Moyer [Mon, 28 Apr 2008 09:12:04 +0000 (02:12 -0700)]
aio: io_getevents() should return if io_destroy() is invoked

This patch wakes up a thread waiting in io_getevents if another thread
destroys the context.  This was tested using a small program that spawns a
thread to wait in io_getevents while the parent thread destroys the io context
and then waits for the getevents thread to exit.  Without this patch, the
program hangs indefinitely.  With the patch, the program exits as expected.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Christopher Smith <x@xman.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agohotplug-memory: make online_page() common
Jeremy Fitzhardinge [Mon, 28 Apr 2008 09:12:03 +0000 (02:12 -0700)]
hotplug-memory: make online_page() common

All architectures use an effectively identical definition of online_page(), so
just make it common code.  x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone.  We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately.  This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agohotplug memory remove: generic __remove_pages() support
Badari Pulavarty [Mon, 28 Apr 2008 09:12:01 +0000 (02:12 -0700)]
hotplug memory remove: generic __remove_pages() support

Generic helper function to remove section mappings and sysfs entries for the
section of the memory we are removing.  offline_pages() correctly adjusted
zone and marked the pages reserved.

TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Mon, 28 Apr 2008 09:12:00 +0000 (02:12 -0700)]
rtc: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agodrivers/char/rtc.c: use time_before, time_before_eq, etc
Julia Lawall [Mon, 28 Apr 2008 09:11:59 +0000 (02:11 -0700)]
drivers/char/rtc.c: use time_before, time_before_eq, etc

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the semantic patch making this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ change_compare_np @
expression E;
@@

(
- jiffies <= E
+ time_before_eq(jiffies,E)
|
- jiffies >= E
+ time_after_eq(jiffies,E)
|
- jiffies < E
+ time_before(jiffies,E)
|
- jiffies > E
+ time_after(jiffies,E)
)

@ include depends on change_compare_np @
@@

#include <linux/jiffies.h>

@ no_include depends on !include && change_compare_np @
@@

  #include <linux/...>
+ #include <linux/jiffies.h>
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc: add the support for alarm time relative to current time in sysfs
Zhao Yakui [Mon, 28 Apr 2008 09:11:58 +0000 (02:11 -0700)]
rtc: add the support for alarm time relative to current time in sysfs

In current kernel if we want to set the alarm time, the absolute time the
seconds relative to 1970-01-01 00:00:00) should be written into
/sys/class/rtc/rtc0/wakealarm.  It is not convenient.

It is more reasonable to add the support for the alarm time relative to
current RTC time.(the unit is second)

For example:
If the RTC is required to generate alarm after 2 minutes, the following
will be OK.
echo +120 > /sys/class/rtc/rtc0/wakealarm
or      echo +0x78 > /sys/class/rtc/rtc0/wakealarm

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc: rtc-rs5c372: fix up NULL name in transfer error path
Paul Mundt [Mon, 28 Apr 2008 09:11:57 +0000 (02:11 -0700)]
rtc: rtc-rs5c372: fix up NULL name in transfer error path

rs5c_get_regs() currently uses rs5c->rtc->name for its debug printk when
i2c_transfer() fails, though it is used several times before the rtc dev
has been registered. The earliest we can get at the symbolic name is via
the i2c client's struct device, which can be handled by moving the first
rs5c_get_regs() until after the client pointer is assigned.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agokerneldoc for <linux/clk.h>
David Brownell [Mon, 28 Apr 2008 09:11:56 +0000 (02:11 -0700)]
kerneldoc for <linux/clk.h>

Add <linux/clk.h> to the generated kerneldoc, with some overview
to go along with those per-function descriptions.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomake ds1511_rtc_{read,set}_time() static
Adrian Bunk [Mon, 28 Apr 2008 09:11:55 +0000 (02:11 -0700)]
make ds1511_rtc_{read,set}_time() static

Make the needlessly global ds1511_rtc_{read,set}_time() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc: silence section mismatch warning in rtc-test
Sam Ravnborg [Mon, 28 Apr 2008 09:11:55 +0000 (02:11 -0700)]
rtc: silence section mismatch warning in rtc-test

Fix following warning:
WARNING: vmlinux.o(.data+0x253e28): Section mismatch in reference from the variable test_drv to the function .devexit.text:test_remove()

Fix by renaming the platfrom_driver variable from *_drv to *_driver
so modpost ignore the reference to an __devexit section.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc-x1205: new style conversion
Alessandro Zummo [Mon, 28 Apr 2008 09:11:54 +0000 (02:11 -0700)]
rtc-x1205: new style conversion

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc-pcf8563: new style conversion
Alessandro Zummo [Mon, 28 Apr 2008 09:11:54 +0000 (02:11 -0700)]
rtc-pcf8563: new style conversion

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc-isl1208: new style conversion and minor bug fixes
Alessandro Zummo [Mon, 28 Apr 2008 09:11:53 +0000 (02:11 -0700)]
rtc-isl1208: new style conversion and minor bug fixes

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Herbert Valerio Riedel <hvr@gnu.org>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc: avoid legacy drivers with generic framework
David Brownell [Mon, 28 Apr 2008 09:11:52 +0000 (02:11 -0700)]
rtc: avoid legacy drivers with generic framework

Kconfig tweaks to help reduce RTC configuration bugs, by avoiding
legacy RTC drivers when the generic RTC framework is enabled:

 - If rtc-cmos is selected, disable the legacy rtc driver;

 - When using generic RTC on x86, enable rtc-cmos by default;

 - In the old "chardev RTC" section of Kconfig, add a comment
   warning people off these (seven) legacy RTC drivers when
   the generic framework is in use.

People can still use the legacy drivers if they want (or need) to.

This doesn't fix the broken dependencies for the legacy "CMOS" RTC driver.
Ideally it would be a full list of platforms where it works, not a partial
list of ones where it won't.  Or better yet, it would depend on a
"HAVE_CMOS_RTC" flag defined by various platforms ...  surely there's a
Kconfig style guideline lurking there.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc-pcf8583 build fix
David Brownell [Mon, 28 Apr 2008 09:11:51 +0000 (02:11 -0700)]
rtc-pcf8583 build fix

Fix bogus #include in rtc-pcf8583, so it compiles on platforms that
don't support PC clone RTCs.  (Original issue noted by Adrian Bunk.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Adrian Bunk <bunk@kernel.org>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agodz: test after postfix decrement fails in dz_console_putchar()
Roel Kluin [Mon, 28 Apr 2008 09:11:50 +0000 (02:11 -0700)]
dz: test after postfix decrement fails in dz_console_putchar()

When loops reaches 0 the postfix decrement still subtracts, so the subsequent
test fails.

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agomm: fix possible off-by-one in walk_pte_range()
Johannes Weiner [Mon, 28 Apr 2008 09:11:47 +0000 (02:11 -0700)]
mm: fix possible off-by-one in walk_pte_range()

After the loop in walk_pte_range() pte might point to the first address after
the pmd it walks.  The pte_unmap() is then applied to something bad.

Spotted by Roel Kluin and Andreas Schwab.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Roel Kluin <12o3l@tiscali.nl>
Cc: Andreas Schwab <schwab@suse.de>
Acked-by: Matt Mackall <mpm@selenic.com>
Acked-by: Mikael Pettersson <mikpe@it.uu.se>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agox86: PAT fix
Ingo Molnar [Fri, 21 Mar 2008 14:42:28 +0000 (15:42 +0100)]
x86: PAT fix

Adrian Bunk noticed the following Coverity report:

> Commit e7f260a276f2c9184fe753732d834b1f6fbe9f17
> (x86: PAT use reserve free memtype in mmap of /dev/mem)
> added the following gem to arch/x86/mm/pat.c:
>
> <--  snip  -->
>
> ...
> int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
>                                 unsigned long size, pgprot_t *vma_prot)
> {
>         u64 offset = ((u64) pfn) << PAGE_SHIFT;
>         unsigned long flags = _PAGE_CACHE_UC_MINUS;
>         unsigned long ret_flags;
> ...
> ...  (nothing that touches ret_flags)
> ...
>         if (flags != _PAGE_CACHE_UC_MINUS) {
>                 retval = reserve_memtype(offset, offset + size, flags, NULL);
>         } else {
>                 retval = reserve_memtype(offset, offset + size, -1, &ret_flags);
>         }
>
>         if (retval < 0)
>                 return 0;
>
>         flags = ret_flags;
>
>         if (pfn <= max_pfn_mapped &&
>             ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
>                 free_memtype(offset, offset + size);
>                 printk(KERN_INFO
>                 "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
>                         current->comm, current->pid,
>                         cattr_name(flags),
>                         offset, offset + size);
>                 return 0;
>         }
>
>         *vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) |
>                              flags);
>         return 1;
> }
>
> <--  snip  -->
>
> If (flags != _PAGE_CACHE_UC_MINUS) we pass garbage from the stack to
> ioremap_change_attr() and/or __pgprot().
>
> Spotted by the Coverity checker.

the fix simplifies the code as we get rid of the 'ret_flags'
complication.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosparc64: Kill PIL_RESERVED, unused.
David S. Miller [Mon, 28 Apr 2008 11:03:06 +0000 (04:03 -0700)]
sparc64: Kill PIL_RESERVED, unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PATCH] new predicate - AUDIT_FILETYPE
Al Viro [Mon, 28 Apr 2008 08:15:49 +0000 (04:15 -0400)]
[PATCH] new predicate - AUDIT_FILETYPE

Argument is S_IF... | <index>, where index is normally 0 or 1.
Triggers if chosen element of ctx->names[] is present and the
mode of object in question matches the upper bits of argument.
I.e. for things like "is the argument of that chmod a directory",
etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years ago[patch 2/2] Use find_task_by_vpid in audit code
Pavel Emelyanov [Fri, 18 Apr 2008 20:30:15 +0000 (13:30 -0700)]
[patch 2/2] Use find_task_by_vpid in audit code

The pid to lookup a task by is passed inside audit code via netlink message.

Thanks to Denis Lunev, netlink packets are now (since 2.6.24) _always_
processed in the context of the sending task.  So this is correct to lookup
the task with find_task_by_vpid() here.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years ago[patch 1/2] audit: let userspace fully control TTY input auditing
Miloslav Trmac [Fri, 18 Apr 2008 20:30:14 +0000 (13:30 -0700)]
[patch 1/2] audit: let userspace fully control TTY input auditing

Remove the code that automatically disables TTY input auditing in processes
that open TTYs when they have no other TTY open; this heuristic was
intended to automatically handle daemons, but it has false positives (e.g.
with sshd) that make it impossible to control TTY input auditing from a PAM
module.  With this patch, TTY input auditing is controlled from user-space
only.

On the other hand, not even for daemons does it make sense to audit "input"
from PTY masters; this data was produced by a program writing to the PTY
slave, and does not represent data entered by the user.

Signed-off-by: Miloslav Trmac <mitr@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years ago[PATCH 2/2] audit: fix sparse shadowed variable warnings
Harvey Harrison [Sun, 27 Apr 2008 09:39:56 +0000 (02:39 -0700)]
[PATCH 2/2] audit: fix sparse shadowed variable warnings

Use msglen as the identifier.
kernel/audit.c:724:10: warning: symbol 'len' shadows an earlier one
kernel/audit.c:575:8: originally declared here

Don't use ino_f to check the inode field at the end of the functions.
kernel/auditfilter.c:429:22: warning: symbol 'f' shadows an earlier one
kernel/auditfilter.c:420:21: originally declared here
kernel/auditfilter.c:542:22: warning: symbol 'f' shadows an earlier one
kernel/auditfilter.c:529:21: originally declared here

i always used as a counter for a for loop and initialized to zero before
use.  Eliminate the inner i variables.
kernel/auditsc.c:1295:8: warning: symbol 'i' shadows an earlier one
kernel/auditsc.c:1152:6: originally declared here
kernel/auditsc.c:1320:7: warning: symbol 'i' shadows an earlier one
kernel/auditsc.c:1152:6: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years ago[PATCH 1/2] audit: move extern declarations to audit.h
Harvey Harrison [Sun, 27 Apr 2008 09:39:17 +0000 (02:39 -0700)]
[PATCH 1/2] audit: move extern declarations to audit.h

Leave audit_sig_{uid|pid|sid} protected by #ifdef CONFIG_AUDITSYSCALL.

Noticed by sparse:
kernel/audit.c:73:6: warning: symbol 'audit_ever_enabled' was not declared. Should it be static?
kernel/audit.c:100:8: warning: symbol 'audit_sig_uid' was not declared. Should it be static?
kernel/audit.c:101:8: warning: symbol 'audit_sig_pid' was not declared. Should it be static?
kernel/audit.c:102:6: warning: symbol 'audit_sig_sid' was not declared. Should it be static?
kernel/audit.c:117:23: warning: symbol 'audit_ih' was not declared. Should it be static?
kernel/auditfilter.c:78:18: warning: symbol 'audit_filter_list' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: MAINTAINERS update
Eric Paris [Fri, 18 Apr 2008 14:47:32 +0000 (10:47 -0400)]
Audit: MAINTAINERS update

Change maintainers to include me and al viro

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: increase the maximum length of the key field
Eric Paris [Fri, 18 Apr 2008 14:36:22 +0000 (10:36 -0400)]
Audit: increase the maximum length of the key field

Key lengths were arbitrarily limited to 32 characters.  If userspace is going
to start using the single kernel key field as multiple virtual key fields
(example key=key1,key2,key3,key4) we should give them enough room to work.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: standardize string audit interfaces
Eric Paris [Fri, 18 Apr 2008 14:12:59 +0000 (10:12 -0400)]
Audit: standardize string audit interfaces

This patch standardized the string auditing interfaces.  No userspace
changes will be visible and this is all just cleanup and consistancy
work.  We have the following string audit interfaces to use:

void audit_log_n_hex(struct audit_buffer *ab, const unsigned char *buf, size_t len);

void audit_log_n_string(struct audit_buffer *ab, const char *buf, size_t n);
void audit_log_string(struct audit_buffer *ab, const char *buf);

void audit_log_n_untrustedstring(struct audit_buffer *ab, const char *string, size_t n);
void audit_log_untrustedstring(struct audit_buffer *ab, const char *string);

This may be the first step to possibly fixing some of the issues that
people have with the string output from the kernel audit system.  But we
still don't have an agreed upon solution to that problem.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: stop deadlock from signals under load
Eric Paris [Fri, 18 Apr 2008 14:11:04 +0000 (10:11 -0400)]
Audit: stop deadlock from signals under load

A deadlock is possible between kauditd and auditd under load if auditd
receives a signal.  When auditd receives a signal it sends a netlink
message to the kernel asking for information about the sender of the
signal.  In that same context the audit system will attempt to send a
netlink message back to the userspace auditd.  If kauditd has already
filled the socket buffer (see netlink_attachskb()) auditd will now put
itself to sleep waiting for room to send the message.  Since auditd is
responsible for draining that socket we have a deadlock.  The fix, since
the response from the kernel does not need to be synchronous is to send
the signal information back to auditd in a separate thread.  And thus
auditd can continue to drain the audit queue normally.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: save audit_backlog_limit audit messages in case auditd comes back
Eric Paris [Fri, 18 Apr 2008 14:02:28 +0000 (10:02 -0400)]
Audit: save audit_backlog_limit audit messages in case auditd comes back

This patch causes the kernel audit subsystem to store up to
audit_backlog_limit messages for use by auditd if it ever appears
sometime in the future in userspace.  This is useful to collect audit
messages during bootup and even when auditd is stopped.  This is NOT a
reliable mechanism, it does not ever call audit_panic, nor should it.
audit_log_lost()/audit_panic() are called during the normal delivery
mechanism.  The messages are still sent to printk/syslog as usual and if
too many messages appear to be queued they will be silently discarded.

I liked doing it by default, but this patch only uses the queue in
question if it was booted with audit=1 or if the kernel was built
enabling audit by default.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: collect sessionid in netlink messages
Eric Paris [Fri, 18 Apr 2008 14:09:25 +0000 (10:09 -0400)]
Audit: collect sessionid in netlink messages

Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages.  This patch adds that information to netlink messages
so we can audit who sent netlink messages.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agoAudit: end printk with newline
Eric Paris [Fri, 18 Apr 2008 14:01:04 +0000 (10:01 -0400)]
Audit: end printk with newline

A couple of audit printk statements did not have a newline.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17 years agosparc64: Split entry.S up into seperate files.
David S. Miller [Mon, 28 Apr 2008 07:47:20 +0000 (00:47 -0700)]
sparc64: Split entry.S up into seperate files.

entry.S was a hodge-podge of several totally unrelated
sets of assembler routines, ranging from FPU trap handlers
to hypervisor call functions.

Split it up into topic-sized pieces.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[CIFS] Fix statfs formatting
Steve French [Mon, 28 Apr 2008 04:04:34 +0000 (04:04 +0000)]
[CIFS] Fix statfs formatting

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
17 years agoMerge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6
Steve French [Mon, 28 Apr 2008 04:01:34 +0000 (04:01 +0000)]
Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6

17 years agoSELinux: Fix a RCU free problem with the netport cache
Paul Moore [Fri, 25 Apr 2008 19:03:39 +0000 (15:03 -0400)]
SELinux: Fix a RCU free problem with the netport cache

The netport cache doesn't free resources in a manner which is safe or orderly.
This patch fixes this by adding in a missing call to rcu_dereference() in
sel_netport_insert() as well as some general cleanup throughout the file.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: Made netnode cache adds faster
Paul Moore [Fri, 25 Apr 2008 19:03:34 +0000 (15:03 -0400)]
SELinux: Made netnode cache adds faster

When adding new entries to the network node cache we would walk the entire
hash bucket to make sure we didn't cross a threshold (done to bound the
cache size).  This isn't a very quick or elegant solution for something
which is supposed to be quick-ish so add a counter to each hash bucket to
track the size of the bucket and eliminate the need to walk the entire
bucket list on each add.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: include/security.h whitespace, syntax, and other cleanups
Eric Paris [Wed, 23 Apr 2008 18:10:25 +0000 (14:10 -0400)]
SELinux: include/security.h whitespace, syntax, and other cleanups

This patch changes include/security.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

whitespace at end of lines
spaces followed by tabs
spaces used instead of tabs
spacing around parenthesis
location of { around structs and else clauses
location of * in pointer declarations
removal of initialization of static data to keep it in the right section
useless {} in if statemetns
useless checking for NULL before kfree
fixing of the indentation depth of switch statements
no assignments in if statements
include spaces around , in function calls
and any number of other things I forgot to mention

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: policydb.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:17 +0000 (17:46 -0400)]
SELinux: policydb.h whitespace, syntax, and other cleanups

This patch changes policydb.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

spaces followed by tabs
spaces used instead of tabs
location of * in pointer declarations

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: mls_types.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:16 +0000 (17:46 -0400)]
SELinux: mls_types.h whitespace, syntax, and other cleanups

This patch changes mls_types.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

spaces used instead of tabs

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: mls.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:15 +0000 (17:46 -0400)]
SELinux: mls.h whitespace, syntax, and other cleanups

This patch changes mls.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

spaces used instead of tabs

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: hashtab.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:14 +0000 (17:46 -0400)]
SELinux: hashtab.h whitespace, syntax, and other cleanups

This patch changes hashtab.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

spaces used instead of tabs

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: context.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:13 +0000 (17:46 -0400)]
SELinux: context.h whitespace, syntax, and other cleanups

This patch changes context.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

include spaces around , in function calls

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: ss/conditional.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:12 +0000 (17:46 -0400)]
SELinux: ss/conditional.h whitespace, syntax, and other cleanups

This patch changes ss/conditional.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

location of * in pointer declarations

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: selinux/include/security.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:11 +0000 (17:46 -0400)]
SELinux: selinux/include/security.h whitespace, syntax, and other cleanups

This patch changes selinux/include/security.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

whitespace at end of lines
spaces followed by tabs
spaces used instead of tabs
spacing around parenthesis
location of { around structs and else clauses
location of * in pointer declarations
removal of initialization of static data to keep it in the right section
useless {} in if statemetns
useless checking for NULL before kfree
fixing of the indentation depth of switch statements
no assignments in if statements
and any number of other things I forgot to mention

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: objsec.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:10 +0000 (17:46 -0400)]
SELinux: objsec.h whitespace, syntax, and other cleanups

This patch changes objsec.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

whitespace at end of lines
spaces followed by tabs
spaces used instead of tabs
spacing around parenthesis
location of { around structs and else clauses
location of * in pointer declarations
removal of initialization of static data to keep it in the right section
useless {} in if statemetns
useless checking for NULL before kfree
fixing of the indentation depth of switch statements
no assignments in if statements
and any number of other things I forgot to mention

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: netlabel.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:09 +0000 (17:46 -0400)]
SELinux: netlabel.h whitespace, syntax, and other cleanups

This patch changes netlabel.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

spaces used instead of tabs

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoSELinux: avc_ss.h whitespace, syntax, and other cleanups
Eric Paris [Tue, 22 Apr 2008 21:46:08 +0000 (17:46 -0400)]
SELinux: avc_ss.h whitespace, syntax, and other cleanups

This patch changes avc_ss.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

whitespace at end of lines
spaces followed by tabs
spaces used instead of tabs
spacing around parenthesis
location of { around structs and else clauses
location of * in pointer declarations
removal of initialization of static data to keep it in the right section
useless {} in if statemetns
useless checking for NULL before kfree
fixing of the indentation depth of switch statements
no assignments in if statements
and any number of other things I forgot to mention

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoiwlwifi: Allow building iwl3945 without iwl4965.
Jason Riedy [Sun, 27 Apr 2008 22:38:30 +0000 (15:38 -0700)]
iwlwifi: Allow building iwl3945 without iwl4965.

If IWL3945 ever depends on IWLCORE, the silent, user-invisible
IWLWIFI option can go away.

Signed-off-by: Jason Riedy <jason@acm.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agowireless: Fix compile error with wifi & leds
Luca Tettamanti [Sun, 27 Apr 2008 22:34:55 +0000 (15:34 -0700)]
wireless: Fix compile error with wifi & leds

Fix build error caused by commit
e82404ad612ebabc65d15c3d59b971cb35c3ff36 ("iwlwifi: Select
LEDS_CLASS.") from David Miller:

Since MAC80211_LEDS is selected by wireless drivers it must select its
own dependencies otherwise a build error may occur (kbuild will select
the symbol regardless of "depends" constraints).

Signed-off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agotcp: Fix slab corruption with ipv6 and tcp6fuzz
Evgeniy Polyakov [Sun, 27 Apr 2008 22:27:30 +0000 (15:27 -0700)]
tcp: Fix slab corruption with ipv6 and tcp6fuzz

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

This fixes a regression added by ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
("[TCP]: TCP_DEFER_ACCEPT updates - process as established")

tcp_v6_do_rcv()->tcp_rcv_established(), the latter goes to step5, where
eventually skb can be freed via tcp_data_queue() (drop: label), then if
check for tcp_defer_accept_check() returns true and thus
tcp_rcv_established() returns -1, which forces tcp_v6_do_rcv() to jump
to reset: label, which in turn will pass through discard: label and free
the same skb again.

Tested by Eric Sesterhenn.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-By: Patrick McManus <mcmanus@ducksong.com>
17 years agosparc: video drivers: add facility level
Robert Reif [Sun, 27 Apr 2008 22:18:57 +0000 (15:18 -0700)]
sparc: video drivers: add facility level

Add KERN_ facility level to sparc video drivers.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosparc: tcx.c make tcx_init and tcx_exit static
Robert Reif [Sun, 27 Apr 2008 22:18:12 +0000 (15:18 -0700)]
sparc: tcx.c make tcx_init and tcx_exit static

Make tcx_init and tcx_exit static.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosparc: ffb.c make ffb_init and ffb_exit static
Robert Reif [Sun, 27 Apr 2008 22:17:49 +0000 (15:17 -0700)]
sparc: ffb.c make ffb_init and ffb_exit static

Make ffb_init and ffb_exit static.
Remove unnecessary function prototype.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosparc: cg14.c make cg14_init and cg15_exit static
Robert Reif [Sun, 27 Apr 2008 22:17:23 +0000 (15:17 -0700)]
sparc: cg14.c make cg14_init and cg15_exit static

Make cg14_init and cg14_exit static.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosparc: bw2.c fix bw2_exit
Robert Reif [Sun, 27 Apr 2008 22:16:59 +0000 (15:16 -0700)]
sparc: bw2.c fix bw2_exit

Fix void function bw2_exit returning value.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosparc64: Fix accidental syscall restart on child return from clone/fork/vfork.
David S. Miller [Sun, 27 Apr 2008 21:54:02 +0000 (14:54 -0700)]
sparc64: Fix accidental syscall restart on child return from clone/fork/vfork.

This fixes a regression added by
238468b2ac76020c192a7402c92df5097916bf4a ("[SPARC64]: Use trap type
stored in pt_regs to handle syscall restart.")

Because we now encode the "returning from syscall" status in the
pt_regs area, we have to be mindful to zap it out in the child
of a fork.

During a parallel kernel build I saw an accidental -EINTR return
from vfork() in 'make' because of this bug.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosparc64: Clean up handling of pt_regs trap type encoding.
David S. Miller [Sun, 27 Apr 2008 21:52:51 +0000 (14:52 -0700)]
sparc64: Clean up handling of pt_regs trap type encoding.

If we use this from more than one place, it's better to
have helpers instead of twiddling magic constants all
over.

Add pt_regs_trap_type(), pt_regs_clear_trap_type(), and
pt_regs_is_syscall().

Use them in do_signal().

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
David L Stevens [Sun, 27 Apr 2008 08:06:07 +0000 (01:06 -0700)]
ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.

Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPSEC]: Use digest_null directly for auth
Herbert Xu [Sun, 27 Apr 2008 07:59:59 +0000 (00:59 -0700)]
[IPSEC]: Use digest_null directly for auth

Previously digest_null had no setkey function which meant that
we used hmac(digest_null) for IPsec since IPsec always calls
setkey.  Now that digest_null has a setkey we no longer need to
do that.

In fact when only confidentiality is specified for ESP we already
use digest_null directly.  However, when the null algorithm is
explicitly specified by the user we still opt for hmac(digest_null).

This patch removes this discrepancy.  I have not added a new compat
name for it because by chance it wasn't actualy possible for the user
to specify the name hmac(digest_null) due to a key length check in
xfrm_user (which I found out when testing that compat name :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agosunrpc: fix missing kernel-doc
Randy Dunlap [Sun, 27 Apr 2008 05:59:02 +0000 (22:59 -0700)]
sunrpc: fix missing kernel-doc

Fix missing sunrpc kernel-doc:

Warning(linux-2.6.25-git7//net/sunrpc/xprt.c:451): No description found for parameter 'action'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agocan: Fix copy_from_user() results interpretation
Sam Ravnborg [Sun, 27 Apr 2008 05:57:25 +0000 (22:57 -0700)]
can: Fix copy_from_user() results interpretation

Both copy_to_ and _from_user return the number of bytes, that failed to
reach their destination, not the 0/-EXXX values.

Based on patch from Pavel Emelyanov <xemul@openvz.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>