Revert commit 44dd151d "NFS: Don't mark a written page as uptodate until it
is on disk". While it is true that the write may fail, that is always the
case. There is no reason why we should treat data on pages that are not
already marked as PG_uptodate as being special. The only thing we gain is a
noticeable slowdown when re-reading these pages.
Trond Myklebust [Tue, 10 Jun 2008 22:31:00 +0000 (18:31 -0400)]
NFS: Optimise append writes with holes
If a file is being extended, and we're creating a hole, we might as well
declare the entire page to be up to date.
This patch significantly improves the write performance for sparse files
in the case where lseek(SEEK_END) is used to append several non-contiguous
writes at intervals of < PAGE_SIZE.
Trond Myklebust [Tue, 10 Jun 2008 22:30:11 +0000 (18:30 -0400)]
SUNRPC: An ENOMEM error from call_encode is always fatal
The special 'ENOMEM' case that was previously flagged as non-fatal is
bogus: auth_gss always returns EAGAIN for non-fatal errors, and may in fact
return ENOMEM in the special case where xdr_buf_read_netobj runs out of
preallocated buffer space (invariably a _fatal_ error, since there is no
provision for preallocating larger buffers).
Trond Myklebust [Tue, 20 May 2008 23:34:39 +0000 (19:34 -0400)]
NFS: Add correct bounds checking to NFSv2 locks
NFSv2 file locking currently fails the Connectathon tests, because the
calls to the VFS locking code do not return an EINVAL error if the
struct file_lock overflows the 32-bit boundaries.
The problem is due to the fact that we occasionally call helpers from
fs/locks.c in order to avoid RPC calls to the server when we know that a
local process holds the lock. These helpers are, of course, always
64-bit enabled, so EINVAL is not returned in cases when it would if
the call had gone to the NLM code.
For consistency, we therefore add support for a bounds-checking helper.
Trond Myklebust [Thu, 5 Jun 2008 19:17:39 +0000 (15:17 -0400)]
NFS: Fix a preemption count leak in nfs_update_request
The commit 2785259631697ebb0749a3782cca206e2e542939 (nfs: use GFP_NOFS
preloads for radix-tree insertion) appears to have introduced a bug:
We only want to call radix_tree_preload() once after creating a request.
Calling it every time we loop after we created the request, will cause
preemption count leaks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Nick Piggin <npiggin@suse.de>
Grant Likely [Wed, 9 Jul 2008 15:41:52 +0000 (09:41 -0600)]
powerpc/bootwrapper: Allow user to specify additional default targets
It is inconvenient to add additional default targets to the bootwrapper
Makefile for each new board supported which just needs a different dts
file. This change allows the defconfig to specify additional build
targets.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Switch copy_user_generic_string(), copy_user_generic_unrolled() and
__copy_user_nocache() from custom tail handlers to generic
copy_user_tail_handle().
This patch removes the ordering dependency. There is now only one
subsys_initcall function that contains subsystem initialization code
with a defined order.
arch/x86/kernel/acpi/boot.c: In function ‘dmi_ignore_irq0_timer_override’:
arch/x86/kernel/acpi/boot.c:1443: error: implicit declaration of function ‘force_mask_ioapic_irq_2’
The problems are that, with the ACPI vs timer overring issue _fixed_,
after using the box for some time (between several seconds and 1 hour, at
random) processes get very high CPU loads (once I've got X using 107% of
the CPU, for example) and the system becomes unresponsive, as though there
were interrupts lost or something similar.
Andreas Herrman reproduced similar problems:
> Ok, now I've reproduced the stability problem.
> - Using tip/master,
> - reverting e38502eb8aa82314d5ab0eba45f50e6790dadd88 and
> - applying your patch from this posting
> http://marc.info/?l=linux-kernel&m=121539354224562&w=4
>
> Starting X, firefox, gimp, tuxpaint and doing some drawing in tuxpaint
> results in a slow system. Drawing is almost not possible anymore --
> Selections of new colors, cursors etc. is performed with huge delay
> if it's performed at all.
>
> BTW, the code sets up timer IRQ as Virtual Wire IRQ:
>
> Jul 8 14:57:58 kodscha IO-APIC (apicid-pin) 2-22, 2-23 not connected.
> Jul 8 14:57:58 kodscha ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> Jul 8 14:57:58 kodscha ...trying to set up timer as Virtual Wire IRQ... works.
>
> and both INT0 and INT2 of IOAPIC are masked:
>
> Jul 8 14:57:58 kodscha NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
> Jul 8 14:57:58 kodscha 00 000 1 0 0 0 0 0 0 00
> Jul 8 14:57:58 kodscha 01 003 0 0 0 0 0 1 1 31
> Jul 8 14:57:58 kodscha 02 003 1 0 0 0 0 0 0 30
>
> I've also seen strange CPU utilization -- with syslog-ng:
>
> top - 15:33:06 up 35 min, 4 users, load average: 1.70, 0.68, 0.37
> Tasks: 64 total, 4 running, 60 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu1 : 6.4%us, 87.2%sy, 0.0%ni, 5.8%id, 0.0%wa, 0.6%hi, 0.0%si, 0.0%st
> Mem: 895384k total, 283568k used, 611816k free, 35492k buffers
> Swap: 1959920k total, 0k used, 1959920k free, 163044k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4632 root 20 0 17216 800 580 S 104 0.1 0:34.22 syslog-ng
> 28505 root 20 0 205m 11m 4024 S 6 1.3 0:21.16 X
> 28518 root 20 0 56292 5652 4492 S 1 0.6 0:01.80 fluxbox
> 1 root 20 0 3724 608 508 S 0 0.1 0:00.36 init
>
> So far I have no clue why C1E-idle in conjunction with virtual wire
> mode causes this strange behaviour.
>
> ... and I start to think about the root cause of all this.
>
> I've performed similar tests under X with the IRQ0/INT0 configuration and
> I did not see above symptoms.
So lets fall back to the IRQ0/INT0 configuration on this box.
This basically restores the dont-use-the-lapic-timer exception mechanism
that was unconditional on this box prior commit 8750bf5 ("x86: add C1E
aware idle function").
One of the last IOMMU updates covered a bug in the AMD IOMMU code. The early
detection code does not succeed if the GART is already detected. This patch
fixes this.
Glauber Costa [Wed, 25 Jun 2008 15:48:47 +0000 (12:48 -0300)]
x86: merge __get_user_asm and its users.
Move __get_user_asm and __get_user_size and __get_user_nocheck
to uaccess.h. This requires us to define a macro at __get_user_size
for the 64-bit access case.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 25 Jun 2008 14:48:29 +0000 (11:48 -0300)]
x86: merge __put_user_asm and its user.
Move both __put_user_asm and __put_user_size to
uaccess.h. i386 already had a special function for 64-bit access,
so for x86_64, we just define a macro with the same name.
Note that for X86_64, CONFIG_X86_WP_WORKS_OK will always
be defined, so the #else part will never be even compiled in.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 25 Jun 2008 14:05:11 +0000 (11:05 -0300)]
x86: merge getuser.
Merge versions of getuser from uaccess_32.h and uaccess_64.h into
uaccess.h. There is a part which is 64-bit only (for now), and for
that, we use a __get_user_8 macro.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Fri, 13 Jun 2008 17:39:25 +0000 (14:39 -0300)]
x86: merge common parts of uaccess.
Common parts of uaccess_32.h and uaccess_64.h
are put in uaccess.h. Bits in uaccess_32.h and
uaccess_64.h that come to this file are equal
except for comments and whitespaces differences.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Mon, 30 Jun 2008 20:07:51 +0000 (17:07 -0300)]
x86: change asm constraint.
Our integration efforts broke a build with this function being used
with i386. Reason is "g" can put the operand in an imm32, which according
to The Book (tm), is invalid as the second operand.
This is actually a bug
in x86_64 too, since the x86_64 instruction set reference does not list
it as valid.
We probably didn't trigger this before due to the ammount of
registers available for 64-bit platforms. But that's just my guess.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Wed, 25 Jun 2008 13:14:13 +0000 (10:14 -0300)]
x86: commonize __range_not_ok.
For i386, __range_not_ok is a better name than __range_ok, since
it returns 0 when it is in fact okay. Other than that,
both versions does not need the word size specifiers, and we remove them.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Tue, 24 Jun 2008 19:51:59 +0000 (16:51 -0300)]
x86: change testing logic in putuser_64.S.
Instead of operating over a register we need to put back
into normal state afterwards (the memory position), just
sub from rbx, which is trashed anyway. We can save a few instructions.
Also, this is the i386 way.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Tue, 24 Jun 2008 18:02:31 +0000 (15:02 -0300)]
x86: user put_user_x instead of all variants.
Follow the pattern, and define a single put_user_x, instead
of defining macros for all available sizes. Exception is
put_user_8, since the "A" constraint does not give us enough
power to specify which register (a or d) to use in the 32-bit
common case.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Tue, 24 Jun 2008 14:37:57 +0000 (11:37 -0300)]
x86: introduce __ASM_REG macro.
There are situations in which the architecture wants to use the
register that represents its word-size, whatever it is. For those,
introduce __ASM_REG in asm.h, along with the first users _ASM_AX
and _ASM_DX. They have users waiting for it, namely the getuser
functions.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Glauber Costa [Fri, 13 Jun 2008 19:35:52 +0000 (16:35 -0300)]
x86: don't clobber r8 nor use rcx.
There's really no reason to clobber r8 or pass the address in rcx.
We can safely use only two registers (which we already have to touch anyway)
to do the job.
Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Mason [Tue, 8 Jul 2008 16:04:35 +0000 (02:04 +1000)]
powerpc/eeh: PERR/SERR bit settings during EEH device recovery
The following patch restores the PERR and SERR bits in the PCI
command register during an EEH device recovery. We have found
at least one case (an Agilent test card) where the PERR/SERR
bits are set to 1 by firmware at boot time, but are not restored
to 1 during EEH recovery. The patch fixes the Agilent card
problem. It has been tested on several other EEH-enabled cards
with no regressions.
Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Acked-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Tue, 8 Jul 2008 08:43:41 +0000 (18:43 +1000)]
powerpc: fix swapcontext backwards compat. with VSX ucontext changes
When the ucontext changed to add the VSX context, this broke backwards
compatibly on swapcontext. swapcontext only compares the ucontext size
passed in from the user to the new kernel ucontext size.
This adds a check against the old ucontext size (with VMX but without
VSX). It also adds some sanity check for ucontexts without VSX, but
where VSX is used according the MSR. Fixes for both 32 and 64bit
processes on 64bit kernels
Kudos to Paulus for noticing.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Grant Erickson [Mon, 7 Jul 2008 22:03:11 +0000 (08:03 +1000)]
ibm_newemac: Parameterize EMAC Multicast Match Handling
Various instances of the EMAC core have varying: 1) number of address
match slots, 2) width of the registers for handling address match slots,
3) number of registers for handling address match slots and 4) base
offset for those registers.
As the driver stands today, it assumes that all EMACs have 4 IAHT and
GAHT 32-bit registers, starting at offset 0x30 from the register base,
with only 16-bits of each used for a total of 64 match slots.
The 405EX(r) and 460EX now use the EMAC4SYNC core rather than the EMAC4
core. This core has 8 IAHT and GAHT registers, starting at offset 0x80
from the register base, with ALL 32-bits of each used for a total of
256 match slots.
This adds a new compatible device tree entry "emac4sync" and a new,
related feature flag "EMAC_FTR_EMAC4SYNC" along with a series of macros
and inlines which supply the appropriate parameterized value based on
the presence or absence of the EMAC4SYNC feature.
The code has further been reworked where appropriate to use those macros
and inlines.
In addition, the register size passed to ioremap is now taken from the
device tree:
c4 for EMAC4SYNC cores
74 for EMAC4 cores
70 for EMAC cores
rather than sizeof (emac_regs).
Finally, the device trees have been updated with the appropriate compatible
entries and resource sizes.
This has been tested on an AMCC Haleakala board such that: 1) inbound
ICMP requests to 'haleakala.local' via MDNS from both Mac OS X 10.4.11
and Ubuntu 8.04 systems as well as 2) outbound ICMP requests from
'haleakala.local' to those same systems in the '.local' domain via MDNS
now work.
Signed-off-by: Grant Erickson <gerickson@nuovations.com> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Dave Kleikamp [Mon, 7 Jul 2008 14:28:55 +0000 (00:28 +1000)]
powerpc/mm: Don't clear _PAGE_COHERENT when _PAGE_SAO is set
The current low level hash code on LPAR configurations clears
_PAGE_COHERENT (M) when either _PAGE_GUARDED (G) or _PAGE_NO_CACHE (I)
is set. This conflicts with _PAGE_SAO which has M, I and W bits sets at
once (normally invalid combo) to indicate the new SAO attribute.
This changes the code to allow that case.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Dave Kleikamp [Mon, 7 Jul 2008 14:28:54 +0000 (00:28 +1000)]
powerpc/mm: Add Strong Access Ordering support
Allow an application to enable Strong Access Ordering on specific pages of
memory on Power 7 hardware. Currently, power has a weaker memory model than
x86. Implementing a stronger memory model allows an emulator to more
efficiently translate x86 code into power code, resulting in faster code
execution.
On Power 7 hardware, storing 0b1110 in the WIMG bits of the hpte enables
strong access ordering mode for the memory page. This patchset allows a
user to specify which pages are thus enabled by passing a new protection
bit through mmap() and mprotect(). I have defined PROT_SAO to be 0x10.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Dave Kleikamp [Mon, 7 Jul 2008 14:28:53 +0000 (00:28 +1000)]
powerpc/mm: Add SAO Feature bit to the cputable
Add the CPU feature bit for the new Strong Access Ordering
facility of Power7
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Dave Kleikamp [Mon, 7 Jul 2008 14:28:52 +0000 (00:28 +1000)]
powerpc/mm: Define flags for Strong Access Ordering
This patch defines:
- PROT_SAO, which is passed into mmap() and mprotect() in the prot field
- VM_SAO in vma->vm_flags, and
- _PAGE_SAO, the combination of WIMG bits in the pte that enables strong
access ordering for the page.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Dave Kleikamp [Mon, 7 Jul 2008 14:28:51 +0000 (00:28 +1000)]
mm: Allow architectures to define additional protection bits
This patch allows architectures to define functions to deal with
additional protections bits for mmap() and mprotect().
arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags
arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot
arch_validate_prot() checks for valid values of the protection bits
Note: vm_get_page_prot() is now pretty ugly, but the generated code
should be identical for architectures that don't define additional
protection bits.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mark Nelson [Fri, 4 Jul 2008 19:05:45 +0000 (05:05 +1000)]
powerpc: move device_to_mask() to dma-mapping.h
Move device_to_mask() to dma-mapping.h because we need to use it from
outside dma_64.c in a later patch.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mark Nelson [Fri, 4 Jul 2008 19:05:44 +0000 (05:05 +1000)]
powerpc/cell: cell_dma_dev_setup_iommu() return the iommu table
Make cell_dma_dev_setup_iommu() return a pointer to the struct iommu_table
(or NULL if no table can be found) rather than putting this pointer into
dev->archdata.dma_data (let the caller do that), and rename this function
to cell_get_iommu_table() to reflect this change.
This will allow us to get the iommu table for a device that doesn't have
the table in the archdata.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mark Nelson [Fri, 4 Jul 2008 19:05:42 +0000 (05:05 +1000)]
powerpc/dma: implement new dma_*map*_attrs() interfaces
Update powerpc to use the new dma_*map*_attrs() interfaces. In doing so
update struct dma_mapping_ops to accept a struct dma_attrs and propagate
these changes through to all users of the code (generic IOMMU and the
64bit DMA code, and the iseries and ps3 platform code).
The old dma_*map_*() interfaces are reimplemented as calls to the
corresponding new interfaces.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mark Nelson [Fri, 4 Jul 2008 19:05:41 +0000 (05:05 +1000)]
powerpc/dma: Add struct iommu_table argument to iommu_map_sg()
Make iommu_map_sg take a struct iommu_table. It did so before commit 740c3ce66700640a6e6136ff679b067e92125794 (iommu sg merging: ppc: make
iommu respect the segment size limits).
This stops the function looking in the archdata.dma_data for the iommu
table because in the future it will be called with a device that has
no table there.
This also has the nice side effect of making iommu_map_sg() match the
other map functions.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>