Patrick Ohly [Thu, 12 Feb 2009 05:03:35 +0000 (05:03 +0000)]
timecompare: generic infrastructure to map between two time bases
Mapping from a struct timecounter to a time returned by functions like
ktime_get_real() is implemented. This is sufficient to use this code
in a network device driver which wants to support hardware time
stamping and transformation of hardware time stamps to system time.
The interface could have been made more versatile by not depending on
a time counter, but this wasn't done to avoid writing glue code
elsewhere.
The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick Ohly [Thu, 12 Feb 2009 05:03:34 +0000 (05:03 +0000)]
clocksource: allow usage independent of timekeeping.c
So far struct clocksource acted as the interface between time/timekeeping.c
and hardware. This patch generalizes the concept so that a similar
interface can also be used in other contexts. For that it introduces
new structures and related functions *without* touching the existing
struct clocksource.
The reasons for adding these new structures to clocksource.[ch] are
* the APIs are clearly related
* struct clocksource could be cleaned up to use the new structs
* avoids proliferation of files with similar names (timesource.h?
timecounter.h?)
As outlined in the discussion with John Stultz, this patch adds
* struct cyclecounter: stateless API to hardware which counts clock cycles
* struct timecounter: stateful utility code built on a cyclecounter which
provides a nanosecond counter
* only the function to read the nanosecond counter; deltas are used internally
and not exposed to users of timecounter
The code does no locking of the shared state. It must be called at least
as often as the cycle counter wraps around to detect these wrap arounds.
Both is the responsibility of the timecounter user.
Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 16 Feb 2009 01:02:19 +0000 (20:02 -0500)]
ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling
This was found through a code checker (http://repo.or.cz/w/smatch.git/).
It looks like you might be able to trigger the error by trying to migrate
a readonly file system.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Yinghai Lu [Sun, 15 Feb 2009 22:06:13 +0000 (14:06 -0800)]
[IA64] fix __apci_unmap_table
Impact: fix build error
to fix:
tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Duane Griffin [Sun, 15 Feb 2009 23:09:20 +0000 (18:09 -0500)]
ext4: tighten restrictions on inode flags
At the moment there are few restrictions on which flags may be set on
which inodes. Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links. Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.
Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.
Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Duane Griffin [Sun, 15 Feb 2009 23:57:26 +0000 (18:57 -0500)]
ext4: don't inherit inappropriate inode flags from parent
At present INDEX and EXTENTS are the only flags that new ext4 inodes do
NOT inherit from their parent. In addition prevent the flags DIRTY,
ECOMPR, IMAGIC, TOPDIR, HUGE_FILE and EXT_MIGRATE from being inherited.
List inheritable flags explicitly to prevent future flags from
accidentally being inherited.
This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.
Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Pekka Enberg [Sun, 15 Feb 2009 23:07:52 +0000 (18:07 -0500)]
ext4: allocate ->s_blockgroup_lock separately
As spotted by kmemtrace, struct ext4_sb_info is 17664 bytes on 64-bit
which makes it a very bad fit for SLAB allocators. The culprit of the
wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS >= 32.
To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately. This shinks down struct ext4_sb_info
enough to fit a 2 KB slab cache so now we allocate 16 KB + 2 KB instead of
32 KB saving 14 KB of memory.
Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
mm/vmscan.c: In function ‘kswapd’:
mm/vmscan.c:1969: warning: ISO C90 forbids mixed declarations and code
node_to_cpumask_ptr(cpumask, pgdat->node_id), has a side-effect: it
defines the 'cpumask' local variable as well, so it has to go into
the variable definition section.
Sidenote: it might make sense to make this purpose of these macros
more apparent, by naming them the standard way, such as:
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rakib Mullick [Sat, 14 Feb 2009 03:36:00 +0000 (09:36 +0600)]
tracing: fix section mismatch in trace_hw_branches.c
The function bts_trace_init() references a variable
bts_hotcpu_notifier which is marked
as __cpuinitdata. Thus causes section mismatch. This patch fixes it.
LD kernel/trace/built-in.o
WARNING: kernel/trace/built-in.o(.text+0xc90c): Section mismatch in
reference from the function bts_trace_init() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_init() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_init lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.
WARNING: kernel/trace/built-in.o(.text+0xc92a): Section mismatch in
reference from the function bts_trace_reset() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_reset() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_reset lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.
Pekka Paalanen [Sat, 3 Jan 2009 19:09:27 +0000 (21:09 +0200)]
doc: mmiotrace.txt, buffer size control change
Impact: prevents confusing the user when buffer size is inadequate
The tracing framework offers a resizeable buffer, which mmiotrace uses
to record events. If the buffer is full, the following events will be
lost. Events should not be lost, so the documentation instructs the user
to increase the buffer size. The buffer size is set via a debugfs file.
Mmiotrace documentation was not updated the same time the debugfs file
was changed. The old file was tracing/trace_entries and first contained
the number of entries the buffer had space for, per cpu. Nowadays this
file is replaced with the file tracing/buffer_size_kb, which tells the
amount of memory reserved for the buffer, per cpu, in kilobytes.
Previously, a flag had to be toggled via the debugfs file
tracing/tracing_enabled when the buffer size was changed. This is no
longer necessary.
The mmiotrace documentation is updated to reflect the current state of
the tracing framework.
Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pekka Paalanen [Sat, 3 Jan 2009 19:23:51 +0000 (21:23 +0200)]
trace: mmiotrace to the tracer menu in Kconfig
Impact: cosmetic change in Kconfig menu layout
This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.
CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.
Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.
Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pekka Paalanen [Tue, 6 Jan 2009 11:57:11 +0000 (13:57 +0200)]
mmiotrace: count events lost due to not recording
Impact: enhances lost events counting in mmiotrace
The tracing framework, or the ring buffer facility it uses, has a switch
to stop recording data. When recording is off, the trace events will be
lost. The framework does not count these, so mmiotrace has to count them
itself.
Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Sun, 15 Feb 2009 10:30:55 +0000 (11:30 +0100)]
scripts: add x86 64 bit support to the markup_oops.pl script
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Arjan van de Ven [Sun, 15 Feb 2009 10:30:52 +0000 (11:30 +0100)]
scripts: add x86 register parser to markup_oops.pl
An oops dump also contains the register values.
This patch parses these for (32 bit) x86, and then annotates the
disassembly with these values; this helps in analysis of the oops by the
developer, for example, NULL pointer or other pointer bugs show up clearly
this way.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Rabin Vincent [Sun, 25 Jan 2009 13:09:12 +0000 (18:39 +0530)]
kbuild: add sys_* entries for syscalls in tags
Currently, it is no longer possible to use the tags file to jump to
system call function definitions with sys_foo, because the definitions
are obscured by use of the SYSCALL_DEFINE* macros.
This patch adds the appropriate option to ctags to make it see through
the macro. Also, it adds the ENTRY() work already done for Exuberant
to Emacs too.
Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Michael Neuling [Sun, 15 Feb 2009 09:20:30 +0000 (10:20 +0100)]
bootgraph: fix for use with dot symbols
powerpc has dot symbols, so the dmesg output looks like:
<4>[ 0.327310] calling .migration_init+0x0/0x9c @ 1
<4>[ 0.327595] initcall .migration_init+0x0/0x9c returned 1 after 0 usecs
The below fixes bootgraph.pl so it handles this correctly.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Josh Hunt [Thu, 12 Feb 2009 05:10:57 +0000 (21:10 -0800)]
kbuild: add vmlinux to kernel rpm
We are building an automated system to test kernels weekly and need to
provide an rpm to our QA dept. We would like to use the ability to create
kernel rpms already in the kernel's Makefile, but need the vmlinux file
included in the rpm for later debugging.
This patch adds a compressed vmlinux to the kernel rpm when doing a
make rpm-pkg or binrpm-pkg and upon install places the vmlinux file in /boot.
Signed-off-by: Josh Hunt <josh@scalex86.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Thomas Gleixner [Tue, 13 Jan 2009 22:36:34 +0000 (23:36 +0100)]
x86, vm86: fix preemption bug
Commit 3d2a71a596bd9c761c8487a2178e95f8a61da083 ("x86, traps: converge
do_debug handlers") changed the preemption disable logic of do_debug()
so vm86_handle_trap() is called with preemption disabled resulting in:
BUG: sleeping function called from invalid context at include/linux/kernel.h:155
in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
Pid: 3005, comm: dosemu.bin Tainted: G W 2.6.29-rc1 #51
Call Trace:
[<c050d669>] copy_to_user+0x33/0x108
[<c04181f4>] save_v86_state+0x65/0x149
[<c0418531>] handle_vm86_trap+0x20/0x8f
[<c064e345>] do_debug+0x15b/0x1a4
[<c064df1f>] debug_stack_correct+0x27/0x2c
[<c040365b>] sysenter_do_call+0x12/0x2f
BUG: scheduling while atomic: dosemu.bin/3005/0x10000001
Restore the original calling convention and reenable preemption before
calling handle_vm86_trap().
Reported-by: Michal Suchanek <hramrach@centrum.cz> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
dave graham [Sun, 15 Feb 2009 07:46:10 +0000 (23:46 -0800)]
e1000e: Remove mutex_trylock and associated WARN on failure.
Single-thread access must be ensured for ICH8 NVM and PHY operations.
This synchronization is provided by the nvm_mutex. To assist in
understanding the contexts from which this code could be reached,
a WARN was output if the mutex was not going to be immediately
acquirable (if !mutex_trylock()). The code has now been optimized,
and we have verified that the few remaining mutex contentions are
reasonable and non-blocking, and it is time to remove the
mutex_trylock() and WARN messages.
Signed-off-by: dave graham <david.graham@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Harvey Harrison [Sun, 15 Feb 2009 06:56:56 +0000 (22:56 -0800)]
rndis: remove private wrapper of __constant_cpu_to_le32
Use cpu_to_le32 directly as it handles constant folding now, replace direct
uses of __constant_cpu_to_{endian} as well.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sun, 15 Feb 2009 04:01:36 +0000 (23:01 -0500)]
ext4: New rec_len encoding for very large blocksizes
The rec_len field in the directory entry is 16 bits, so to encode
blocksizes larger than 64k becomes problematic. This patch allows us
to supprot block sizes up to 256k, by using the low 2 bits to extend
the range of rec_len to 2**18-1 (since valid rec_len sizes must be a
multiple of 4). We use the convention that a rec_len of 0 or 65535
means the filesystem block size, for compatibility with older kernels.
It's unlikely we'll see VM pages of up to 256k, but at some point we
might find that the Linux VM has been enhanced to support filesystem
block sizes > than the VM page size, at which point it might be useful
for some applications to allow very large filesystem block sizes.
kvm->slots_lock is outer to kvm->lock, so take slots_lock
in kvm_vm_ioctl_assign_device() before taking kvm->lock,
rather than taking it in kvm_iommu_map_memslots().
Cc: stable@kernel.org Signed-off-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Wed, 21 Jan 2009 08:52:16 +0000 (16:52 +0800)]
KVM: MMU: Map device MMIO as UC in EPT
Software are not allow to access device MMIO using cacheable memory type, the
patch limit MMIO region with UC and WC(guest can select WC using PAT and
PCD/PWT).
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Tue, 6 Jan 2009 02:03:03 +0000 (10:03 +0800)]
KVM: Fix racy in kvm_free_assigned_irq
In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:
1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().
2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...
This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Tue, 6 Jan 2009 02:03:02 +0000 (10:03 +0800)]
KVM: Add kvm_arch_sync_events to sync with asynchronize events
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.
For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Wed, 10 Dec 2008 20:23:26 +0000 (21:23 +0100)]
KVM: mmu_notifiers release method
The destructor for huge pages uses the backing inode for adjusting
hugetlbfs accounting.
Hugepage mappings are destroyed by exit_mmap, after
mmu_notifier_release, so there are no notifications through
unmap_hugepage_range at this point.
The hugetlbfs inode can be freed with pages backed by it referenced
by the shadow. When the shadow releases its reference, the huge page
destructor will access a now freed inode.
Implement the release operation for kvm mmu notifiers to release page
refs before the hugetlbfs inode is gone.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 19 Jan 2009 12:57:52 +0000 (14:57 +0200)]
KVM: Avoid using CONFIG_ in userspace visible headers
Kconfig symbols are not available in userspace, and are not stripped by
headers-install. Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.
Yang Zhang [Thu, 8 Jan 2009 07:13:31 +0000 (15:13 +0800)]
KVM: ia64: fix fp fault/trap handler
The floating-point registers f6-f11 is used by vmm and
saved in kvm-pt-regs, so should set the correct bit mask
and the pointer in fp_state, otherwise, fpswa may touch
vmm's fp registers instead of guests'.
In addition, for fp trap handling, since the instruction
which leads to fp trap is completely executed, so can't
use retry machanism to re-execute it, because it may
pollute some registers.
Signed-off-by: Yang Zhang <yang.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Nick Piggin [Wed, 21 Jan 2009 07:12:39 +0000 (08:12 +0100)]
lockdep: annotate reclaim context (__GFP_NOFS)
Here is another version, with the incremental patch rolled up, and
added reclaim context annotation to kswapd, and allocation tracing
to slab allocators (which may only ever reach the page allocator
in rare cases, so it is good to put annotations here too).
Haven't tested this version as such, but it should be getting closer
to merge worthy ;)
--
After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
allocation when it should not have been, I thought it might be a good idea to
try to catch this kind of thing with lockdep.
I coded up a little idea that seems to work. Unfortunately the system has to
actually be in __GFP_FS page reclaim, then take the lock, before it will mark
it. But at least that might still be some orders of magnitude more common
(and more debuggable) than an actual deadlock condition, so we have some
improvement I hope (the concept is no less complete than discovery of a lock's
interrupt contexts).
I guess we could even do the same thing with __GFP_IO (normal reclaim), and
even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
code paths, so let's start there and see how it goes.
It *seems* to work. I did a quick test.
=================================
[ INFO: inconsistent lock state ] 2.6.28-rc6-00007-ged31348-dirty #26
---------------------------------
inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
(testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
{in-reclaim-W} state was registered at:
[<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
[<ffffffff80268f71>] lock_acquire+0x91/0xc0
[<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
[<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
[<ffffffff8020903b>] _stext+0x3b/0x170
[<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
[<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 3929
hardirqs last enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
softirqs last enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
other info that might help us debug this:
1 lock held by modprobe/8526:
#0: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
Johannes Berg [Thu, 29 Jan 2009 15:03:20 +0000 (16:03 +0100)]
timer: implement lockdep deadlock detection
This modifies the timer code in a way to allow lockdep to detect
deadlocks resulting from a lock being taken in the timer function
as well as around the del_timer_sync() call.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Chris Ball [Sat, 14 Feb 2009 01:56:18 +0000 (20:56 -0500)]
x86, olpc: fix model detection without OFW
Impact: fix "garbled display, laptop is unusable" bug
Commit e51a1ac2dfca9ad869471e88f828281db7e810c0 ("x86, olpc: fix endian
bug in openfirmware workaround") breaks model comparison on OLPC; the value
0xc2 needs to be scaled up by olpc_board().
The pre-patch version was wrong, but accidentally worked anyway
(big-endian 0xc2 is big enough to satisfy all other board revisions,
but little endian 0xc2 is not).
Signed-off-by: Chris Ball <cjb@laptop.org> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andres Salomon <dilinger@queued.net> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Woodhouse [Fri, 13 Feb 2009 23:18:03 +0000 (23:18 +0000)]
iommu: fix Intel IOMMU write-buffer flushing
This is the cause of the DMA faults and disk corruption that people have
been seeing. Some chipsets neglect to report the RWBF "capability" --
the flag which says that we need to flush the chipset write-buffer when
changing the DMA page tables, to ensure that the change is visible to
the IOMMU.
Override that bit on the affected chipsets, and everything is happy
again.
Thanks to Chris and Bhavesh and others for helping to debug.
4xx chips commonly now have multiple PHBs, there is no reason to not
enable PCI domains on them. The main issue with PCI domains is X but
currently its already somewhat busted for other reasons such as the
36-bit physical address space, which I'm fixing separately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
powerpc/4xx: Add missing USB and i2c devices to Canyonlands
This adds the device-tree entries for a handful of devices on the
Canyonlands board, such as the EHCI and OHCI controllers, the real
time clock and the AD7414 thermal monitor.
I also updated the defconfig to enable various options related to
these devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Yuri Tikhonov [Thu, 29 Jan 2009 01:40:44 +0000 (01:40 +0000)]
powerpc/44x: Support for 256KB PAGE_SIZE
This patch adds support for 256KB pages on ppc44x-based boards.
For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).
Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).
With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.
Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:
One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
http://lkml.org/lkml/2008/12/19/20
Andrew Victor [Wed, 11 Feb 2009 20:39:05 +0000 (21:39 +0100)]
[ARM] 5391/1: AT91: Enable GPIO clocks earlier
Enable the GPIO clocks earlier in the initialization sequence. This
allow the board-setup code to read and set GPIO pins.
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch> Signed-off-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Wed, 11 Feb 2009 20:23:10 +0000 (21:23 +0100)]
[ARM] 5390/1: AT91: Watchdog fixes
The recently merged AT91SAM9 watchdog driver uses the
AT91SAM9X_WATCHDOG config variable, whereas the original version of
the driver (and the platform support code) used AT91SAM9_WATCHDOG.
This causes the watchdog platform_device to never be registered, and
therefore the driver not to be initialized.
This patch:
- updates the platform support code to use AT91SAM9X_WATCHDOG.
- includes <linux/io.h> to fix compile error (same fix as was applied
to at91rm9200_wdt.c)
- fixes comment regarding watchdog clock-rates in at91rm9200.
Signed-off-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ext4: Implement range_cyclic in ext4_da_writepages instead of write_cache_pages
With delayed allocation we lock the page in write_cache_pages() and
try to build an in memory extent of contiguous blocks. This is needed
so that we can get large contiguous blocks request. If range_cyclic
mode is enabled, write_cache_pages() will loop back to the 0 index if
no I/O has been done yet, and try to start writing from the beginning
of the range. That causes an attempt to take the page lock of lower
index page while holding the page lock of higher index page, which can
cause a dead lock with another writeback thread.
The solution is to implement the range_cyclic behavior in
ext4_da_writepages() instead.
When creating a new ext4_prealloc_space structure, we have to
initialize its list_head pointers before we add them to any prealloc
lists. Otherwise, with list debug enabled, we will get list
corruption warnings.
Russell King [Sat, 14 Feb 2009 13:25:38 +0000 (13:25 +0000)]
[ARM] omap: fix _omap2_clksel_get_src_field()
_omap2_clksel_get_src_field() was returning the first entry which was
either the default _or_ applicable to the SoC. This is wrong - we
should be returning the first default which is applicable to the SoC.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 14 Feb 2009 13:24:10 +0000 (13:24 +0000)]
[ARM] omap: fix omap2_divisor_to_clksel() error return value
The error checks for omap2_divisor_to_clksel() and comment disagree with
the actual value returned on error. Fix this to return the correct error
value.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 12 Feb 2009 10:12:59 +0000 (10:12 +0000)]
[ARM] omap: arrange for clock recalc methods to return the rate
linux-omap source commit 33d000c99ee393fe2042f93e8422f94976d276ce
introduces a way to "dry run" clock changes before they're committed.
However, this involves putting logic to handle this into each and
every recalc function, and unfortunately due to the caching, led to
some bugs.
Solve both of issues by making the recalc methods always return the
clock rate for the clock, which the caller decides what to do with.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andres Salomon [Wed, 11 Feb 2009 21:27:02 +0000 (13:27 -0800)]
[JFFS2] force the jffs2 GC daemon to behave a bit better
I've noticed some pretty poor behavior on OLPC machines after bootup, when
gdm/X are starting. The GCD monopolizes the scheduler (which in turns
means it gets to do more nand i/o), which results in processes taking much
much longer than they should to start.
As an example, on an OLPC machine going from OFW to a usable X (via
auto-login gdm) takes 2m 30s. The majority of this time is consumed by
the switch into graphical mode. With this patch, we cut a full 60s off of
bootup time. After bootup, things are much snappier as well.
Note that we have seen a CRC node error with this patch that causes the machine
to fail to boot, but we've also seen that problem without this patch.
Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Wim Van Sebroeck [Wed, 28 Jan 2009 20:51:04 +0000 (20:51 +0000)]
[WATCHDOG] iTCO_wdt: fix SMI_EN regression 2
bugzilla: #12363
commit 7cd5b08be3c489df11b559fef210b81133764ad4 added a second regression:
some Dell's and Compaq's lockup on boot. So we revert most of the code.
The ICH9 reboot issue remains in place and will need some more fixing... :-(
I have a SuperMicro C2SBX motherboard with BIOS revision 1.0b. With vt-d
enabled in the BIOS, Linux gets into an endless loop printing
"DMAR:Unknown DMAR structure type" when booting. Here is the DMAR ACPI
table:
DMAR:Host address width 36
DMAR:RMRR base: 0x000000007fe8a000 end: 0x000000007fefffff
DMAR:Unknown DMAR structure type
DMAR:Unknown DMAR structure type
DMAR:Unknown DMAR structure type
...
Although I not very familiar with ACPI, to me it looks like struct
acpi_dmar_header::length == 0x0058 is incorrect, causing
parse_dmar_table() to look at an invalid offset on the next loop. This
offset happens to have struct acpi_dmar_header::length == 0x0000, which
prevents the loop from ever terminating. This patch checks for this
condition and bails out instead of looping forever.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Correct a build error. bfin-async uses complex mappings and so needs it.
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Atsushi Nemoto [Wed, 11 Feb 2009 21:12:17 +0000 (13:12 -0800)]
[MTD] [MAPS] physmap: fix wrong free and del_mtd_{partition,device}
commit 176bf2e0f10ecf1d20a97db3bd5bb2e6ba0b5668 ("physmap: fix leak of
memory returned by parse_mtd_partitions") deals with a memory leak and
frees the pointer array of mtd_partition after the call to
add_mtd_partitions(). the problem is that mtd_table[x]->name still points
to the freed memory.
Aldo physmap_flash_remove() should call del_mtd_partitions() or
del_mtd_device() only once.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Reported-by: Matthias Kaehlcke <matthias@kaehlcke.net> Tested-by: Matthias Kaehlcke <matthias@kaehlcke.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Eric Paris [Thu, 12 Feb 2009 19:51:04 +0000 (14:51 -0500)]
SELinux: convert the avc cache hash list to an hlist
We do not need O(1) access to the tail of the avc cache lists and so we are
wasting lots of space using struct list_head instead of struct hlist_head.
This patch converts the avc cache to use hlists in which there is a single
pointer from the head which saves us about 4k of global memory.
Resulted in about a 1.5% decrease in time spent in avc_has_perm_noaudit based
on oprofile sampling of tbench. Although likely within the noise....
Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>