i386, dumpstack: use x86_64's method to account die_nest_count
oops_begin/oops_end should always be used in pairs. On x86_64
oops_begin increments die_nest_count, and oops_end decrements
die_nest_count. Doing this makes oops_begin and oops_end equal
to the x86_64 versions.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86_64, dumpstack: move kexec_crash from __die to oops_end
oops_end is preceded by either a call to __die, or a conditional
call to crash_kexec. Move the conditional call to crash_kexec
from the end of __die to the start of oops_end and remove
the superfluous call to crash_kexec in die_nmi.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change oops_end such that signr=0 signals that do_exit
is not to be called.
Currently, each use of __die is soon followed by a call
to oops_end and 'regs' is set to NULL if oops_end is expected
not to call do_exit. Change all such pairs to set signr=0
instead. On x86_64 oops_end is used 'bare' in die_nmi; use
signr=0 instead of regs=NULL there, too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Neil Horman [Fri, 17 Oct 2008 21:00:13 +0000 (17:00 -0400)]
x86, kexec: fix hang on i386 when panic occurs while console_sem is held
There's a corner case in 32 bit x86 kdump at the moment. When the box
panics via nmi, we call bust_spinlocks(1) to disable sensitivity to the
console_sem (allowing us to print to the console in all cases), but we don't
call crash_kexec, until after we call bust_spinlocks(0), which re-enables
console_sem sensitivity.
The result is that, if we get an nmi while the console_sem is held and
kdump is configured, and we try to print something to the console during
kdump shutdown (which we often do) we deadlock the box. The fix is to
simply do what 64 bit die_nmi does which is to not call bust_spinlocks(0)
until after we call crash_kexec.
Patch below tested successfully by me.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Tue, 21 Oct 2008 20:45:22 +0000 (22:45 +0200)]
MCE: Don't run 32bit machine checks with interrupts on
Running machine checks with interrupt on is a extremly bad idea. The machine
check handler only runs when the system is broken and needs to finish
as quickly as possible.
Remove the respective bogus post 2.6.27 regression and call
the machine check vector directly again.
This removes only code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[Cherry-picked from x86/mce] Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ron Mercer [Mon, 20 Oct 2008 17:30:26 +0000 (10:30 -0700)]
qlge: Fix MSI/legacy single interrupt bug.
The chip can issue spurious interrupts for single interrupt
modes. We use disable to clear the condition and allow processing to
continue. Also got rid of legacy specific code since it now needs to
be done on MSI single irq also.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Catalin Marinas [Mon, 20 Oct 2008 17:15:37 +0000 (18:15 +0100)]
smc911x: Make the driver safer on SMP
This patch extends the critical regions covered by lp->lock to make it
safer on SMP. The main failure point was the smc911x_hard_start_xmit
function being called from different CPUs. It was tested on the ARM SMP
platforms.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Catalin Marinas [Mon, 20 Oct 2008 17:15:30 +0000 (18:15 +0100)]
smc911x: Add IRQ polarity configuration
Platforms like ARM Ltd's RealView require the IRQ polarity bit to be set
for the SMC9118 chip. This patch allows the dynamic configuration via
the smc911x_platdata structure.
This patch also changes the smc91x_platdata structure name to the
correct smc911x_platdata in the smc911x_drv_probe() function.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The 8139 drivers are a source of error messages that confuse users.
Since this device can not be disambiguated by normal PCI device
id's two drivers match the same info. But the module utilities
seem to correctly handle this overlap, they try one driver, then
if that doesn't load try the other. Therefore there is no need for
a message to be logged with error level severity, just using info
level instead. Can't be completely silent because user might have
configure one driver and forgot the other one.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Jeff Kirsher [Fri, 17 Oct 2008 18:08:31 +0000 (11:08 -0700)]
igb: add IGB_DCA instead of selecting INTEL_IOATDMA
Add a bool IGB_DCA defined to y if IGB and DCA are enabled, but IGB isn't y while DCA=m. And thus remove the need to select INTEL_IOATDMA when IGB is enabled, so that non-x86 architectures can build the igb driver.
Based on work/patch from Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Fri, 17 Oct 2008 04:26:57 +0000 (21:26 -0700)]
igb: fix tx data corruption with transition to L0s on 82575
The 82575 has an issue in which the DMA will go out of sync if the link
partner goes into an L0s state. To prevent this we set the pci-e link
partner capability bits to disable the L0s transition on the hw.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Hannes Hering [Thu, 16 Oct 2008 09:36:42 +0000 (11:36 +0200)]
ehea: Fix memory hotplug support
This patch implements the memory notifier to update the busmap
instantly instead of rebuilding the whole map. This is necessary
because walk_memory_resource provides different information than
required during memory hotplug.
Signed-off-by: Hannes Hering <hering2@de.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Roland Dreier [Fri, 17 Oct 2008 21:18:26 +0000 (14:18 -0700)]
cxgb3: Fix kernel crash caused by uninitialized l2t_entry.arpq
Commit 147e70e6 ("cxgb3: Use SKB list interfaces instead of home-grown
implementation.") causes a crash in t3_l2t_send_slow() when an iWARP
connection request is received. This is because the new l2t_entry.arpq
skb queue is never initialized, and therefore trying to add an skb to
it causes a NULL dereference. With the old code there was no need to
initialize the queues because the l2t_entry structures were zeroed,
and the code used NULL to mean empty.
Fix this by adding __skb_queue_head_init() when all the l2t_entry
structures get allocated.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Atsushi Nemoto [Mon, 20 Oct 2008 15:00:29 +0000 (00:00 +0900)]
net: Make SMC91X selectable on other MIPS boards
RBTX4939 board has SMC91X chip and there can be other MIPS boards with
that chip. Make SMC91X selectable on all MIPS board would be better than
enumerating each boards in Kconfig.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: jeff@garzik.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Brian Swetland [Wed, 10 Sep 2008 21:58:25 +0000 (14:58 -0700)]
[ARM] msm: dma: various basic dma improvements and bugfixes
San:
- Propagate DM errors to the originator of the request.
- Implement msm_dmov_stop_cmd()
- Add return value to init code
- Modify msm_dmov_stop_cmd() to support ungraceful flushing
Arve:
- Disable datamover interrupt when not in use.
We turn off the interrrupt to allow power collapse from idle.
Signed-off-by: San Mehat <san@android.com> Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Brian Swetland <swetland@google.com>
Brian Swetland [Wed, 10 Sep 2008 21:00:53 +0000 (14:00 -0700)]
[ARM] msm: clean up iomap and devices
- Add some more peripherals (sdcc, etc) to the iomap.
- Remove virtual base addresses for devices that we should be passing
physical addresses to drivers via resources and ioremap()ing.
- don't try to use uarts for ll debug once the mmu is enabled due to
problems with the peripheral window
- make base addresses void __iomem * and fixup irq.c and timer.c
- Remove common.c and bring in devices.c/devices.h similar to
the PXA architecture.
Signed-off-by: Brian Swetland <swetland@google.com>
Brian Swetland [Tue, 9 Sep 2008 16:36:50 +0000 (09:36 -0700)]
[ARM] msm: add proc_comm support, necessary for clock and power control
The proc_comm protocol is the lowest level protocol available for
communicating with the modem core. It provides access to clock and
power control, among other things, and is safe for use from atomic
contexts, unlike the higher level SMD and RPC transports.
Signed-off-by: Brian Swetland <swetland@google.com>
Brian Swetland [Tue, 9 Sep 2008 14:13:33 +0000 (07:13 -0700)]
[ARM] msm: rename ARCH_MSM7X00A to ARCH_MSM
The MSM architecture covers a wider family of chips than just the MSM7X00A.
Move to a more generic name, in perparation for supporting the specific
SoC variants as sub-architectures (ARCH_MSM7X01A, ARCH_MSM722X, etc). This
gives us ARCH_MSM for the (many) common peripherals.
This also removes the unused/obsolete config item MSM7X00A_IDLE.
Signed-off-by: Brian Swetland <swetland@google.com>
Magnus Damm [Wed, 22 Oct 2008 09:29:17 +0000 (18:29 +0900)]
sh: improve pinmux support for single direction pins
This patch improves the support for gpio pins that are hard wired
to either input or output and lack control register association.
A special force enum id is used to allow use without control
register but still mark the gpio pin as input or output.
Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Li Zefan [Wed, 22 Oct 2008 07:25:26 +0000 (15:25 +0800)]
sched: add CONFIG_SMP consistency
a patch from Henrik Austad did this:
>> Do not declare select_task_rq as part of sched_class when CONFIG_SMP is
>> not set.
Peter observed:
> While a proper cleanup, could you do it by re-arranging the methods so
> as to not create an additional ifdef?
Do not declare select_task_rq and some other methods as part of sched_class
when CONFIG_SMP is not set.
Also gather those methods to avoid CONFIG_SMP mess.
Idea-by: Henrik Austad <henrik.austad@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Henrik Austad <henrik@austad.us> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mohan Kumar M [Tue, 21 Oct 2008 17:38:10 +0000 (17:38 +0000)]
powerpc: Support for relocatable kdump kernel
This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.
The purgatory code compares the signature and sets the __kdump_flag in
head_64.S. During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.
CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.
This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Mohan Kumar M <mohan@in.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mike Ditto [Tue, 21 Oct 2008 11:32:29 +0000 (11:32 +0000)]
powerpc: Add del_node() for early boot code to prune inapplicable devices.
Some platforms have variants that can share most of a flat device tree but need
a few devices selectively pruned at boot time. This adds del_node() to ops.h
to allow access to the existing fdt_del_node().
Signed-off-by: Mike Ditto <mditto@consentry.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Andy Henroid [Thu, 9 Oct 2008 18:45:22 +0000 (11:45 -0700)]
i7300_idle driver v1.55
The Intel 7300 Memory Controller supports dynamic throttling of memory which can
be used to save power when system is idle. This driver does the memory
throttling when all CPUs are idle on such a system.
Refer to "Intel 7300 Memory Controller Hub (MCH)" datasheet
for the config space description.
Signed-off-by: Andy Henroid <andrew.d.henroid@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
David Gibson [Mon, 20 Oct 2008 17:55:29 +0000 (17:55 +0000)]
powerpc: Further compile fixup for STRICT_MM_TYPECHECKS
A patch of mine was recently committed to fix up STRICT_MM_TYPECHECKS
behaviour on powerpc (f5ea64dcbad89875d130596df14c9b25d994a737).
However, something which breaks it again seems to have slipped in
afterwards. So, here's another small fix.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Becky Bruce [Wed, 15 Oct 2008 08:25:28 +0000 (08:25 +0000)]
powerpc: Move memory size print into common show_cpuinfo for 32-bit
Most of the platforms were printing the size of the memory
in their show_cpuinfo implementations. This moves that to
the common show_cpuinfo, so that all 32-bit platforms will
now print the size of memory. I also update the code
to deal with the fact that total_memory is now a phys_addr_t.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
hvc_console: Remove __devexit annotation of hvc_remove()
Removed __devexit annotation of hvc_remove() to avoid a section mismatch
if the backend initialization fails and hvc_remove() must be used to
clean up allocated hvc structs (called in section __init or __devinit).
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch provides the hvc_resize() function to update the terminal
window dimensions (struct winsize) for a specified hvc console.
The function stores the new window size and schedules a function
that finally updates the tty winsize and signals the change to
user space (SIGWINCH).
Because the winsize update must acquire a mutex and might sleep,
the function is scheduled instead of being called from hvc_poll()
or khvcd.
This patch uses the tty_do_resize() routine from the tty layer.
A pending resize work is canceled in hvc_close() and hvc_hangup().
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If put_char() routine of a hvc console backend returns 0, then the
hvc console starts looping in the following scenarios:
1. hvc_console_print()
If put_char() returns 0 then the while loop may loop forever.
I have added the missing check for 0 to throw away console messages.
2. khvcd may loop:
The thread calls hvc_poll() --> hvc_push()... if there are still
buffered data then the HVC_POLL_WRITE bit is set and causes the
khvcd thread to loop (if yield() returns immediately).
However, instead of looping, the khvcd thread could sleep for
MIN_TIMEOUT (doing the same as for get_chars()).
The MIN_TIMEOUT is set if hvc_push() was not able to write
data to the backend. If data has been written, the timeout is
set to 0 to immediately re-schedule hvc_poll().
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> (virtio_console) Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
hvc_console: Add tty driver flag TTY_DRIVER_RESET_TERMIOS
After a tty hangup() or close() operation, processes might not reset the
termio settings to a sane state. In order to reset the termios to its
default settings the tty driver flag TTY_DRIVER_RESET_TERMIOS has been added.
TTY driver flag description from include/linux/tty_driver.h:
TTY_DRIVER_RESET_TERMIOS --- requests the tty layer to reset the
termios setting when the last process has closed the device.
Used for PTY's, in particular.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I have added a hangup notifier that can be used by hvc console
backends to handle a tty hangup. The default irq hangup notifier
calls the notifier_del_irq() for compatibility.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ilpo Järvinen [Tue, 21 Oct 2008 23:28:36 +0000 (16:28 -0700)]
tcp: should use number of sack blocks instead of -1
While looking for the recent "sack issue" I also read all eff_sacks
usage that was played around by some relevant commit. I found
out that there's another thing that is asking for a fix (unrelated
to the "sack issue" though).
This feature has probably very little significance in practice.
Opposite direction timeout with bidirectional tcp comes to me as
the most likely scenario though there might be other cases as
well related to non-data segments we send (e.g., response to the
opposite direction segment). Also some ACK losses or option space
wasted for other purposes is necessary to prevent the earlier
SACK feedback getting to the sender.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Arjan van de Ven [Sun, 28 Sep 2008 23:14:52 +0000 (16:14 -0700)]
mfd: use pci_ioremap_bar() in sm501
Use the newly introduced pci_ioremap_bar() function in drivers/mfd.
pci_ioremap_bar() just takes a pci device and a bar number, with the goal
of making it really hard to get wrong, while also having a central place
to stick sanity checks.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Stephen Rothwell [Mon, 20 Oct 2008 21:55:30 +0000 (23:55 +0200)]
mfd: don't export wm3850 static functions
October 10th linux-next build (powerpc allyesconfig) failed like this:
drivers/mfd/wm8350-core.c:1131: error: __ksymtab_wm8350_create_cache causes a section type conflict
Caused by commit 89b4012befb1abca5e86d232bc0e2a797b0d9825 ("mfd: Core
support for the WM8350 AudioPlus PMIC"). wm8350_create_cache is not used
elsewhere, so remove the EXPORT_SYMBOL.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
David Brownell [Mon, 20 Oct 2008 21:51:46 +0000 (23:51 +0200)]
mfd: twl4030-gpio driver
This adds basic support for the GPIOs in the twl4030 power management
chip. That includes two open drain LED drivers, and the use of GPIO-0
(and GPIO-1) as MMC/SD card detect switches which can control whether
the VMMC1 (and VMMC2) regulators are active.
This version of the code has a debounce call that will probably be
replaced before long, when a more generic interface exists.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
David Brownell [Mon, 20 Oct 2008 21:50:05 +0000 (23:50 +0200)]
mfd: rtc-twl4030 driver
This adds a driver for the RTC inside the TWL4030 multi-function device.
It's a fairly basic RTC, with a wake-capable alarm.
Note that many of the pre-release Overo boards now in circulation can't
effectively use this RTC, because of a wiring error that puts its TWL
chip into "secure" mode. (As in "secure yourself against tampering".)
This isn't an issue on other OMAP3 boards now supported in mainline,
such as Beagle and Labrador.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
David Brownell [Mon, 20 Oct 2008 21:46:28 +0000 (23:46 +0200)]
mfd: twl4030 IRQ handling update
- Move it into a separate file; clean and streamline it
- Restructure the init code for reuse during secondary dispatch
- Support both levels (primary, secondary) of IRQ dispatch
- Use a workqueue for irq mask/unmask and trigger configuration
Code for two subchips currently share that secondary handler code.
One is the power subchip; its IRQs are now handled by this core,
courtesy of this patch. The other is the GPIO module, which will
be supported through a later patch.
There are also minor changes to the header file, mostly related
to GPIO support; nothing yet in mainline cares about those. A
few references to OMAP-specific symbols are disabled; when they
can all be removed, the TWL4030 support ceases being OMAP-specific.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Heiko Carstens [Wed, 22 Oct 2008 15:00:26 +0000 (10:00 -0500)]
stop_machine: fix error code handling on multiple cpus
Using |= for updating a value which might be updated on several cpus
concurrently will not always work since we need to make sure that the
update happens atomically.
To fix this just use a write if the called function returns an error
code on a cpu. We end up writing the error code of an arbitrary cpu
if multiple ones fail but that should be sufficient.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Heiko Carstens [Mon, 13 Oct 2008 21:50:10 +0000 (23:50 +0200)]
stop_machine: use workqueues instead of kernel threads
Convert stop_machine to a workqueue based approach. Instead of using kernel
threads for stop_machine we now use a an rt workqueue to synchronize all
cpus.
This has the advantage that all needed per cpu threads are already created
when stop_machine gets called. And therefore a call to stop_machine won't
fail anymore. This is needed for s390 which needs a mechanism to synchronize
all cpus without allocating any memory.
As Rusty pointed out free_module() needs a non-failing stop_machine interface
as well.
As a side effect the stop_machine code gets simplified.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Heiko Carstens [Mon, 13 Oct 2008 21:50:09 +0000 (23:50 +0200)]
workqueue: introduce create_rt_workqueue
create_rt_workqueue will create a real time prioritized workqueue.
This is needed for the conversion of stop_machine to a workqueue based
implementation.
This patch adds yet another parameter to __create_workqueue_key to tell
it that we want an rt workqueue.
However it looks like we rather should have something like "int type"
instead of singlethread, freezable and rt.
Rusty Russell [Wed, 22 Oct 2008 15:00:23 +0000 (10:00 -0500)]
core_param() for genuinely core kernel parameters
There are a lot of one-liner uses of __setup() in the kernel: they're
cumbersome and not queryable (definitely not settable) via /sys. Yet
it's ugly to simplify them to module_param(), because by default that
inserts a prefix of the module name (usually filename).
So, introduce a "core_param". The parameter gets no prefix, but
appears in /sys/module/kernel/parameters/ (if non-zero perms arg). I
thought about using the name "core", but that's more common than
"kernel". And if you create a module called "kernel", you will die
a horrible death.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Wed, 22 Oct 2008 15:00:22 +0000 (10:00 -0500)]
param: Fix duplicate module prefixes
Instead of insisting each new module_param sysfs entry is unique,
handle the case where it already exists (for builtin modules).
The current code assumes that all identical prefixes are together in
the section: true for normal uses, but not necessarily so if someone
overrides MODULE_PARAM_PREFIX. More importantly, it's not true with
the new "core_param()" code which uses "kernel" as a prefix.
This simplifies the caller for the builtin case, at a slight loss of
efficiency (we do the lookup every time to see if the directory
exists).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@suse.de>
Rusty Russell [Wed, 22 Oct 2008 15:00:22 +0000 (10:00 -0500)]
module: check kernel param length at compile time, not runtime
The kparam code tries to handle over-length parameter prefixes at
runtime. Not only would I bet this has never been tested, it's not
clear that truncating names is a good idea either.
So let's check at compile time. We need to move the #define to
moduleparam.h to do this, though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Andi Kleen [Sat, 30 Aug 2008 08:09:00 +0000 (10:09 +0200)]
Remove stop_machine during module load v2
Remove stop_machine during module load v2
module loading currently does a stop_machine on each module load to insert
the module into the global module lists. Especially on larger systems this
can be quite expensive.
It does that to handle concurrent lock lessmodule list readers
like kallsyms.
I don't think stop_machine() is actually needed to insert something
into a list though. There are no concurrent writers because the
module mutex is taken. And the RCU list functions know how to insert
a node into a list with the right memory ordering so that concurrent
readers don't go off into the wood.
So remove the stop_machine for the module list insert and just
do a list_add_rcu() instead.
Module removal will still do a stop_machine of course, it needs
that for other reasons.
v2: Revised readers based on Paul's comments. All readers that only
rely on disabled preemption need to be changed to list_for_each_rcu().
Done that. The others are ok because they have the modules mutex.
Also added a possible missing preempt disable for print_modules().
[cc Paul McKenney for review. It's not RCU, but quite similar.]
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Wed, 22 Oct 2008 15:00:13 +0000 (10:00 -0500)]
module: simplify load_module.
Linus' recent catch of stack overflow in load_module lead me to look
at the code. A couple of helpers to get a section address and get
objects from a section can help clean things up a little.
(And in case you're wondering, the stack size also dropped from 328 to
284 bytes).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The real bug behind this issue is the fact that we're not
masking out the SPURIOUSIRQFLAG bits [31:7] from the active
interrupt number. The following patch will replace this
patch with the fix.
This patch fixes the section mismatch warning from
sa1111.o at buildtime.
CC arch/arm/common/sa1111.o
LD arch/arm/common/built-in.o
LD vmlinux.o
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x87f4): Section mismatch in reference from the function sa1111_probe() to the function .devinit.text:sa1110_mb_enable()
The function sa1111_probe() references
the function __devinit sa1110_mb_enable().
This is often because sa1111_probe lacks a __devinit
annotation or the annotation of sa1110_mb_enable is wrong.
Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 18c7109289625106cdc810b20b628cd13b46d6dd had #endif leftoff from
compilation. This patch fixes it.
Also, I replaced a misplaced comment by a useful one, that explains why are
here #ifdef and #endif added in compilation.
solves the problem of stale jiffies when long running softirqs happen
in a long idle sleep period, but it has a major thinko in it:
When the interrupt which came in _is_ the timer interrupt which should
expire ts->sched_timer then we cancel and rearm the timer _before_ it
gets expired in hrtimer_interrupt() to the next period. That means the
call back function is not called. This game can go on for ever :(
Prevent this by making sure to only rearm the timer when the expiry
time is more than one tick_period away. Otherwise keep it running as
it is either already expired or will expiry at the right point to
update jiffies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Venkatesch Pallipadi <venkatesh.pallipadi@intel.com>
Kiyoshi Ueda [Tue, 21 Oct 2008 16:45:07 +0000 (17:45 +0100)]
dm: remove unused flush_all
This patch removes the DM_WQ_FLUSH_ALL state that is unnecessary.
The dm_queue_flush(md, DM_WQ_FLUSH_ALL, NULL) in dm_suspend()
is never invoked because:
- 'goto flush_and_out' is the same as 'goto out' because
the 'goto flush_and_out' is called only when '!noflush'
- If r is non-zero, then the code above will invoke 'goto out'
and skip this code.
No functional change.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Tue, 21 Oct 2008 16:45:02 +0000 (17:45 +0100)]
dm crypt: fix async split
When writing io, dm-crypt has to allocate a new cloned bio
and encrypt the data into newly-allocated pages attached to this bio.
In rare cases, because of hw restrictions (e.g. physical segment limit)
or memory pressure, sometimes more than one cloned bio has to be used,
each processing a different fragment of the original.
Currently there is one waitqueue which waits for one fragment to finish
and continues processing the next fragment.
But when using asynchronous crypto this doesn't work, because several
fragments may be processed asynchronously or in parallel and there is
only one crypt context that cannot be shared between the bio fragments.
The result may be corruption of the data contained in the encrypted bio.
The patch fixes this by allocating new dm_crypt_io structs (with new
crypto contexts) and running them independently.
The fragments contains a pointer to the base dm_crypt_io struct to
handle reference counting, so the base one is properly deallocated
after all the fragments are finished.
In a low memory situation, this only uses one additional object from the
mempool. If the mempool is empty, the next allocation simple waits for
previous fragments to complete.
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 21 Oct 2008 16:44:56 +0000 (17:44 +0100)]
dm exception store: fix misordered writes
We must zero the next chunk on disk *before* writing out the current chunk, not
after. Otherwise if the machine crashes at the wrong time, the "end of metadata"
marker may be missing.