eventually causing init to die and panic the system:
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.26-rc9-tip #13878
after a maratonic bisection session, the bad commit turned out to be:
| b7675791859075418199c7af86a116ea34eaf5bd is first bad commit
| commit b7675791859075418199c7af86a116ea34eaf5bd
| Author: Jeremy Fitzhardinge <jeremy@goop.org>
| Date: Wed Jun 25 00:19:00 2008 -0400
|
| x86: remove open-coded save/load segment operations
|
| This removes a pile of buggy open-coded implementations of savesegment
| and loadsegment.
after some more bisection of this patch itself, it turns out that what
makes the difference are the savesegment() changes to __switch_to().
Taking a look at this portion of arch/x86/kernel/process_64.o revealed
this crutial difference:
note the "m" modifier - it allows GCC to generate the segment move
into a memory operand as well.
But regarding segment operands there's a subtle detail in the x86
instruction set: the above 16-bit moves are zero-extend, but only
if it goes to a register.
If it goes to a memory operand, -0x34(%rbp) in the above case, there's
no zero-extend to 32-bit and the instruction will only save 16 bits
instead of the intended 32-bit.
The other 16 bits is random data - which can cause problems when that
value is used later on.
The solution is to only allow segment operands to go to registers.
This fix allows my test-system to boot up without crashing.
Milton Miller [Thu, 10 Jul 2008 21:14:18 +0000 (16:14 -0500)]
[MTD] [NAND] remove __PPC__ hardcoded address from DiskOnChip drivers
Such a hardcoded address can cause a checkstop or machine check if
the driver is in the kernel but the address is not acknowledged.
Both drivers allow an address to be specified as either a module
parameter or config option. Any future powerpc board should either
use one of these methods or find the address in the device tree.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Anton Vorontsov [Fri, 27 Jun 2008 19:04:20 +0000 (23:04 +0400)]
[MTD] [NAND] fsl_elbc_nand: ecclayout cleanups
This patch deletes oobavail assignments, they're calculated by the nand
core code in nand_scan_tail, plus current oobavail values are wrong for
the LP NANDs.
Also remove mtd->ecclayout and mtd->oobavail assignments, mtd core
handles this all by itself.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Anton Vorontsov [Fri, 27 Jun 2008 19:04:13 +0000 (23:04 +0400)]
[MTD] [NAND] fsl_elbc_nand: implement support for flash-based BBT
This patch implements support for flash-based BBT for chips working
through ELBC NAND controller, so that NAND core will not have to re-scan
for bad blocks on every boot.
Because ELBC controller may provide HW-generated ECCs we should adjust
bbt pattern and bbt version positions in the OOB free area.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Anton Vorontsov [Fri, 27 Jun 2008 19:04:04 +0000 (23:04 +0400)]
[MTD] [NAND] fsl_elbc_nand: fix OOB workability for large page NAND chips
For large page chips, nand_bbt is looking into OOB area, and checking
for "0xff 0xff" pattern at OOB offset 0. That is, two bytes should be
reserved for bbt means.
But ELBC driver is specifying ecclayout so that oobfree area starts at
offset 1, so only one byte left for the bbt purposes.
This causes problems with any OOB users, namely JFFS2: after first mount
JFFS2 will fill all OOBs with "erased marker", so OOBs will contain:
And on the next boot, NAND core will rescan for bad blocks, then will
see "0xff 0x19" pattern, and will mark all blocks as bad ones.
To fix the issue we should implement our own bad block pattern: just one
byte at OOB start. Though, this will work only for x8 chips. For x16
chips two bytes must be checked. Since ELBC driver does not support x16
NANDs (yet), we're safe for now.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
kernel/trace/ftrace.c:1615: error: 'ftraced_suspend' undeclared (first use in this function)
kernel/trace/ftrace.c:1615: error: (Each undeclared identifier is reported only once
kernel/trace/ftrace.c:1615: error: for each function it appears in.)
Steven Rostedt [Wed, 9 Jul 2008 04:15:33 +0000 (00:15 -0400)]
sched_clock: and multiplier for TSC to gtod drift
The sched_clock code currently tries to keep all CPU clocks of all CPUS
somewhat in sync. At every clock tick it records the gtod clock and
uses that and jiffies and the TSC to calculate a CPU clock that tries to
stay in sync with all the other CPUs.
ftrace depends heavily on this timer and it detects when this timer
"jumps". One problem is that the TSC and the gtod also drift.
When the TSC is 0.1% faster or slower than the gtod it is very noticeable
in ftrace. To help compensate for this, I've added a multiplier that
tries to keep the CPU clock updating at the same rate as the gtod.
I've tried various ways to get it to be in sync and this ended up being
the most reliable. At every scheduler tick we calculate the new multiplier:
multi = delta_gtod / delta_TSC
This means we perform a 64 bit divide at the tick (once a HZ). A shift
is used to handle the accuracy.
Other methods that failed due to dynamic HZ are:
(not used) multi += (gtod - tsc) / delta_gtod
(not used) multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod
as well as other variants.
This code still allows for a slight drift between TSC and gtod, but
it keeps the damage down to a minimum.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 9 Jul 2008 04:15:32 +0000 (00:15 -0400)]
sched_clock: record TSC after gtod
To read the gtod we need to grab the xtime lock for read. Reading the gtod
before the TSC can cause a bigger gab if the xtime lock is contended.
This patch simply reverses the order to read the TSC after the gtod.
The locking in the reading of the gtod handles any barriers one might
think is needed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 9 Jul 2008 04:15:31 +0000 (00:15 -0400)]
sched_clock: only update deltas with local reads.
Reading the CPU clock should try to stay accurate within the CPU.
By reading the CPU clock from another CPU and updating the deltas can
cause unneeded jumps when reading from the local CPU.
This patch changes the code to update the last read TSC only when read
from the local CPU.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Mon, 7 Jul 2008 23:49:41 +0000 (19:49 -0400)]
sched_clock: fix calculation of other CPU
The algorithm to calculate the 'now' of another CPU is not correct.
At each scheduler tick, each CPU records the last sched_clock and
gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the
same in speed between two clocks the algorithm would be:
This solves most of the rest of the issues I've had with timestamps in
ftace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Mon, 7 Jul 2008 18:16:52 +0000 (14:16 -0400)]
sched_clock: stop maximum check on NO HZ
Working with ftrace I would get large jumps of 11 millisecs or more with
the clock tracer. This killed the latencing timings of ftrace and also
caused the irqoff self tests to fail.
What was happening is with NO_HZ the idle would stop the jiffy counter and
before the jiffy counter was updated the sched_clock would have a bad
delta jiffies to compare with the gtod with the maximum.
The jiffies would stop and the last sched_tick would record the last gtod.
On wakeup, the sched clock update would compare the gtod + delta jiffies
(which would be zero) and compare it to the TSC. The TSC would have
correctly (with a stable TSC) moved forward several jiffies. But because the
jiffies has not been updated yet the clock would be prevented from moving
forward because it would appear that the TSC jumped too far ahead.
The clock would then virtually stop, until the jiffies are updated. Then
the next sched clock update would see that the clock was very much behind
since the delta jiffies is now correct. This would then jump the clock
forward by several jiffies.
This caused ftrace to report several milliseconds of interrupts off
latency at every resume from NO_HZ idle.
This patch adds hooks into the nohz code to disable the checking of the
maximum clock update when nohz is in effect. It resumes the max check
when nohz has updated the jiffies again.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Mon, 7 Jul 2008 18:16:51 +0000 (14:16 -0400)]
sched_clock: widen the max and min time
With keeping the max and min sched time within one jiffy of the gtod clock
was too tight. Just before a schedule tick the max could easily be hit, as
well as just after a schedule_tick the min could be hit. This caused the
clock to jump around by a jiffy.
This patch widens the minimum to
last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS
and the maximum to
last gtod + (2 + delta_jiffies) * TICK_NSECS
This keeps the minum to gtod or if one jiffy less than delta jiffies
and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be
a bit more sporadic, but it helps keep a clock with a stable TSC working well.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Mon, 7 Jul 2008 18:16:50 +0000 (14:16 -0400)]
sched_clock: record from last tick
The sched_clock code tries to keep within the gtod time by one tick (jiffy).
The current code mistakenly keeps track of the delta jiffies between
updates of the clock, where the the delta is used to compare with the
number of jiffies that have past since an update of the gtod. The gtod is
updated at each schedule tick not each sched_clock update. After one
jiffy passes the clock is updated fine. But the delta is taken from the
last update so if the next update happens before the next tick the delta
jiffies used will be incorrect.
This patch changes the code to check the delta of jiffies between ticks
and not updates to match the comparison of the updates with the gtod.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Brownell [Fri, 4 Jul 2008 06:40:19 +0000 (23:40 -0700)]
[MTD] [NAND] atmel_nand can be modular
There's no reason to prevent the Atmel NAND driver from
building as a module.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: HÄvard Skinnemoen <haavard.skinnemoen@atmel.com> Acked-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[MTD] [NAND] atmel_nand: Work around AT32AP7000 ECC erratum
The ALE signal isn't correctly wired up to the ECC controller on the
AP7000, so it starts calculating ECC during the address cycles.
Work around this by resetting the ECC controller between the address and
data cycles.
Signed-off-by: HÄvard Skinnemoen <haavard.skinnemoen@atmel.com> Acked-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
David Brownell [Fri, 4 Jul 2008 06:40:16 +0000 (23:40 -0700)]
[MTD] [NAND] atmel_nand speedup via {read,write}s{b,w}()
This uses __raw_{read,write}s{b,w}() primitives to access data on NAND
chips for more efficient I/O.
On an arm926 with memory clocked at 100 MHz, this reduced the elapsed time
for a 64 MiB read by 16%. ("dd" /dev/mtd0 to /dev/null, with an 8-bit
NAND using hardware ECC and 128KiB blocksize.)
Also some minor section tweaks:
- Use platform_driver_probe() so no pointer to probe() lingers
after that code has been removed at run-time.
- Use __exit and __exit_p so the remove() code will normally be
removed by the linker.
Since these buffer read/write calls are new, this increases the runtime
code footprint (by 88 bytes on my build, after the section tweaks).
[haavard.skinnemoen@atmel.com: rebase onto atmel_nand rename] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: HÄvard Skinnemoen <haavard.skinnemoen@atmel.com> Acked-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Steven Rostedt [Fri, 11 Jul 2008 00:58:16 +0000 (20:58 -0400)]
ftrace: separate out the function enabled variable
Currently the function tracer uses the global tracer_enabled variable that
is used to keep track if the tracer is enabled or not. The function tracing
startup needs to be separated out, otherwise the internal happenings of
the tracer startup is also recorded.
This patch creates a ftrace_function_enabled variable to all the starting
of the function traces to happen after everything has been started.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:15 +0000 (20:58 -0400)]
ftrace: add ftrace_kill_atomic
It has been suggested that I add a way to disable the function tracer
on an oops. This code adds a ftrace_kill_atomic. It is not meant to be
used in normal situations. It will disable the ftrace tracer, but will
not perform the nice shutdown that requires scheduling.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:14 +0000 (20:58 -0400)]
ftrace: use current CPU for function startup
This is more of a clean up. Currently the function tracer initializes the
tracer with which ever CPU was last used for tracing. This value isn't
realy useful for function tracing, but at least it should be something other
than a random number.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:13 +0000 (20:58 -0400)]
ftrace: start wakeup tracing after setting function tracer
Enabling the wakeup tracer before enabling the function tracing causes
some strange results due to the dynamic enabling of the functions.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:12 +0000 (20:58 -0400)]
ftrace: check proper config for preempt type
There is no CONFIG_PREEMPT_DESKTOP. Use the proper entry CONFIG_PREEMPT.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:11 +0000 (20:58 -0400)]
ftrace: trace schedule
After the sched_clock code has been removed from sched.c we can now trace
the scheduler. The scheduler has a lot of functions that would be worth
tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:10 +0000 (20:58 -0400)]
ftrace: define function trace nop
When CONFIG_FTRACE is not enabled, the tracing_start_functon_trace
and tracing_stop_function_trace should be nops.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 11 Jul 2008 00:58:09 +0000 (20:58 -0400)]
ftrace: move sched_switch enable after markers
We have two markers now that are enabled on sched_switch. One that records
the context switching and the other that records task wake ups. Currently
we enable the tracing first and then set the markers. This causes some
confusing traces:
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Harvey Harrison [Fri, 4 Jul 2008 06:40:14 +0000 (23:40 -0700)]
[MTD] mtdchar.c remove shadowed variable warnings
Use einfo, oinfo for the inner erase_info and otp_info structs used in
individual case statements.
drivers/mtd/mtdchar.c:582:26: warning: symbol 'info' shadows an earlier one
drivers/mtd/mtdchar.c:380:23: originally declared here
drivers/mtd/mtdchar.c:596:26: warning: symbol 'info' shadows an earlier one
drivers/mtd/mtdchar.c:380:23: originally declared here
drivers/mtd/mtdchar.c:704:19: warning: symbol 'info' shadows an earlier one
drivers/mtd/mtdchar.c:380:23: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Harvey Harrison [Fri, 4 Jul 2008 06:40:13 +0000 (23:40 -0700)]
[MTD] mtdchar.c silence sparse warning
The copy_to_user was casting away the address space to get the offset of
the length member. Use offsetof() instead and add it to the void __user
*argp.
drivers/mtd/mtdchar.c:527:23: warning: cast removes address space of expression
drivers/mtd/mtdchar.c:527:23: warning: incorrect type in argument 1 (different address spaces)
drivers/mtd/mtdchar.c:527:23: expected void [noderef] <asn:1>*to
drivers/mtd/mtdchar.c:527:23: got unsigned int *<noident>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
x86_64: add pseudo-features for 32-bit compat syscall
Add pseudo-feature bits to describe whether the CPU supports sysenter
and/or syscall from ia32-compat userspace. This removes a hardcoded
test in vdso32-setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
[MTD] m25p80: fix bug - ATmel spi flash fails to be copied to
Atmel serial flash tends to power up with the protection status bits set.
http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_item_id=4089
[michael.hennerich@analog.com: remove duplicate code] Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Some BIOSen enable DIPM via _GTF which causes command timeouts under
certain configuration. This didn't occur on 2.6.25 because 2.6.25
defaulted to SRST, so _GTF wasn't executed during boot probe, so ahci
host reset disabled DIPM and as _GTF wasn't executed after SRST, DIPM
wasn't enabled. On 2.6.26, hardreset is used during probe and after
probe _GTF is executed enabling DIPM and thus the failures.
This patch could theoretically disable DIPM on machines which used to
have it enabled on 2.6.25 but AFAIK ahci is currently the only driver
which uses SATA ACPI hierarchy (_SDD) and as the host reset would have
always disabled DIPM, this shouldn't happen.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Andre Noll [Fri, 11 Jul 2008 12:02:23 +0000 (22:02 +1000)]
md: Replace calc_dev_size() by calc_num_sectors().
Number of sectors is the preferred unit for sizes of raid devices,
so change calc_dev_size() so that it returns this unit instead of
the number of 1K blocks.
Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
Andre Noll [Fri, 11 Jul 2008 12:02:22 +0000 (22:02 +1000)]
md: Make update_size() take the number of sectors.
Changing the internal representations of sizes of raid devices
from 1K blocks to sector counts (512B units) is desirable because
it allows to get rid of many divisions/multiplications and unnecessary
casts that are present in the current code.
This patch is a first step in this direction. It replaces the old
1K-based "size" argument of update_size() by "num_sectors" and
fixes up its two callers.
Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
Neil Brown [Fri, 11 Jul 2008 12:02:22 +0000 (22:02 +1000)]
md: Better control of when do_md_stop is allowed to stop the array.
do_md_stop check the number of active users before allowing the array
to be stopped.
Two problems:
1/ it assumes the request is coming through an open file descriptor
(via ioctl) so it allows for that. This is not always the case.
2/ it doesn't do the check it the array hasn't been activated.
This is not good for cases when we use an inactive array to hold
some devices in a container.
Andre Noll [Fri, 11 Jul 2008 12:02:21 +0000 (22:02 +1000)]
md: get_disk_info(): Don't convert between signed and unsigned and back.
The current code copies a signed int from user space, converts it to
unsigned and passes the unsigned value to find_rdev_nr() which expects
a signed value. Simply pass the signed value from user space directly.
Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
nohz: don't stop idle tick if softirqs are pending.
In case a cpu goes idle but softirqs are pending only an error message is
printed to the console. It may take a very long time until the pending
softirqs will finally be executed. Worst case would be a hanging system.
With this patch the timer tick just continues and the softirqs will be
executed after the next interrupt. Still a delay but better than a
hanging system.
Currently we have at least two device drivers on s390 which under certain
circumstances schedule a tasklet from process context. This is a reason
why we can end up with pending softirqs when going idle. Fixing these
drivers seems to be non-trivial.
However there is no question that the drivers should be fixed.
This patch shouldn't be considered as a bug fix. It just is intended to
keep a system running even if device drivers are buggy.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Glauber <jan.glauber@de.ibm.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: remove ifdef CONFIG_CALGARY_IOMMU in pci-dma.c
asm-x86/calgary.h has dummy calgary_iommu_init() and detect_calgary()
in !CONFIG_CALGARY_IOMMU case. So we don't need ifdef
CONFIG_CALGARY_IOMMU in pci-dma.c.
Our way to handle gart_* functions for CONFIG_GART_IOMMU and
!CONFIG_GART_IOMMU cases is inconsistent.
We have some dummy gart_* functions in !CONFIG_GART_IOMMU case and
also use ifdef CONFIG_GART_IOMMU tricks in pci-dma.c to call some
gart_* functions in only CONFIG_GART_IOMMU case.
This patch removes ifdef CONFIG_GART_IOMMU in pci-dma.c and always use
dummy gart_* functions in iommu.h.
Mark McLoughlin [Tue, 8 Jul 2008 07:10:42 +0000 (17:10 +1000)]
virtio_net: Set VIRTIO_NET_F_GUEST_CSUM feature
We can handle receiving partial csums, so set the
appropriate feature bit.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:14:44 +0000 (15:14 -0700)]
igb: Improve multiqueue AIM support
Improve multiqueue performance
Change itr_val to reflect ITR timer value instead of ints/sec
Cleaned up AIM algorithms in general
Based on work by Mitch Williams
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:14:04 +0000 (15:14 -0700)]
igb: unused variable warning in igb remove
Wrap hw variable declaration in DCA flags to prevent unused variable
warning during compilation.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:13:38 +0000 (15:13 -0700)]
igb: update suspend resume
Updates the suspend and resume to better handle the possibility of MSIX
vector changes.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:13:05 +0000 (15:13 -0700)]
net: add netif_napi_del function to allow for removal of napistructs
Adds netif_napi_del function which is used to remove the napi struct from
the netdev napi_list in cases where CONFIG_NETPOLL was enabled.
The motivation for adding this is to handle the case in which the number of
queues on a device changes due to a configuration change. Previously the
napi structs for each queue would be left in the list until the netdev was
freed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:12:13 +0000 (15:12 -0700)]
igb: add support for in kernel LRO
This patch adds support for the use of the inet_lro module to provide
software LRO support.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:11:40 +0000 (15:11 -0700)]
igb: add page recycling support
This patch adds support for page recycling by splitting the page into two
usable portions and tracking the reference count.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:10:46 +0000 (15:10 -0700)]
igb: Add support for quad port WOL and feature flags
Change igb from using a series of boolean operators to using a single flags
value that contains a number of different bit flags for all the different
features of the adapter.
This patch also adds WOL support for quad port adapters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
We can remove a clunky workaround for not having the hardware
strip the CRC. 82575 silicon as well as the older PCI Express
e1000e hardware all work OK in this respect.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Jeb Cramer [Tue, 8 Jul 2008 22:07:55 +0000 (15:07 -0700)]
igb: add DCA support
Add DCA support in the similar method that it was added to the ixgbe
driver recently. DCA allows the network device to put data in the
CPU cache and notify the chipset of that event. This reduces cache
misses during receives.
Signed-off-by: Jeb Cramer <cramerj@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alexander Duyck [Tue, 8 Jul 2008 22:07:24 +0000 (15:07 -0700)]
igb: update ethtool stats to support multiqueue
Addesses problems seen earlier with igb driver not correctly reporting rx
and tx stats.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
igb: Introduce multiple TX queues with infrastructure
This code adds multiple Tx queue infrastructure much like we
previously did in ixgbe. The MSI-X vector mapping is the bulk of
the change.
IAM can now be safely enabled and we've verified that it does
work correctly. We can also eliminate the tx ring lock.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
rx cleanup should look more like our other drivers that have evolved
to nicer performance levels over time. Changes consist of refilling
tx buffers to hardware more often, some minor assignment cleanups.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>