pilppa.org Git - linux-2.6-omap-h63xx.git/log

]> pilppa.org Git - linux-2.6-omap-h63xx.git/log

projects / linux-2.6-omap-h63xx.git / log

Avi Kivity [Mon, 16 Jun 2008 04:23:17 +0000 (21:23 -0700)]

KVM: x86 emulator: simplify sib decoding

Instead of using sparse switches, use simpler if/else sequences.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Mon, 16 Jun 2008 04:13:41 +0000 (21:13 -0700)]

KVM: x86 emulator: handle undecoded rex.b with r/m = 5 in certain cases

x86_64 does not decode rex.b in certain cases, where the r/m field = 5.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Mohammed Gamal [Sun, 15 Jun 2008 16:37:38 +0000 (19:37 +0300)]

KVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Fri, 13 Jun 2008 19:45:42 +0000 (22:45 +0300)]

KVM: Use printk_rlimit() instead of reporting emulation failures just once

Emulation failure reports are useful, so allow more than one per the lifetime
of the module.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Tan, Li [Fri, 23 May 2008 06:54:09 +0000 (14:54 +0800)]

KVM: Support mixed endian machines

Currently kvmtrace is not portable. This will prevent from copying a
trace file from big-endian target to little-endian workstation for analysis.
In the patch, kernel outputs metadata containing a magic number to trace
log, and changes 64-bit words to be u64 instead of a pair of u32s.

Signed-off-by: Tan Li <li.tan@intel.com>
Acked-by: Jerone Young <jyoung5@us.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Glauber Costa [Tue, 10 Jun 2008 13:46:53 +0000 (10:46 -0300)]

KVM: Do not calculate linear rip in emulation failure report

If we're not gonna do anything (case in which failure is already
reported), we do not need to even bother with calculating the linear rip.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Marcelo Tosatti [Wed, 11 Jun 2008 22:52:53 +0000 (19:52 -0300)]

KVM: only abort guest entry if timer count goes from 0->1

Only abort guest entry if the timer count went from 0->1, since for 1->2
or larger the bit will either be set already or a timer irq will have
been injected.

Using atomic_inc_and_test() for it also introduces an SMP barrier
to the LAPIC version (thought it was unecessary because of timer
migration, but guest can be scheduled to a different pCPU between exit
and kvm_vcpu_block(), so there is the possibility for a race).

Noticed by Avi.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Laurent Vivier [Fri, 30 May 2008 14:05:57 +0000 (16:05 +0200)]

KVM: Add coalesced MMIO support (ia64 part)

This patch enables coalesced MMIO for ia64 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

[akpm: fix compile error on ia64]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Laurent Vivier [Fri, 30 May 2008 14:05:56 +0000 (16:05 +0200)]

KVM: Add coalesced MMIO support (powerpc part)

This patch enables coalesced MMIO for powerpc architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Laurent Vivier [Fri, 30 May 2008 14:05:55 +0000 (16:05 +0200)]

KVM: Add coalesced MMIO support (x86 part)

This patch enables coalesced MMIO for x86 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Laurent Vivier [Fri, 30 May 2008 14:05:54 +0000 (16:05 +0200)]

KVM: Add coalesced MMIO support (common part)

This patch adds all needed structures to coalesce MMIOs.
Until an architecture uses it, it is not compiled.

Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
can be coalesced:

- KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
  It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
  a memory area where MMIOs can be coalesced until the next switch to
  user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

- KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
  the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

The userspace client can check kernel coalesced MMIO availability by asking
ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
or the page offset where will be stored the ring buffer.
The page offset depends on the architecture.

After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
to the address of the start of th kvm_run structure. The MMIO ring buffer
is defined by the structure kvm_coalesced_mmio_ring.

[akio: fix oops during guest shutdown]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Laurent Vivier [Fri, 30 May 2008 14:05:53 +0000 (16:05 +0200)]

KVM: kvm_io_device: extend in_range() to manage len and write attribute

Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).

This modification allows to use kvm_io_device with coalesced MMIO.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Thu, 29 May 2008 11:56:28 +0000 (14:56 +0300)]

KVM: MMU: Avoid page prefetch on SVM

SVM cannot benefit from page prefetching since guest page fault bypass
cannot by made to work there. Avoid accessing the guest page table in
this case.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Thu, 29 May 2008 11:55:03 +0000 (14:55 +0300)]

KVM: MMU: Move nonpaging_prefetch_page()

In preparation for next patch. No code change.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Thu, 29 May 2008 11:38:38 +0000 (14:38 +0300)]

KVM: x86 emulator: implement 'push imm' (opcode 0x68)

Encountered in FC6 boot sequence, now that we don't force ss.rpl = 0 during
the protected mode transition. Not really necessary, but nice to have.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Thu, 29 May 2008 11:26:29 +0000 (14:26 +0300)]

KVM: x86 emulator: simplify push imm8 emulation

Instead of fetching the data explicitly, use SrcImmByte.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Thu, 29 May 2008 11:20:16 +0000 (14:20 +0300)]

KVM: MMU: Optimize prefetch_page()

Instead of reading each pte individually, read 256 bytes worth of ptes and
batch process them.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Guillaume Thouvenin [Tue, 27 May 2008 13:13:28 +0000 (15:13 +0200)]

KVM: x86 emulator: Add support for mov r, sreg (0x8c) instruction

Add support for mov r, sreg (0x8c) instruction

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Guillaume Thouvenin [Tue, 27 May 2008 12:49:15 +0000 (14:49 +0200)]

KVM: x86 emulator: Add support for mov seg, r (0x8e) instruction

Add support for mov r, sreg (0x8c) instruction.

[avi: drop the sreg decoding table in favor of 1:1 encoding]

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Guillaume Thouvenin [Tue, 27 May 2008 08:19:16 +0000 (10:19 +0200)]

KVM: x86 emulator: adds support to mov r,imm (opcode 0xb8) instruction

Add support to mov r, imm (0xb8) instruction.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Guillaume Thouvenin [Tue, 27 May 2008 08:19:08 +0000 (10:19 +0200)]

KVM: x86 emulator: add support for jmp far 0xea

Add support for jmp far (opcode 0xea) instruction.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Guillaume Thouvenin [Tue, 27 May 2008 08:22:20 +0000 (10:22 +0200)]

KVM: x86 emulator: Update c->dst.bytes in decode instruction

Update c->dst.bytes in decode instruction instead of instruction
itself. It's needed because if c->dst.bytes is equal to 0, the
instruction is not emulated.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Guillaume Thouvenin [Tue, 27 May 2008 08:18:46 +0000 (10:18 +0200)]

KVM: Prefixes segment functions that will be exported with "kvm_"

Prefixes functions that will be exported with kvm_.
We also prefixed set_segment() even if it still static
to be coherent.

signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Mon, 26 May 2008 17:06:35 +0000 (20:06 +0300)]

KVM: MTRR support

Add emulation for the memory type range registers, needed by VMware esx 3.5,
and by pci device assignment.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Tue, 27 May 2008 13:26:01 +0000 (16:26 +0300)]

KVM: Order segment register constants in the same way as cpu operand encoding

This can be used to simplify the x86 instruction decoder.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Sheng Yang [Thu, 15 May 2008 10:23:25 +0000 (18:23 +0800)]

KVM: VMX: Enable NMI with in-kernel irqchip

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Sheng Yang [Thu, 15 May 2008 01:52:48 +0000 (09:52 +0800)]

KVM: IOAPIC/LAPIC: Enable NMI support

[avi: fix ia64 build breakage]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Sun, 25 May 2008 11:38:15 +0000 (14:38 +0300)]

KVM: Remove unnecessary ->decache_regs() call

Since we aren't modifying any register, there's no need to decache
the register state.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Tue, 13 May 2008 13:29:20 +0000 (16:29 +0300)]

KVM: Remove decache_vcpus_on_cpu() and related callbacks

Obsoleted by the vmx-specific per-cpu list.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Tue, 13 May 2008 13:22:47 +0000 (16:22 +0300)]

KVM: VMX: Add list of potentially locally cached vcpus

VMX hardware can cache the contents of a vcpu's vmcs.  This cache needs
to be flushed when migrating a vcpu to another cpu, or (which is the case
that interests us here) when disabling hardware virtualization on a cpu.

The current implementation of decaching iterates over the list of all vcpus,
picks the ones that are potentially cached on the cpu that is being offlined,
and flushes the cache.  The problem is that it uses mutex_trylock() to gain
exclusive access to the vcpu, which fires off a (benign) warning about using
the mutex in an interrupt context.

To avoid this, and to make things generally nicer, add a new per-cpu list
of potentially cached vcus.  This makes the decaching code much simpler.  The
list is vmx-specific since other hardware doesn't have this issue.

[andrea: fix crash on suspend/resume]

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Tue, 13 May 2008 10:23:38 +0000 (13:23 +0300)]

KVM: Handle virtualization instruction #UD faults during reboot

KVM turns off hardware virtualization extensions during reboot, in order
to disassociate the memory used by the virtualization extensions from the
processor, and in order to have the system in a consistent state.
Unfortunately virtual machines may still be running while this goes on,
and once virtualization extensions are turned off, any virtulization
instruction will #UD on execution.

Fix by adding an exception handler to virtualization instructions; if we get
an exception during reboot, we simply spin waiting for the reset to complete.
If it's a true exception, BUG() so we can have our stack trace.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Thu, 15 May 2008 10:51:35 +0000 (13:51 +0300)]

KVM: MMU: Fix false flooding when a pte points to page table

The KVM MMU tries to detect when a speculative pte update is not actually
used by demand fault, by checking the accessed bit of the shadow pte. If
the shadow pte has not been accessed, we deem that page table flooded and
remove the shadow page table, allowing further pte updates to proceed
without emulation.

However, if the pte itself points at a page table and only used for write
operations, the accessed bit will never be set since all access will happen
through the emulator.

This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels.
The kernel points a kmap_atomic() pte at a page table, and then
proceeds with read-modify-write operations to look at the dirty and accessed
bits. We get a false flood trigger on the kmap ptes, which results in the
mmu spending all its time setting up and tearing down shadows.

Fix by setting the shadow accessed bit on emulated accesses.

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Avi Kivity [Mon, 12 May 2008 16:25:43 +0000 (19:25 +0300)]

KVM: VMX: Trivial vmcs_write64() code simplification

Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Chris Lalancette [Mon, 5 May 2008 17:05:16 +0000 (13:05 -0400)]

KVM: SVM: Fake MSR_K7 performance counters

Attached is a patch that fixes a guest crash when booting older Linux kernels.
The problem stems from the fact that we are currently emulating
MSR_K7_EVNTSEL[0-3], but not emulating MSR_K7_PERFCTR[0-3]. Because of this,
setup_k7_watchdog() in the Linux kernel receives a GPF when it attempts to
write into MSR_K7_PERFCTR, which causes an OOPs.

The patch fixes it by just "fake" emulating the appropriate MSRs, throwing
away the data in the process. This causes the NMI watchdog to not actually
work, but it's not such a big deal in a virtualized environment.

When we get a write to one of these counters, we printk_ratelimit() a warning.
I decided to print it out for all writes, even if the data is 0; it doesn't
seem to make sense to me to special case when data == 0.

Tested by myself on a RHEL-4 guest, and Joerg Roedel on a Windows XP 64-bit
guest.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Aurelien Jarno [Fri, 2 May 2008 15:02:23 +0000 (17:02 +0200)]

KVM: PIT: support mode 3

The in-kernel PIT emulation ignores pending timers if operating
under mode 3, which for example Hurd uses.

This mode should output a square wave, high for (N+1)/2 counts and low
for (N-1)/2 counts. As we only care about the resulting interrupts, the
period is N, and mode 3 is the same as mode 2 with regard to
interrupts.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Anthony Liguori [Wed, 30 Apr 2008 20:37:07 +0000 (15:37 -0500)]

KVM: Handle vma regions with no backing page

This patch allows VMAs that contain no backing page to be used for guest
memory. This is useful for assigning mmio regions to a guest.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Joerg Roedel [Wed, 30 Apr 2008 15:56:04 +0000 (17:56 +0200)]

KVM: SVM: add tracing support for TDP page faults

To distinguish between real page faults and nested page faults they should be
traced as different events. This is implemented by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Joerg Roedel [Wed, 30 Apr 2008 15:56:03 +0000 (17:56 +0200)]

KVM: SVM: add missing kvmtrace markers

This patch adds the missing kvmtrace markers to the svm
module of kvm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Joerg Roedel [Wed, 30 Apr 2008 15:56:02 +0000 (17:56 +0200)]

KVM: add missing kvmtrace bits

This patch adds some kvmtrace bits to the generic x86 code
where it is instrumented from SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Joerg Roedel [Wed, 30 Apr 2008 15:56:01 +0000 (17:56 +0200)]

KVM: SVM: implement dedicated INTR exit handler

With an exit handler for INTR intercepts its possible to account them using
kvmtrace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Joerg Roedel [Wed, 30 Apr 2008 15:56:00 +0000 (17:56 +0200)]

KVM: SVM: implement dedicated NMI exit handler

With an exit handler for NMI intercepts its possible to account them using
kvmtrace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Joerg Roedel [Wed, 30 Apr 2008 15:55:59 +0000 (17:55 +0200)]

KVM: VMX: move APIC_ACCESS trace entry to generic code

This patch moves the trace entry for APIC accesses from the VMX code to the
generic lapic code. This way APIC accesses from SVM will also be traced.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Harvey Harrison [Sun, 27 Apr 2008 19:14:13 +0000 (12:14 -0700)]

KVM: add statics were possible, function definition in lapic.h

Noticed by sparse:
arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static?
arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static?
arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static?
arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static?
arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static?
arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static?
arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static?
arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Christian Borntraeger [Mon, 21 Apr 2008 11:48:24 +0000 (13:48 +0200)]

KVM: remove long -> void *user -> long cast

kvm_dev_ioctl casts the arg value to void __user *, just to recast it
again to long. This seems unnecessary.

According to objdump the binary code on x86 is unchanged by this patch.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

commit | commitdiff | tree

Ingo Molnar [Sun, 20 Jul 2008 09:02:06 +0000 (11:02 +0200)]

sched: hrtick_enabled() should use cpu_active()

Peter pointed out that hrtick_enabled() should use cpu_active().

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Sun, 20 Jul 2008 09:01:29 +0000 (11:01 +0200)]

Merge branch 'sched/urgent' into sched/devel

commit | commitdiff | tree

Peter Zijlstra [Fri, 18 Jul 2008 16:01:23 +0000 (18:01 +0200)]

sched, x86: clean up hrtick implementation

random uvesafb failures were reported against Gentoo:

  http://bugs.gentoo.org/show_bug.cgi?id=222799

and Mihai Moldovan bisected it back to:

> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Fri Jan 25 21:08:29 2008 +0100
>
>    sched: high-res preemption tick

Linus suspected it to be hrtick + vm86 interaction and observed:

> Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
> _incorrect_ per se, but they are definitely bad.
>
> Why?
>
> Using random _TIF_WORK_MASK flags is really impolite for doing
> "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
> special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
> vm86 mode unnecessarily.
>
> See the "work_notifysig_v86" label, and how it does that
> "save_v86_state()" thing etc etc.

Right, I never liked having to fiddle with those TIF flags. Initially I
needed it because the hrtimer base lock could not nest in the rq lock.
That however is fixed these days.

Currently the only reason left to fiddle with the TIF flags is remote
wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
about using the new and improved IPI function call stuff to implement
hrtimer_start_on().

However that does require that smp_call_function_single(.wait=0) works
from interrupt context - /me looks at the latest series from Jens - Yes
that does seem to be supported, good.

Here's a stab at cleaning this stuff up ...

Mihai reported test success as well.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Mihai Moldovan <ionic@ionic.de>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:34 +0000 (18:11 -0700)]

NR_CPUS: Replace NR_CPUS in speedstep-centrino.c

Some cleanups in speedstep-centrino.c for NR_CPUS=4096.

  * Use new CPUMASK_PTR (instead of old CPUMASK_VAR).

  * Replace arrays sized by NR_CPUS with percpu variables.

  * Cleanup some formatting problems (>80 chars per line)
    and other checkpatch complaints.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:33 +0000 (18:11 -0700)]

cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP

  * Rename CPUMASK_VAR --> CPUMASK_PTR (and simplify)

  * Fix a semantic error in CPUMASK_ALLOC

  * Add a bit of commentry to cpumask.h

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:32 +0000 (18:11 -0700)]

NR_CPUS: Replace NR_CPUS in cpufreq userspace routines

  * Replace arrays sized by NR_CPUS with percpu variables.

    Prior reference: http://marc.info/?l=linux-kernel&m=120251421825989&w=4
    Subject:    [PATCH 1/4] cpufreq: change cpu freq tables to per_cpu variables
    From:       Mike Travis <travis () sgi ! com>
    Date:       2008-02-08 23:37:39

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:31 +0000 (18:11 -0700)]

NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var

* Slight optimization when getting one's own cpu_info percpu data.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:30 +0000 (18:11 -0700)]

NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c

* nr_cpu_ids should be used to determine if a percpu area is
available for a given cpu.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:29 +0000 (18:11 -0700)]

NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c

  * Replace NR_CPUS loop with for_each_possible_cpu().

  * nr_cpu_ids should be used to determine if a percpu area is
    available for a given cpu.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:28 +0000 (18:11 -0700)]

NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c

* Use nr_cpu_ids instead of NR_CPUS to limit traversal of cpu online map.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mike Travis [Sat, 19 Jul 2008 01:11:27 +0000 (18:11 -0700)]

NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c

* nr_cpu_ids should be used to allocate arrays based on the number of
cpu's present.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Simon Arlott [Sat, 19 Jul 2008 22:32:54 +0000 (23:32 +0100)]

x86: add unknown_nmi_panic kernel parameter

It's not possible to enable the unknown_nmi_panic sysctl option
until init is run. It's useful to be able to panic the kernel
during boot too, this adds a parameter to enable this option.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Sun, 20 Jul 2008 07:31:24 +0000 (09:31 +0200)]

x86, VisWS: turn into generic arch, eliminate leftover files

remove unused leftovers.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Yinghai Lu [Sun, 20 Jul 2008 01:02:26 +0000 (18:02 -0700)]

x86: add ->pre_time_init to x86_quirks

so NUMAQ can use that to call numaq_pre_time_init()

This allows us to remove a NUMAQ special from arch/x86/kernel/setup.c.

(and paves the way to remove the NUMAQ subarch)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Yinghai Lu [Sun, 20 Jul 2008 01:01:16 +0000 (18:01 -0700)]

x86: extend and use x86_quirks to clean up NUMAQ code

add these new x86_quirks methods:

int *mpc_record;
int (*mpc_apic_id)(struct mpc_config_processor *m);
void (*mpc_oem_bus_info)(struct mpc_config_bus *m, char *name);
void (*mpc_oem_pci_bus)(struct mpc_config_bus *m);
void (*smp_read_mpc_oem)(struct mp_config_oemtable *oemtable,
unsigned short oemsize);

... and move NUMAQ related mps table handling to numaq_32.c.

also move the call to smp_read_mpc_oem() to smp_read_mpc() directly.

Should not change functionality, albeit it would be nice to get it
tested on real NUMAQ as well ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Yinghai Lu [Sat, 19 Jul 2008 09:07:25 +0000 (02:07 -0700)]

x86: introduce x86_quirks

introduce x86_quirks array of boot-time quirk methods.

No change in functionality intended.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Yinghai Lu [Fri, 18 Jul 2008 23:16:23 +0000 (16:16 -0700)]

x86: improve debug printout: add target bootmem range in early_res_to_bootmem()

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Jussi Kivilinna [Sun, 20 Jul 2008 07:08:47 +0000 (00:08 -0700)]

net_sched: Add size table for qdiscs

Add size table functions for qdiscs and calculate packet size in
qdisc_enqueue().

Based on patch by Patrick McHardy
http://marc.info/?l=linux-netdev&m=115201979221729&w=2

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jussi Kivilinna [Sun, 20 Jul 2008 07:08:27 +0000 (00:08 -0700)]

net_sched: Add accessor function for packet length for qdiscs

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jussi Kivilinna [Sun, 20 Jul 2008 07:08:04 +0000 (00:08 -0700)]

net_sched: Add qdisc_enqueue wrapper

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yinghai Lu [Fri, 18 Jul 2008 20:22:36 +0000 (13:22 -0700)]

x86, pci: detect end_bus_number according to acpi/e820 reserved, v2

Jack Howarth reported that 2.6.26-rc9-git9 doesn't boot on MacBookPro2.

the reason is a faulty BIOS update that reportes faulty resources.

Nevertheless it's possible for Linux to be more resolent about this
situation (and similar situations) and work around this bug, by
cross-checking the mmconf range against the e820 table and ACPI resources.

Change the mconf bus range from [0,0xff] to to [0, 0x3f]
to match range [0xf0000000, 0xf4000000) in e820 tables.

[ v2, yhlu.kernel@gmail.com:
x86, pci: detect end_bus_number according to acpi/e820 reserved - fix ]

Reported-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: jbarnes@virtuousgeek.org
Cc: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Thu, 17 Jul 2008 22:26:59 +0000 (00:26 +0200)]

Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM
From: Arjan van de Ven <arjan@infradead.org>
Date: Sat, 19 Jul 2008 15:47:17 -0700

CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
so that architectures can opt-in into it.

( the polarity of the option is still the same as it was originally; it
needs to be for now to not break architectures that don't have the
infastructure yet to support this feature)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "V.Radhakrishnan" <rk@atr-labs.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---

commit | commitdiff | tree

David S. Miller [Sun, 20 Jul 2008 05:39:46 +0000 (22:39 -0700)]

highmem: Export totalhigh_pages.

Hash et al. sizing code in SCTP wants to make the
calculation totalram_pages - totalhigh_pages, just
like TCP. But this requires an export for the
CONFIG_HIGHMEM case to work.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YOSHIFUJI Hideaki [Sun, 20 Jul 2008 05:36:07 +0000 (22:36 -0700)]

ipv6 mcast: Omit redundant address family checks in ip6_mc_source().

The caller has alredy checked for them.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YOSHIFUJI Hideaki [Sun, 20 Jul 2008 05:35:47 +0000 (22:35 -0700)]

net: Use standard structures for generic socket address structures.

Use sockaddr_storage{} for generic socket address storage
and ensures proper alignment.
Use sockaddr{} for pointers to omit several casts.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YOSHIFUJI Hideaki [Sun, 20 Jul 2008 05:35:03 +0000 (22:35 -0700)]

ipv6 netns: Make several "global" sysctl variables namespace aware.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YOSHIFUJI Hideaki [Sun, 20 Jul 2008 05:34:43 +0000 (22:34 -0700)]

netns: Use net_eq() to compare net-namespaces for optimization.

Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Suresh Siddha [Fri, 18 Jul 2008 22:58:35 +0000 (15:58 -0700)]

x86: APIC: Remove apic_write_around(); use alternatives, merge fix

On Fri, Jul 18, 2008 at 02:03:48PM -0700, Ingo Molnar wrote:
>
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>
> > > git merge tip/x86/x2apic
> > CONFLICT (content): Merge conflict in arch/x86/kernel/Makefile
> > CONFLICT (content): Merge conflict in arch/x86/kernel/paravirt.c
> > CONFLICT (content): Merge conflict in arch/x86/kernel/smpboot.c
> > CONFLICT (content): Merge conflict in arch/x86/kernel/vmi_32.c
> > CONFLICT (content): Merge conflict in arch/x86/xen/enlighten.c
> > CONFLICT (content): Merge conflict in include/asm-x86/apic.h
> > CONFLICT (content): Merge conflict in include/asm-x86/paravirt.h
>
> that's due to the changes in tip/x86/apic and in tip/x86/uv.
>
> ok, i've just merged x86/apic into x86/x2apic and x86/uv as well, and
> pushed out the result.
>
> Note: it's a first raw merge and completely untested. It will now merge
> cleanly into tip/master. There are probably a few details missing.

Ingo, thanks for doing this. While I was testing my merge changes, you posted
yours... anyhow we need this piece, which is missing from your merge.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Enberg [Sat, 19 Jul 2008 11:17:22 +0000 (14:17 +0300)]

slub: dump more data on slab corruption

The limit of 128 bytes is too small when debugging slab corruption of the skb
cache, for example. So increase the limit to PAGE_SIZE to make debugging
corruptions easier.

Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit | commitdiff | tree

Michael Hennerich [Sat, 19 Jul 2008 09:16:07 +0000 (17:16 +0800)]

Blackfin arch: Apply Bluetechnix CM-BF527 board support patch

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Robin Getz [Sat, 26 Jul 2008 11:45:46 +0000 (19:45 +0800)]

Blackfin arch: Add unwinding for stack info, and a little more detail on trace buffer

Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Michael Hennerich [Sat, 26 Jul 2008 08:14:57 +0000 (16:14 +0800)]

Blackfin arch: Add ISP1760 board resources to BF548-EZKIT

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Michael Hennerich [Sat, 19 Jul 2008 08:56:53 +0000 (16:56 +0800)]

Blackfin arch: fix bug - detect 0.1 silicon revision BF527-EZKIT as 0.0 version

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Mike Frysinger [Sat, 19 Jul 2008 08:43:51 +0000 (16:43 +0800)]

Blackfin arch: add missing IORESOURCE_MEM flags to UART3

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Graf Yang [Sat, 19 Jul 2008 07:54:10 +0000 (15:54 +0800)]

Blackfin arch: Add return value check in bfin_sir_probe(), remove SSYNC().

Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Sat, 19 Jul 2008 07:33:21 +0000 (09:33 +0200)]

nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit | commitdiff | tree

David Howells [Sat, 19 Jul 2008 07:44:32 +0000 (00:44 -0700)]

sparc: Remove Sparc's asm-offsets for sclow.S

Remove Sparc's asm-offsets for sclow.S as the (E)UID/(E)GID size and
offset definitions will cease to be correct if COW credentials are
merged.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sonic Zhang [Sat, 19 Jul 2008 07:42:41 +0000 (15:42 +0800)]

Blackfin arch: Extend sram malloc to handle L2 SRAM.

Extend system call to alloc L2 SRAM in application.
Automatically move following sections to L2 SRAM:
1. kernel built-in l2 attribute section
2. kernel module l2 attribute section
3. elf-fdpic application l2 attribute section

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

David S. Miller [Sat, 19 Jul 2008 07:30:39 +0000 (00:30 -0700)]

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6

commit | commitdiff | tree

Denis V. Lunev [Sat, 19 Jul 2008 07:29:42 +0000 (00:29 -0700)]

ipv6: remove unused macros from net/ipv6.h

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Denis V. Lunev [Sat, 19 Jul 2008 07:28:58 +0000 (00:28 -0700)]

ipv6: remove unused parameter from ip6_ra_control

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Lezcano [Sat, 19 Jul 2008 07:15:13 +0000 (00:15 -0700)]

tcp: fix kernel panic with listening_get_next

# BUG: unable to handle kernel NULL pointer dereference at
0000000000000038
IP: [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3
PGD 11e4b9067 PUD 11d16c067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Modules linked in: bridge ipv6 button battery ac loop dm_mod tg3 ext3
jbd edd fan thermal processor thermal_sys hwmon sg sata_svw libata dock
serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 3368, comm: slpd Not tainted 2.6.26-rc2-mm1-lxc4 #1
RIP: 0010:[<ffffffff821ed01e>] [<ffffffff821ed01e>]
listening_get_next+0x50/0x1b3
RSP: 0018:ffff81011e1fbe18 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8100be0ad3c0 RCX: ffff8100619f50c0
RDX: ffffffff82475be0 RSI: ffff81011d9ae6c0 RDI: ffff8100be0ad508
RBP: ffff81011f4f1240 R08: 00000000ffffffff R09: ffff8101185b6780
R10: 000000000000002d R11: ffffffff820fdbfa R12: ffff8100be0ad3c8
R13: ffff8100be0ad6a0 R14: ffff8100be0ad3c0 R15: ffffffff825b8ce0
FS: 00007f6a0ebd16d0(0000) GS:ffff81011f424540(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 000000011dc20000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slpd (pid: 3368, threadinfo ffff81011e1fa000, task
ffff81011f4b8660)
Stack: 00000000000002ee ffff81011f5a57c0 ffff81011f4f1240
ffff81011e1fbe90
0000000000001000 0000000000000000 00007fff16bf2590 ffffffff821ed9c8
ffff81011f5a57c0 ffff81011d9ae6c0 000000000000041a ffffffff820b0abd
Call Trace:
[<ffffffff821ed9c8>] ? tcp_seq_next+0x34/0x7e
[<ffffffff820b0abd>] ? seq_read+0x1aa/0x29d
[<ffffffff820d21b4>] ? proc_reg_read+0x73/0x8e
[<ffffffff8209769c>] ? vfs_read+0xaa/0x152
[<ffffffff82097a7d>] ? sys_read+0x45/0x6e
[<ffffffff8200bd2b>] ? system_call_after_swapgs+0x7b/0x80

Code: 31 a9 25 00 e9 b5 00 00 00 ff 45 20 83 7d 0c 01 75 79 4c 8b 75 10
48 8b 0e eb 1d 48 8b 51 20 0f b7 45 08 39 02 75 0e 48 8b 41 28 <4c> 39
78 38 0f 84 93 00 00 00 48 8b 09 48 85 c9 75 de 8b 55 1c
RIP [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3
RSP <ffff81011e1fbe18>
CR2: 0000000000000038

This kernel panic appears with CONFIG_NET_NS=y.

How to reproduce ?

    On the buggy host (host A)
       * ip addr add 1.2.3.4/24 dev eth0

    On a remote host (host B)
       * ip addr add 1.2.3.5/24 dev eth0
       * iptables -A INPUT -p tcp -s 1.2.3.4 -j DROP
       * ssh 1.2.3.4

    On host A:
       * netstat -ta or cat /proc/net/tcp

This bug happens when reading /proc/net/tcp[6] when there is a req_sock
at the SYN_RECV state.

When a SYN is received the minisock is created and the sk field is set to
NULL. In the listening_get_next function, we try to look at the field
req->sk->sk_net.

When looking at how to fix this bug, I noticed that is useless to do
the check for the minisock belonging to the namespace. A minisock belongs
to a listen point and this one is per namespace, so when browsing the
minisock they are always per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Robin Getz [Sat, 19 Jul 2008 07:11:15 +0000 (15:11 +0800)]

Blackfin arch: Remove useless config option.

Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Adam Langley [Sat, 19 Jul 2008 07:07:02 +0000 (00:07 -0700)]

tcp: Remove redundant checks when setting eff_sacks

Remove redundant checks when setting eff_sacks and make the number of SACKs a
compile time constant. Now that the options code knows how many SACK blocks can
fit in the header, we don't need to have the SACK code guessing at it.

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Adam Langley [Sat, 19 Jul 2008 07:04:31 +0000 (00:04 -0700)]

tcp: options clean up

This should fix the following bugs:
  * Connections with MD5 signatures produce invalid packets whenever SACK
    options are included
  * MD5 signatures are counted twice in the MSS calculations

Behaviour changes:
  * A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK

    This is because we can't fit any SACK blocks in a packet with MD5 + TS
    options. There was discussion about disabling SACK rather than TS in
    order to fit in better with old, buggy kernels, but that was deemed to
    be unnecessary.

  * SYNs with MD5 don't include a TS option

    See above.

Additionally, it removes a bunch of duplicated logic for calculating options,
which should help avoid these sort of issues in the future.

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Adam Langley [Sat, 19 Jul 2008 07:01:42 +0000 (00:01 -0700)]

tcp: Fix MD5 signatures for non-linear skbs

Currently, the MD5 code assumes that the SKBs are linear and, in the case
that they aren't, happily goes off and hashes off the end of the SKB and
into random memory.

Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy
Polyakov. Also includes a couple of missed route_caps from Stephen's patch
in [2].

[1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2
[2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sonic Zhang [Sat, 19 Jul 2008 06:51:31 +0000 (14:51 +0800)]

Blackfin arch: change L1 malloc to base on slab cache and lists.

Remove the sram piece limitation and improve the performance to
alloc/free sram piece data.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

commit | commitdiff | tree

Vlad Yasevich [Sat, 19 Jul 2008 06:08:21 +0000 (23:08 -0700)]

sctp: Update sctp global memory limit allocations.

Update sctp global memory limit allocations to be the same as TCP.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Harvey Harrison [Sat, 19 Jul 2008 06:07:09 +0000 (23:07 -0700)]

sctp: remove unnecessary byteshifting, calculate directly in big-endian

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Sat, 19 Jul 2008 06:06:32 +0000 (23:06 -0700)]

sctp: Allow only 1 listening socket with SO_REUSEADDR

When multiple socket bind to the same port with SO_REUSEADDR,
only 1 can be listining.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Sat, 19 Jul 2008 06:06:07 +0000 (23:06 -0700)]

sctp: Do not leak memory on multiple listen() calls

SCTP permits multiple listen call and on subsequent calls
we leak he memory allocated for the crypto transforms.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vlad Yasevich [Sat, 19 Jul 2008 06:05:40 +0000 (23:05 -0700)]

sctp: Support ipv6only AF_INET6 sockets.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Westphal [Sat, 19 Jul 2008 06:04:39 +0000 (23:04 -0700)]

sctp: Prevent uninitialized memory access

valgrind reports uninizialized memory accesses when running
sctp inside the network simulation cradle simulator:

Conditional jump or move depends on uninitialised value(s)
    at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324)
    by 0x57427DA: sctp_packet_transmit (output.c:403)
    by 0x5710EFF: sctp_outq_flush (outqueue.c:824)
    by 0x5710B88: sctp_outq_uncork (outqueue.c:701)
    by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548)
    by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976)
    by 0x5744460: sctp_do_sm (sm_sideeffect.c:945)
    by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94)
    by 0x5725C04: __sctp_connect (socket.c:1094)
    by 0x57297DC: sctp_connect (socket.c:3297)

Conditional jump or move depends on uninitialised value(s)
    at 0x575D3A5: mod_timer (timer.c:630)
    by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555)
    by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448)
    by 0x5753607: sctp_side_effects (sm_sideeffect.c:976)
    by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945)
    by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474)
    by 0x573347F: sctp_inq_push (inqueue.c:104)
    by 0x572EF93: sctp_rcv (input.c:256)
    by 0x5689623: ip_local_deliver_finish (ip_input.c:230)
    by 0x5689759: ip_local_deliver (ip_input.c:268)
    by 0x5689CAC: ip_rcv_finish (dst.h:246)

#1 is due to "if (t->pmtu_pending)".
8a4794914f9cf2681235ec2311e189fe307c28c7 "[SCTP] Flag a pmtu change request"
suggests it should be initialized to 0.

#2 is the heartbeat timer 'expires' value, which is uninizialised, but
test by mod_timer().
T3_rtx_timer seems to be affected by the same problem, so initialize it, too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Westphal [Sat, 19 Jul 2008 06:03:44 +0000 (23:03 -0700)]

sctp: Don't abort initialization when CONFIG_PROC_FS=n

This puts CONFIG_PROC_FS defines around the proc init/exit functions
and also avoids compiling proc.c if procfs is not supported.
Also make SCTP_DBG_OBJCNT depend on procfs.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Stephen Hemminger [Sat, 19 Jul 2008 06:02:15 +0000 (23:02 -0700)]

tcp: RTT metrics scaling

Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in
kernel units (jiffies) and this leaks out through the netlink API to
user space where the units for jiffies are unknown.

This patches changes the kernel to convert to/from milliseconds. This
changes the ABI, but milliseconds seemed like the most natural unit
for these parameters. Values available via syscall in
/proc/net/rt_cache and netlink will be in milliseconds.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Sat, 19 Jul 2008 06:00:11 +0000 (23:00 -0700)]

pkt_sched: Fix noqueue_qdisc initialization.

Like noop_qdisc, it needs a dummy backpointer and
explicit qdisc->q.lock initialization.

Based upon a report by Stephen Hemminger.

Signed-off-by: David S. Miller <davem@davemloft.net>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom