Artem Bityutskiy [Mon, 16 Jun 2008 10:35:23 +0000 (13:35 +0300)]
UBI: fix LEB locking
leb_read_unlock() may be called simultaniously by several tasks.
The would race at the following code:
up_read(&le->mutex);
if (free)
kfree(le);
And it is possible that one task frees 'le' before the other tasks
do 'up_read()'. Fix this by doing up_read and free inside the
'ubi->ltree' lock. Below it the oops we had because of this:
BUG: spinlock bad magic on CPU#0, integck/7504
BUG: unable to handle kernel paging request at 6b6b6c4f
IP: [<c0211221>] spin_bug+0x5c/0xdb
*pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ubifs ubi nandsim nand nand_ids nand_ecc video output
Normally UBI volumes are freed in the release function of
the struct device object. However, on error path they may
have to be freed before the struct device objects have been
initialized.
ubi_free_volume() function sets ubi->volumes[] to NULL, so
ubi_eba_close() is useless, it does not free what has to be freed.
So zap it and free vol->eba_tbl at the volume release function.
Pointed-out-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
x86: split spinlock implementations out into their own files
ftrace requires certain low-level code, like spinlocks and timestamps,
to be compiled without -pg in order to avoid infinite recursion. This
patch splits out the core paravirt spinlocks and the Xen spinlocks
into separate files which can be compiled without -pg.
Also do xen/time.c while we're about it. As a result, we can now use
ftrace within a Xen domain.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Call early_cpu_init() at the same (early) point in setup_arch().
The x86_64 code was calling it relatively late, after when other arch
code need to do cpu-related setup which depends on it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dump the "flows" number according to the number of active flows
instead of repeating the "limit".
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mv643xx_eth: enable hardware TX checksumming with vlan tags
Although mv643xx_eth has no hardware support for inserting a vlan
tag by twiddling some bits in the TX descriptor, it does support
hardware TX checksumming on packets where the IP header starts {a
limited set of values other than 14} bytes into the packet.
This patch sets mv643xx_eth's ->vlan_features to NETIF_F_SG |
NETIF_F_IP_CSUM, which prevents the stack from checksumming vlan'ed
packets in software, and if vlan tags are present on a transmitted
packet, notifies the hardware of this fact by toggling the right
bits in the TX descriptor.
mv643xx_eth: use auto phy polling for configuring (R)(G)MII interface
The mv643xx_eth hardware has a provision for polling the PHY's
MII management registers to obtain the (R)(G)MII interface speed
(10/100/1000) and duplex (half/full) and pause (off/symmetric)
settings to use to talk to the PHY.
The driver currently does not make use of this feature. Instead,
whenever there is a link status change event, it reads the current
link parameters from the PHY, and programs those parameters into
the mv643xx_eth MAC by hand.
This patch switches the mv643xx_eth driver to letting the MAC
auto-determine the (R)(G)MII link parameters by PHY polling, if there
is a PHY present. For PHYless ports (when e.g. the (R)(G)MII
interface is connected to a hardware switch), we keep hardcoding the
MII interface parameters.
The mv643xx_eth driver is limiting DMA bursts to 32 bytes, while
using the largest burst size (128 bytes) gives a couple percentage
points performance improvement in throughput tests, and the docs
say that the 128 byte default should not need to be changed, so
use 128 byte bursts instead.
mv643xx_eth: also check TX_IN_PROGRESS when disabling transmit path
The recommended sequence for waiting for the transmit path to clear
after disabling all of the transmit queues is to wait for the
TX_FIFO_EMPTY bit in the Port Status register to become set as well
as the TX_IN_PROGRESS bit to clear.
mv643xx_eth: don't fiddle with maximum receive packet size setting
The maximum receive packet size field in the Port Serial Control
register controls at what size received packets are flagged
overlength in the receive descriptor, but it doesn't prevent
overlength packets from being DMAd to memory and signaled to the
host like other received packets.
mv643xx_eth does not support receiving jumbo frames in 10/100 mode,
but setting the packet threshold to larger than 1522 bytes in 10/100
mode won't cause breakage by itself.
If we really want to enforce maximum packet size on the receiving
end instead of on the sending end where it should be done, we can
always just add a length check to the software receive handler
instead of relying on the hardware to do the comparison for us.
What's more, changing the maximum packet size field requires
temporarily disabling the RX/TX paths. So once the link comes
up in 10/100 Mb/s mode or 1000 Mb/s mode, we'd have to disable it
again just to set the right maximum packet size field (1522 in
10/100 Mb/s mode or 9700 in 1000 Mb/s mode), just so that we can
offload one comparison operation to hardware that we might as well
do in software, assuming that we'd want to do it at all.
Contrary to what the documentation suggests, there is no harm in
just setting a 9700 byte maximum packet size in 10/100 mode, so use
the maximum maximum packet size for all modes.
The mv643xx_eth driver allows doing transmit reclaim from within the
napi poll routine, but after doing reclaim, it would forget to check
the free transmit descriptor count and wake up the transmit queue if
the reclaim caused enough descriptors for a new packet to become
available. This would cause the netdev watchdog to occasionally kick
in during certain workloads with combined receive and transmit traffic.
Fix this by adding a wakeup check identical to the one in the
interrupt handler to the napi poll routine.
mv643xx_eth: prevent breakage when link goes down during transmit
When the ethernet link goes down while mv643xx_eth is transmitting
data, transmit DMA can stop before all queued transmit descriptors
have been processed. But even the descriptors that _have_ been
processed might not be properly marked as done before the transmit
DMA unit shuts down.
Then when the link comes up again, the hardware transmit pointer
might have advanced while not all previous packet descriptors have
been marked as transmitted, causing software transmit reclaim to
hang waiting for the hardware to finish transmitting a descriptor
that it has already skipped.
This patch forcibly reclaims all packets on the transmit ring on a
link down interrupt, and then resyncs the hardware transmit pointer to
what the software's idea of the first free descriptor is. Also, we
need to prevent re-waking the transmit queue if we get a 'transmit
done' interrupt at the same time as a 'link down' interrupt, which
this patch does as well.
The previously merged TX hang erratum workaround ("mv643xx_eth:
work around TX hang hardware issue") assumes that TX_END interrupts
are delivered simultaneously with or after their corresponding TX
interrupts, but this is not always true in practise.
In particular, it appears that TX_END interrupts are issued as soon
as descriptor fetch returns an invalid descriptor, which may happen
before earlier descriptors have been fully transmitted and written
back to memory as being done.
This hardware behavior can lead to a situation where the current
driver code mistakenly assumes that the MAC has given up transmitting
before noticing the packets that it is in fact still currently working
on, causing the driver to re-kick the transmit queue, which will only
cause the MAC to re-fetch the invalid head descriptor, and generate
another TX_END interrupt, et cetera, until the packets in the pipe
finally finish transmitting and have their descriptors written back
to memory, which will then finally break the loop.
Fix this by having the erratum workaround not check the 'number of
unfinished descriptor', but instead, to compare the software's idea
of what the head descriptor pointer should be to the hardware's head
descriptor pointer (which is updated on the same conditions as the
TX_END interupt is generated on, i.e. possibly before all previous
descriptors have been transmitted and written back).
Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"
Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
Merge branch 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix invalid proc_handler for softlockup_panic
softlockup: fix watchdog task wakeup frequency
softlockup: fix watchdog task wakeup frequency
softlockup: show irqtrace
softlockup: print a module list on being stuck
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
softlockup: fix softlockup_thresh fix
softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
softlockup: allow panic on lockup
Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (85 commits)
[ARM] pxa: add base support for PXA930 Handheld Platform (aka SAAR)
[ARM] pxa: add base support for PXA930 Evaluation Board (aka TavorEVB)
[ARM] pxa: add base support for PXA930 (aka Tavor-P)
[ARM] Update mach-types
[ARM] pxa: make littleton to use the new smc91x platform data
[ARM] pxa: make zylonite to use the new smc91x platform data
[ARM] pxa: make mainstone to use the new smc91x platform data
[ARM] pxa: make lubbock to use new smc91x platform data
[NET] smc91x: prepare SMC_USE_PXA_DMA to be specified in platform data
[NET] smc91x: prepare for SMC_IO_SHIFT to be a platform configurable variable
[NET] smc91x: add SMC91X_NOWAIT flag to platform data
[NET] smc91x: favor the use of SMC91X_USE_* instead of SMC_CAN_USE_*
[NET] smc91x: remove "irq_flags" from "struct smc91x_platdata"
[ARM] 5146/1: pxa2xx: convert all boards to call pxa2xx_transceiver_mode helper
Support for LCD on e740 e750 e400 and e800 e-series PDAs
E-series UDC support
PXA UDC - allow use of inverted GPIO for pullup
Add e350 support
Fix broken e-series build
E-series GPIO / IRQ definitions.
...
Harvey Harrison [Thu, 24 Jul 2008 00:45:58 +0000 (17:45 -0700)]
cifs: assorted endian annotations
fs/cifs/cifssmb.c:3917:13: warning: incorrect type in assignment (different base types)
fs/cifs/cifssmb.c:3917:13: expected bool [unsigned] [usertype] is_unicode
fs/cifs/cifssmb.c:3917:13: got restricted __le16
The comment explains why __force is used here.
fs/cifs/connect.c:458:16: warning: cast to restricted __be32
fs/cifs/connect.c:458:16: warning: cast to restricted __be32
fs/cifs/connect.c:458:16: warning: cast to restricted __be32
fs/cifs/connect.c:458:16: warning: cast to restricted __be32
fs/cifs/connect.c:458:16: warning: cast to restricted __be32
fs/cifs/connect.c:458:16: warning: cast to restricted __be32
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
Andre Detsch [Thu, 24 Jul 2008 01:01:54 +0000 (11:01 +1000)]
powerpc/spufs: better placement of spu affinity reference context
This patch adjusts the placement of a reference context from
a spu affinity chain. The reference context can now be placed
only on nodes that have enough spus not intended to be used by
another gang (already running on the node).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Roland McGrath [Tue, 24 Jun 2008 11:16:52 +0000 (04:16 -0700)]
i386 syscall audit fast-path
This adds fast paths for 32-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
These paths does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Andre Detsch [Thu, 24 Jul 2008 00:57:26 +0000 (10:57 +1000)]
powerpc/spufs: fix aff_mutex and cbe_spu_info[n].list_mutex deadlock
Currenlt,, it is possible to lock aff_mutex and
cbe_spu_info[n].list_mutex in different orders, allowing a deadlock to
occur. With this change, aff_mutex is not taken within a list_mutex
critical section anymore.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Roland McGrath [Tue, 24 Jun 2008 08:13:31 +0000 (01:13 -0700)]
x86_64 ia32 syscall audit fast-path
This adds fast paths for 32-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
These paths does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Roland McGrath [Mon, 23 Jun 2008 22:37:04 +0000 (15:37 -0700)]
x86_64 syscall audit fast-path
This adds a fast path for 64-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
This path does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Roland McGrath [Tue, 24 Jun 2008 03:41:12 +0000 (20:41 -0700)]
x86_64: remove bogus optimization in sysret_signal
This short-circuit path in sysret_signal looks wrong to me.
AFAICT, in practice the branch is never taken--and if it were,
it would go wrong. To wit, try loading a module whose init
function does set_thread_flag(TIF_IRET), and see insmod crash
(presumably with a wrong user stack pointer).
This is because the FIXUP_TOP_OF_STACK work hasn't been done yet
when we jump around the call to ptregscall_common and get to
int_with_check--where it expects the user RSP,SS,CS and EFLAGS to
have been stored by FIXUP_TOP_OF_STACK.
I don't think it's normally possible to get to sysret_signal with no
_TIF_DO_NOTIFY_MASK bits set anyway, so these two instructions are
already superfluous. If it ever did happen, it is harmless to call
do_notify_resume with nothing for it to do.
David S. Miller [Wed, 23 Jul 2008 23:38:45 +0000 (16:38 -0700)]
tcp: Clear probes_out more aggressively in tcp_ack().
This is based upon an excellent bug report from Eric Dumazet.
tcp_ack() should clear ->icsk_probes_out even if there are packets
outstanding. Otherwise if we get a sequence of ACKs while we do have
packets outstanding over and over again, we'll never clear the
probes_out value and eventually think the connection is too sick and
we'll reset it.
This appears to be some "optimization" added to tcp_ack() in the 2.4.x
timeframe. In 2.2.x, probes_out is pretty much always cleared by
tcp_ack().
Here is Eric's original report:
----------------------------------------
Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost.
In order to reproduce the problem, please try the following program on linux-2.6.25.*
Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000
iptables -N SLOWLO
iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT
iptables -A SLOWLO -j DROP
iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO
Then run the attached program and see the output :
# ./loop
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15)
write(): Connection timed out
wrote 890 bytes but was interrupted after 9 seconds
ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455
Exiting read() because no data available (4000 ms timeout).
read 860 bytes
While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15)
tcpdump :
15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257
15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257
15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257
15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257
15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257
15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257
15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257
15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257
15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257
15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257
15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257
15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257
15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257
15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257
15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257
15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257
15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257
15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257
15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257
15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257
15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257
15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257
15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257
15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257
15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257
15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257
15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257
15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257
15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257
15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257
15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257
15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257
15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257
15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257
15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257
15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0
Source of program :
/*
* small producer/consumer program.
* setup a listener on 127.0.0.1:12000
* Forks a child
* child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms
* Father accepts connection, and read all data
*/
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>
#include <sys/poll.h>
int port = 12000;
char buffer[4096];
int main(int argc, char *argv[])
{
int lfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in socket_address;
time_t t0, t1;
int on = 1, sfd, res;
unsigned long total = 0;
socklen_t alen = sizeof(socket_address);
pid_t pid;
Jan Nikitenko [Wed, 23 Jul 2008 23:27:07 +0000 (01:27 +0200)]
mmc_spi: put signals to low power off fix
The original intention was to write a zero byte to mmc to force spi
signals to low when doing power off. Somehow the spi_w8r8 call got there
so a read followed the write of single zero byte. This patch changes
that to simple write of zero byte without the following read.
This way the power off is more reliable and completely sufficient.
Signed-off-by: Jan Nikitenko <jan.nikitenko@gmail.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
posix-timers: fix posix_timer_event() vs dequeue_signal() race
The bug was reported and analysed by Mark McLoughlin <markmc@redhat.com>,
the patch is based on his and Roland's suggestions.
posix_timer_event() always rewrites the pre-allocated siginfo before sending
the signal. Most of the written info is the same all the time, but memset(0)
is very wrong. If ->sigq is queued we can race with collect_signal() which
can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
copy the wrong .si_code/si_tid/etc.
In short, sys_timer_settime() can in fact stop the active timer, or the user
can receive the siginfo with the wrong .si_xxx values.
Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
It would be nice to move the whole sigq->info initialization from send to
create path, but this is not easy to do without uglifying timer_create()
further.
As Roland rightly pointed out, we need more cleanups/fixes here, see the
"FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
it can mask the most bad implications.
Reported-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Oliver Pinter <oliver.pntr@gmail.com> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
kernel/posix-timers.c | 17 +++++++++++++----
kernel/signal.c | 1 +
2 files changed, 14 insertions(+), 4 deletions(-)
Evgeniy Polyakov noticed that drivers/net/e1000e/netdev.c:e1000_netpoll()
was calling e1000_clean_tx_irq() without taking the TX lock.
David Miller suggested to remove the call altogether: since in this
callpah there's periodic calls to ->poll() anyway which will do
e1000_clean_tx_irq() and will garbage-collect any finished TX ring
descriptors.
This fix solved the e1000e+netconsole crashes i've been seeing:
Oliver Hartkopp [Wed, 23 Jul 2008 21:06:04 +0000 (14:06 -0700)]
net: Update entry in af_family_clock_key_strings
In the merge phase of the CAN subsystem the
af_family_clock_key_strings[] have been added to sock.c in commit 443aef0eddfa44c158d1b94ebb431a70638fcab4
(lockdep: fixup sk_callback_lock annotation). This trivial patch adds
the missing name for address family 29 (AF_CAN).
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Jul 2008 21:01:29 +0000 (14:01 -0700)]
netdev: Remove warning from __netif_schedule().
It isn't helping anything and we aren't going to be able to change all
the drivers that do queue wakeups in strange situations.
Just letting a noop_qdisc get scheduled will work because when
qdisc_run() executes via net_tx_work() it will simply find no packets
pending when it makes the ->dequeue() call in qdisc_restart.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Thu, 24 Jul 2008 03:05:34 +0000 (20:05 -0700)]
md: fix merge error
The original STRIPE_OP_IO removal patch had the following hunk:
- for (i = conf->raid_disks; i--; ) {
+ for (i = conf->raid_disks; i--; )
set_bit(R5_Wantwrite, &sh->dev[i].flags);
- if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
- sh->ops.count++;
- }
However it appears the hunk became broken after merging:
- for (i = conf->raid_disks; i--; ) {
+ for (i = conf->raid_disks; i--; )
set_bit(R5_Wantwrite, &sh->dev[i].flags);
set_bit(R5_LOCKED, &dev->flags);
s.locked++;
- if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
- sh->ops.count++;
- }
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In file included from /home/v4l/master/v4l/dw2102.c:14:
/home/v4l/master/v4l/z0194a.h:93: error: 'STV0229_LOCKOUTPUT_1' undeclared here (not in a function)
This is due to some typos that were fixed on stv0299.
Hans Verkuil [Sun, 20 Jul 2008 09:35:02 +0000 (06:35 -0300)]
V4L/DVB (8429): videodev: renamed 'class_dev' to 'dev'
The class_dev field is a normal device, not a class device. This is very
confusing and now that the old 'dev' field has been renamed to 'parent'
we can rename 'class_dev' to just 'dev'.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Hans Verkuil [Sun, 20 Jul 2008 11:43:17 +0000 (08:43 -0300)]
V4L/DVB (8427): videodev: split off the ioctl handling into v4l2-ioctl.c
videodev.c became top-heavy so all the ioctl processing has been split off
into v4l2-ioctl.c. This means videodev.c is back to its original purpose:
creating and registering v4l devices.
Since videodev.c and v4l2-ioctl.c should still remain one module (as least
for now) I also had to rename videodev.c to v4l2-dev.c to prevent a
circular dependency when building a videodev.ko module. This is not a bad
thing, since the source and header now have the same name. And the v4l2-
prefix is useful to see which sources are generic v4l2 support code.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
sdhci: highmem capable PIO routines
sg: reimplement sg mapping iterator
mmc_test: print message when attaching to card
mmc: Remove Russell as primecell mci maintainer
mmc_block: bounce buffer highmem support
sdhci: fix bad warning from commit c8b3e02
sdhci: add warnings for bad buffers in ADMA path
mmc_test: test oversized sg lists
mmc_test: highmem tests
s3cmci: ensure host stopped on machine shutdown
au1xmmc: suspend/resume implementation
s3cmci: fixes for section mismatch warnings
pxamci: trivial fix of DMA alignment register bit clearing
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits)
I/OAT: I/OAT version 3.0 support
I/OAT: tcp_dma_copybreak default value dependent on I/OAT version
I/OAT: Add watchdog/reset functionality to ioatdma
iop_adma: cleanup iop_chan_xor_slot_count
iop_adma: document how to calculate the minimum descriptor pool size
iop_adma: directly reclaim descriptors on allocation failure
async_tx: make async_tx_test_ack a boolean routine
async_tx: remove depend_tx from async_tx_sync_epilog
async_tx: export async_tx_quiesce
async_tx: fix handling of the "out of descriptor" condition in async_xor
async_tx: ensure the xor destination buffer remains dma-mapped
async_tx: list_for_each_entry_rcu() cleanup
dmaengine: Driver for the Synopsys DesignWare DMA controller
dmaengine: Add slave DMA interface
dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap
dmaengine: Add dma_client parameter to device_alloc_chan_resources
dmatest: Simple DMA memcpy test client
dmaengine: DMA engine driver for Marvell XOR engine
iop-adma: fix platform driver hotplug/coldplug
dmaengine: track the number of clients using a channel
...
Fixed up conflict in drivers/dca/dca-sysfs.c manually
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (60 commits)
ide: small whitespace fixes
ide: ide-cd_ioctl.c fix sparse integer as NULL pointer warnings
ide: ide-cd.c fix sparse endianness warnings
ide-cd: convert to using the new atapi_flags
ide: remove unused PC_FLAG_DRQ_INTERRUPT
ide-scsi: convert to using the new atapi_flags
ide-tape: convert to using the new atapi_flags
ide-floppy: convert to using the new atapi_flags (take 2)
ide: add per-device flags
ide: use rq->cmd instead of pc->c in atapi common code
ide-scsi: pass packet command in rq->cmd
ide-tape: pass packet command in rq->cmd
ide-tape: make room for packet command ids in rq->cmd
ide-floppy: pass packet command in rq->cmd
ide: remove pc->callback member from ide_atapi_pc
ide-scsi: use drive->pc_callback instead of pc->callback
ide-tape: use drive->pc_callback instead of pc->callback
ide-floppy: use drive->pc_callback instead of pc->callback
ide: push pc callback pointer into the ide_drive_t structure
drivers/ide/ide-tape.c: remove double kfree
...
Jeff Layton [Wed, 23 Jul 2008 14:11:19 +0000 (10:11 -0400)]
lockdep: annotate cifs in-kernel sockets
Put CIFS sockets in their own class to avoid some lockdep warnings. CIFS
sockets are not exposed to user-space, and so are not subject to the
same deadlock scenarios.
Shaohua Li [Thu, 3 Jul 2008 14:45:38 +0000 (10:45 -0400)]
Input: serio - offload resume to kseriod
When resuming AUX ports psmouse driver calls psmouse_extensions()
to determine if the attached mouse is still the same, which may take
a while to complete for generic mice. Offload the resume process to
kseriod so the rest of the system may continue resuming without
waiting for the mouse.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>