]> pilppa.org Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
17 years agommc_block: wait for card even on failures
Pierre Ossman [Sun, 29 Jun 2008 10:19:47 +0000 (12:19 +0200)]
mmc_block: wait for card even on failures

Many failures are non-permanent, but the card might need some time to
finish what it is doing before becoming responsive again. Make sure we
wait for it to finish programming before dealing with the error.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: scatter-gather (ADMA) support
Pierre Ossman [Sat, 28 Jun 2008 16:28:51 +0000 (18:28 +0200)]
sdhci: scatter-gather (ADMA) support

Add support for the scatter-gather DMA mode present on newer controllers.
As the mode requires 32-bit alignment, non-aligned chunks are handled by
using a bounce buffer.

Also add some new quirks to handle controllers that have bugs in the
ADMA engine.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci-pci: don't penalize newer jmicron chips
Pierre Ossman [Sat, 28 Jun 2008 16:21:41 +0000 (18:21 +0200)]
sdhci-pci: don't penalize newer jmicron chips

The upcoming JMicron chips will have solved all the currently known
bugs, so don't penalize them for older problems.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc_test: only bind to supported cards
Pierre Ossman [Sat, 28 Jun 2008 15:51:27 +0000 (17:51 +0200)]
mmc_test: only bind to supported cards

We can only perform the tests on MMC and SD cards, so avoid binding
to any other type.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdio: clean up handling of byte mode transfer size
Pierre Ossman [Sat, 28 Jun 2008 11:22:40 +0000 (13:22 +0200)]
sdio: clean up handling of byte mode transfer size

Make sure that the maximum size for a byte mode transfer is identical
in all places. Also tweak the transfer helper so that a single byte
mode transfer is preferred over (possibly multiple) block mode
request(s).

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc,sdio: helper function for transfer padding
Pierre Ossman [Sat, 28 Jun 2008 10:52:45 +0000 (12:52 +0200)]
mmc,sdio: helper function for transfer padding

There are a lot of crappy controllers out there that cannot handle
all the request sizes that the MMC/SD/SDIO specifications require.
In case the card driver can pad the data to overcome the problems,
this commit adds a helper that calculates how much that padding
should be.

A corresponding helper is also added for SDIO, but it can also deal
with all the complexities of splitting up a large transfer efficiently.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: remove custom carddetect poll implementation.
Manuel Lauss [Fri, 27 Jun 2008 16:25:18 +0000 (18:25 +0200)]
au1xmmc: remove custom carddetect poll implementation.

The MMC core provides a carddetect poll feature, time to
remove the driver's own implementation of it.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: new maintainer.
Manuel Lauss [Mon, 9 Jun 2008 06:40:35 +0000 (08:40 +0200)]
au1xmmc: new maintainer.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: abort requests early if no card is present.
Manuel Lauss [Mon, 9 Jun 2008 06:39:11 +0000 (08:39 +0200)]
au1xmmc: abort requests early if no card is present.

Don't process an MMC request if no card is present.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: codingstyle tidying.
Manuel Lauss [Mon, 9 Jun 2008 06:38:35 +0000 (08:38 +0200)]
au1xmmc: codingstyle tidying.

Clean up the codebase, no functional changes.
- merge the au1xmmc.h header contents into the driver file,
- indentation, spelling and style fixes.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: SDIO IRQ support.
Manuel Lauss [Mon, 9 Jun 2008 06:38:03 +0000 (08:38 +0200)]
au1xmmc: SDIO IRQ support.

Wire up the SD controllers' SDIO IRQ capability.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: enable 4 bit transfer mode
Manuel Lauss [Mon, 9 Jun 2008 06:37:33 +0000 (08:37 +0200)]
au1xmmc: enable 4 bit transfer mode

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoau1xmmc: remove db1200 board code, rewrite probe.
Manuel Lauss [Mon, 9 Jun 2008 06:36:13 +0000 (08:36 +0200)]
au1xmmc: remove db1200 board code, rewrite probe.

Remove the DB1200 board-specific functions (card present, read-only,
activity LED methods) and instead add platform data which is passed
to the driver.  This also allows for platforms to implement other
carddetect schemes (e.g. dedicated irq) without having to pollute the
driver code.  The poll timer (used for pb1200) is kept for compatibility.

With the board-specific stuff gone, the driver's ->probe() code can be
cleaned up considerably.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoat91_mci: Fix byte mode transitions.
Ville Syrjala [Mon, 9 Jun 2008 19:06:45 +0000 (22:06 +0300)]
at91_mci: Fix byte mode transitions.

The byte mode support fails to clear the byte mode bit in the command
register, possibly leaving byte mode enabled with the counters programmed
in non-byte mode.

Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoat91_mci: Cover more AT91RM9200 and AT91SAM9261 errata.
Ville Syrjala [Mon, 9 Jun 2008 19:06:44 +0000 (22:06 +0300)]
at91_mci: Cover more AT91RM9200 and AT91SAM9261 errata.

According to the documentation the AT91SAM9261 MCI shares the block size
limitations of the AT91RM9200 MCI. Also the errata documentation for
AT91RM9200 and AT91SAM9261 state that stream commands are not supported.
This has not been tested on actual hardware.

Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoat91_mci: AT91SAM9260/9263 12 byte write erratum (v2)
Ville Syrjala [Sat, 14 Jun 2008 17:27:20 +0000 (20:27 +0300)]
at91_mci: AT91SAM9260/9263 12 byte write erratum (v2)

AT91SAM926[0/3] PDC must write at least 12 bytes. The code compiles and runs
but the actual condition for this erratum did not trigger in my tests so it's
unclear if it actually works as intended.

Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoat91_mci: manage cmd error and data error independently
Nicolas Ferre [Tue, 10 Jun 2008 09:27:29 +0000 (11:27 +0200)]
at91_mci: manage cmd error and data error independently

In at91_mci_completed_command() function, this patch distinguishes
command error and data error. It reports it in the corresponding
error field.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: do not read irq status twice as it will forget some errors
Nicolas Ferre [Fri, 30 May 2008 12:28:45 +0000 (14:28 +0200)]
mmc: at91_mci: do not read irq status twice as it will forget some errors

Reading AT91_MCI_SR again at the end of transfer can corrupt the
error reporting. Some fields in the SR register are read-and-clear.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: add sdio irq management
Eric Benard [Fri, 30 May 2008 12:26:05 +0000 (14:26 +0200)]
mmc: at91_mci: add sdio irq management

Enable SDIO interrupt handling.

Signed-off-by: Eric Benard <ebenard@free.fr>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: add multiwrite switch
Nicolas Ferre [Fri, 30 May 2008 12:08:56 +0000 (14:08 +0200)]
mmc: at91_mci: add multiwrite switch

at91_mci is capable of multiwrite. Enable it before it disappears.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: update bytes_xfered value once xfer done
Nicolas Ferre [Fri, 30 May 2008 12:18:57 +0000 (14:18 +0200)]
mmc: at91_mci: update bytes_xfered value once xfer done

Modify bytes_xfered value after a write.

That will report, as accurately as possible, the amount of
sectors that are effectively written.

This update introduces the check of the busy signal given by
the card.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: avoid timeouts
Marc Pignat [Fri, 30 May 2008 12:07:47 +0000 (14:07 +0200)]
mmc: at91_mci: avoid timeouts

The at91 mci controller internal state machine seems to often crash. This can
be fixed by resetting the controller after each command for at91rm9200 and by
setting the MCI_BLKR register on at91sam926*.

Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: Hans J Koch <hjk@linutronix.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: show timeouts
Marc Pignat [Fri, 30 May 2008 12:06:32 +0000 (14:06 +0200)]
mmc: at91_mci: show timeouts

Detect command timeout (or mci controller hangs).

Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: Hans J Koch <hjk@linutronix.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: at91_mci: support for block size not modulo 4
Marc Pignat [Fri, 30 May 2008 12:05:24 +0000 (14:05 +0200)]
mmc: at91_mci: support for block size not modulo 4

Implement transfer with size not modulo 4 for at91sam9*. Please note that the
at91rm9200 simply can't handle this.

Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoMMC: Trivial comment cleanup
Deepak Saxena [Tue, 17 Jun 2008 02:20:57 +0000 (19:20 -0700)]
MMC: Trivial comment cleanup

Make the variable name in the comments match the actual name
of the variable.

Signed-off-by: Deepak Saxena <dsaxena@laptop.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: change .get_ro() callback semantics
Anton Vorontsov [Tue, 17 Jun 2008 14:17:39 +0000 (18:17 +0400)]
mmc: change .get_ro() callback semantics

Now get_ro() callback must return 0/1 values for its logical states, and
negative errno values in case of error. If particular host instance doesn't
support RO/WP switch, it should return -ENOSYS.

This patch changes some hosts in two ways:

1. Now functions should be smart to not return negative values in
   "RO asserted" case (particularly gpio_ calls could return negative
   values for the outermost GPIOs).

   Also, board code usually passes get_ro() callbacks that directly return
   gpioreg & bit result, so at91_mci, imxmmc, pxamci and mmc_spi's get_ro()
   handlers need take special care when returning platform's values to the
   mmc core.

2. In case of host instance didn't implement get_ro() callback, it should
   really return -ENOSYS and let the mmc core decide what to do about it
   (mmc core thinks the same way as the hosts, so it isn't functional
   change).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc_spi: add support for card-detection polling
Anton Vorontsov [Tue, 17 Jun 2008 14:17:21 +0000 (18:17 +0400)]
mmc_spi: add support for card-detection polling

This patch adds new platform data variable "caps", so platforms
could pass theirs capabilities into MMC core (for example, platforms
without interrupt on the CD line will most probably want to pass
MMC_CAP_NEEDS_POLL).

New platform get_cd() callback provided to optimize polling.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc: add support for card-detection polling
Anton Vorontsov [Tue, 17 Jun 2008 14:17:15 +0000 (18:17 +0400)]
mmc: add support for card-detection polling

Some hosts (and boards that use mmc_spi) do not use interrupts on the CD
line, so they can't trigger mmc_detect_change. We want to poll the card
and see if there was a change. 1 second poll interval seems resonable.

This patch also implements .get_cd() host operation, that could be used
by the hosts that are able to report card-detect status without need to
talk MMC.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoinclude/linux/mmc/mmc.h: remove CVS tags
Adrian Bunk [Mon, 19 May 2008 21:57:27 +0000 (00:57 +0300)]
include/linux/mmc/mmc.h: remove CVS tags

This patch removes a CVS tag that wasn't updated for a long time.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci-pci: unaligned data with ricoh controllers
Pierre Ossman [Wed, 28 May 2008 07:54:50 +0000 (09:54 +0200)]
sdhci-pci: unaligned data with ricoh controllers

The Ricoh controllers cannot handle unaligned data blocks.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agommc_test: add test case control
Pierre Ossman [Sat, 24 May 2008 20:36:31 +0000 (22:36 +0200)]
mmc_test: add test case control

Add the ability to run just a single test case by writing the test
case number into the sysfs "test" file.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: handle hot-remove
Pierre Ossman [Wed, 16 Apr 2008 17:13:13 +0000 (19:13 +0200)]
sdhci: handle hot-remove

Gracefully handle when the device is suddenly removed. Do a test read
and avoid any further access if that read returns -1.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: support JMicron secondary interface
Pierre Ossman [Fri, 4 Apr 2008 17:36:59 +0000 (19:36 +0200)]
sdhci: support JMicron secondary interface

JMicron chips sometimes have two interfaces to work around limitations
in Microsoft's sdhci driver. This patch allows us to use either interface.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: toggle JMicron PMOS setting
Pierre Ossman [Mon, 24 Mar 2008 12:09:09 +0000 (13:09 +0100)]
sdhci: toggle JMicron PMOS setting

Some of the JMicron chips requires us to manually enable the power
output stages of the chip. Add the necessary hooks and functions to
manage this.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: make workaround for timeout bug more general
Pierre Ossman [Fri, 4 Jul 2008 22:25:15 +0000 (00:25 +0200)]
sdhci: make workaround for timeout bug more general

Give the quirk for broken timeout handling a better chance of handling
more controllers by simply classifying the system as broken and setting
a fixed value.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: more complex quirks handling
Pierre Ossman [Sun, 23 Mar 2008 18:33:23 +0000 (19:33 +0100)]
sdhci: more complex quirks handling

Extend the quirks handling in the PCI driver to be able to have
callbacks and not just flags.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: remove forced dma quirks
Pierre Ossman [Sat, 22 Mar 2008 21:05:40 +0000 (22:05 +0100)]
sdhci: remove forced dma quirks

Remove the quirk to force DMA on the Ricoh and TI controllers as it is
no longer needed. The only bug they have is that they use an incorrect
PCI interface value, and that is not respected anymore.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: move pci stuff to separate module
Pierre Ossman [Tue, 18 Mar 2008 16:35:49 +0000 (17:35 +0100)]
sdhci: move pci stuff to separate module

The SDHCI interface is not PCI specific, yet the Linux driver was
intimitely connected to the PCI bus. This patch properly separates
the PCI specific portion from the bus independent code.

This patch is based on work by Ben Dooks but he did not have time
to complete it.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agosdhci: don't check block count for progress
Pierre Ossman [Fri, 18 Apr 2008 18:41:49 +0000 (20:41 +0200)]
sdhci: don't check block count for progress

The specification is insufficiently strict when it comes to how the
hardware should update the block count register, making it useless
for checking transfer progress.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
17 years agoMerge branch 'linus' into x86/urgent
Ingo Molnar [Tue, 15 Jul 2008 11:45:59 +0000 (13:45 +0200)]
Merge branch 'linus' into x86/urgent

17 years agoKprobe smoke test lockdep warning
Peter Zijlstra [Tue, 22 Apr 2008 13:09:30 +0000 (15:09 +0200)]
Kprobe smoke test lockdep warning

On Mon, 2008-04-21 at 18:54 -0400, Masami Hiramatsu wrote:
> Thank you for reporting.
>
> Actually, kprobes tries to fixup thread's flags in post_kprobe_handler
> (which is called from kprobe_exceptions_notify) by
> trace_hardirqs_fixup_flags(pt_regs->flags). However, even the irq flag
> is set in pt_regs->flags, true hardirq is still off until returning
> from do_debug. Thus, lockdep assumes that hardirq is off without annotation.
>
> IMHO, one possible solution is that fixing hardirq flags right after
> notify_die in do_debug instead of in post_kprobe_handler.

My reply to BZ 10489:

> [    2.707509] Kprobe smoke test started
> [    2.709300] ------------[ cut here ]------------
> [    2.709420] WARNING: at kernel/lockdep.c:2658 check_flags+0x4d/0x12c()
> [    2.709541] Modules linked in:
> [    2.709588] Pid: 1, comm: swapper Not tainted 2.6.25.jml.057 #1
> [    2.709588]  [<c0126acc>] warn_on_slowpath+0x41/0x51
> [    2.709588]  [<c010bafc>] ? save_stack_trace+0x1d/0x3b
> [    2.709588]  [<c0140a83>] ? save_trace+0x37/0x89
> [    2.709588]  [<c011987d>] ? kernel_map_pages+0x103/0x11c
> [    2.709588]  [<c0109803>] ? native_sched_clock+0xca/0xea
> [    2.709588]  [<c0142958>] ? mark_held_locks+0x41/0x5c
> [    2.709588]  [<c0382580>] ? kprobe_exceptions_notify+0x322/0x3af
> [    2.709588]  [<c0142aff>] ? trace_hardirqs_on+0xf1/0x119
> [    2.709588]  [<c03825b3>] ? kprobe_exceptions_notify+0x355/0x3af
> [    2.709588]  [<c0140823>] check_flags+0x4d/0x12c
> [    2.709588]  [<c0143c9d>] lock_release+0x58/0x195
> [    2.709588]  [<c038347c>] ? __atomic_notifier_call_chain+0x0/0x80
> [    2.709588]  [<c03834d6>] __atomic_notifier_call_chain+0x5a/0x80
> [    2.709588]  [<c0383508>] atomic_notifier_call_chain+0xc/0xe
> [    2.709588]  [<c013b6d4>] notify_die+0x2d/0x2f
> [    2.709588]  [<c038168a>] do_debug+0x67/0xfe
> [    2.709588]  [<c0381287>] debug_stack_correct+0x27/0x30
> [    2.709588]  [<c01564c0>] ? kprobe_target+0x1/0x34
> [    2.709588]  [<c0156572>] ? init_test_probes+0x50/0x186
> [    2.709588]  [<c04fae48>] init_kprobes+0x85/0x8c
> [    2.709588]  [<c04e947b>] kernel_init+0x13d/0x298
> [    2.709588]  [<c04e933e>] ? kernel_init+0x0/0x298
> [    2.709588]  [<c04e933e>] ? kernel_init+0x0/0x298
> [    2.709588]  [<c0105ef7>] kernel_thread_helper+0x7/0x10
> [    2.709588]  =======================
> [    2.709588] ---[ end trace 778e504de7e3b1e3 ]---
> [    2.709588] possible reason: unannotated irqs-off.
> [    2.709588] irq event stamp: 370065
> [    2.709588] hardirqs last  enabled at (370065): [<c0382580>] kprobe_exceptions_notify+0x322/0x3af
> [    2.709588] hardirqs last disabled at (370064): [<c0381bb7>] do_int3+0x1d/0x7d
> [    2.709588] softirqs last  enabled at (370050): [<c012b464>] __do_softirq+0xfa/0x100
> [    2.709588] softirqs last disabled at (370045): [<c0107438>] do_softirq+0x74/0xd9
> [    2.714751] Kprobe smoke test passed successfully

how I love this stuff...

Ok, do_debug() is a trap, this can happen at any time regardless of the
machine's IRQ state. So the first thing we do is fix up the IRQ state.
Then we call this die notifier stuff; and return with messed up IRQ
state... YAY.

So, kprobes fudges it..

  notify_die(DIE_DEBUG)
    kprobe_exceptions_notify()
      post_kprobe_handler()
        modify regs->flags
        trace_hardirqs_fixup_flags(regs->flags);  <--- must be it

So what's the use of modifying flags if they're not meant to take effect
at some point.

/me tries to reproduce issue; enable kprobes test thingy && boot

OK, that reproduces..

So the below makes it work - but I'm not getting this code; at the time
I wrote that stuff I CC'ed each and every kprobe maintainer listed in
the usual places but got no reposonse - can some please explain this
stuff to me?

Are the saved flags only for the TF bit or are they made in full effect
later (and if so, where) ?

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoiucv: fix memory leak in cpu hotplug error path.
Akinobu Mita [Tue, 15 Jul 2008 09:09:53 +0000 (02:09 -0700)]
iucv: fix memory leak in cpu hotplug error path.

Fix memory leak in error path in CPU_UP_PREPARE notifier.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agowireless: fix warnings from QoS patch
Johannes Berg [Tue, 15 Jul 2008 09:08:24 +0000 (02:08 -0700)]
wireless: fix warnings from QoS patch

When I removed the special "default" meaning from the QoS
parameters, I forgot to update drivers and this lead to
warnings because some drivers were checking for the special
values and putting in defaults. This fixes that by removing
the default special-casing completely.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoRevert "SELinux: allow fstype unknown to policy to use xattrs if present"
James Morris [Tue, 15 Jul 2008 08:32:49 +0000 (18:32 +1000)]
Revert "SELinux: allow fstype unknown to policy to use xattrs if present"

This reverts commit 811f3799279e567aa354c649ce22688d949ac7a9.

From Eric Paris:

"Please drop this patch for now.  It deadlocks on ntfs-3g.  I need to
rework it to handle fuse filesystems better.  (casey was right)"

17 years agoBlackfin arch: Allow ptrace to peek and poke application data in L1 data SRAM.
Jie Zhang [Tue, 15 Jul 2008 08:15:40 +0000 (16:15 +0800)]
Blackfin arch: Allow ptrace to peek and poke application data in L1 data SRAM.

Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: Add ANOMALY_05000368 workaround
Michael Hennerich [Wed, 16 Jul 2008 08:59:05 +0000 (16:59 +0800)]
Blackfin arch: Add ANOMALY_05000368 workaround

Possible RETS Register Corruption when Subroutine Is under 5 Cycles in Duration

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: Functional power management support
Michael Hennerich [Sat, 19 Jul 2008 08:57:32 +0000 (16:57 +0800)]
Blackfin arch: Functional power management support

Enable: PM_SUSPEND_MEM -> Blackfin Hibernate to SDRAM
This feature requires a special bootloader (u-boot)
supporting return from hibernate.

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: Fix BUG - JUMP error in kernel (relocation truncated to fit: R_pcrel12...
Michael Hennerich [Tue, 15 Jul 2008 08:38:28 +0000 (16:38 +0800)]
Blackfin arch: Fix BUG - JUMP error in kernel (relocation truncated to fit: R_pcrel12_jump_s)

Use long jump

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agobluetooth/hci_bcsp: fix bitrev Kconfig
Randy Dunlap [Tue, 15 Jul 2008 07:51:45 +0000 (00:51 -0700)]
bluetooth/hci_bcsp: fix bitrev Kconfig

Fix bluetooth hci_bcsp Kconfig to avoid build errors:

drivers/built-in.o: In function `bcsp_prepare_pkt':
hci_bcsp.c:(.text+0x7e9ac): undefined reference to `bitrev16'
drivers/built-in.o: In function `bcsp_recv':
hci_bcsp.c:(.text+0x7f276): undefined reference to `bitrev16'
hci_bcsp.c:(.text+0x7f293): undefined reference to `bitrev16'
make[1]: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Ackey-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agonet: refactor tcp splice receive path to improve readability
Octavian Purdila [Tue, 15 Jul 2008 07:49:11 +0000 (00:49 -0700)]
net: refactor tcp splice receive path to improve readability

- move all of the details on offsets, lengths and buffers into a
single function instead of doing these operation from multiple places

- use a bottom up approach: try to avoid details in the high level
functions, introduce them gradually as we go deeper in the function
call stack

With helpful feedback from Jarek Poplawski.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agonetdev: Do not use TX lock to protect address lists.
David S. Miller [Tue, 15 Jul 2008 07:15:08 +0000 (00:15 -0700)]
netdev: Do not use TX lock to protect address lists.

Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agonetdev: Add netdev->addr_list_lock protection.
David S. Miller [Tue, 15 Jul 2008 07:13:44 +0000 (00:13 -0700)]
netdev: Add netdev->addr_list_lock protection.

Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agonetdev: Add addr_list_lock to struct net_device.
David S. Miller [Tue, 15 Jul 2008 07:08:33 +0000 (00:08 -0700)]
netdev: Add addr_list_lock to struct net_device.

This will be used to protect the per-device unicast and multicast
address lists, as well as the callbacks into the drivers which
configure such state such as ->set_rx_mode() and ->set_multicast_list().

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoIB/mlx4: Use kzalloc() for new QPs so flags are initialized to 0
Eli Cohen [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
IB/mlx4: Use kzalloc() for new QPs so flags are initialized to 0

Current code uses kmalloc() and then just does a bitwise OR operation on
qp->flags in create_qp_common(), which means that qp->flags may
potentially have some unintended bits set.  This patch uses kzalloc()
and avoids further explicit clearing of structure members, which also
shrinks the code:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-65 (-65)
function                                     old     new   delta
create_qp_common                            2024    1959     -65

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agomlx4_core: Use MOD_STAT_CFG command to get minimal page size
Vladimir Sokolovsky [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
mlx4_core: Use MOD_STAT_CFG command to get minimal page size

There was a bug in some versions of the mlx4 driver in
mlx4_alloc_fmr(), which hardcoded the minimum acceptable page_shift to
be 12.  However, new ConnectX firmware can support a minimum
page_shift of 9 (log_pg_sz of 9 returned by QUERY_DEV_LIM) -- so with
old drivers, ib_fmr_alloc() would fail for ULPs using the device
minimum when creating FMRs.

To preserve firmware compatibility with released mlx4 drivers, the
firmware will continue to return 12 as before for log_page_sz in
QUERY_DEV_CAP for these drivers.  However, to enable new drivers to
take advantage of the available smaller page size, the mlx4 driver now
first sets the log_pg_sz to the device minimum by setting a
log_page_sz value to 0 via the MOD_STAT_CFG command and then reading
the real minimum via QUERY_DEV_CAP.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cma: Simplify locking needed for serialization of callbacks
Or Gerlitz [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/cma: Simplify locking needed for serialization of callbacks

The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable.  This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr
Or Gerlitz [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr

Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cxgb3: Fixes for zero STag
Steve Wise [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/cxgb3: Fixes for zero STag

Handling the zero STag in receive work request requires some extra
logic in the driver:

 - Only set the QP_PRIV bit for kernel mode QPs.

- Add a zero STag build function for recv wrs. The uP needs a PBL
  allocated and passed down in the recv WR so it can construct a HW
  PBL for the zero STag S/G entries.  Note: we need to place a few
  restrictions on zero STag usage because of this:

  1) all SGEs in a recv WR must either be zero STag or not.  No mixing.

  2) an individual SGE length cannot exceed 128MB for a zero-stag SGE.
     This should be OK since it's not really practical to allocate
     such a large chunk of pinned contiguous DMA mapped memory.

- Add an optimized non-zero-STag recv wr format for kernel users.
  This is needed to optimize both zero and non-zero STag cracking in
  the recv path for kernel users.

 - Remove the iwch_ prefix from the static build functions.

 - Bump required FW version.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
17 years agoRDMA/core: Add local DMA L_Key support
Steve Wise [Tue, 15 Jul 2008 06:48:53 +0000 (23:48 -0700)]
RDMA/core: Add local DMA L_Key support

- Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
  IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
  STag support and IB HCAs to indicate reserved L_Key support.

- Add a u32 local_dma_lkey member to struct ib_device.  Drivers fill
  this in with the appropriate local DMA L_Key (if they support it).

- Fix up the drivers using this flag.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Fix check of max_send_sge for special QPs
Roland Dreier [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IB/mthca: Fix check of max_send_sge for special QPs

The MLX transport requires two extra gather entries for sends (one for
the header and one for the checksum at the end, as the comment says).
However the code checked that max_recv_sge was not too big, instead of
checking max_send_sge as it should have.  Fix the code to check the
correct condition.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Use round_jiffies() for catastrophic error polling timer
Roland Dreier [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IB/mthca: Use round_jiffies() for catastrophic error polling timer

Exactly when the catastrophic error polling timer function runs is not
important, so use round_jiffies() to save unnecessary wakeups.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Remove "stop" flag for catastrophic error polling timer
Roland Dreier [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IB/mthca: Remove "stop" flag for catastrophic error polling timer

Since we use del_timer_sync() anyway, there's no need for an
additional flag to tell the timer not to rearm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Double default RX/TX ring sizes
Eli Cohen [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IPoIB: Double default RX/TX ring sizes

Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks.  With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput.  Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB/cm: Reduce connected mode TX object size
Eli Cohen [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IPoIB/cm: Reduce connected mode TX object size

Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: Use IEEE OUI for vendor_id reported by ibv_query_device()
Ralph Campbell [Tue, 15 Jul 2008 06:48:52 +0000 (23:48 -0700)]
IB/ipath: Use IEEE OUI for vendor_id reported by ibv_query_device()

The IB spe. for SubnGet(NodeInfo) and query HCA says that the vendor
ID field should be the IEEE OUI assigned to the vendor.  The ipath
driver was returning the PCI vendor ID instead.  This will affect
applications which call ibv_query_device().  The old value was
0x001fc1 or 0x001077, the new value is 0x001175.

The vendor ID doesn't appear to be exported via /sys so that should
reduce possible compatibility issues.  I'm only aware of Open MPI as a
major application which depends on this change, and they have made
necessary adjustments.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Use dev_set_mtu() to change mtu
Eli Cohen [Tue, 15 Jul 2008 06:48:51 +0000 (23:48 -0700)]
IPoIB: Use dev_set_mtu() to change mtu

When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Use rtnl lock/unlock when changing device flags
Eli Cohen [Tue, 15 Jul 2008 06:48:51 +0000 (23:48 -0700)]
IPoIB: Use rtnl lock/unlock when changing device flags

Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Get rid of ipoib_mcast_detach() wrapper
Roland Dreier [Tue, 15 Jul 2008 06:48:50 +0000 (23:48 -0700)]
IPoIB: Get rid of ipoib_mcast_detach() wrapper

ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function                                     old     new   delta
ipoib_mcast_leave                            357     319     -38
ipoib_mcast_detach                            67       -     -67

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Only set Q_Key once: after joining broadcast group
Eli Cohen [Tue, 15 Jul 2008 06:48:50 +0000 (23:48 -0700)]
IPoIB: Only set Q_Key once: after joining broadcast group

The current code will set the Q_Key for any join of a non-sendonly
multicast group.  The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group.  Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Remove priv->mcast_mutex
Eli Cohen [Tue, 15 Jul 2008 06:48:50 +0000 (23:48 -0700)]
IPoIB: Remove priv->mcast_mutex

No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast
since these operations are synchronized at the HW driver layer.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Remove unused IPOIB_MCAST_STARTED code
Eli Cohen [Tue, 15 Jul 2008 06:48:50 +0000 (23:48 -0700)]
IPoIB: Remove unused IPOIB_MCAST_STARTED code

The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b
("IPoIB: Don't drop multicast sends when they can be queued"), so
remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cxgb3: Set rkey field for new memory windows in iwch_alloc_mw()
Steve Wise [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
RDMA/cxgb3: Set rkey field for new memory windows in iwch_alloc_mw()

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/nes: Get rid of ring_doorbell parameter of nes_post_cqp_request()
Roland Dreier [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
RDMA/nes: Get rid of ring_doorbell parameter of nes_post_cqp_request()

Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL,
so just remove that parameter and always ring the doorbell.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
17 years agoRDMA/cxgb3: Propagate HW page size capabilities
Jon Mason [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
RDMA/cxgb3: Propagate HW page size capabilities

cxgb3 does not currently report the page size capabilities, and
incorrectly reports them internally.

This version changes the bit-shifting to a static value (per Steve's
request).

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/nes: Encapsulate logic nes_put_cqp_request()
Roland Dreier [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
RDMA/nes: Encapsulate logic nes_put_cqp_request()

The iw_nes driver repeats the logic

if (atomic_dec_and_test(&cqp_request->refcount)) {
if (cqp_request->dynamic) {
kfree(cqp_request);
} else {
spin_lock_irqsave(&nesdev->cqp.lock, flags);
list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs);
spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
}
}

over and over.  Wrap this up in functions nes_free_cqp_request() and
nes_put_cqp_request() to simplify such code.

In addition to making the source smaller and more readable, this shrinks
the compiled code quite a bit:

add/remove: 2/0 grow/shrink: 0/13 up/down: 164/-1692 (-1528)
function                                     old     new   delta
nes_free_cqp_request                           -     147    +147
nes_put_cqp_request                            -      17     +17
nes_modify_qp                               2316    2293     -23
nes_hw_modify_qp                             737     657     -80
nes_dereg_mr                                 945     860     -85
flush_wqes                                   501     416     -85
nes_manage_apbvt                             648     560     -88
nes_reg_mr                                  1117    1026     -91
nes_cqp_ce_handler                           927     769    -158
nes_alloc_mw                                1052     884    -168
nes_create_qp                               5314    5141    -173
nes_alloc_fmr                               2212    2035    -177
nes_destroy_cq                              1097     918    -179
nes_create_cq                               2787    2598    -189
nes_dealloc_mw                               762     566    -196

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
17 years agoIPoIB: Refresh paths instead of flushing them on SM change events
Moni Shoua [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
IPoIB: Refresh paths instead of flushing them on SM change events

The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level.  In most cases, SM
failover doesn't cause LID change so traffic won't stop.  In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state.  In fact, preventing the device from
going down saves packets that otherwise would be lost.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Make device table externally visible
Joachim Fenkes [Tue, 15 Jul 2008 06:48:49 +0000 (23:48 -0700)]
IB/ehca: Make device table externally visible

This gives ehca an autogenerated modalias and therefore enables automatic loading.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: add LRO support
Vladimir Sokolovsky [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
IPoIB: add LRO support

Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated.  Make LRO controllable and LRO statistics accessible
through ethtool.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Use multicast loopback blocking if available
Ron Livne [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
IPoIB: Use multicast loopback blocking if available

Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device.  This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Add support for blocking multicast loopback packets
Ron Livne [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
IB/mlx4: Add support for blocking multicast loopback packets

Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
flag by using the per-multicast group loopback blocking feature of
mlx4 hardware.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/core: Add support for multicast loopback blocking
Ron Livne [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
IB/core: Add support for multicast loopback blocking

This patch also adds a creation flag for QPs,
IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that
multicast sends from the QP to a group that the QP is attached to will
not be looped back to the QP's receive queue.  This can be used to
save receive resources when a consumer does not want a local copy of
multicast traffic; for example IPoIB must waste CPU time throwing away
such local copies of multicast traffic.

This patch also adds a device capability flag that shows whether a
device supports this feature or not.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cxgb3: Add support for protocol statistics
Steve Wise [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
RDMA/cxgb3: Add support for protocol statistics

- Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low
  level driver to obtain the protocol mib from the rnic hardware.

- Add new iw_cxgb3 provider method to get the MIB from the low level
  driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/core: Add iWARP protocol statistics attributes in sysfs
Steve Wise [Tue, 15 Jul 2008 06:48:48 +0000 (23:48 -0700)]
RDMA/core: Add iWARP protocol statistics attributes in sysfs

This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()
Roland Dreier [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()

For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cma: Add missing newlines to printk()s
Roland Dreier [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
RDMA/cma: Add missing newlines to printk()s

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
17 years agoRDMA/cxgb3: Remove write-only iwch_rnic_attributes fields
Roland Dreier [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
RDMA/cxgb3: Remove write-only iwch_rnic_attributes fields

The members struct iwch_rnic_attributes.vendor_id and .vendor_part_id
are write-only, so we might as well get rid of them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
17 years agoRDMA/cxgb3: Fix up some ib_device_attr fields
Steve Wise [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
RDMA/cxgb3: Fix up some ib_device_attr fields

- set fw_ver
- set hw_ver
- set max_qp_wr to something reasonable
- set max_cqe to something reasonable

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts
Stefan Roscher [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

During corner case testing, we noticed that some versions of ehca do
not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI, if
EQEs are pending.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ehca: Reject receive work requests if QP is in RESET state
Joachim Fenkes [Tue, 15 Jul 2008 06:48:47 +0000 (23:48 -0700)]
IB/ehca: Reject receive work requests if QP is in RESET state

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Remove extra code for RESET->ERR QP state transition
Roland Dreier [Tue, 15 Jul 2008 06:48:46 +0000 (23:48 -0700)]
IB/mlx4: Remove extra code for RESET->ERR QP state transition

Commit 65adfa91 ("IB/mlx4: Fix RESET to RESET and RESET to ERROR
transitions") added some extra code to handle a QP state transition
from RESET to ERROR.  However, the latest 1.2.1 version of the IB spec
has clarified that this transition is actually not allowed, so we can
remove this extra code again.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Remove extra code for RESET->ERR QP state transition
Roland Dreier [Tue, 15 Jul 2008 06:48:46 +0000 (23:48 -0700)]
IB/mthca: Remove extra code for RESET->ERR QP state transition

Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some
extra code to handle a QP state transition from RESET to ERROR.
However, the latest 1.2.1 version of the IB spec has clarified that
this transition is actually not allowed, so we can remove this extra
code again.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/core: Reset to error QP state transition is not allowed
Ralph Campbell [Tue, 15 Jul 2008 06:48:46 +0000 (23:48 -0700)]
IB/core: Reset to error QP state transition is not allowed

I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Pass congestion management class MADs to the HCA
Eli Cohen [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
IB/mlx4: Pass congestion management class MADs to the HCA

ConnectX HCAs support the IB_MGMT_CLASS_CONG_MGMT management class, so
process MADs of this class through the MAD_IFC firmware command.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mlx4: Configure QPs' max message size based on real device capability
Eli Cohen [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
IB/mlx4: Configure QPs' max message size based on real device capability

ConnectX returns the max message size it supports through the
QUERY_DEV_CAP firmware command.  When modifying a QP to RTR, the max
message size for the QP must be specified.  This value must not exceed
the value declared through QUERY_DEV_CAP.  The current code ignores
the max allowed size and unconditionally sets the value to 2^31.  This
patch sets all QPs to the max value allowed as returned from firmware.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/cxgb3: MEM_MGT_EXTENSIONS support
Steve Wise [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
RDMA/cxgb3: MEM_MGT_EXTENSIONS support

- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
- set max_fast_reg_page_list_len device attribute.
- add iwch_alloc_fast_reg_mr function.
- add iwch_alloc_fastreg_pbl
- add iwch_free_fastreg_pbl
- adjust the WQ depth for kernel mode work queues to account for
  fastreg possibly taking 2 WR slots.
- add fastreg_mr work request support.
- add local_inv work request support.
- add send_with_inv and send_with_se_inv work request support.
- removed useless duplicate enums/defines for TPT/MW/MR stuff.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA/core: Add memory management extensions support
Steve Wise [Tue, 15 Jul 2008 06:48:45 +0000 (23:48 -0700)]
RDMA/core: Add memory management extensions support

This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIPoIB: Copy small received SKBs in connected mode
Eli Cohen [Tue, 15 Jul 2008 06:48:44 +0000 (23:48 -0700)]
IPoIB: Copy small received SKBs in connected mode

The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
17 years agoRDMA: Remove subversion $Id tags
Roland Dreier [Tue, 15 Jul 2008 06:48:44 +0000 (23:48 -0700)]
RDMA: Remove subversion $Id tags

They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoRDMA: Improve include file coding style
Dotan Barak [Tue, 15 Jul 2008 06:48:44 +0000 (23:48 -0700)]
RDMA: Improve include file coding style

Remove subversion $Id lines and improve readability by fixing other
coding style problems pointed out by checkpatch.pl.

Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: Simplify code using ARRAY_SIZE() macro
Robert P. J. Day [Tue, 15 Jul 2008 06:48:44 +0000 (23:48 -0700)]
IB/ipath: Simplify code using ARRAY_SIZE() macro

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>