]> pilppa.org Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
17 years agoBtrfs: Fix leaf reference cache miss
Yan Zheng [Thu, 9 Oct 2008 15:46:19 +0000 (11:46 -0400)]
Btrfs: Fix leaf reference cache miss

Due to the optimization for truncate, tree leaves only containing
checksum items can be deleted without being COW'ed first. This causes
reference cache misses. The way to fix the miss is create cache
entries for tree leaves only contain checksum.

This patch also fixes a -EEXIST issue in shared reference cache.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
17 years agoBtrfs: Remove offset field from struct btrfs_extent_ref
Yan Zheng [Thu, 9 Oct 2008 15:46:24 +0000 (11:46 -0400)]
Btrfs: Remove offset field from struct btrfs_extent_ref

The offset field in struct btrfs_extent_ref records the position
inside file that file extent is referenced by. In the new back
reference system, tree leaves holding references to file extent
are recorded explicitly. We can scan these tree leaves very quickly, so the
offset field is not required.

This patch also makes the back reference system check the objectid
when extents are in deleting.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
17 years agoBtrfs: Count space allocated to file in bytes
Yan Zheng [Thu, 9 Oct 2008 15:46:29 +0000 (11:46 -0400)]
Btrfs: Count space allocated to file in bytes

This patch makes btrfs count space allocated to file in bytes instead
of 512 byte sectors.

Everything else in btrfs uses a byte count instead of sector sizes or
blocks sizes, so this fits better.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
17 years agosched debug: add name to sched_domain sysctl entries
Ingo Molnar [Thu, 9 Oct 2008 09:35:51 +0000 (11:35 +0200)]
sched debug: add name to sched_domain sysctl entries

add /proc/sys/kernel/sched_domain/cpu0/domain0/name, to make
it easier to see which specific scheduler domain remained at
that entry.

Since we process the scheduler domain tree and
simplify it, it's not always immediately clear during debugging
which domain came from where.

depends on CONFIG_SCHED_DEBUG=y.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoARM: OMAP3: Add support for the Gumstix Overo board (rev 3)
Steve Sakoman [Thu, 9 Oct 2008 14:51:43 +0000 (17:51 +0300)]
ARM: OMAP3: Add support for the Gumstix Overo board (rev 3)

This patch adds minimal overo support.

Signed-off-by: Steve Sakoman <steve@sakoman.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
17 years agoARM: OMAP3: Add Beagle defconfig
Syed Mohammed, Khasim [Thu, 9 Oct 2008 14:51:42 +0000 (17:51 +0300)]
ARM: OMAP3: Add Beagle defconfig

Add Beagle defconfig

Signed-off-by: Syed Mohammed, Khasim <khasim@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
17 years agoARM: OMAP3: Add minimal Beagle board support
Syed Mohammed, Khasim [Thu, 9 Oct 2008 14:51:42 +0000 (17:51 +0300)]
ARM: OMAP3: Add minimal Beagle board support

Add minimal Beagle board support. Based on earlier patches
by Syed Mohammed Khasim with some fixes from linux-omap tree.

Signed-off-by: Syed Mohammed Khasim <khasim@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
17 years agoARM: OMAP3: Add minimal omap3430 support
Syed Mohammed, Khasim [Thu, 9 Oct 2008 14:51:41 +0000 (17:51 +0300)]
ARM: OMAP3: Add minimal omap3430 support

Add minimal omap3430 support based on earlier patches from
Syed Mohammed Khasim. Also merge in omap34xx SRAM support
from Karthik Dasu and use consistent naming for sram init
functions.

Also do following changes that make 34xx support usable:

- Remove unused sram.c functions for 34xx

- Rename IRQ_SIR_IRQ to INTCPS_SIR_IRQ and define it locally
  in entry-macro.S

- Update mach-omap2/io.c to support 2420, 2430, and 34xx

- Also merge in 34xx GPMC changes to add fields wr_access and
  wr_data_mux_bus from Adrian Hunter

- Remove memory initialization call omap2_init_memory() until
  until more generic memory initialization patches are posted.
  It's OK to rely on bootloader initialization until then.

Signed-off-by: Syed Mohammed, Khasim <khasim@ti.com>
Signed-off-by: Karthik Dasu<karthik-dp@ti.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
17 years agoARM: OMAP2: Fix sparse, checkpatch warnings in OMAP2/3 IRQ code
Paul Walmsley [Thu, 9 Oct 2008 14:51:28 +0000 (17:51 +0300)]
ARM: OMAP2: Fix sparse, checkpatch warnings in OMAP2/3 IRQ code

Fix sparse warnings in mach-omap2/irq.c. Fix by defining
intc_bank_write_reg() and intc_bank_read_reg(), and convert INTC module
register access to use them rather than __raw_{read,write}l.

Also clear up some checkpatch warnings involving includes from asm/
rather than linux/.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
17 years ago[ARM] 5239/1: Palm Zire 72 power management support
Sergey Lapin [Fri, 29 Aug 2008 14:53:24 +0000 (15:53 +0100)]
[ARM] 5239/1: Palm Zire 72 power management support

This patch contains Palm Zire 72 power
management support.

Depends on #5238/1

Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years ago[ARM] 5298/1: Drop desc_handle_irq()
Dmitry Baryshkov [Thu, 9 Oct 2008 12:36:24 +0000 (13:36 +0100)]
[ARM] 5298/1: Drop desc_handle_irq()

desc_handle_irq() was declared as obsolete since long ago.
Replace it with generic_handle_irq()

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agohwmon: (abituguru3) Enable DMI probing feature on Abit AT8 32X
Alistair John Strachan [Thu, 9 Oct 2008 13:33:59 +0000 (15:33 +0200)]
hwmon: (abituguru3) Enable DMI probing feature on Abit AT8 32X

Enable driver checking of the DMI product name (when enabled) on
an Abit AT8 32X, instead of falling back to a manual probe. This
eliminates false negatives and eventually will help avoid
unnecessary bus probes on unsupported mainboards.

Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Daniel Exner <dex@dragonslave.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
17 years agohwmon: (abituguru3) Enable reading from AUX3 fan on Abit AT8 32X
Alistair John Strachan [Thu, 9 Oct 2008 13:33:59 +0000 (15:33 +0200)]
hwmon: (abituguru3) Enable reading from AUX3 fan on Abit AT8 32X

The table for the Abit AT8 32X was incorrectly missing an entry
for the sixth ("AUX3") fan. Add this entry, exporting the fan
reading to userspace.

Closes lm-sensors.org ticket #2339.

Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Daniel Exner <dex@dragonslave.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
17 years agohwmon: (adt7473) Fix some bogosity in documentation file
Darrick J. Wong [Thu, 9 Oct 2008 13:33:58 +0000 (15:33 +0200)]
hwmon: (adt7473) Fix some bogosity in documentation file

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
17 years agohwmon: Define sysfs interface for energy consumption register
Darrick J. Wong [Thu, 9 Oct 2008 13:33:58 +0000 (15:33 +0200)]
hwmon: Define sysfs interface for energy consumption register

Describe the sysfs files that were introduced in the ibmaem driver.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
17 years agohwmon: (it87) Prevent power-off on Shuttle SN68PT
Jean Delvare [Thu, 9 Oct 2008 13:33:58 +0000 (15:33 +0200)]
hwmon: (it87) Prevent power-off on Shuttle SN68PT

On the Shuttle SN68PT, FAN_CTL2 is apparently not connected to a fan,
but to something else. One user has reported instant system power-off
when changing the PWM2 duty cycle, so we disable it.

I use the board name string as the trigger in case the same board is
ever used in other systems.

This closes lm-sensors ticket #2349:
pwmconfig causes a hard poweroff
http://www.lm-sensors.org/ticket/2349

Signed-off-by: Jean Delvare <khali@linux-fr.org>
17 years agoeeepc-laptop: Fix hwmon interface
Corentin Chary [Thu, 9 Oct 2008 13:33:57 +0000 (15:33 +0200)]
eeepc-laptop: Fix hwmon interface

Creates a name file in the sysfs directory, that
is needed for the libsensors library to work.
Also rename fan1_pwm to pwm1 and scale its value as needed.

This fixes bug #11520:
http://bugzilla.kernel.org/show_bug.cgi?id=11520

Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
17 years agoUBI: print reserved_peb when it is too large
Deepak Saxena [Wed, 8 Oct 2008 19:56:24 +0000 (12:56 -0700)]
UBI: print reserved_peb when it is too large

This patch makes debugging a missconfigured UBI a bit easier
by providing the needed information in the boot log.

Signed-off-by: Deepak Saxena <dsaxena@laptop.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
17 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux...
Ingo Molnar [Thu, 9 Oct 2008 12:33:00 +0000 (14:33 +0200)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-hrtimer into timers/range-hrtimers

17 years agoxen: use spin_lock_nest_lock when pinning a pagetable
Jeremy Fitzhardinge [Wed, 8 Oct 2008 20:01:39 +0000 (13:01 -0700)]
xen: use spin_lock_nest_lock when pinning a pagetable

When pinning/unpinning a pagetable with split pte locks, we can end up
holding multiple pte locks at once (we need to hold the locks while
there's a pending batched hypercall affecting the pte page).  Because
all the pte locks are in the same lock class, lockdep thinks that
we're potentially taking a lock recursively.

This warning is spurious because we always take the pte locks while
holding mm->page_table_lock.  lockdep now has spin_lock_nest_lock to
express this kind of dominant lock use, so use it here so that lockdep
knows what's going on.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoBlackfin arch: flash memory map and dm9000 resources updating
Javier Herrero [Thu, 9 Oct 2008 10:06:47 +0000 (18:06 +0800)]
Blackfin arch: flash memory map and dm9000 resources updating

Signed-off-by: Javier Herrero <jherrero@hvsistemas.es>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: early prink code still use uart core console functions to parse and...
Sonic Zhang [Thu, 9 Oct 2008 09:39:37 +0000 (17:39 +0800)]
Blackfin arch: early prink code still use uart core console functions to parse and set configure option string

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: Move all the silicon rev handling to one place
Mike Frysinger [Thu, 9 Oct 2008 09:32:28 +0000 (17:32 +0800)]
Blackfin arch: Move all the silicon rev handling to one place

Move all the silicon rev handling to one place (Kconfig) and
make sure we warn if you are running on silicon that has not been tested on

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: fix end address for parallel flash and increase kernel partition size...
Mike Frysinger [Thu, 9 Oct 2008 09:28:36 +0000 (17:28 +0800)]
Blackfin arch: fix end address for parallel flash and increase kernel partition size to 4meg

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years ago[ARM] 5297/1: [KS8695] Fix two compile-time warnings
Andrew Victor [Tue, 7 Oct 2008 19:44:07 +0000 (20:44 +0100)]
[ARM] 5297/1: [KS8695] Fix two compile-time warnings

Fix two warnings when compiling for the KS8695 processor.

arch/arm/include/asm/dma-mapping.h: In function 'dma_to_virt':
arch/arm/include/asm/dma-mapping.h:40: warning: return makes pointer
from integer without a cast

Section mismatch in reference from the function pcibios_fixup_bus() to
the (unknown reference) .devinit.text:(unknown)
The function pcibios_fixup_bus() references
the (unknown reference) __devinit (unknown).
This is often because pcibios_fixup_bus lacks a __devinit
annotation or the annotation of (unknown) is wrong.

Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years ago[ARM] 5296/1: [KS8695] Replace macro's with trailing underscores.
Andrew Victor [Tue, 7 Oct 2008 19:20:15 +0000 (20:20 +0100)]
[ARM] 5296/1: [KS8695] Replace macro's with trailing underscores.

Replace Macro names that have trailing underscores.
Also use the IOPD() macro instead of a hard-coded bit-shift (for
better readability).

Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agoBlackfin arch: avoid using actual config name in comment
Mike Frysinger [Thu, 9 Oct 2008 09:13:39 +0000 (17:13 +0800)]
Blackfin arch: avoid using actual config name in comment

avoid using actual config name in comment as a text search
is done to see what files need to be rebuilt

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: Fix bug - HW Errors never recover on BF548
Robin Getz [Thu, 9 Oct 2008 09:06:32 +0000 (17:06 +0800)]
Blackfin arch: Fix bug - HW Errors never recover on BF548

The kernel does not properly clear the EBIU Error Master (EBIU_ERRMST) Register
on BF548, which causes the kernel to panic.

We need to make sure that we clear the EBIU_ERRMST (necessary on BF54x)

Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoblock_dev: fix kernel-doc in new functions
Randy Dunlap [Thu, 9 Oct 2008 08:42:38 +0000 (10:42 +0200)]
block_dev: fix kernel-doc in new functions

Fix kernel-doc in new functions:

Error(mmotm-2008-1002-1617//fs/block_dev.c:895): duplicate section name 'Description'
Error(mmotm-2008-1002-1617//fs/block_dev.c:924): duplicate section name 'Description'
Warning(mmotm-2008-1002-1617//fs/block_dev.c:1282): No description found for parameter 'pathname'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoBlackfin arch: correct icache size in show_cpuinfo(), let c_start() return proper...
Graf Yang [Thu, 9 Oct 2008 07:37:47 +0000 (15:37 +0800)]
Blackfin arch: correct icache size in show_cpuinfo(), let c_start() return proper pointer

Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: give sys_strace proper entry markings
Mike Frysinger [Thu, 9 Oct 2008 07:32:18 +0000 (15:32 +0800)]
Blackfin arch: give sys_strace proper entry markings

a global _sys_trace will cause the assembler to fail, it should be fixed in toolchain side firstly.

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: ptrace - make sure PT_ORIG_R0 and PT_ORIG_P0 offsets are declared
Mike Frysinger [Thu, 9 Oct 2008 07:22:56 +0000 (15:22 +0800)]
Blackfin arch: ptrace - make sure PT_ORIG_R0 and PT_ORIG_P0 offsets are declared

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: use existing ptrace_disable() func to clear TRACE_BITS and create...
Mike Frysinger [Thu, 9 Oct 2008 07:21:05 +0000 (15:21 +0800)]
Blackfin arch: use existing ptrace_disable() func to clear TRACE_BITS and create the opposite ptrace_enable()

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: ptrace - cleanup debug messages and style
Mike Frysinger [Thu, 9 Oct 2008 07:19:50 +0000 (15:19 +0800)]
Blackfin arch: ptrace - cleanup debug messages and style

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoBlackfin arch: fix bug -- PTRACE_PEEKDATA does not seem to work which breaks umoven...
Mike Frysinger [Thu, 9 Oct 2008 07:17:36 +0000 (15:17 +0800)]
Blackfin arch: fix bug -- PTRACE_PEEKDATA does not seem to work which breaks umoven() in strace

Don't add arbitrary offset when peeking at data

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
17 years agoblock: add some comments around the bio read-write flags
Jens Axboe [Thu, 9 Oct 2008 07:01:10 +0000 (09:01 +0200)]
block: add some comments around the bio read-write flags

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: mark bio_split_pool static
Denis ChengRq [Thu, 9 Oct 2008 06:57:05 +0000 (08:57 +0200)]
block: mark bio_split_pool static

Since all bio_split calls refer the same single bio_split_pool, the bio_split
function can use bio_split_pool directly instead of the mempool_t parameter;

then the mempool_t parameter can be removed from bio_split param list, and
bio_split_pool is only referred in fs/bio.c file, can be marked static.

Signed-off-by: Denis ChengRq <crquan@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Find bio sector offset given idx and offset
Martin K. Petersen [Thu, 2 Oct 2008 02:42:53 +0000 (22:42 -0400)]
block: Find bio sector offset given idx and offset

Helper function to find the sector offset in a bio given bvec index
and page offset.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: gendisk integrity wrapper
Martin K. Petersen [Thu, 2 Oct 2008 16:47:49 +0000 (18:47 +0200)]
block: gendisk integrity wrapper

This is a wrapper for accessing a gendisk's integrity bits.  It allows
the integrity support in MD to be compiled with BLK_DEV_INTEGRITY off.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Switch blk_integrity_compare from bdev to gendisk
Martin K. Petersen [Wed, 1 Oct 2008 07:38:39 +0000 (03:38 -0400)]
block: Switch blk_integrity_compare from bdev to gendisk

The DM and MD integrity support now depends on being able to use
gendisks instead of block_devices when comparing integrity profiles.
Change function parameters accordingly.

Also update comparison logic so that two NULL profiles are a valid
configuration.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Fix double put in blk_integrity_unregister
Martin K. Petersen [Wed, 1 Oct 2008 07:38:38 +0000 (03:38 -0400)]
block: Fix double put in blk_integrity_unregister

- kobject_del already puts the parent.

 - Set integrity profile to NULL to prevent stale data.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Introduce integrity data ownership flag
Martin K. Petersen [Wed, 1 Oct 2008 07:38:37 +0000 (03:38 -0400)]
block: Introduce integrity data ownership flag

A filesystem might supply its own integrity metadata.  Introduce a
flag that indicates whether the filesystem or the block layer owns the
integrity buffer.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1
Jens Axboe [Thu, 2 Oct 2008 10:53:22 +0000 (12:53 +0200)]
block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1

We need bdev_get_integrity() to support the pending md/dm patches.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agobio.h: Remove unused conditional code
Alberto Bertogli [Thu, 2 Oct 2008 10:46:53 +0000 (12:46 +0200)]
bio.h: Remove unused conditional code

The whole bio_integrity() definition is inside an #ifdef
CONFIG_BLK_DEV_INTEGRITY, there's no need for the conditional code.

Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: remove end_{queued|dequeued}_request()
Kiyoshi Ueda [Wed, 1 Oct 2008 14:14:46 +0000 (10:14 -0400)]
block: remove end_{queued|dequeued}_request()

This patch removes end_queued_request() and end_dequeued_request(),
which are no longer used.

As a results, users of __end_request() became only end_request().
So the actual code in __end_request() is moved to end_request()
and __end_request() is removed.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: change elevator to use __blk_end_request()
Kiyoshi Ueda [Wed, 1 Oct 2008 14:13:44 +0000 (10:13 -0400)]
block: change elevator to use __blk_end_request()

This patch converts elevator to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' arguments is converted to 'error'.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agogdrom: change to use __blk_end_request()
Kiyoshi Ueda [Wed, 1 Oct 2008 14:13:02 +0000 (10:13 -0400)]
gdrom: change to use __blk_end_request()

This patch converts gdrom to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.

gd.transfer is '1' in error cases and '0' in non-error cases,
so gdrom hasn't been propagating any error code to the block layer.
We can just convert error cases to '-EIO'.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agomemstick: change to use __blk_end_request()
Kiyoshi Ueda [Wed, 1 Oct 2008 14:12:15 +0000 (10:12 -0400)]
memstick: change to use __blk_end_request()

This patch converts memstick to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agovirtio_blk: change to use __blk_end_request()
Kiyoshi Ueda [Wed, 1 Oct 2008 14:11:20 +0000 (10:11 -0400)]
virtio_blk: change to use __blk_end_request()

This patch converts virtio_blk to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' argument is converted to 'error'.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure
Jens Axboe [Wed, 1 Oct 2008 14:16:25 +0000 (16:16 +0200)]
blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure

Define as 32, which is is what BDEVNAME_SIZE is/was as well. This keeps
the user interface the same and gets rid of the difference between
kernel and user api here.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add lld busy state exporting interface
Kiyoshi Ueda [Wed, 1 Oct 2008 14:12:15 +0000 (16:12 +0200)]
block: add lld busy state exporting interface

This patch adds an new interface, blk_lld_busy(), to check lld's
busy state from the block layer.
blk_lld_busy() calls down into low-level drivers for the checking
if the drivers set q->lld_busy_fn() using blk_queue_lld_busy().

This resolves a performance problem on request stacking devices below.

Some drivers like scsi mid layer stop dispatching request when
they detect busy state on its low-level device like host/target/device.
It allows other requests to stay in the I/O scheduler's queue
for a chance of merging.

Request stacking drivers like request-based dm should follow
the same logic.
However, there is no generic interface for the stacked device
to check if the underlying device(s) are busy.
If the request stacking driver dispatches and submits requests to
the busy underlying device, the requests will stay in
the underlying device's queue without a chance of merging.
This causes performance problem on burst I/O load.

With this patch, busy state of the underlying device is exported
via q->lld_busy_fn().  So the request stacking driver can check it
and stop dispatching requests if busy.

The underlying device driver must return the busy state appropriately:
    1: when the device driver can't process requests immediately.
    0: when the device driver can process requests immediately,
       including abnormal situations where the device driver needs
       to kill all requests.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Fix blk_start_queueing() to not kick a stopped queue
Elias Oltmanns [Wed, 1 Oct 2008 14:02:33 +0000 (16:02 +0200)]
block: Fix blk_start_queueing() to not kick a stopped queue

blk_start_queueing() should act like the generic queue unplugging
and kicking and ignore a stopped queue. Such a queue may not be
run until after a call to blk_start_queue().

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoinclude blktrace_api.h in headers_install
Sven Schuetz [Fri, 26 Sep 2008 08:58:02 +0000 (10:58 +0200)]
include blktrace_api.h in headers_install

This header file is of interest for user space programming, i.e.
for tools that process blktrace data.

We would like to use it for a tool on-top of blktrace which processes
data provided by blktrace. For this purpose, it would be helpful
if the blktrace API would make it to /usr/include/linux.

The git tree for the blktrace tools comes with its own copy of this header
file. I didn't manage to replace that copy with the file generated
by the patch below yet. A few more cleanups would be needed.
For example, the blktrace ioctl numbers, which are currently defined in
usr/include/fs.h, might need to be moved. Should be feasible, though.

Signed-off-by: Sven Schuetz <sven@linux.vnet.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: reserve some tags just for sync IO
Jens Axboe [Thu, 25 Sep 2008 09:42:41 +0000 (11:42 +0200)]
block: reserve some tags just for sync IO

By only allowing async IO to consume 3/4 ths of the tag depth, we
always have slots free to serve sync IO. This is important to avoid
having writes fill the entire tag queue, thus starving reads.

Original patch and idea from Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: as/cfq ssd idle check update
Jens Axboe [Thu, 25 Sep 2008 09:37:50 +0000 (11:37 +0200)]
block: as/cfq ssd idle check update

We really need to know about the hardware tagging support as well,
since if the SSD does not do tagging then we still want to idle.
Otherwise have the same dependent sync IO vs flooding async IO
problem as on rotational media.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agolibata: set queue SSD flag for SSD devices
Jens Axboe [Wed, 24 Sep 2008 11:05:10 +0000 (13:05 +0200)]
libata: set queue SSD flag for SSD devices

SSD devices should give an RPM setting of 1 in word 217 of the ID
page. If we see such a device, tell the block layer about it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add queue flag for SSD/non-rotational devices
Jens Axboe [Wed, 24 Sep 2008 11:03:33 +0000 (13:03 +0200)]
block: add queue flag for SSD/non-rotational devices

We don't want to idle in AS/CFQ if the device doesn't have a seek
penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational
device, low level drivers should set this flag upon discovery of
an SSD or similar device type.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agofloppy: support arbitrary first-sector numbers
Keith Wansbrough [Mon, 22 Sep 2008 21:57:17 +0000 (14:57 -0700)]
floppy: support arbitrary first-sector numbers

The current floppy_struct allows floppies to number sectors starting
from 0 or 1.  This patch allows arbitrary first-sector numbers - for
example, 0xC1 for Amstrad CPC disks.

This extends the existing 1-bit field (FD_ZEROBASED, bit 2 of stretch)
to 8 bits (FD_SECTMASK, bits 2 to 9).

Currently 0x00 denotes a first sector number of 1, and 0x01 denotes a
first sector number of 0.  We extend this by interpreting FD_SECTMASK
as the first sector number with the LSB flipped.

Signed-off-by: Keith Wansbrough <keith@lochan.org>
Cc: Alain Knaff <alain@linux.lu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agodrivers/block: Use DIV_ROUND_UP
Julia Lawall [Mon, 22 Sep 2008 21:57:16 +0000 (14:57 -0700)]
drivers/block: Use DIV_ROUND_UP

The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) /
(d)) but is perhaps more readable.

An extract of the semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@haskernel@
@@

#include <linux/kernel.h>

@depends on haskernel@
expression n,d;
@@

(
- (n + d - 1) / d
+ DIV_ROUND_UP(n,d)
|
- (n + (d - 1)) / d
+ DIV_ROUND_UP(n,d)
)

@depends on haskernel@
expression n,d;
@@

- DIV_ROUND_UP((n),d)
+ DIV_ROUND_UP(n,d)

@depends on haskernel@
expression n,d;
@@

- DIV_ROUND_UP(n,(d))
+ DIV_ROUND_UP(n,d)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agocciss: Fix cciss SCSI rescan code to better notice device changes
scameron@beardog.cca.cpqcorp.net [Sat, 20 Sep 2008 01:27:47 +0000 (18:27 -0700)]
cciss: Fix cciss SCSI rescan code to better notice device changes

Fix cciss SCSI rescan code to better notice device changes.
If you hot-unplug a tape drive, then hot-plug a different
tape drive into the same slot in a storage enclosure,
the cciss driver wouldn't notice anything had changed, as
it was only looking at the LUN address and device type.
Now it looks at the inquiry page 0x83 device identifier,
and vendor and model strings as well.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agofix an example of scatterlists handling in DMA-API.txt
FUJITA Tomonori [Thu, 18 Sep 2008 16:35:28 +0000 (09:35 -0700)]
fix an example of scatterlists handling in DMA-API.txt

This example isn't the proper way to handle scatterlists (can't handle
sg chaining).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add a queue flag for request stacking support
Kiyoshi Ueda [Thu, 18 Sep 2008 14:46:13 +0000 (10:46 -0400)]
block: add a queue flag for request stacking support

This patch adds a queue flag to indicate the block device can be
used for request stacking.

Request stacking drivers need to stack their devices on top of
only devices of which q->request_fn is functional.
Since bio stacking drivers (e.g. md, loop) basically initialize
their queue using blk_alloc_queue() and don't set q->request_fn,
the check of (q->request_fn == NULL) looks enough for that purpose.

However, dm will become both types of stacking driver (bio-based and
request-based).  And dm will always set q->request_fn even if the dm
device is bio-based of which q->request_fn is not functional actually.
So we need something else to distinguish the type of the device.
Adding a queue flag is a solution for that.

The reason why dm always sets q->request_fn is to keep
the compatibility of dm user-space tools.
Currently, all dm user-space tools are using bio-based dm without
specifying the type of the dm device they use.
To use request-based dm without changing such tools, the kernel
must decide the type of the dm device automatically.
The automatic type decision can't be done at the device creation time
and needs to be deferred until such tools load a mapping table,
since the actual type is decided by dm target type included in
the mapping table.

So a dm device has to be initialized using blk_init_queue()
so that we can load either type of table.
Then, all queue stuffs are set (e.g. q->request_fn) and we have
no element to distinguish that it is bio-based or request-based,
even after a table is loaded and the type of the device is decided.

By the way, some stuffs of the queue (e.g. request_list, elevator)
are needless when the dm device is used as bio-based.
But the memory size is not so large (about 20[KB] per queue on ia64),
so I hope the memory loss can be acceptable for bio-based dm users.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add request submission interface
Kiyoshi Ueda [Thu, 18 Sep 2008 14:45:38 +0000 (10:45 -0400)]
block: add request submission interface

This patch adds blk_insert_cloned_request(), a generic request
submission interface for request stacking drivers.
Request-based dm will use it to submit their clones to underlying
devices.

blk_rq_check_limits() is also added because it is possible that
the lower queue has stronger limitations than the upper queue
if multiple drivers are stacking at request-level.
Not only for blk_insert_cloned_request()'s internal use, the function
will be used by request-based dm when the queue limitation is
modified (e.g. by replacing dm's table).

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add request update interface
Kiyoshi Ueda [Thu, 18 Sep 2008 14:45:09 +0000 (10:45 -0400)]
block: add request update interface

This patch adds blk_update_request(), which updates struct request
with completing its data part, but doesn't complete the struct
request itself.
Though it looks like end_that_request_first() of older kernels,
blk_update_request() should be used only by request stacking drivers.

Request-based dm will use it in bio->bi_end_io callback to update
the original request when a data part of a cloned request completes.
Followings are additional background information of why request-based
dm needs this interface.

  - Request stacking drivers can't use blk_end_request() directly from
    the lower driver's completion context (bio->bi_end_io or rq->end_io),
    because some device drivers (e.g. ide) may try to complete
    their request with queue lock held, and it may cause deadlock.
    See below for detailed description of possible deadlock:
    <http://marc.info/?l=linux-kernel&m=120311479108569&w=2>

  - To solve that, request-based dm offloads the completion of
    cloned struct request to softirq context (i.e. using
    blk_complete_request() from rq->end_io).

  - Though it is possible to use the same solution from bio->bi_end_io,
    it will delay the notification of bio completion to the original
    submitter.  Also, it will cause inefficient partial completion,
    because the lower driver can't perform the cloned request anymore
    and request-based dm needs to requeue and redispatch it to
    the lower driver again later.  That's not good.

  - So request-based dm needs blk_update_request() to perform the bio
    completion in the lower driver's completion context, which is more
    efficient.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: blk_cleanup_queue() should call blk_sync_queue()
Jens Axboe [Thu, 18 Sep 2008 16:22:54 +0000 (09:22 -0700)]
block: blk_cleanup_queue() should call blk_sync_queue()

When a driver calls blk_cleanup_queue(), the device should be fully idle.
However, the block layer may have pending plugging timers and the IO
schedulers may have pending work in the work queues. So quisce the device
by waiting for the timer and flushing the work queues.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Expand Xen blkfront for > 16 xvd
Chris Lalancette [Wed, 17 Sep 2008 21:30:32 +0000 (14:30 -0700)]
block: Expand Xen blkfront for > 16 xvd

Until recently, the maximum number of xvd block devices you could attach
to a Xen domU was 16. This limitation turned out to be problematic for
some users, so it was expanded to handle a much larger number of disks.
However, this requires a couple of changes in the way that blkfront
scans for disks. This functionality is already present in the Xen
linux-2.6.18-xen.hg tree; the attached patch adds this functionality to
the mainline xen-blkfront implementation. I successfully tested it on a
2.6.25 tree, and build tested it on 2.6.27-rc3.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: cleanup some of the integrity stuff in blkdev.h
Jens Axboe [Thu, 18 Sep 2008 16:31:53 +0000 (09:31 -0700)]
block: cleanup some of the integrity stuff in blkdev.h

Don't put functions that are only used in fs/bio-integrity.c in
blkdev.h, it's much cleaner to just keep it in there. Also kill
completely unused bdev_get_tag_size()

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: use rq complete marking in blk_abort_request()
Jens Axboe [Tue, 16 Sep 2008 16:54:11 +0000 (09:54 -0700)]
block: use rq complete marking in blk_abort_request()

We cannot abort a request if we raced with the timeout handler already,
or with the IO completion. So make blk_abort_request() mark the request
as complete, and only continue if we succeeded.

Found and suggested by Mike Anderson <andmike@linux.vnet.ibm.com>

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add fault injection mechanism for faking request timeouts
Jens Axboe [Sun, 14 Sep 2008 12:56:33 +0000 (05:56 -0700)]
block: add fault injection mechanism for faking request timeouts

Only works for the generic request timer handling. Allows one to
sporadically ignore request completions, thus exercising the timeout
handling.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add bio_kmalloc()
Jens Axboe [Thu, 11 Sep 2008 11:17:37 +0000 (13:17 +0200)]
block: add bio_kmalloc()

Not all callers need (or want!) the mempool backing guarentee, it
essentially means that you can only use bio_alloc() for short allocations
and not for preallocating some bio's at setup or init time.

So add bio_kmalloc() which does the same thing as bio_alloc(), except
it just uses kmalloc() as the backing instead of the bio mempools.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: adjust blkdev_issue_discard for swap
Hugh Dickins [Thu, 11 Sep 2008 08:57:55 +0000 (10:57 +0200)]
block: adjust blkdev_issue_discard for swap

Two mods to blkdev_issue_discard(), thinking ahead to its use on swap:

1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL
   might deadlock but GFP_NOIO is safe.

2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is
   enough to cover a whole swap area, but sector_t suits any partition.

Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen
for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard().

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: remove unnecessary blk_rq_unmap_user
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:08 +0000 (22:50 +0900)]
sg: remove unnecessary blk_rq_unmap_user

blk_rq_unmap_user in sg_finish_rem_req can take care of all the cases.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: remove sg_read_xfer
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:07 +0000 (22:50 +0900)]
sg: remove sg_read_xfer

sg_read_xfer was used to copy data to user space for READ
commands. blk_rq_unmap_user does the job so sg_read_xfer does nothing
useful.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: remove sg_write_xfer
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:06 +0000 (22:50 +0900)]
sg: remove sg_write_xfer

sg_write_xfer was used to copy data from user space for WRITE
commands. blk_rq_map_user_iov and blk_rq_map_user do the job so
sg_write_xfer does nothing useful.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: incorporate sg_build_direct into sg_start_req
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:05 +0000 (22:50 +0900)]
sg: incorporate sg_build_direct into sg_start_req

Calling blk_rq_map_user() at a single place is better than at
different two places. It makes the code more understandable.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: remove __sg_start_req
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:04 +0000 (22:50 +0900)]
sg: remove __sg_start_req

__sg_start_req() was used temporarily to call blk_get_request() during
converting sg to use the block layer.

Now sg always calls blk_get_request() so we can move blk_get_request()
to sg_start_req(). We don't need __sg_start_req anymore.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: remove b_malloc_len in sg_scatter_hold struct
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:03 +0000 (22:50 +0900)]
sg: remove b_malloc_len in sg_scatter_hold struct

It's not used for anything useful after the block layer conversion.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: remove SG_ALLOW_DIO_CODE define
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:02 +0000 (22:50 +0900)]
sg: remove SG_ALLOW_DIO_CODE define

sg had lots of the own functions for the direct IO but now sg uses the
block layer functions for it. There are only five lines for the direct
IO. SG_ALLOW_DIO_CODE define was used to compile out the direct IO
code but we don't need the define. If someone wants to remove the
direct IO code, he can do easily without the define.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: rename sg_cmd_done sg_rq_end_io
FUJITA Tomonori [Tue, 2 Sep 2008 13:50:01 +0000 (22:50 +0900)]
sg: rename sg_cmd_done sg_rq_end_io

old sg_rq_end_io() was used to wrap sg_cmd_done during converting sg
to use the block layer (in order to cover the difference
scsi_execute_async and blk_execute_rq_nowait). Now we don't need it so
let's remove it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agodm: Call blk_abort_queue on failed paths
Mike Anderson [Fri, 29 Aug 2008 07:36:09 +0000 (09:36 +0200)]
dm: Call blk_abort_queue on failed paths

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: Add interface to abort queued requests
Mike Anderson [Sat, 13 Sep 2008 18:31:27 +0000 (20:31 +0200)]
block: Add interface to abort queued requests

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: unify request timeout handling
Jens Axboe [Sun, 14 Sep 2008 12:55:09 +0000 (05:55 -0700)]
block: unify request timeout handling

Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoCall flush_disk() after detecting an online resize.
Andrew Patterson [Thu, 4 Sep 2008 20:27:45 +0000 (14:27 -0600)]
Call flush_disk() after detecting an online resize.

We call flush_disk() to make sure the buffer cache for the disk is
flushed after a disk resize. There are two resize cases, growing and
shrinking. Given that users can shrink/then grow a disk before
revalidate_disk() is called, we treat the grow case identically to
shrinking. We need to flush the buffer cache after an online shrink
because, as James Bottomley puts it,

     The two use cases for shrinking I can see are

     1. planned: the fs is already shrunk to within the new boundaries
        and all data is relocated, so invalidate is fine (any dirty
        buffers that might exist in the shrunk region are there only
        because they were relocated but not yet written to their
        original location).
     2. unplanned:  In this case, the fs is probably toast, so whether
        we invalidate or not isn't going to make a whole lot of
        difference; it's still going to try to read or write from
        sectors beyond the new size and get I/O errors.

Immediately invalidating shrunk disks will cause errors for outstanding
I/Os for reads/write beyond the new end of the disk to be generated
earlier then if we waited for the normal buffer cache operation. It also
removes a potential security hole where we might keep old data around
from beyond the end of the shrunk disk if the disk was not invalidated.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoAdded flush_disk to factor out common buffer cache flushing code.
Andrew Patterson [Thu, 4 Sep 2008 20:27:40 +0000 (14:27 -0600)]
Added flush_disk to factor out common buffer cache flushing code.

We need to be able to flush the buffer cache for for more than
just when a disk is changed, so we factor out common cache flush code
in check_disk_change() to an internal flush_disk() routine.  This
routine will then be used for both disk changes and disk resizes (in a
later patch).

Include the disk name in the text indicating that there are busy
inodes on the device and increase the KERN severity of the message.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoSCSI sd driver calls revalidate_disk wrapper.
Andrew Patterson [Thu, 4 Sep 2008 20:27:35 +0000 (14:27 -0600)]
SCSI sd driver calls revalidate_disk wrapper.

Modify the SCSI disk driver to call the revalidate_disk()
wrapper. This allows us to do some housekeeping such as accounting for
a disk being resized online. The wrapper will call
sd_revalidate_disk() at the appropriate time.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoCheck for device resize when rescanning partitions
Andrew Patterson [Thu, 4 Sep 2008 20:27:30 +0000 (14:27 -0600)]
Check for device resize when rescanning partitions

Check for device resize in the rescan_partitions() routine. If the device
has been resized, the bdev size is set to match. The rescan_partitions()
routine is called when opening the device and when calling the
BLKRRPART ioctl.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoAdjust block device size after an online resize of a disk.
Andrew Patterson [Thu, 4 Sep 2008 20:27:25 +0000 (14:27 -0600)]
Adjust block device size after an online resize of a disk.

The revalidate_disk routine now checks if a disk has been resized by
comparing the gendisk capacity to the bdev inode size.  If they are
different (usually because the disk has been resized underneath the kernel)
the bdev inode size is adjusted to match the capacity.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoWrapper for lower-level revalidate_disk routines.
Andrew Patterson [Thu, 4 Sep 2008 20:27:20 +0000 (14:27 -0600)]
Wrapper for lower-level revalidate_disk routines.

This is a wrapper for the lower-level revalidate_disk call-backs such
as sd_revalidate_disk(). It allows us to perform pre and post
operations when calling them.

We will use this wrapper in a later patch to adjust block device sizes
after an online resize (a _post_ operation).

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: fix duplicate headers for /proc/partitions
Tejun Heo [Thu, 4 Sep 2008 07:17:31 +0000 (09:17 +0200)]
block: fix duplicate headers for /proc/partitions

seqf can be started multiple times for a read and the header should be
printed only for the initial one.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: set dxferp to NULL for READ with the older SG interface
FUJITA Tomonori [Tue, 2 Sep 2008 07:20:20 +0000 (16:20 +0900)]
sg: set dxferp to NULL for READ with the older SG interface

With the older SG interface, we don't know a user-space address to
trasfer data when executing a SCSI command. So we can't pass a
user-space address to blk_rq_map_user.

This patch fixes sg to pass a NULL user-space address to
blk_rq_map_user so that it just sets up a request and bios with page
frames propely without data transfer.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: make blk_rq_map_user take a NULL user-space buffer
FUJITA Tomonori [Tue, 2 Sep 2008 07:20:19 +0000 (16:20 +0900)]
block: make blk_rq_map_user take a NULL user-space buffer

This patch changes blk_rq_map_user to accept a NULL user-space buffer
with a READ command if rq_map_data is not NULL. Thus a caller can pass
page frames to lk_rq_map_user to just set up a request and bios with
page frames propely. bio_uncopy_user (called via blk_rq_unmap_user)
doesn't copy data to user space with such request.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: update comment on end_request()
Jens Axboe [Tue, 2 Sep 2008 07:25:21 +0000 (09:25 +0200)]
block: update comment on end_request()

It refers to functions that no longer exist after the IO completion
changes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoinit: DEBUG_BLOCK_EXT_DEVT requires explicit root= param
Tejun Heo [Mon, 1 Sep 2008 11:44:35 +0000 (13:44 +0200)]
init: DEBUG_BLOCK_EXT_DEVT requires explicit root= param

DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless.  Root devices should
be explicitly specified using textual names.  Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled.  Also, add warning
to the help text.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: don't test for partition size in bdget_disk() and blk_lookup_devt()
Tejun Heo [Fri, 29 Aug 2008 09:41:51 +0000 (11:41 +0200)]
block: don't test for partition size in bdget_disk() and blk_lookup_devt()

bdget_disk() and blk_lookup_devt() never cared whether the specified
partition (or disk) is zero sized or not.  I got confused while
converting those not to depend on consecutive minor numbers in commit
5a6411b1178baf534aa9138052864dfa89d3eada and later when dev0 was added
it broke callers which expected to get valid return for zero sized
disk devices.

So, they never needed nr_sects checks in the first place.  Kill them.

This problem was spotted and debugged by Bartlmoiej Zolnierkiewicz.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoChange default value of CONFIG_DEBUG_BLOCK_EXT_DEVT to 'n'
Jens Axboe [Fri, 29 Aug 2008 07:06:29 +0000 (09:06 +0200)]
Change default value of CONFIG_DEBUG_BLOCK_EXT_DEVT to 'n'

It's a debug option that you would explicitly enable to test this
feature, we should default it to 'n' to prevent accidental surprises
for now.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: kmalloc args reversed, small function definition fixes
Harvey Harrison [Thu, 28 Aug 2008 07:27:42 +0000 (09:27 +0200)]
block: kmalloc args reversed, small function definition fixes

Noticed by sparse:
block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static?
block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition
block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types)
block/genhd.c:659:17:    expected unsigned int [unsigned] [usertype] size
block/genhd.c:659:17:    got restricted gfp_t
block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types)
block/genhd.c:659:29:    expected restricted gfp_t [usertype] flags
block/genhd.c:659:29:    got unsigned int
block: kmalloc args reversed

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: use blk_rq_aligned helper function
FUJITA Tomonori [Thu, 28 Aug 2008 06:05:59 +0000 (15:05 +0900)]
sg: use blk_rq_aligned helper function

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agoblock: add blk_rq_aligned helper function
FUJITA Tomonori [Thu, 28 Aug 2008 06:05:58 +0000 (15:05 +0900)]
block: add blk_rq_aligned helper function

This adds blk_rq_aligned helper function to see if alignment and
padding requirement is satisfied for DMA transfer. This also converts
blk_rq_map_kern and __blk_rq_map_user to use the helper function.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agobio: convert bio_copy_kern to use bio_copy_user
FUJITA Tomonori [Thu, 28 Aug 2008 06:05:57 +0000 (15:05 +0900)]
bio: convert bio_copy_kern to use bio_copy_user

bio_copy_kern and bio_copy_user are very similar. This converts
bio_copy_kern to use bio_copy_user.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
17 years agosg: convert the indirect IO path to use the block layer
FUJITA Tomonori [Fri, 29 Aug 2008 10:32:18 +0000 (12:32 +0200)]
sg: convert the indirect IO path to use the block layer

This patch converts the indirect IO path (including mmap IO and old
struct sg_header) to use the block layer functions (blk_get_request,
blk_execute_rq_nowait, blk_rq_map_user, etc) instead of
scsi_execute_async().

[Jens: fixed compile error with SCSI logging enabled]

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>