Paul Mundt [Tue, 3 Jun 2008 09:54:02 +0000 (18:54 +0900)]
sh: Add support for 16kB PAGE_SIZE.
16kB is a useful size on nommu, while 64kB still tends to be too big to
be useful. Newer MMUs are likely to support this as well, so plug it
in in anticipation of those, too.
Paul Mundt [Mon, 19 May 2008 05:00:44 +0000 (14:00 +0900)]
sh: Make dump_task dependent on ELF core.
Currently this is only linked in for CONFIG_BINFMT_ELF, make it dependent
on CONFIG_ELF_CORE, so it's both selectable there and also linked in for
CONFIG_BINFMT_ELF_FDPIC.
Paul Mundt [Mon, 19 May 2008 04:34:45 +0000 (13:34 +0900)]
binfmt_elf_fdpic: Magical stack pointer index, for NEW_AUX_ENT compat.
While implementing binfmt_elf_fdpic on SH it quickly became apparent
that SH was the first platform to support both binfmt_elf_fdpic and
binfmt_elf, as well as the only of the FDPIC platforms to make use of the
auxvt.
Currently binfmt_elf_fdpic uses a special version of NEW_AUX_ENT() where
the first argument is the entry displacement after csp has been adjusted,
being reset after each adjustment. As we have no ability to sort this out
through the platform's ARCH_DLINFO, this index needs to be managed
entirely in create_elf_fdpic_tables(). Presently none of the platforms
that set their own auxvt entries are able to do so through their
respective ARCH_DLINFOs when using binfmt_elf_fdpic.
In addition to this, binfmt_elf_fdpic has been looking at
DLINFO_ARCH_ITEMS for the number of architecture-specific entries in the
auxvt. This is legacy cruft, and is not defined by any platforms in-tree,
even those that make heavy use of the auxvt. AT_VECTOR_SIZE_ARCH is
always available, and contains the number that is of interest here, so we
switch to using that unconditionally as well.
As this has direct bearing on how much stack is used, platforms that have
configurable (or dynamically adjustable) NEW_AUX_ENT calls need to either
make AT_VECTOR_SIZE_ARCH more fine-grained, or leave it as a worst-case
and live with some lost stack space if those entries aren't pushed (some
platforms may also need to purposely sacrifice some space here for
alignment considerations, as noted in the code -- although not an issue
for any FDPIC-capable platform today).
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: David Howells <dhowells@redhat.com>
[XFS] Now that xfs_setattr is only used for attributes set from ->setattr
it can be switched to take struct iattr directly and thus simplify the
implementation greatly. Also rename the ATTR_ flags to XFS_ATTR_ to not
conflict with the ATTR_ flags used by the VFS.
SGI-PV: 984565
SGI-Modid: xfs-linux-melb:xfs-kern:31678a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
[XFS] xfs_setattr currently doesn't just handle the attributes set through
->setattr but also addition XFS-specific attributes: project id, inode
flags and extent size hint. Having these in a single function makes it
more complicated and forces to have us a bhv_vattr intermediate structure
eating up stackspace.
This patch adds a new xfs_ioctl_setattr helper for the XFS ioctls that set
these attributes and remove the code to set them through xfs_setattr.
SGI-PV: 984564
SGI-Modid: xfs-linux-melb:xfs-kern:31677a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Tim Shimmin [Fri, 18 Jul 2008 07:13:04 +0000 (17:13 +1000)]
[XFS] A bug was found in xfs_bmap_add_extent_unwritten_real(). In a
particular case, the delta param which is supposed to describe the region
where extents have changed was not updated appropriately.
SGI-PV: 984030
SGI-Modid: xfs-linux-melb:xfs-kern:31663a
Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Olaf Weber <olaf@sgi.com>
Remount currently happily accept any option thrown at it, although the
only filesystem specific option it actually handles is barrier/nobarrier.
And it actually doesn't handle these correctly either because it only uses
the value it parsed when we're doing a ro->rw transition. In addition to
that there's also a bad bug in xfs_parseargs which doesn't touch the
actual option in the mount point except for a single one,
XFS_MOUNT_SMALL_INUMS and thus forced any filesystem that's every
remounted in some way to not support 64bit inodes with no way to recover
unless unmounted.
This patch changes xfs_fs_remount to use it's own linux/parser.h based
options parse instead of xfs_parseargs and reject all options except for
barrier/nobarrier and to the right thing in general. Eventually I'd like
to have a single big option table used for mount aswell but that can wait
for a while.
SGI-PV: 983964
SGI-Modid: xfs-linux-melb:xfs-kern:31382a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Eric Sandeen [Fri, 18 Jul 2008 07:12:18 +0000 (17:12 +1000)]
[XFS] Disable queue flag test in barrier check.
md raid1 can pass down barriers, but does not set an ordered flag on the
queue, so xfs does not even attempt a barrier write, and will never use
barriers on these block devices.
Remove the flag check and just let the barrier write test determine
barrier support.
A possible risk here is that if something does not set an ordered flag and
also does not properly return an error on a barrier write... but if it's
any consolation jbd/ext3/reiserfs never test the flag, and don't even do a
test write, they just disable barriers the first time an actual journal
barrier write fails.
SGI-PV: 983924
SGI-Modid: xfs-linux-melb:xfs-kern:31377a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Currently the xfs module init/exit code is a mess. It's farmed out over a
lot of function with very little error checking. This patch makes sure we
propagate all initialization failures properly and clean up after them.
Various runtime initializations are replaced with compile-time
initializations where possible to make this easier. The exit path is
similarly consolidated.
There's now split out function to create/destroy the kmem zones and
alloc/free the trace buffers. I've also changed the ktrace allocations to
KM_MAYFAIL and handled errors resulting from that.
And yes, we really should replace the XFS_*_TRACE ifdefs with a single
XFS_TRACE..
Tim Shimmin [Fri, 27 Jun 2008 03:34:42 +0000 (13:34 +1000)]
[XFS] Fix up problem when CONFIG_XFS_POSIX_ACL is not set and yet we still
can use the _ACL_TYPE_* definitions in linux-2.6/xfs_xattr.c. The
forthcoming generic acl code will also fix this problem.
SGI-PV: 982343
SGI-Modid: xfs-linux-melb:xfs-kern:31369a
Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Fri, 27 Jun 2008 03:34:34 +0000 (13:34 +1000)]
[XFS] Don't assert if trying to mount with blocksize > pagesize
If we don't do the blocksize/PAGESIZE check before calling
xfs_sb_validate_fsb_count() we can assert if we try to mount with a
blocksize > pagesize. The assert is valid so leave it and just move the
blocksize/pagesize check earlier.
SGI-PV: 983734
SGI-Modid: xfs-linux-melb:xfs-kern:31365a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
Lachlan McIlroy [Fri, 27 Jun 2008 03:33:11 +0000 (13:33 +1000)]
[XFS] Allow xfs_bmbt_split() to fallback to the lowspace allocator
algorithm
If xfs_bmbt_split() cannot find an AG with sufficient free space to
satisfy a full extent btree split then fall back to the lowspace allocator
algorithm.
SGI-PV: 983338
SGI-Modid: xfs-linux-melb:xfs-kern:31359a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
Lachlan McIlroy [Fri, 27 Jun 2008 03:33:03 +0000 (13:33 +1000)]
[XFS] Restore the lowspace extent allocator algorithm
When free space is running low the extent allocator may choose to allocate
an extent from an AG without leaving sufficient space for a btree split
when inserting the new extent (see where xfs_bmap_btalloc() sets minleft
to 0). In this case the allocator will enable the lowspace algorithm which
is supposed to allow further allocations (such as btree splits and
newroots) to allocate from sequential AGs. This algorithm has been broken
for a long time and this patch restores its behaviour.
SGI-PV: 983338
SGI-Modid: xfs-linux-melb:xfs-kern:31358a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
Lachlan McIlroy [Fri, 27 Jun 2008 03:32:53 +0000 (13:32 +1000)]
[XFS] use minleft when allocating in xfs_bmbt_split()
The bmap btree split code relies on a previous data extent allocation
(from xfs_bmap_btalloc()) to find an AG that has sufficient space to
perform a full btree split, when inserting the extent. When converting
unwritten extents we don't allocate a data extent so a btree split will be
the first allocation. In this case we need to set minleft so the allocator
will pick an AG that has space to complete the split(s).
SGI-PV: 983338
SGI-Modid: xfs-linux-melb:xfs-kern:31357a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
xfs_attrmulti_by_handle currently request the size based on
sizeof(attr_multiop_t) but should be using sizeof(xfs_attr_multiop_t)
because that is what it is dealing with. Despite beeing wrong this
actually harmless in practice because both structures are the same size on
all platforms.
But this sizeof was the only user of struct attr_multiop so we can just
kill it. Also move the ATTR_OP_* defines xfs_attr.h into the struct
xfs_attr_multiop defintion in xfs_fs.h because they are only used with
that structure, and are part of the user ABI for the
XFS_IOC_ATTRMULTI_BY_HANDLE ioctl.
SGI-PV: 983508
SGI-Modid: xfs-linux-melb:xfs-kern:31352a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
[XFS] Check for invalid flags in xfs_attrlist_by_handle.
xfs_attrlist_by_handle should only take the ATTR_ flags for the root
namespaces. The ATTR_KERN* flags may change at anytime and expect special
preconditions that can't be guaranteed for userspace-originating requests.
For example passing down ATTR_KERNNOVAL through xfs_attrlist_by_handle
will hit an assert in debug builds currently.
SGI-PV: 983677
SGI-Modid: xfs-linux-melb:xfs-kern:31351a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Mon, 23 Jun 2008 03:25:53 +0000 (13:25 +1000)]
[XFS] Always reset btree cursor after an insert
After a btree insert operation a cursor can be invalid due to block splits
and a maybe a new root block. We reset the cursor in xfs_bmbt_insert() in
the cases where we think we need to but it isn't enough as we still see
assertions. Just do what we do elsewhere and reset the cursor
unconditionally. Also remove the fix to revalidate the original cursor in
xfs_bmbt_insert().
SGI-PV: 983336
SGI-Modid: xfs-linux-melb:xfs-kern:31342a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
Barry Naujok [Mon, 23 Jun 2008 03:25:38 +0000 (13:25 +1000)]
[XFS] Fix returning case-preserved name with CI node form directories
xfs_dir2_node_lookup() calls xfs_da_node_lookup_int() which iterates
through leaf blocks containing the matching hash value for the name being
looked up. Inside xfs_da_node_lookup_int(), it calls the
xfs_dir2_leafn_lookup_for_entry() for each leaf block.
xfs_dir2_leafn_lookup_for_entry() iterates through each matching
hash/offset pair doing a name comparison to find the matching dirent.
For CI mode, the state->extrablk retains the details of the block that has
the CI match so xfs_dir2_node_lookup() can return the case-preserved name.
The original implementation didn't retain the xfs_da_buf_t properly, so
the lookup was returning a bogus name to be stored in the dentry.
In the case of unlink, the bad name was passed and in debug mode, ASSERTed
when it can't find the entry.
SGI-PV: 983284
SGI-Modid: xfs-linux-melb:xfs-kern:31337a
Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
[XFS] Don't update i_size for directories and special files
The core kernel uses vfs_getattr to look at the inode size and similar
attributes, so there is no need to keep i_size uptodate for directories or
special files. This means we can remove xfs_validate_fields because the
I/O path already keeps i_size uptodate for regular files.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31336a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_remove and xfs_rmdir are almost the same with a little more work
performed in xfs_rmdir due to the . and .. entries. This patch merges
xfs_rmdir into xfs_remove and performs these actions conditionally.
Also clean up the error handling which was a nightmare in both versions
before.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31335a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Mon, 23 Jun 2008 03:25:02 +0000 (13:25 +1000)]
[XFS] fix extent corruption in xfs_iext_irec_compact_full()
This function is used to compact the indirect extent list by moving
extents from one page to the previous to fill them up. After we move some
extents to an earlier page we need to shuffle the remaining extents to the
start of the page. The actual bug here is the second argument to memmove()
needs to index past the extents, that were copied to the previous page,
and move the remaining extents. For pages that are already full (ie
ext_avail == 0) the compaction code has no net effect so don't do it.
SGI-PV: 983337
SGI-Modid: xfs-linux-melb:xfs-kern:31332a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
Lachlan McIlroy [Mon, 23 Jun 2008 03:23:57 +0000 (13:23 +1000)]
[XFS] make inode reclaim wait for log I/O to complete
During a forced shutdown a xfs inode can be destroyed before log I/O
involving that inode is complete. We need to wait for the inode to be
unpinned before tearing it down. Version 2 cleans up the code a bit by
relying on xfs_iflush() to do the unpinning and forced shutdown check.
SGI-PV: 981240
SGI-Modid: xfs-linux-melb:xfs-kern:31326a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
Eric Sandeen [Mon, 23 Jun 2008 03:23:32 +0000 (13:23 +1000)]
[XFS] Pack some shortform dir2 structures for the ARM old ABI
architecture.
This should fix the longstanding issues with xfs and old ABI arm boxes,
which lead to various asserts and xfs shutdowns, and for which an
(incorrect) patch has been floating around for years.
I've verified this patch by comparing the on-disk structure layouts using
pahole from the dwarves package, as well as running through a bit of xfsqa
under qemu-arm, modified so that the check/repair phase after each test
actually executes check/repair from the x86 host, on the filesystem
populated by the arm emulator. Thus far it all looks good.
There are 2 other structures with extra padding at the end, but they don't
seem to cause trouble. I suppose they could be packed as well:
xfs_dir2_data_unused_t and xfs_dir2_sf_t.
Note that userspace needs a similar treatment, and any filesystems which
were running with the previous rogue "fix" will now see corruption (either
in the kernel, or during xfs_repair) with this fix properly in place; it
may be worth teaching xfs_repair to identify and fix that specific issue.
SGI-PV: 982930
SGI-Modid: xfs-linux-melb:xfs-kern:31280a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Lachlan McIlroy [Mon, 23 Jun 2008 03:23:01 +0000 (13:23 +1000)]
[XFS] Use the generic xattr methods.
Use the generic set, get and removexattr methods and supply the s_xattr
array with fine-grained handlers. All XFS/Linux highlevel attr handling is
rewritten from scratch and placed into fs/xfs/linux-2.6/xfs_xattr.c so
that it's separated from the generic low-level code.
SGI-PV: 982343
SGI-Modid: xfs-linux-melb:xfs-kern:31234a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Barry Naujok [Mon, 16 Jun 2008 02:07:41 +0000 (12:07 +1000)]
[XFS] Invalidate dentry in unlink/rmdir if in case-insensitive mode
The vfs_unlink/d_delete functionality in the Linux VFS make the
dentry negative if it is the only inode being referenced. Case-insensitive
mode doesn't work with negative dentries, so if using CI-mode, invalidate
the dentry on unlink/rmdir.
Barry Naujok [Tue, 3 Jun 2008 01:59:18 +0000 (11:59 +1000)]
[XFS] Zero uninitialised xfs_da_args structure in xfs_dir2.c
Fixes a problem in the xfs_dir2_remove and xfs_dir2_replace paths which
intenally call directory format specific lookup funtions that assume
args->cmpresult is zeroed.
Barry Naujok [Wed, 21 May 2008 06:58:55 +0000 (16:58 +1000)]
[XFS] XFS: ASCII case-insensitive support
Implement ASCII case-insensitive support. It's primary purpose is for
supporting existing filesystems that already use this case-insensitive
mode migrated from IRIX. But, if you only need ASCII-only case-insensitive
support (ie. English only) and will never use another language, then this
mode is perfectly adequate.
ASCII-CI is implemented by generating hashes based on lower-case letters
and doing lower-case compares. It implements a new xfs_nameops vector for
doing the hashes and comparisons for all filename operations.
To create a filesystem with this CI mode, use: # mkfs.xfs -n version=ci
<device>
Barry Naujok [Wed, 21 May 2008 06:58:22 +0000 (16:58 +1000)]
[XFS] Return case-insensitive match for dentry cache
This implements the code to store the actual filename found during a
lookup in the dentry cache and to avoid multiple entries in the dcache
pointing to the same inode.
To avoid polluting the dcache, we implement a new directory inode
operations for lookup. xfs_vn_ci_lookup() stores the correct case name in
the dcache.
The "actual name" is only allocated and returned for a case- insensitive
match and not an actual match.
Another unusual interaction with the dcache is not storing negative
dentries like other filesystems doing a d_add(dentry, NULL) when an ENOENT
is returned. During the VFS lookup, if a dentry returned has no inode,
dput is called and ENOENT is returned. By not doing a d_add, this actually
removes it completely from the dcache to be reused. create/rename have to
be modified to support unhashed dentries being passed in.
Barry Naujok [Wed, 21 May 2008 06:50:46 +0000 (16:50 +1000)]
dcache: Add case-insensitive support d_ci_add() routine
This add a dcache entry to the dcache for lookup, but changing the name
that is associated with the entry rather than the one passed in to the
lookup routine.
First, it sees if the case-exact match already exists in the dcache and
uses it if one exists. Otherwise, it allocates a new node with the new
name and splices it into the dcache.
Original code from ntfs_lookup in fs/ntfs/namei.c by Anton Altaparmakov.
Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Acked-by: Christoph Hellwig <hch@infradead.org>
Barry Naujok [Wed, 21 May 2008 06:42:05 +0000 (16:42 +1000)]
[XFS] Add op_flags field and helpers to xfs_da_args
The end of the xfs_da_args structure has 4 unsigned char fields for
true/false information on directory and attr operations using the
xfs_da_args structure.
The following converts these 4 into a op_flags field that uses the first 4
bits for these fields and allows expansion for future operation
information (eg. case-insensitive lookup request).
Barry Naujok [Wed, 21 May 2008 06:41:01 +0000 (16:41 +1000)]
[XFS] Name operation vector for hash and compare
Adds two pieces of functionality for the basis of case-insensitive support
in XFS:
1. A comparison result enumerated type: xfs_dacmp. It represents an
exact match, case-insensitive match or no match at all. This patch
only implements different and exact results.
2. xfs_nameops vector for specifying how to perform the hash generation
of filenames and comparision methods. In this patch the hash vector
points to the existing xfs_da_hashname function and the comparison
method does a length compare, and if the same, does a memcmp and
return the xfs_dacmp result.
All filename functions that use the hash (create, lookup remove, rename,
etc) now use the xfs_nameops.hashname function and all directory lookup
functions also use the xfs_nameops.compname function.
The lookup functions also handle case-insensitive results even though the
default comparison function cannot return that. And important aspect of
the lookup functions is that an exact match always has precedence over a
case-insensitive. So while a case-insensitive match is found, we have to
keep looking just in case there is an exact match. In the meantime, the
info for the first case-insensitive match is retained if no exact match is
found.
Eric Sandeen [Tue, 20 May 2008 05:11:17 +0000 (15:11 +1000)]
[XFS]
de-duplicate calls to xfs_attr_trace_enter
Every call to xfs_attr_trace_enter() shares the exact same 16 args in the
middle... just send in the context pointer and let the next level down
split it into the ktrace.
[XFS] kill calls to xfs_binval in the mount error path
xfs_binval aka xfs_flush_buftarg is the first thing done in
xfs_free_buftarg, so there is no need to have duplicated calls just before
xfs_free_buftarg in the mount failure path.
xfs_mount_init is inlined into xfs_fs_fill_super and allocation switched
to kzalloc. Plug a leak of the mount structure for most early mount
failures. Move xfs_icsb_init_counters to as late as possible in the mount
path and make sure to undo it so that no stale hotplug cpu notifiers are
left around on mount failures.
xfs_mount is already pretty linux-specific so merge it into
xfs_fs_fill_super to allow for a more structured mount code in the next
patches. xfs_start_flags and xfs_finish_flags also move to xfs_super.c.
[XFS] merge xfs_unmount into xfs_fs_put_super / xfs_fs_fill_super
xfs_unmount is small and already pretty Linux specific, so merge it into
the callers. The real unmount path is simplified a little by doing a
WARN_ON on the xfs_unmount_flush retval directly instead of propagating
the error back to the caller, and the mout failure case in simplified
significantly by removing the forced shutdown case and all the dmapi
events that shouldn't be sent because the dmapi mount event hasn't been
sent by that time either.
David Chinner [Tue, 20 May 2008 01:30:27 +0000 (11:30 +1000)]
[XFS] Update valid fields in xfs_mount_log_sb()
Recent changes to update the version number during mount (attr2 stuff)
failed to change the assert that checked for calid flags being changed on
mount. Clearly this path hasn't been exercised by the test code....
[XFS] Kill attr_capable checks as already done in xattr_permission.
No need for addition permission checks in the xattr handler,
fs/xattr.c:xattr_permission() already does them, and in fact slightly more
strict then what was in the attr_capable handlers.
Matthew Wilcox [Mon, 19 May 2008 06:34:27 +0000 (16:34 +1000)]
[XFS] Convert l_flushsema to a sv_t
The l_flushsema doesn't exactly have completion semantics, nor mutex
semantics. It's used as a list of tasks which are waiting to be notified
that a flush has completed. It was also being used in a way that was
potentially racy, depending on the semaphore implementation.
By using a sv_t instead of a semaphore we avoid the need for a separate
counter, since we know we just need to wake everything on the queue.
Original waitqueue implementation from Matthew Wilcox. Cleanup and
conversion to sv_t by Christoph Hellwig.
Tim Shimmin [Wed, 30 Apr 2008 08:15:28 +0000 (18:15 +1000)]
[XFS] Fix up noattr2 so that it will properly update the versionnum and
features2 fields.
Previously, mounting with noattr2 failed to achieve anything because
although it cleared the attr2 mount flag, it would set it again as soon as
it processed the superblock fields. The fix now has an explicit noattr2
flag and uses it later to fix up the versionnum and features2 fields.
powerpc: Disable 64K hugetlb support when doing 64K SPU mappings
The 64K SPU local store mapping feature is incompatible with the
64K huge pages support due to the inability of some parts of
the memory management to differenciate between them while they
use a different page table format.
For now, disable 64K huge pages when CONFIG_SPU_FS_64K_LS,
in the long run, this can be fixed by making this feature use
the hugetlb page table format.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/powermac: Fixup default serial port device for pmac_zilog
This removes the non-working code in legacy_serial that tried to handle
the powermac SCC ports, and instead add a (now working) function to the
powermac platform code to find the default serial console if any.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/powermac: Use sane default baudrate for SCC debugging
When using the "sccdbg" option to route early kernel messages and
xmon to the SCC serial port on PowerMacs, when this wasn't the
configured output port of Open Firmware, we initialize the baudrate
to 57600bps. This isn't a very good default on some powermacs where
both the FW and pmac_zilog will default to 38400. This fixes it to
use the same logic as pmac_zilog to pick a default speed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nick Piggin [Mon, 28 Jul 2008 03:28:03 +0000 (20:28 -0700)]
powerpc/mm: Implement _PAGE_SPECIAL & pte_special() for 64-bit
Implement _PAGE_SPECIAL and pte_special() for 64-bit powerpc. This bit will
be used by the fast get_user_pages() to differenciate PTEs that correspond
to a valid struct page from special mappings that don't such as IO mappings
obtained via io_remap_pfn_ranges().
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Show processor cache information in sysfs
Collect cache information from the OF device tree and display it in
the cpu hierarchy in sysfs. This is intended to be compatible at the
userspace level with x86's implementation[1], hence some of the funny
attribute names. The arrangement of cache info is not immediately
intuitive, but (again) it's for compatibility's sake.
The cache attributes exposed are:
type (Data, Instruction, or Unified)
level (1, 2, 3...)
size
coherency_line_size
number_of_sets
ways_of_associativity
All of these can be derived on platforms that follow the OF PowerPC
Processor binding. The code "publishes" only those attributes for
which it is able to determine values; attributes for values which
cannot be determined are not created at all.
[1] arch/x86/kernel/cpu/intel_cacheinfo.c
BenH: Turned some printk's into pr_debug, added better NULL checking
in a couple of places.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Make core id information available to userspace
Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Make core sibling information available to userspace
Implement the notion of "core siblings" for powerpc. This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.
BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Sun, 27 Jul 2008 16:14:24 +0000 (02:14 +1000)]
ibmveth: Fix multiple errors with dma_mapping_error conversion
The addition of an argument to dma_mapping_error() in commit 8d8bb39b9eba32dd70e87fd5ad5c5dd4ba118e06 "dma-mapping: add the device
argument to dma_mapping_error()" left a bit of fallout:
drivers/net/ibmveth.c:263: error: too few arguments to function 'dma_mapping_error'
drivers/net/ibmveth.c:264: error: expected ')' before 'goto'
drivers/net/ibmveth.c:284: error: expected expression before '}' token
drivers/net/ibmveth.c:297: error: too few arguments to function 'dma_mapping_error'
drivers/net/ibmveth.c:298: error: expected ')' before 'dma_unmap_single'
drivers/net/ibmveth.c:306: error: expected expression before '}' token
drivers/net/ibmveth.c:491: error: too few arguments to function 'dma_mapping_error'
drivers/net/ibmveth.c:927: error: too few arguments to function 'dma_mapping_error'
drivers/net/ibmveth.c:927: error: expected ')' before '{' token
drivers/net/ibmveth.c:974: error: expected expression before '}' token
drivers/net/ibmveth.c:914: error: label 'out' used but not defined m
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Sun, 27 Jul 2008 14:51:02 +0000 (00:51 +1000)]
powerpc/pseries: Fix CMO sysdev attribute API change fallout
Noticed due to these wanings:
arch/powerpc/platforms/pseries/cmm.c:298: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:299: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:320: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:320: warning: initialization from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Roland McGrath [Sun, 27 Jul 2008 06:52:52 +0000 (16:52 +1000)]
powerpc: Add TIF_NOTIFY_RESUME support for tracehook
This adds TIF_NOTIFY_RESUME support for powerpc. When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Roland McGrath [Sun, 27 Jul 2008 06:49:50 +0000 (16:49 +1000)]
powerpc: Call tracehook_signal_handler() when setting up signal frames
This makes the powerpc signal handling code call tracehook_signal_handler()
after a handler is set up. This means that using PTRACE_SINGLESTEP to
enter a signal handler will report to ptrace on the first instruction of
the handler, instead of the second. This is consistent with what x86 and
other machines do, and what users and debuggers want.
BenH: Fixed up the test for the trap value.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather doing one initialization pass over all the per-cpu
cpu_sibling_maps at boot, update the maps at cpu online/offline time.
This is a behavior change -- the thread_siblings attribute now
reflects only online siblings, whereas it would display offline
siblings before. The new behavior matches that of x86, and is
arguably more useful.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This piece of code is broken for >2 threads, and possibly in some
other subtle ways (such as comparing a value obtained from an
"ibm,ppc-interrupt-server#s" property to a value obtained from a
"reg" property) and doesn't seem to have any useful purpose in the
first place other than a dubious warning in case NR_CPUS is too
small, which probably isn't the right place to do so.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>