ToDo/Notes:
- Find and fix bugs.
- - Checkpoint or disable the user space journal ($UsnJrnl).
- In between ntfs_prepare/commit_write, need exclusion between
simultaneous file extensions. This is given to us by holding i_sem
on the inode. The only places in the kernel when a file is resized
- Enable the code for setting the NT4 compatibility flag when we start
making NTFS 1.2 specific modifications.
-2.1.23-WIP
+2.1.23 - Implement extension of resident files and make writing safe as well as
+ many bug fixes, cleanups, and enhancements...
- Add printk rate limiting for ntfs_warning() and ntfs_error() when
compiled without debug. This avoids a possible denial of service
attack. Thanks to Carl-Daniel Hailfinger from SuSE for pointing this
out.
- - Use i_size_read() in fs/ntfs/attrib.c::ntfs_attr_set().
- - Use i_size_read() in fs/ntfs/logfile.c::ntfs_{check,empty}_logfile().
- - Use i_size_read() once and then use the cached value in
- fs/ntfs/lcnalloc.c::ntfs_cluster_alloc().
- - Use i_size_read() in fs/ntfs/file.c::ntfs_file_open().
+ - Fix compilation warnings on ia64. (Randy Dunlap)
+ - Use i_size_{read,write}() instead of reading i_size by hand and cache
+ the value where apropriate.
- Add size_lock to the ntfs_inode structure. This is an rw spinlock
and it locks against access to the inode sizes. Note, ->size_lock
is also accessed from irq context so you must use the _irqsave and
- _irqrestore lock and unlock functions, respectively.
- - Use i_size_read() in fs/ntfs/compress.c at the start of the read and
- use the cached value afterwards. Cache the initialized_size in the
- same way and protect access to the two sizes using the size_lock.
- - Use i_size_read() in fs/ntfs/dir.c once and then use the cached
- value afterwards.
- - Use i_size_read() in fs/ntfs/super.c once and then use the cached
- value afterwards. Cache the initialized_size in the same way and
- protect access to the two sizes using the size_lock.
+ _irqrestore lock and unlock functions, respectively. Protect all
+ accesses to allocated_size, initialized_size, and compressed_size.
- Minor optimization to fs/ntfs/super.c::ntfs_statfs() and its helpers.
- - Use i_size_read() in fs/ntfs/inode.c once and then use the cached
- value afterwards when reading the size of the bitmap inode.
- - Use i_size_{read,write}() in fs/ntfs/{aops.c,mft.c} and protect
- access to the i_size and other size fields using the size_lock.
- Implement extension of resident files in the regular file write code
paths (fs/ntfs/aops.c::ntfs_{prepare,commit}_write()). At present
this only works until the data attribute becomes too big for the mft
mft record for resident attributes (fs/ntfs/inode.c).
- Small readability cleanup to use "a" instead of "ctx->attr"
everywhere (fs/ntfs/inode.c).
+ - Make fs/ntfs/namei.c::ntfs_get_{parent,dentry} static and move the
+ definition of ntfs_export_ops from fs/ntfs/super.c to namei.c. Also,
+ declare ntfs_export_ops in fs/ntfs/ntfs.h.
+ - Correct sparse file handling. The compressed values need to be
+ checked and set in the ntfs inode as done for compressed files and
+ the compressed size needs to be used for vfs inode->i_blocks instead
+ of the allocated size, again, as done for compressed files.
+ - Add AT_EA in addition to AT_DATA to whitelist for being allowed to be
+ non-resident in fs/ntfs/attrib.c::ntfs_attr_can_be_non_resident().
+ - Add fs/ntfs/attrib.c::ntfs_attr_vcn_to_lcn_nolock() used by the new
+ write code.
+ - Fix bug in fs/ntfs/attrib.c::ntfs_find_vcn_nolock() where after
+ dropping the read lock and taking the write lock we were not checking
+ whether someone else did not already do the work we wanted to do.
+ - Rename fs/ntfs/attrib.c::ntfs_find_vcn_nolock() to
+ ntfs_attr_find_vcn_nolock() and update all callers.
+ - Add fs/ntfs/attrib.[hc]::ntfs_attr_make_non_resident().
+ - Fix sign of various error return values to be negative in
+ fs/ntfs/lcnalloc.c.
+ - Modify ->readpage and ->writepage (fs/ntfs/aops.c) so they detect and
+ handle the case where an attribute is converted from resident to
+ non-resident by a concurrent file write.
+ - Remove checks for NULL before calling kfree() since kfree() does the
+ checking itself. (Jesper Juhl)
+ - Some utilities modify the boot sector but do not update the checksum.
+ Thus, relax the checking in fs/ntfs/super.c::is_boot_sector_ntfs() to
+ only emit a warning when the checksum is incorrect rather than
+ refusing the mount. Thanks to Bernd Casimir for pointing this
+ problem out.
+ - Update attribute definition handling.
+ - Add NTFS_MAX_CLUSTER_SIZE and NTFS_MAX_PAGES_PER_CLUSTER constants.
+ - Use NTFS_MAX_CLUSTER_SIZE in super.c instead of hard coding 0x10000.
+ - Use MAX_BUF_PER_PAGE instead of variable sized array allocation for
+ better code generation and one less sparse warning in fs/ntfs/aops.c.
+ - Remove spurious void pointer casts from fs/ntfs/. (Pekka Enberg)
+ - Use C99 style structure initialization after memory allocation where
+ possible (fs/ntfs/{attrib.c,index.c,super.c}). Thanks to Al Viro and
+ Pekka Enberg.
+ - Stamp the transaction log ($UsnJrnl), aka user space journal, if it
+ is active on the volume and we are mounting read-write or remounting
+ from read-only to read-write.
+ - Fix a bug in address space operations error recovery code paths where
+ if the runlist was not mapped at all and a mapping error occured we
+ would leave the runlist locked on exit to the function so that the
+ next access to the same file would try to take the lock and deadlock.
+ - Detect the case when Windows has been suspended to disk on the volume
+ to be mounted and if this is the case do not allow (re)mounting
+ read-write. This is done by parsing hiberfil.sys if present.
+ - Fix several occurences of a bug where we would perform 'var & ~const'
+ with a 64-bit variable and a int, i.e. 32-bit, constant. This causes
+ the higher order 32-bits of the 64-bit variable to be zeroed. To fix
+ this cast the 'const' to the same 64-bit type as 'var'.
+ - Change the runlist terminator of the newly allocated cluster(s) to
+ LCN_ENOENT in ntfs_attr_make_non_resident(). Otherwise the runlist
+ code gets confused.
+ - Add an extra parameter @last_vcn to ntfs_get_size_for_mapping_pairs()
+ and ntfs_mapping_pairs_build() to allow the runlist encoding to be
+ partial which is desirable when filling holes in sparse attributes.
+ Update all callers.
+ - Change ntfs_map_runlist_nolock() to only decompress the mapping pairs
+ if the requested vcn is inside it. Otherwise we get into problems
+ when we try to map an out of bounds vcn because we then try to map
+ the already mapped runlist fragment which causes
+ ntfs_mapping_pairs_decompress() to fail and return error. Update
+ ntfs_attr_find_vcn_nolock() accordingly.
+ - Fix a nasty deadlock that appeared in recent kernels.
+ The situation: VFS inode X on a mounted ntfs volume is dirty. For
+ same inode X, the ntfs_inode is dirty and thus corresponding on-disk
+ inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
+ to the table of inodes, i.e. $MFT, inode 0.
+ What happens:
+ Process 1: sys_sync()/umount()/whatever... calls
+ __sync_single_inode() for $MFT -> do_writepages() -> write_page for
+ the dirty page containing the on-disk inode X, the page is now locked
+ -> ntfs_write_mst_block() which clears PageUptodate() on the page to
+ prevent anyone else getting hold of it whilst it does the write out.
+ This is necessary as the on-disk inode needs "fixups" applied before
+ the write to disk which are removed again after the write and
+ PageUptodate is then set again. It then analyses the page looking
+ for dirty on-disk inodes and when it finds one it calls
+ ntfs_may_write_mft_record() to see if it is safe to write this
+ on-disk inode. This then calls ilookup5() to check if the
+ corresponding VFS inode is in icache(). This in turn calls ifind()
+ which waits on the inode lock via wait_on_inode whilst holding the
+ global inode_lock.
+ Process 2: pdflush results in a call to __sync_single_inode for the
+ same VFS inode X on the ntfs volume. This locks the inode (I_LOCK)
+ then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
+ read_cache_page() for the page (in page cache of table of inodes
+ $MFT, inode 0) containing the on-disk inode. This page has
+ PageUptodate() clear because of Process 1 (see above) so
+ read_cache_page() blocks when it tries to take the page lock for the
+ page so it can call ntfs_read_page().
+ Thus Process 1 is holding the page lock on the page containing the
+ on-disk inode X and it is waiting on the inode X to be unlocked in
+ ifind() so it can write the page out and then unlock the page.
+ And Process 2 is holding the inode lock on inode X and is waiting for
+ the page to be unlocked so it can call ntfs_readpage() or discover
+ that Process 1 set PageUptodate() again and use the page.
+ Thus we have a deadlock due to ifind() waiting on the inode lock.
+ The solution: The fix is to use the newly introduced
+ ilookup5_nowait() which does not wait on the inode's lock and hence
+ avoids the deadlock. This is safe as we do not care about the VFS
+ inode and only use the fact that it is in the VFS inode cache and the
+ fact that the vfs and ntfs inodes are one struct in memory to find
+ the ntfs inode in memory if present. Also, the ntfs inode has its
+ own locking so it does not matter if the vfs inode is locked.
2.1.22 - Many bug and race fixes and error handling improvements.