Commit Graph

16889 Commits

Author SHA1 Message Date
Trond Myklebust
2928db1ffe NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
Since nfs_scan_list() doesn't wait for locked pages, we have a race in
which it is possible to end up with an inode that needs to send a COMMIT,
but which does not have the I_DIRTY_DATASYNC flag set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust
5bad5abec4 NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust
420e3646bb NFS: Reduce the number of unnecessary COMMIT calls
If the caller is doing a non-blocking flush, and there are still writebacks
pending on the wire, we can usually defer the COMMIT call until those
writes are done.

Also ensure that we honour the wbc->nonblocking flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust
ff778d02bf NFS: Add a count of the number of unstable writes carried by an inode
In order to know when we should do opportunistic commits of the unstable
writes, when the VM is doing a background flush, we add a field to count
the number of unstable writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust
8fc795f703 NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
The sole purpose of nfs_write_inode is to commit unstable writes, so
move it into fs/nfs/write.c, and make nfs_commit_inode static.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:53 -05:00
Linus Torvalds
9467c4fdd6 Merge branch 'write_inode2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'write_inode2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  pass writeback_control to ->write_inode
  make sure data is on disk before calling ->write_inode
2010-03-05 11:53:53 -08:00
Linus Torvalds
35c2e967d0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Switch !O_CREAT case to use of do_last()
  Get rid of symlink body copying
  Finish pulling of -ESTALE handling to upper level in do_filp_open()
  Turn do_link spaghetty into a normal loop
  Unify exits in O_CREAT handling
  Kill is_link argument of do_last()
  Pull handling of LAST_BIND into do_last(), clean up ok: part in do_filp_open()
  Leave mangled flag only for setting nd.intent.open.flag
  Get rid of passing mangled flag to do_last()
  Don't pass mangled open_flag to finish_open()
  pull more into do_last()
  bail out with ELOOP earlier in do_link loop
  pull the common predecessors into do_last()
  postpone __putname() until after do_last()
  unroll do_last: loop in do_filp_open()
  Shift releasing nd->root from do_last() to its caller
  gut do_filp_open() a bit more (do_last separation)
  beginning to untangle do_filp_open()
2010-03-05 11:46:31 -08:00
Linus Torvalds
1f63b9c15b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (36 commits)
  ext4: fix up rb_root initializations to use RB_ROOT
  ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl
  ext4: Fix the NULL reference in double_down_write_data_sem()
  ext4: Fix insertion point of extent in mext_insert_across_blocks()
  ext4: consolidate in_range() definitions
  ext4: cleanup to use ext4_grp_offs_to_block()
  ext4: cleanup to use ext4_group_first_block_no()
  ext4: Release page references acquired in ext4_da_block_invalidatepages
  ext4: Fix ext4_quota_write cross block boundary behaviour
  ext4: Convert BUG_ON checks to use ext4_error() instead
  ext4: Use direct_IO_no_locking in ext4 dio read
  ext4: use ext4_get_block_write in buffer write
  ext4: mechanical rename some of the direct I/O get_block's identifiers
  ext4: make "offset" consistent in ext4_check_dir_entry()
  ext4: Handle non empty on-disk orphan link
  ext4: explicitly remove inode from orphan list after failed direct io
  ext4: fix error handling in migrate
  ext4: deprecate obsoleted mount options
  ext4: Fix fencepost error in chosing choosing group vs file preallocation.
  jbd2: clean up an assertion in jbd2_journal_commit_transaction()
  ...
2010-03-05 10:47:00 -08:00
Linus Torvalds
b24bc1e61c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: get rid of obsolete definition in header file
  Squashfs: get rid of obsolete variable in struct squashfs_sb_info
  Squashfs: add decompressor entries for lzma and lzo
  Squashfs: add a decompressor framework
  Squashfs: factor out remaining zlib dependencies into separate wrapper file
  Squashfs: move zlib decompression wrapper code into a separate file
2010-03-05 10:46:04 -08:00
Christoph Hellwig
a9185b41a4 pass writeback_control to ->write_inode
This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:52 -05:00
Christoph Hellwig
26821ed40b make sure data is on disk before calling ->write_inode
Similar to the fsync issue fixed a while ago in commit
2daea67e96 we need to write for data to
actually hit the disk before writing out the metadata to guarantee
data integrity for filesystems that modify the inode in the data I/O
completion path.  Currently XFS and NFS handle this manually, and AFS
has a write_inode method that does nothing but waiting for data, while
others are possibly missing out on this.

Fortunately this change has a lot less impact than the fsync change
as none of the write_inode methods starts data writeout of any form
by itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:10 -05:00
Phillip Lougher
06862f884d Squashfs: get rid of obsolete definition in header file
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-03-05 15:35:35 +00:00
Phillip Lougher
ae4a3179b1 Squashfs: get rid of obsolete variable in struct squashfs_sb_info
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-03-05 15:35:20 +00:00
Al Viro
1f36f774b2 Switch !O_CREAT case to use of do_last()
... and now we have all intents crap well localized

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:22:25 -05:00
Al Viro
def4af30cf Get rid of symlink body copying
Now that nd->last stays around until ->put_link() is called, we can
just postpone that ->put_link() in do_filp_open() a bit and don't
bother with copying.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:40 -05:00
Al Viro
3866248e5f Finish pulling of -ESTALE handling to upper level in do_filp_open()
Don't bother with path_walk() (and its retry loop); link_path_walk()
will do it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:38 -05:00
Al Viro
806b681cbe Turn do_link spaghetty into a normal loop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:36 -05:00
Al Viro
10fa8e62f2 Unify exits in O_CREAT handling
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:35 -05:00
Al Viro
9e67f36169 Kill is_link argument of do_last()
We set it to 1 iff we return NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:33 -05:00
Al Viro
67ee3ad21d Pull handling of LAST_BIND into do_last(), clean up ok: part in do_filp_open()
Note that in case of !O_CREAT we know that nd.root has already been given up

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:31 -05:00
Al Viro
4296e2cbf2 Leave mangled flag only for setting nd.intent.open.flag
Nothing else uses it anymore

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:29 -05:00
Al Viro
5b369df826 Get rid of passing mangled flag to do_last()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:27 -05:00
Al Viro
9a66179e13 Don't pass mangled open_flag to finish_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:25 -05:00
Al Viro
a2c36b450e pull more into do_last()
Handling of LAST_DOT/LAST_ROOT/LAST_DOTDOT/terminating slash
can be pulled in as well

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:24 -05:00
Al Viro
c99658fe97 bail out with ELOOP earlier in do_link loop
If we'd passed through 32 trailing symlinks already, there's
no sense following the 33rd - we'll bail out anyway.  Better
bugger off earlier.

It *does* change behaviour, after a fashion - if the 33rd happens
to be a procfs-style symlink, original code *would* allow it.
This one will not.  Cry me a river if that hurts you.  Please, do.
And post a video of that, while you are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:22 -05:00
Al Viro
a1e28038df pull the common predecessors into do_last()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:20 -05:00
Al Viro
c41c140562 postpone __putname() until after do_last()
Since do_last() doesn't mangle nd->last_name, we can safely postpone
__putname() done in handling of trailing symlinks until after the
call of do_last()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:18 -05:00
Al Viro
27bff34300 unroll do_last: loop in do_filp_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:16 -05:00
Al Viro
3343eb8209 Shift releasing nd->root from do_last() to its caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:15 -05:00
Al Viro
fb1cc555d5 gut do_filp_open() a bit more (do_last separation)
Brute-force separation of stuff reachable from do_last: with
the exception of do_link:; just take all that crap to a helper
function as-is and have it tell the caller if it has to go
to do_link.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:13 -05:00
Al Viro
648fa8611d beginning to untangle do_filp_open()
That's going to be a long and painful series.  The first step:
take the stuff reachable from 'ok' label in do_filp_open() into
a new helper (finish_open()).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 09:01:11 -05:00
Venkatesh Pallipadi
64e290ec69 ext4: fix up rb_root initializations to use RB_ROOT
ext4 uses rb_node = NULL; to zero rb_root at few places.  Using
RB_ROOT as the initializer is more portable in case the underlying
implementation of rbtrees changes in the future.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Paris <eparis@redhat.com>
2010-03-04 22:25:21 -05:00
Linus Torvalds
64ba992675 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: groups support
  exofs: Prepare for groups
  exofs: Error recovery if object is missing from storage
  exofs: convert io_state to use pages array instead of bio at input
  exofs: RAID0 support
  exofs: Define on-disk per-inode optional layout attribute
  exofs: unindent exofs_sbi_read
  exofs: Move layout related members to a layout structure
  exofs: Recover in the case of read-passed-end-of-file
  exofs: Micro-optimize exofs_i_info
  exofs: debug print even less
2010-03-04 08:26:08 -08:00
Linus Torvalds
0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Akira Fujita
c437b27335 ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl
a) Fix sparse warning in ext4_ioctl()
b) Remove unneeded variable in mext_leaf_block()
c) Fix spelling typo in mext_check_arguments()

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-04 00:39:24 -05:00
Akira Fujita
7247c0caa2 ext4: Fix the NULL reference in double_down_write_data_sem()
If EXT4_IOC_MOVE_EXT ioctl is called with NULL donor_fd, fget() in
ext4_ioctl() gets inappropriate file structure for donor; so we need
to do this check earlier, before calling double_down_write_data_sem().

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-04 00:34:58 -05:00
Akira Fujita
5fd5249aa3 ext4: Fix insertion point of extent in mext_insert_across_blocks()
If the leaf node has 2 extent space or fewer and EXT4_IOC_MOVE_EXT
ioctl is called with the file offset where after the 2nd extent
covers, mext_insert_across_blocks() always tries to insert extent into
the first extent.  As a result, the file gets corrupted because of
wrong extent order.  The patch fixes this problem.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-04 00:31:06 -05:00
Akinobu Mita
731eb1a03a ext4: consolidate in_range() definitions
There are duplicate macro definitions of in_range() in mballoc.h and
balloc.c.  This consolidates these two definitions into ext4.h, and
changes extents.c to use in_range() as well.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
2010-03-03 23:55:01 -05:00
Akinobu Mita
bda00de7e8 ext4: cleanup to use ext4_grp_offs_to_block()
More cleanup to convert open-coded calculations of the first block
number of a free extent to use ext4_grp_offs_to_block() instead.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
2010-03-03 23:53:25 -05:00
Akinobu Mita
5661bd6861 ext4: cleanup to use ext4_group_first_block_no()
This is a cleanup and simplification patch which takes some open-coded
calculations to calculate the first block number of a group and
converts them to use the (already defined) ext4_group_first_block_no()
function.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
2010-03-03 23:53:39 -05:00
Al Viro
9643f5d94a Merge branch 'for-fsnotify' into for-linus 2010-03-03 17:12:40 -05:00
Jan Kara
9b1d0998d2 ext4: Release page references acquired in ext4_da_block_invalidatepages
We forget to release page references we acquire in
ext4_da_block_invalidatepages.  Luckily, this function gets called only if we
are not able to allocate blocks for delay-allocated data so that function
should better never be called.

Also cleanup handling of index variable.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-03 16:19:32 -05:00
Al Viro
4919c5e45a fix race in d_splice_alias()
rehashing the negative placeholder opens a race with d_lookup();
we unhash it almost immediately (by d_move()), but the race
window is there.  Since d_move() doesn't rely on target being
hashed, we don't need that d_rehash() at all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:13:08 -05:00
Al Viro
bec1052e5b set S_DEAD on unlink() and non-directory rename() victims
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:12:08 -05:00
Miklos Szeredi
db1f05bb85 vfs: add NOFOLLOW flag to umount(2)
Add a new UMOUNT_NOFOLLOW flag to umount(2).  This is needed to prevent
symlink attacks in unprivileged unmounts (fuse, samba, ncpfs).

Additionally, return -EINVAL if an unknown flag is used (and specify
an explicitly unused flag: UMOUNT_UNUSED).  This makes it possible for
the caller to determine if a flag is supported or not.

CC: Eugene Teo <eugene@redhat.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:08:00 -05:00
Al Viro
0ceeca5a08 hppfs can use existing proc_mnt, no need for do_kern_mount() in there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:08:00 -05:00
Al Viro
8089352a13 Mirror MS_KERNMOUNT in ->mnt_flags
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:08:00 -05:00
Al Viro
d498b25a4f get rid of useless vfsmount_lock use in put_mnt_ns()
It hadn't been needed since we'd sanitized the logics in
mark_mounts_for_expiry() (which, in turn, used to be a
rudiment of bad old times when namespace_sem was per-ns).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:59 -05:00
Al Viro
47cd813f29 Take vfsmount_lock to fs/internal.h
no more users left outside of fs/*.c (and very few outside of
fs/namespace.c, actually)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:59 -05:00
Al Viro
9f5596af44 take check for new events in namespace (guts of mounts_poll()) to namespace.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:59 -05:00