* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Fix memory corruption with small buffer reads
[XFS] Fix inode list allocation size in writeback.
[XFS] Don't allow memory reclaim to wait on the filesystem in inode
[XFS] Fix fsync() b0rkage.
[XFS] Include linux/random.h in all builds, not just debug builds.
When we have multiple buffers in a single page for a blocksize == pagesize
filesystem we might overwrite the page contents if two callers hit it
shortly after each other. To prevent that we need to keep the page locked
until I/O is completed and the page marked uptodate.
Thanks to Eric Sandeen for triaging this bug and finding a reproducible
testcase and Dave Chinner for additional advice.
This should fix kernel.org bz #10421.
Tested-by: Eric Sandeen <sandeen@sandeen.net>
SGI-PV: 981813
SGI-Modid: xfs-linux-melb:xfs-kern:31173a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
We only need to allocate space for the number of inodes in the cluster
when writing back inodes, not every byte in the inode cluster. This
reduces the amount of memory needing to be allocated to 256 bytes instead
of 64k.
SGI-PV: 981949
SGI-Modid: xfs-linux-melb:xfs-kern:31182a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
writeback
If we allow memory reclaim to wait on the pages under writeback in inode
cluster writeback we could deadlock because we are currently holding the
ILOCK on the initial writeback inode which is needed in data I/O
completion to change the file size or do unwritten extent conversion
before the pages are taken out of writeback state.
SGI-PV: 981091
SGI-Modid: xfs-linux-melb:xfs-kern:31015a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_fsync() fails to wait for data I/O completion before checking if the
inode is dirty or clean to decide whether to log the inode or not. This
misses inode size updates when the data flushed by the fsync() is
extending the file.
Hence, like fdatasync(), we need to wait for I/o completion first, then
check the inode for cleanliness. Doing so makes the behaviour of
xfs_fsync() identical for fsync and fdatasync and we *always* use
synchronous semantics if the inode is dirty. Therefore also kill the
differences and remove the unused flags from the xfs_fsync function and
callers.
SGI-PV: 981296
SGI-Modid: xfs-linux-melb:xfs-kern:31033a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
memcpy() from userland pointer is a Bad Thing(tm)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fallout from commit 46d7b522eb ("uml: move
hppfs_kern.c to hppfs.c")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (21 commits)
[CIFS] Remove debug statement
Fix possible access to undefined memory region.
[CIFS] Enable DFS support for Windows query path info
[CIFS] Enable DFS support for Unix query path info
[CIFS] add missing seq_printf to cifs_show_options for hard mount option
[CIFS] add more complete mount options to cifs_show_options
[CIFS] Add missing defines for DFS
CIFSGetDFSRefer cleanup + dfs_referral_level_3 fixed to conform REFERRAL_V3 the MS-DFSC spec.
Fixed DFS code to work with new 'build_path_from_dentry', that returns full path if share in the dfs, now.
[CIFS] enable parsing for transport encryption mount parm
[CIFS] Finishup DFS code
[CIFS] BKL-removal: convert CIFS over to unlocked_ioctl
[CIFS] suppress duplicate warning
[CIFS] Fix paths when share is in DFS to include proper prefix
add function to convert access flags to legacy open mode
clarify return value of cifs_convert_flags()
[CIFS] don't explicitly do a FindClose on rewind when directory search has ended
[CIFS] cleanup old checkpatch warnings
[CIFS] CIFSSMBPosixLock should return -EINVAL on error
fix memory leak in CIFSFindNext
...
* 'for-2.6.26' of git://linux-nfs.org/~bfields/linux: (25 commits)
svcrdma: Verify read-list fits within RPCSVC_MAXPAGES
svcrdma: Change svc_rdma_send_error return type to void
svcrdma: Copy transport address and arm CQ before calling rdma_accept
svcrdma: Set rqstp transport address in rdma_read_complete function
svcrdma: Use ib verbs version of dma_unmap
svcrdma: Cleanup queued, but unprocessed I/O in svc_rdma_free
svcrdma: Move the QP and cm_id destruction to svc_rdma_free
svcrdma: Add reference for each SQ/RQ WR
svcrdma: Move destroy to kernel thread
svcrdma: Shrink scope of spinlock on RQ CQ
svcrdma: Use standard Linux lists for context cache
svcrdma: Simplify RDMA_READ deferral buffer management
svcrdma: Remove unused READ_DONE context flags bit
svcrdma: Return error from rdma_read_xdr so caller knows to free context
svcrdma: Fix error handling during listening endpoint creation
svcrdma: Free context on post_recv error in send_reply
svcrdma: Free context on ib_post_recv error
svcrdma: Add put of connection ESTABLISHED reference in rdma_cma_handler
svcrdma: Fix return value in svc_rdma_send
svcrdma: Fix race with dto_tasklet in svc_rdma_send
...
Final piece for handling DFS in query_path_info, constructing a
fake inode for the junction directory which the submount will cover.
This handles the non-Unix (Windows etc.) code path.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Final piece for handling DFS in unix_query_path_info, constructing a
fake inode for the junction directory which the submount will cover.
Acked-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
[GFS2] Prefer strlcpy() over snprintf()
[GFS2] Fix cast from unsigned int to s64
[GFS2] filesystem consistency error from do_strip
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] return to old errno choice in mkdir() et.al.
[Patch] fs/binfmt_elf.c: fix wrong return values
[PATCH] get rid of leak in compat_execve()
[Patch] fs/binfmt_elf.c: fix a wrong free
[PATCH] avoid multiplication overflows and signedness issues for max_fds
[PATCH] dup_fd() part 4 - race fix
[PATCH] dup_fd() - part 3
[PATCH] dup_fd() part 2
[PATCH] dup_fd() fixes, part 1
[PATCH] take init_files to fs/file.c
Also Kari Hurtta noticed a missing check in the same function which is now fixed.
CC: Kari Hurtta <hurtta+gmane@siilo.fmi.fi>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The return value on writes to the plock device should be
the number of bytes written. It was returning 0 instead
when an nfs lock callback was involved.
Reported-by: Nathan Straz <nstraz@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Removed the section mismatch message:
WARNING: fs/dlm/dlm.o(.init.text+0x132): Section mismatch in reference from the function init_module() to the function .exit.text:dlm_netlink_exit()
Since dlm_netlink_exit() is called in the init_dlm() error handling,
the __exit annotation has been removed.
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: David Teigland <teigland@redhat.com>
The semaphore connections_lock is used as a mutex. Convert it to the mutex
API.
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: AUTH_SYS "machine creds" shouldn't use negative valued uid/gid
nfs: make nfs4_drop_state_owner() static
nfs: path_{get,put}() cleanups
nfs: replace remaining __FUNCTION__ occurrences
nfs/lsm: make NFSv4 set LSM mount options
NFSv4: Check the return value of decode_compound_hdr_arg()
nfs: fix race in nfs_dirty_request
NFS: Ensure that 'noac' and/or 'actimeo=0' turn off attribute caching
The current permissions on sessionid are a little too restrictive.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
adds various options to cifs_show_options
(displayed when you cat /proc/mounts with a cifs mount). I limited
the new ones to values that are associated with the mount with the
exception of "seal" (which is a per tree connection property, but I
thought was important enough to show through).
Eventually cifs's parse_mount_options also needs to
be rewritten to use the match_token API but that would be a big enough
change that I would prefer that changing parse_mount_options wait
until next release.
Signed-off-by: Steve French <sfrench@us.ibm.com>
In case when both EEXIST and EROFS would apply we used to
return the former in mkdir(2) and friends. Lest anyone suspects
us of being consistent, in the same situation knfsd gave clients
nfs_erofs...
ro-bind series had switched the syscall side of things to
returning -EROFS and immediately broke an application - namely,
mkdir -p. Patch restores the original behaviour...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
create_elf_tables() returns 0 on success. But when strnlen_user() "fails",
it returns 0 directly. So this is wrong.
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Even though copy_compat_strings() doesn't cache the pages,
copy_strings_kernel() and stuff indirectly called by e.g.
->load_binary() is doing that, so we need to drop the
cache contents in the end.
[found by WANG Cong <wangcong@zeuux.org>]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In kmalloc failing path, we shouldn't free pointers in 'info',
because the struct 'info' is uninitilized when kmalloc is called.
And when kmalloc returns NULL, it's needless to kfree it.
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
--
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Limit sysctl_nr_open - we don't want ->max_fds to exceed MAX_INT and
we don't want size calculation for ->fd[] to overflow.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Parent _can_ be a clone task, contrary to the comment. Moreover,
more files could be opened while we allocate a copy, in which case
we end up copying only part into new descriptor table. Since what
we get _is_ affected by all changes in the old range, we can get
rather weird effects - e.g.
dup2(0, 1024); close(0);
in parallel with fork() resulting in child that sees the effect of
close(), but not that of dup2() done just before that close().
What we need is to recalculate the open_count after having reacquired
->file_lock and if external fdtable we'd just allocated is too small for
it, free the sucker and redo allocation.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
use alloc_fdtable() instead of expand_files(), get rid of pointless
grabbing newf->file_lock, kill magic in copy_fdtable() that used to
be there only to skip copying when called from dup_fd().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nfs4_drop_state_owner() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Here are some more places where path_{get,put}() can be used instead of
dput()/mntput() pair.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 get_sb operations call into the LSM layer to set security options passed
from userspace. NFSv4 hooks were not originally added since it was reasonably
late in the merge window and NFSv3 was the only thing that had regressed (v4
has never supported any LSM options)
This patch makes NFSv4 call into the LSM to set security options rather than
just blindly dropping them with no notice to the user as happens today. This
patch was tested in a simple NFSv4 environment with the context= option and
appeared to work as expected.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If decode_compound_hdr_arg() returns a resource error, then we cannot
proceed to process the callback. Return a 'GARBAGE_ARGS' rpc-level error to
the caller instead.
If, however, the minor version field is incorrect, then we need to
propagate the resulting NFS4ERR_MINOR_VERS_MISMATCH error back as the
compound status field (setting the nops field to 0).
Finally, if encode_compound_hdr_res() returns an error, we need to return
an RPC_SYSTEM_ERR to the caller.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When called from nfs_flush_incompatible, the req is not locked, so
req->wb_page might be set to NULL before it is used by PageWriteback.
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Both the 'noac' and 'actimeo=0' mount options should ensure that attributes
are not cached, however a bug in nfs_attribute_timeout() means that
currently, the attributes may in fact get cached for up to one jiffy. This
has been seen to cause corruption in some applications.
The reason for the bug is that the time_in_range() test returns 'true' as
long as the current time lies between nfsi->read_cache_jiffies and
nfsi->read_cache_jiffies + nfsi->attrtimeo. In other words, if jiffies
equals nfsi->read_cache_jiffies, then we still cache the attribute data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Updating the current transaction's t_state is protected by j_state_lock. We
need to do the same when updating the t_state to T_COMMIT.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If the block allocator gets blocks out of system zone ext4 calls
ext4_error. But if the file system is mounted with errors=continue
retry block allocation. We need to mark the system zone blocks as
in use to make sure retry don't pick them again
System zone is the block range mapping block bitmap, inode bitmap and inode
table.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Samba now supports transport encryption on particular exports
(mounted tree ids can be encrypted for servers which support the
unix extensions). This adds parsing support to cifs mount
option parsing for this.
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_ioctl doesn't seem to need the BKL for anything, so convert it over
to use unlocked_ioctl.
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
fs/cifs/dir.c: In function 'cifs_ci_compare':
fs/cifs/dir.c:582: warning: passing argument 1 of 'memcpy' discards
qualifiers from pointer target type
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It is possible that the entry in sysfs already exists, one case of this is
when a network device is renamed to bonding_masters. Anyway, in this case
the proper error path is for device_rename to return an error code, not to
generate bogus backtrace and errors.
Also, to avoid possible races, the create link should be done before the
remove link. This makes a device rename atomic operation like other renames.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix error path during early mount
9p: make cryptic unknown error from server less scary
9p: fix flags length in net
9p: Correct fidpool creation failure in p9_client_create
9p: use struct mutex instead of struct semaphore
9p: propagate parse_option changes to client and transports
fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
9p: Documentation updates
add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
This fix the uninitialized bs when we try to replace a xattr entry in
ibody with the new value which require more than free space.
This situation only happens we format ext3/4 with inode size more than 128 and
we have put xattr entries both in ibody and block. The consequences about
this bug is we will lost the xattr block which pointed by i_file_acl with all
xattr entires in it. We will alloc a new xattr block and put that large value
entry in it. The old xattr block will become orphan block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updating the current transaction's t_state is protected by j_state_lock. We
need to do the same when updating the t_state to T_COMMIT.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some versions of Samba (3.2-pre e.g.) are stricter about checking to make sure that
paths in DFS name spaces are sent in the form \\server\share\dir\subdir ...
instead of \dir\subdir
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
There was some cleanup issues during early mount which would trigger
a kernel bug for certain types of failure. This patch reorganizes the
cleanup to get rid of the bad behavior.
This also merges the 9pnet and 9pnet_fd modules for the purpose of
configuration and initialization. Keeping the fd transport separate
from the core 9pnet code seemed like a good idea at the time, but in
practice has caused more harm and confusion than good.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
The kernel-doc comments of much of the 9p system have been in disarray since
reorganization. This patch fixes those problems, adds additional documentation
and a template book which collects the 9p information.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
match_strcpy() is a somewhat creepy function: the caller needs to make sure
that the destination buffer is big enough, and when he screws up or
forgets, match_strcpy() happily overruns the buffer.
There's exactly one customer: v9fs_parse_options(). I believe it currently
can't overflow its buffer, but that's not exactly obvious.
The source string is a substing of the mount options. The kernel silently
truncates those to PAGE_SIZE bytes, including the terminating zero. See
compat_sys_mount() and do_mount().
The destination buffer is obtained from __getname(), which allocates from
name_cachep, which is initialized by vfs_caches_init() for size PATH_MAX.
We're safe as long as PATH_MAX <= PAGE_SIZE. PATH_MAX is 4096. As far as
I know, the smallest PAGE_SIZE is also 4096.
Here's a patch that makes the code a bit more obviously correct. It
doesn't depend on PATH_MAX <= PAGE_SIZE.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Jim Meyering <meyering@redhat.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
SMBLegacyOpen always opens a file as r/w. This could be problematic
for files with ATTR_READONLY set. Have it interpret the access_mode
into a sane open mode.
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_convert_flags returns 0x20197 in the default case. It's not
immediately evident where that number comes from, so change it
to be an or'ed set of flags. The compiler will boil it down anyway.
(Thanks to Guenter Kukkukk for clarifying the flags).
Signed-off-by: Steve French <sfrench@us.ibm.com>
In case of inode preallocation, the number of blocks to allocate depends
on the file size and it is calculated in ext4_mb_normalize_request().
Each group in the filesystem is then checked to find one that can be
used for allocation; this is done in ext4_mb_good_group().
When a file bigger than 4MB is created, the requested number of blocks
to preallocate, calculated by ext4_mb_normalize_request is 4096.
However for a filesystem with 1KB block size, the maximum size of the
block buddies used by the multiblock allocator is 2048, so none of
groups in the filesystem satisfies the search criteria in
ext4_mb_good_group(). Scanning all the filesystem groups impacts
performance.
This was demonstrated by using a freshly created, 70GB, 1k block
filesystem, with caches dropped write before the test via
/proc/sys/vm/drop_caches, and with the filesystem mounted with
nodelalloc and nodealloc,nomballoc. The time to write an 8 megabyte
file using "dd if=/dev/zero of=/mnt/test/fo bs=8k count=1k conv=fsync"
took 35.5091 seconds (236kB/s) with nodellaloc, and 0.233754 seconds
(35.9 MB/s) with the nodelloc,nomballoc options. With a 1TB partition,
it took several minutes to write 8MB!
This patch modifies the algorithm in ext4_mb_normalize_group_request to
calculate the number of blocks to allocate by taking into account the
maximum size of free blocks chunks handled by the multiblock allocator.
It has also been tested for filesystems with 2KB and 4KB block sizes to
ensure that those cases don't regress.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In journal=data mode, it is not enough to do write_inode_now as done in
vfs_quota_on() to write all data to their final location (which is
needed for quota_read to work correctly). Calling journal_flush() does
its job.
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When quota is disabled, we should not print 'journaled quota not
supported' when user tried to mount non-journaled quota. Also fix typo
in the message.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We should not allow user to change quota mount options when quota is
just suspended. It would make mount options and internal quota state
inconsistent. Also we should not allow user to change quota format when
quota is turned on. On the other hand we can just silently ignore when
some option is set to the value it already has (mount does this on
remount).
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Do the following series of operations on a CIFS share:
opendir(dir)
readdir(dir)
unlink(file in dir)
rewinddir(dir)
readdir(dir)
If the readdir read all entries in the directory this will make CIFS throw an error like this:
CIFS VFS: Send error in FindClose = -9
CIFS requests "Close at end of search" of the server by setting this bit when issuing FindFirst or FindNext. Therefore when all search entries are returned, the server may return "end of search" and close the search implicitly when this bit is set by the client on the request. We check for this when a readdir is explicitly closed - but when the client notices that a directory has changed after the last operation, we attempt to close the directory before reopening by reissuing a second FindFirst. But, the directory may already been implicitly closed (due to end of search) because the first readdir finished. So we only want to issue a FindClose call in this case when we don't expect it to already be closed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
bdevname() fills the buffer that it is given as a parameter, so calling
strcpy() or snprintf() on the returned value is redundant (and probably not
guaranteed to work - I don't think strcpy and snprintf support overlapping
buffers.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prior to 2.6.26 fuse only supported single page write requests. In theory all
fuse filesystem should be able support bigger than 4k writes, as there's
nothing in the API to prevent it. Unfortunately there's a known case in
NTFS-3G where big writes cause filesystem corruption. There could also be
other filesystems, where the lack of testing with big write requests would
result in bugs.
To prevent such problems on a kernel upgrade, disable big writes by default,
but let filesystems set a flag to turn it on.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Szabolcs Szakacsits <szaka@ntfs-3g.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When mm destruction happens, we should pass mm_update_next_owner() the old mm.
But unfortunately new mm is passed in exec_mmap().
Thus, kernel panic is possible when a multi-threaded process uses exec().
Also, the owner member comment description is wrong. mm->owner does not
necessarily point to the thread group leader.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: "Paul Menage" <menage@google.com>
Cc: "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix an oops with a corrupted hfs+ image.
See http://bugzilla.kernel.org/show_bug.cgi?id=10548 for details.
Problem is that we call hfs_btree_open() from hfsplus_fill_super() to set
HFSPLUS_SB(sb).[ext_tree|cat_tree] Both trees are still NULL at this moment.
If hfs_btree_open() fails for any reason it calls iput() on the page, which
gets to hfsplus_releasepage() which tries to access HFSPLUS_SB(sb).* which is
still NULL and oopses while dereferencing it.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is currently no way to query the bounding set of another task. As there
appears to be no security reason not to, and as Michael Kerrisk points out the
following valid reasons to do so exist:
* consistency (I can see all of the other per-thread/process sets in
/proc/.../status)
* debugging -- I could imagine that it would make the job of debugging an
application that uses capabilities a little simpler.
this patch adds the bounding set to /proc/self/status right after the
effective set.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes, vfs_quota_off() is called on a partially set up super block (for
example when fill_super() fails for some reason). In such cases we cannot
call ->sync_fs() because it can Oops because of not properly filled in super
block. So in case we find there's not quota to turn off, we just skip
everything and return which fixes the above problem.
[akpm@linux-foundation.org: fxi tpyo]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dget(dentry->d_parent) --> dget_parent(dentry)
unlock_parent() is racy and unnecessary. Replace single caller with
unlock_dir().
There are several other suspect uses of ->d_parent in ecryptfs...
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's no reason for the _kern in hppfs_kern.c, so move it to hppfs.c.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hppfs tidying and fixes noticed during hch's get_inode work -
style fixes
a copy_to_user got its return value checked
hppfs_write no longer fiddles file->f_pos because it gets and
returns pos in its arguments
hppfs_delete_inode dputs the underlyng procfs dentry stored in
its private data and mntputs the vfsmnt stashed in s_fs_info
hppfs_put_super no longer needs to mntput the s_fs_info, so it
no longer needs to exist
hppfs_readlink and hppfs_follow_link were doing a bunch of stuff
with a struct file which they didn't use
there is now a ->permission which calls generic_permission
get_inode was always returning 0 for some reason - it now
returns an inode if nothing bad happened
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
all other codepaths in this function return negative values on errors
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When CIFSFindNext gets back an -EBADF from a call, it sets the return
code of the function to 0 and eventually exits. Doing this makes the
cleanup at the end of the function skip freeing the SMB buffer, so
we need to make sure we free the buffer explicitly when doing this.
If we don't you end up with errors like this when unplugging the cifs
kernel module:
slab error in kmem_cache_destroy(): cache `cifs_request': Can't free all objects
[<c046bdbf>] kmem_cache_destroy+0x61/0xf3
[<e0f03045>] cifs_destroy_request_bufs+0x14/0x28 [cifs]
[<e0f2016e>] exit_cifs+0x1e/0x80 [cifs]
[<c043aeae>] sys_delete_module+0x192/0x1b8
[<c04451fd>] audit_syscall_entry+0x14b/0x17d
[<c0405413>] syscall_call+0x7/0xb
=======================
Signed-off-by: Jeff Layton <jlayton@redhat.com>
when unix extensions and cifsacl support are disabled. These
permissions changes are "ephemeral" however. They are lost whenever
a share is mounted and unmounted, or when memory pressure forces
the inode out of the cache.
Because of this, we'd like to introduce a behavior change to make
CIFS behave more like local DOS/Windows filesystems. When unix
extensions and cifsacl support aren't enabled, then don't silently
ignore changes to permission bits that can't be reflected on the
server.
Still, there may be people relying on the current behavior for
certain applications. This patch adds a new "dynperm" (and a
corresponding "nodynperm") mount option that will be intended
to make the client fall back to legacy behavior when setting
these modes.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] don't allow demultiplex thread to exit until kthread_stop is called
[CIFS] when not using unix extensions, check for and set ATTR_READONLY on create and mkdir
[CIFS] add local struct inode pointer to cifs_setattr
[CIFS] cifs_find_tcp_session cleanup
strlcpy is faster than snprintf when you don't use the returned value.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes bz 444829 where allocating a new block caused gfs2 file systems to
report 0 bytes used in df. It was caused by a broken cast from an unsigned int
in gfs2_block_alloc() to a negative s64 in gfs2_statfs_change(). This patch
casts the unsigned int to an s64 before the unary minus is applied.
Signed-off-by: Andrew Price <andy@andrewprice.me.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes a GFS2 filesystem consistency error reported from
function do_strip. The problem was caused by a timing window
that allowed two vfs inodes to be created in memory that point
to the same file. The problem is fixed by making the vfs's
iget_test, iget_set mechanism check and set a new bit in the
in-core gfs2_inode structure while the vfs inode spin_lock is held.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
It acts exactly like a regular 'cond_resched()', but will not get
optimized away when CONFIG_PREEMPT is set.
Normal kernel code is already preemptable in the presense of
CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
02b67cc3ba "sched: do not do
cond_resched() when CONFIG_PREEMPT").
But when wanting to conditionally reschedule while holding a lock, you
need to use "cond_sched_lock(lock)", and the new function is the BKL
equivalent of that.
Also make fs/locks.c use it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cifs_demultiplex_thread can exit under several conditions:
1) if it's signaled
2) if there's a problem with session setup
3) if kthread_stop is called on it
The first two are problems. If kthread_stop is called on the thread,
there is no guarantee that it will still be up. We need to have the
thread stay up until kthread_stop is called on it.
One option would be to not even try to tear things down until after
kthread_stop is called. However, in the case where there is a problem
setting up the session, there's no real reason to try continuing the
loop.
This patch allows the thread to clean up and prepare for exit under all
three conditions, but it has the thread go to sleep until kthread_stop
is called. This allows us to simplify the shutdown code somewhat since
we can be reasonably sure that the thread won't exit after being
signaled but before kthread_stop is called.
It also removes the places where the thread itself set the tsk variable
since it appeared that it could have a potential race where the thread
might never be shut down.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When creating a directory on a CIFS share without POSIX extensions,
and the given mode has no write bits set, set the ATTR_READONLY bit.
When creating a file, set ATTR_READONLY if the create mode has no write
bits set and we're not using unix extensions.
There are some comments about this being problematic due to the VFS
splitting creates into 2 parts. I'm not sure what that's actually
talking about, but I'm assuming that it has something to do with how
mknod is implemented. In the simple case where we have no unix
extensions and we're just creating a regular file, there's no reason
we can't set ATTR_READONLY.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Clean up cifs_setattr a bit by adding a local inode pointer, and
changing all of the direntry->d_inode references to it. This also adds a
bit of micro-optimization. d_inode shouldn't change over the life of
this function, so we only need to dereference it once.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch cleans up cifs_find_tcp_session so it become
less indented. Also the error of skipping IPv6 matched
addresses fixed.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] fix build warning
[CIFS] Fixed build warning in is_ip
[CIFS] cleanup cifsd completion
[CIFS] Remove over-indented code in find_unc().
[CIFS] fix typo
[CIFS] Remove duplicate call to mode_to_acl
[CIFS] convert usage of implicit booleans to bool
[CIFS] fixed compatibility issue with samba refferal request
[CIFS] Fix statfs formatting
[CIFS] Adds to dns_resolver checking if the server name is an IP addr and skipping upcall in this case.
[CIFS] Fix spelling mistake
[CIFS] Update cifs version number