Could cause hangs on smp systems in i_size_read on a cifs inode
whose size has been previously simultaneously updated from
different processes.
Thanks to Brian Wang for some great testing/debugging on this
hard problem.
Fixes kernel bugzilla #7903
CC: Shirish Pargoankar <shirishp@us.ibm.com>
CC: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] One line missing from previous commit
[CIFS] mtime bounces from local to remote when cifs nocmtime i_flags overwritten
[CIFS] fix &&/& typo in cifs_setattr()
Pointers to user data should be marked with a __user hint. This one is
missing.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
affs wants to truncate the inode when the last user goes away, currently it
does that through a potentially racy i_count check in ->put_inode. But we
already have a method that's called just after the we dropped the last
reference, ->drop_inode. This patch implements affs_drop_inode to take
advantage of this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This problem was identified and fixed some time ago by Jeff Moyer but it fell
through the cracks somehow.
It is possible that a user space application could remove and re-create a
directory during a request. To avoid returning a failure from lookup
incorrectly when our current dentry is unhashed we need to check if another
positive, hashed dentry matching this one exists and if so return it instead
of a fail.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer has identified a race between mount and expire.
What happens is that during an expire the situation can arise that a directory
is removed and another lookup is done before the expire issues a completion
status to the kernel module. In this case, since the the lookup gets a new
dentry, it doesn't know that there is an expire in progress and when it posts
its mount request, matches the existing expire request and waits for its
completion. ENOENT is then returned to user space from lookup (as the dentry
passed in is now unhashed) without having performed the mount request.
The solution used here is to keep track of dentrys in this unhashed state and
reuse them, if possible, in order to preserve the flags. Additionally, this
infrastructure will provide the framework for the reintroduction of caching of
mount fails removed earlier in development.
Signed-off-by: Ian Kent <raven@themaw.net>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current header file definitions for autofs version 5 have caused a couple
of problems for application builds downstream.
This fixes the problem by separating the definitions.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nobh_prepare_write leaks data similarly to how simple_prepare_write did. Fix
by not marking the page uptodate until nobh_commit_write time. Again, this
could break weird use-cases, but none appear to exist in the tree.
We can safely remove the set_page_dirty, because as the comment says,
nobh_commit_write does set_page_dirty. If a filesystem wants to allocate
backing store for a page dirtied via mmap, page_mkwrite is the suggested
approach.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
simple_prepare_write leaks uninitialised kernel data. This happens because
the it leaves an uninitialised "hole" over the part of the page that the
write is expected to go to. This is fine, but it then marks the page
uptodate, which means a concurrent read can come in and copy the
uninitialised memory into userspace before it written to.
Fix it by simply marking it uptodate in simple_commit_write instead, after
the hole has been filled in. This could theoretically break an fs that
uses simple_prepare_write and not simple_commit_write, and that relies on
the incorrect simple_prepare_write behaviour. Luckily, none of those
exists in the tree.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the DIO write on FAT is expanding the size, it will be fail by -EINVAL,
because FAT can't handle it now.
This patch fallback it to the normal buffered-write and would return
success.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew noticed that unlocking the page before submitting all buffers for
writeout could cause problems if the IO completes before we've finished
messing around with the page buffers, and they subsequently get freed.
Even if there were no bug, it is a good idea to bring the error case
into line with the common case here.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several people have reported failures in dynamic major device number handling
due to the recent changes in there to avoid handing out the local/experimental
majors.
Rolf reports that this is due to a gcc-4.1.0 bug.
The patch refactors that code a lot in an attempt to provoke the compiler into
behaving.
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to type confusion, when an nfsacl verison 2 'ACCESS' request
finishes and tries to clean up, it calls fh_put on entiredly the
wrong thing and this can cause an oops.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A user-specified get_nlinks may depend on other inode attributes.
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: implement optional loose read cache
9p: Use kthread_stop instead of sending a SIGKILL.
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
fs/jffs2/wbuf.c: In function 'jffs2_check_oob_empty':
fs/jffs2/wbuf.c:993: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
fs/jffs2/wbuf.c:993: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
fs/jffs2/wbuf.c: In function 'jffs2_check_nand_cleanmarker':
fs/jffs2/wbuf.c:1036: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
fs/jffs2/wbuf.c:1036: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
fs/jffs2/wbuf.c: In function 'jffs2_write_nand_cleanmarker':
fs/jffs2/wbuf.c:1062: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
fs/jffs2/wbuf.c:1062: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
While cacheing is generally frowned upon in the 9p world, it has its
place -- particularly in situations where the remote file system is
exclusive and/or read-only. The vacfs views of venti content addressable
store are a real-world instance of such a situation. To facilitate higher
performance for these workloads (and eventually use the fscache patches),
we have enabled a "loose" cache mode which does not attempt to maintain
any form of consistency on the page-cache or dcache. This results in over
two orders of magnitude performance improvement for cacheable block reads
in the Bonnie benchmark. The more aggressive use of the dcache also seems
to improve metadata operational performance.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Since the kthread api does not bump the reference count on
processes that tracked it is not safe allow user space to
kill the threads, as I still retain a pointer to the task_struct.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Provide an audit record of the descriptor pair returned by pipe() and
socketpair(). Rewritten from the original posted to linux-audit by
John D. Ramsdell <ramsdell@mitre.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Fix the various misspellings of "agressive", as well as a couple
other things on the same lines while we're there.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
atime flag was also overwritten. Noticed by Shirish when he was debugging
an atime problem. Should help performance a bit too.
cifs should be getting time stamps from the server (that was the original
intent too)
Signed-off-by: Steve French <sfrench@us.ibm.com>
Just mention which error will be returned if debugfs is disabled. Callers
should be able to figure out themselves what they need to check.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
debugfs: implement symbolic links
Implement a new function debugfs_create_symlink() which can be used
to create symbolic links in debugfs. This function can be useful
for people moving functionality from /proc to debugfs (e.g. the
gcov-kernel patch).
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Here is a patch that removes all redundant kobject_unregister argument checks.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add format specifier %d for uid in ecryptfs_printk
Signed-off-by: Thomas Hisch <t.hisch@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eCryptfs is gobbling a lot of stack in ecryptfs_generate_key_packet_set()
because it allocates a temporary memory-hungry ecryptfs_key_record struct.
This patch introduces a new kmem_cache for that struct and converts
ecryptfs_generate_key_packet_set() to use it.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When setting an ACL that lacks inheritable ACEs on a directory, we should set
a default ACL of zero length, not a default ACL with all bits denied.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're inserting deny's between some ACEs in order to enforce posix draft acl
semantics which prevent permissions from accumulating across entries in an
acl.
That's fine, but we're doing that by inserting a deny after *every* allow,
which is overkill. We shouldn't be adding them in places where they actually
make no difference.
Also replaced some helper functions for creating acl entries; I prefer just
assigning directly to the struct fields--it takes a few more lines, but the
field names provide some documentation that I think makes the result easier
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return just the effective permissions, and forget about the mask. It isn't
worth the complexity.
WARNING: This breaks backwards compatibility with overly-picky nfsv4->posix
acl translation, as may has been included in some patched versions of libacl.
To our knowledge no such version was every distributed by anyone outside citi.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.
Also fix a comment.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The wrong pointer is being kfree'd in savemem() when defer_free returns with
an error.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify the memory management and code a bit by representing acls with an
array instead of a linked list.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code that splits an incoming nfsv4 ACL into inheritable and effective
parts can be combined with the the code that translates each to a posix acl,
resulting in simpler code that requires one less pass through the ACL.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The rfc allows us to be more permissive about the ACL inheritance bits we
accept:
"If the server supports a single "inherit ACE" flag that applies to
both files and directories, the server may reject the request
(i.e., requiring the client to set both the file and directory
inheritance flags). The server may also accept the request and
silently turn on the ACE4_DIRECTORY_INHERIT_ACE flag."
Let's take the latter option--the ACL is a complex attribute that could be
rejected for a wide variety of reasons, and the protocol gives us little
ability to explain the reason for the rejection, so erroring out is a
user-unfriendly last resort.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The server name is expected to be a null-terminated string, so we can't pass
in the raw client identifier.
What's more, the client identifier is just a binary, not necessarily
printable, blob. Let's just use the ip address instead. The server name
appears to exist just to help debugging by making some printk's more
informative.
Note that the string is copies into the rpc client structure, so the pointer
to the local variable does not outlive the function call.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If prepare_write or commit_write return AOP_TRUNCATED_PAGE we jump to
"retry" label and than if find_or_create_page() failed function return
incorrect error code.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>