There are race issues around ext[34] xattr block release code.
ext[34]_xattr_release_block() checks the reference count of xattr block
(h_refcount) and frees that xattr block if it is the last one reference it.
Unlike ext2, the check of this counter is unprotected by any lock.
ext[34]_xattr_release_block() will free the mb_cache entry before freeing
that xattr block. There is a small window between the check for the re
h_refcount ==1 and the call to mb_cache_entry_free(). During this small
window another inode might find this xattr block from the mbcache and reuse
it, racing a refcount updates. The xattr block will later be freed by the
first inode without notice other inode is still use it. Later if that
block is reallocated as a datablock for other file, then more serious
problem might happen.
We need put a lock around places checking the refount as well to avoid
racing issue. Another place need this kind of protection is in
ext3_xattr_block_set(), where it will modify the xattr block content in-
the-fly if the refcount is 1 (means it's the only inode reference it).
This will also fix another issue: the xattr block may not get freed at all
if no lock is to protect the refcount check at the release time. It is
possible that the last two inodes could release the shared xattr block at
the same time. But both of them think they are not the last one so only
decreased the h_refcount without freeing xattr block at all.
We need to call lock_buffer() after ext3_journal_get_write_access() to
avoid deadlock (because the later will call lock_buffer()/unlock_buffer
() as well).
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitriy Monakhov wrote:
> if path_lookup() return non zero code we don't have to worry about
> 'nd' parameter, but ecryptfs_read_super does path_release(&nd) after
> path_lookup has failed, and dentry counter becomes negative
Do not do a path_release after a path_lookup error.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
O_LARGEFILE should be set here when opening the lower file.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eCryptfs lower file handling code has several issues:
- Retval from prepare_write()/commit_write() wasn't checked to equality
to AOP_TRUNCATED_PAGE.
- In some places page wasn't unmapped and unlocked after error.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
Revert "Driver core: let request_module() send a /sys/modules/kmod/-uevent"
Driver core: fix error by cleanup up symlinks properly
make kernel/kmod.c:kmod_mk static
power management: fix struct layout and docs
power management: no valid states w/o pm_ops
Driver core: more fallout from class_device changes for pcmcia
sysfs: move struct sysfs_dirent to private header
driver core: refcounting fix
Driver core: remove class_device_rename
This patch (as860) adds two new sysfs routines:
sysfs_add_file_to_group() and sysfs_remove_file_from_group().
A later patch adds code that uses the new routines.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
struct sysfs_dirent is private to the fs/sysfs/ subtree. It is
not even referenced as an opaque structure outside of that subtree.
The following patch moves the declaration from include/linux/sysfs.h to
fs/sysfs/sysfs.h, making it clearer that nothing else in the kernel
dereferences it.
I have been running this patch for years. Please integrate and forward
upstream if there are no objections.
From: "Adam J. Richter" <adam@yggdrasil.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] One line missing from previous commit
[CIFS] mtime bounces from local to remote when cifs nocmtime i_flags overwritten
[CIFS] fix &&/& typo in cifs_setattr()
Pointers to user data should be marked with a __user hint. This one is
missing.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
affs wants to truncate the inode when the last user goes away, currently it
does that through a potentially racy i_count check in ->put_inode. But we
already have a method that's called just after the we dropped the last
reference, ->drop_inode. This patch implements affs_drop_inode to take
advantage of this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This problem was identified and fixed some time ago by Jeff Moyer but it fell
through the cracks somehow.
It is possible that a user space application could remove and re-create a
directory during a request. To avoid returning a failure from lookup
incorrectly when our current dentry is unhashed we need to check if another
positive, hashed dentry matching this one exists and if so return it instead
of a fail.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer has identified a race between mount and expire.
What happens is that during an expire the situation can arise that a directory
is removed and another lookup is done before the expire issues a completion
status to the kernel module. In this case, since the the lookup gets a new
dentry, it doesn't know that there is an expire in progress and when it posts
its mount request, matches the existing expire request and waits for its
completion. ENOENT is then returned to user space from lookup (as the dentry
passed in is now unhashed) without having performed the mount request.
The solution used here is to keep track of dentrys in this unhashed state and
reuse them, if possible, in order to preserve the flags. Additionally, this
infrastructure will provide the framework for the reintroduction of caching of
mount fails removed earlier in development.
Signed-off-by: Ian Kent <raven@themaw.net>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current header file definitions for autofs version 5 have caused a couple
of problems for application builds downstream.
This fixes the problem by separating the definitions.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nobh_prepare_write leaks data similarly to how simple_prepare_write did. Fix
by not marking the page uptodate until nobh_commit_write time. Again, this
could break weird use-cases, but none appear to exist in the tree.
We can safely remove the set_page_dirty, because as the comment says,
nobh_commit_write does set_page_dirty. If a filesystem wants to allocate
backing store for a page dirtied via mmap, page_mkwrite is the suggested
approach.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
simple_prepare_write leaks uninitialised kernel data. This happens because
the it leaves an uninitialised "hole" over the part of the page that the
write is expected to go to. This is fine, but it then marks the page
uptodate, which means a concurrent read can come in and copy the
uninitialised memory into userspace before it written to.
Fix it by simply marking it uptodate in simple_commit_write instead, after
the hole has been filled in. This could theoretically break an fs that
uses simple_prepare_write and not simple_commit_write, and that relies on
the incorrect simple_prepare_write behaviour. Luckily, none of those
exists in the tree.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the DIO write on FAT is expanding the size, it will be fail by -EINVAL,
because FAT can't handle it now.
This patch fallback it to the normal buffered-write and would return
success.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew noticed that unlocking the page before submitting all buffers for
writeout could cause problems if the IO completes before we've finished
messing around with the page buffers, and they subsequently get freed.
Even if there were no bug, it is a good idea to bring the error case
into line with the common case here.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several people have reported failures in dynamic major device number handling
due to the recent changes in there to avoid handing out the local/experimental
majors.
Rolf reports that this is due to a gcc-4.1.0 bug.
The patch refactors that code a lot in an attempt to provoke the compiler into
behaving.
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to type confusion, when an nfsacl verison 2 'ACCESS' request
finishes and tries to clean up, it calls fh_put on entiredly the
wrong thing and this can cause an oops.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A user-specified get_nlinks may depend on other inode attributes.
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: implement optional loose read cache
9p: Use kthread_stop instead of sending a SIGKILL.
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
fs/jffs2/wbuf.c: In function 'jffs2_check_oob_empty':
fs/jffs2/wbuf.c:993: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
fs/jffs2/wbuf.c:993: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
fs/jffs2/wbuf.c: In function 'jffs2_check_nand_cleanmarker':
fs/jffs2/wbuf.c:1036: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
fs/jffs2/wbuf.c:1036: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
fs/jffs2/wbuf.c: In function 'jffs2_write_nand_cleanmarker':
fs/jffs2/wbuf.c:1062: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
fs/jffs2/wbuf.c:1062: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
While cacheing is generally frowned upon in the 9p world, it has its
place -- particularly in situations where the remote file system is
exclusive and/or read-only. The vacfs views of venti content addressable
store are a real-world instance of such a situation. To facilitate higher
performance for these workloads (and eventually use the fscache patches),
we have enabled a "loose" cache mode which does not attempt to maintain
any form of consistency on the page-cache or dcache. This results in over
two orders of magnitude performance improvement for cacheable block reads
in the Bonnie benchmark. The more aggressive use of the dcache also seems
to improve metadata operational performance.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Since the kthread api does not bump the reference count on
processes that tracked it is not safe allow user space to
kill the threads, as I still retain a pointer to the task_struct.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Provide an audit record of the descriptor pair returned by pipe() and
socketpair(). Rewritten from the original posted to linux-audit by
John D. Ramsdell <ramsdell@mitre.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Fix the various misspellings of "agressive", as well as a couple
other things on the same lines while we're there.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
atime flag was also overwritten. Noticed by Shirish when he was debugging
an atime problem. Should help performance a bit too.
cifs should be getting time stamps from the server (that was the original
intent too)
Signed-off-by: Steve French <sfrench@us.ibm.com>
Just mention which error will be returned if debugfs is disabled. Callers
should be able to figure out themselves what they need to check.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
debugfs: implement symbolic links
Implement a new function debugfs_create_symlink() which can be used
to create symbolic links in debugfs. This function can be useful
for people moving functionality from /proc to debugfs (e.g. the
gcov-kernel patch).
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Here is a patch that removes all redundant kobject_unregister argument checks.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add format specifier %d for uid in ecryptfs_printk
Signed-off-by: Thomas Hisch <t.hisch@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eCryptfs is gobbling a lot of stack in ecryptfs_generate_key_packet_set()
because it allocates a temporary memory-hungry ecryptfs_key_record struct.
This patch introduces a new kmem_cache for that struct and converts
ecryptfs_generate_key_packet_set() to use it.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When setting an ACL that lacks inheritable ACEs on a directory, we should set
a default ACL of zero length, not a default ACL with all bits denied.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're inserting deny's between some ACEs in order to enforce posix draft acl
semantics which prevent permissions from accumulating across entries in an
acl.
That's fine, but we're doing that by inserting a deny after *every* allow,
which is overkill. We shouldn't be adding them in places where they actually
make no difference.
Also replaced some helper functions for creating acl entries; I prefer just
assigning directly to the struct fields--it takes a few more lines, but the
field names provide some documentation that I think makes the result easier
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>