To handle the earlier bogus ENOSPC error caused by filesystem full of block
reservation, current code falls back to non block reservation, starts to
allocate block(s) from the goal allocation block group as if there is no
block reservation.
Current code needs to re-load the corresponding block group descriptor for
the initial goal block group in this case. The patch fixes this.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mounting an ext2 filesystem with zero s_inodes_per_group will cause a
divide error.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mounting a (corrupt) minix filesystem with zero s_zmap_blocks
gives a spectacular crash on my 2.6.17.8 system, no doubt
because minix/inode.c does an unconditional
minix_set_bit(0,sbi->s_zmap[0]->b_data);
[akpm@osdl.org: make labels conistent while we're there]
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Wed, 2006-08-09 at 07:57 +0200, Rolf Eike Beer wrote:
> =============================================
> [ INFO: possible recursive locking detected ]
> ---------------------------------------------
> parted/7929 is trying to acquire lock:
> (&bdev->bd_mutex){--..}, at: [<c105eb8d>] __blkdev_put+0x1e/0x13c
>
> but task is already holding lock:
> (&bdev->bd_mutex){--..}, at: [<c105eec6>] do_open+0x72/0x3a8
>
> other info that might help us debug this:
> 1 lock held by parted/7929:
> #0: (&bdev->bd_mutex){--..}, at: [<c105eec6>] do_open+0x72/0x3a8
> stack backtrace:
> [<c1003aad>] show_trace_log_lvl+0x58/0x15b
> [<c100495f>] show_trace+0xd/0x10
> [<c1004979>] dump_stack+0x17/0x1a
> [<c102dee5>] __lock_acquire+0x753/0x99c
> [<c102e3b0>] lock_acquire+0x4a/0x6a
> [<c1204501>] mutex_lock_nested+0xc8/0x20c
> [<c105eb8d>] __blkdev_put+0x1e/0x13c
> [<c105ecc4>] blkdev_put+0xa/0xc
> [<c105f18a>] do_open+0x336/0x3a8
> [<c105f21b>] blkdev_open+0x1f/0x4c
> [<c1057b40>] __dentry_open+0xc7/0x1aa
> [<c1057c91>] nameidata_to_filp+0x1c/0x2e
> [<c1057cd1>] do_filp_open+0x2e/0x35
> [<c1057dd7>] do_sys_open+0x38/0x68
> [<c1057e33>] sys_open+0x16/0x18
> [<c1002845>] sysenter_past_esp+0x56/0x8d
OK, I'm having a look here; its all new to me so bear with me.
blkdev_open() calls
do_open(bdev, ...,BD_MUTEX_NORMAL) and takes
mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_NORMAL)
then something fails, and we're thrown to:
out_first: where
if (bdev != bdev->bd_contains)
blkdev_put(bdev->bd_contains) which is
__blkdev_put(bdev->bd_contains, BD_MUTEX_NORMAL) which does
mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_NORMAL) <--- lockdep trigger
When going to out_first, dbev->bd_contains is either bdev or whole, and
since we take the branch it must be whole. So it seems to me the
following patch would be the right one:
[akpm@osdl.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current sun disklabel code uses a signed int for the sector count.
When partitions larger than 1 TB are used, the cast to a sector_t causes
the partition sizes to be invalid:
# cat /proc/paritions | grep sdan
66 112 2146435072 sdan
66 115 9223372036853660736 sdan3
66 120 9223372036853660736 sdan8
This patch switches the sector count to an unsigned int to fix this.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check in open_exec() for inode->i_mode & 0111 has been made
redundant by the fix to permission().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 1d3741c5d991686699f100b65b9956f7ee7ae0ae commit)
The check in prepare_binfmt() for inode->i_mode & 0111 is redundant,
since open_exec() will already have done that.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 822dec482ced07af32c378cd936d77345786572b commit)
Currently, the access() call will return incorrect information on NFS if
there exists an ACL that grants execute access to the user on a regular
file. The reason the information is incorrect is that the VFS overrides
this execute access in open_exec() by checking (inode->i_mode & 0111).
This patch propagates the VFS execute bit check back into the generic
permission() call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 64cbae98848c4c99851cb0a405f0b4982cd76c1e commit)
This is needed in order to handle any NFS4ERR_DELAY errors that might be
returned by the server. It also ensures that we map the NFSv4 errors before
they are returned to userland.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 71c12b3f0abc7501f6ed231a6d17bc9c05a238dc commit)
Check the bounds of length specifiers more thoroughly in the XDR decoding of
NFS4 readdir reply data.
Currently, if the server returns a bitmap or attr length that causes the
current decode point pointer to wrap, this could go undetected (consider a
small "negative" length on a 32-bit machine).
Also add a check into the main XDR decode handler to make sure that the amount
of data is a multiple of four bytes (as specified by RFC-1014). This makes
sure that we can do u32* pointer subtraction in the NFS client without risking
an undefined result (the result is undefined if the pointers are not correctly
aligned with respect to one another).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 5861fddd64a7eaf7e8b1a9997455a24e7f688092 commit)
The problem is that we may be caching writes that would extend the file and
create a hole in the region that we are reading. In this case, we need to
detect the eof from the server, ensure that we zero out the pages that
are part of the hole and mark them as up to date.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 856b603b01b99146918c093969b6cb1b1b0f1c01 commit)
nlm_traverse_files() is not allowed to hold the nlm_file_mutex while calling
nlm_inspect file, since it may end up calling nlm_release_file() when
releaseing the blocks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from e558d3cde986e04f68afe8c790ad68ef4b94587a commit)
rpc_unlink() and rpc_rmdir() will dput the dentry reference for you.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from a05a57effa71a1f67ccbfc52335c10c8b85f3f6a commit)
I'm trying to speeding up mkdir(2) for network file systems. A typical
mkdir(2) calls two inode_operations: lookup and mkdir. The lookup
operation would fail with ENOENT in common case. I think it is unnecessary
because the subsequent mkdir operation can check it. In case of creat(2),
lookup operation is called with the LOOKUP_CREATE flag, so individual
filesystem can omit real lookup. e.g. nfs_lookup().
Here is a sample patch which uses LOOKUP_CREATE and O_EXCL on mkdir,
symlink and mknod. This uses the gadget for creat(2).
And here is the result of a benchmark on NFSv3.
mkdir(2) 10,000 times:
original 50.5 sec
patched 29.0 sec
Signed-off-by: ASANO Masahiro <masano@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from fab7bf44449b29f9d5572a5dd8adcf7c91d5bf0f commit)
nfs_wb_page() waits on request completion and, as a result, is not safe to be
called from nfs_release_page() invoked by VM scanner as part of GFP_NOFS
allocation. Fix possible deadlock by analyzing gfp mask and refusing to
release page if __GFP_FS is not set.
Signed-off-by: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 374d969debfb290bafcb41d28918dc6f7e43ce31 commit)
UDF code is not really ready to handle extents larger that 1GB. This is
the easy way to forbid creating those.
Also truncation code did not count with the case when there are no
extents in the file and we are extending the file.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe.
current_io_context() initializes "struct io_context", then sets ->io_context.
set_task_ioprio() running on another cpu may see the changes out of order, so
->set_ioprio(ioc) may use io_context which was not initialized properly.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
From include/linux/sched.h:
* Careful: do_each_thread/while_each_thread is a double loop so
* 'break' will not work as expected - use goto instead.
*/
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch removes some obvious dead code spotted by the Coverity
checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
le16 compared to host-endian constant
u8 fed to le32_to_cpu()
le16 compared to host-endian constant
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <sfrench@us.ibm.com>
fcntl(F_SETSIG) no longer works on leases because
lease_release_private_callback() gets called as the lease is copied in
order to initialise it.
The problem is that lease_alloc() performs an unnecessary initialisation,
which sets the lease_manager_ops. Avoid the problem by allocating the
target lease structure using locks_alloc_lock().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't let fuse_readpages leave the @pages list not empty when exiting
on error.
[akpm@osdl.org: kernel-doc fixes]
Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric says:
> I saw an oops down this path when trying to create a new file on a UDF
> filesystem which was internally marked as readonly, but mounted rw:
>
> udf_create
> udf_new_inode
> new_inode
> alloc_inode
> udf_alloc_inode
> udf_new_block
> returns EIO due to readonlyness
> iput (on error)
I ran into the same issue today, but when listing a directory with
invalid/corrupt entries:
udf_lookup
udf_iget
get_new_inode_fast
alloc_inode
udf_alloc_inode
__udf_read_inode
fails for any reason
iput (on error)
...
The following patch to udf_alloc_inode() should take care of both (and
other similar) cases, but I've only tested it with udf_lookup().
Signed-off-by: Dan Bastone <dan@pwienterprises.com>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't use NULL as a printf control string. Fixes bug #6889.
Cc: Ralph Corderoy <ralph@inputplus.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call. TODO - restrict this to servers
that support NT_STATUS codes (Win9x will probably
not support this call).
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 570d4d2d895569825d0d017d4e76b51138f68864 commit)
Make cifsd allow us to suspend if it has lost the connection with a server
Ref: http://bugzilla.kernel.org/show_bug.cgi?id=6811
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 27bd6cd87b0ada66515ad49bc346d77d1e9d3e05 commit)
Although harmless, we were sometimes treating midState like it contained
flags but they are exclusive states, and this makes that more clear.
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 586c057c3a68dd6ae0f3ba94fbf76798b1558074 commit)
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from b33a3f55e54fd210fc043eafcf83728b03bc9e02 commit)
request and do not time out slow requests to a server that is still responding
well to other threads
Suggested by jra of Samba team
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 89b57148115479eef074b8d3f86c4c86c96ac969 commit)
We recently fixed an out-of-space deadlock in XFS, and part of that fix
involved the addition of the XFS_ALLOC_FLAG_FREEING flag to some of the
space allocator calls to indicate they're freeing space, not allocating
it. There was a missed xfs_alloc_fix_freelist condition test that did not
correctly test "flags". The same test would also test an uninitialised
structure field (args->userdata) and depending on its value either would
or would not return early with a critical buffer pointer set to NULL.
This fixes that up, adds asserts to several places to catch future botches
of this nature, and skips sections of xfs_alloc_fix_freelist that are
irrelevent for the space-freeing case.
SGI-PV: 955303
SGI-Modid: xfs-linux-melb:xfs-kern:26743a
Signed-off-by: Nathan Scott <nathans@sgi.com>
Record the most recently used allocation group on the allocation context, so
that subsequent allocations can attempt to optimize for contiguousness.
Local alloc especially should benefit from this as the current chain search
tends to let it spew across the disk.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Try to catch corrupted group descriptors with some stronger checks placed in
a couple of strategic locations. Detect a failed resizefs and refuse to
allocate past what bitmap i_clusters allows.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We were storing cluster count on the ocfs2_super structure, but never
actually using it so remove that. Also, we don't want to populate the
uptodate cache with the unlocked block read - it is technically safe as is,
but we should change it for correctness.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch removes the unused EXPORT_SYMBOL_GPL(dlm_migrate_lockres).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
If a process requests a lock cancel but the lock has been remotely granted
already then there is no need to send the cancel message.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This can race with other ast notification, which can cause bad status values
to propagate into the unlock ast.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Properly ignore LVB flags during a PR downconvert. This avoids an illegal
lvb update.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
I saw an oops down this path when trying to create a new file on a UDF
filesystem which was internally marked as readonly, but mounted rw:
udf_create
udf_new_inode
new_inode
alloc_inode
udf_alloc_inode
udf_new_block
returns EIO due to readonlyness
iput (on error)
udf_put_inode
udf_discard_prealloc
udf_next_aext
udf_current_aext
udf_get_fileshortad
OOPS
the udf_discard_prealloc() path was examining uninitialized fields of the
udf inode.
udf_discard_prealloc() already has this code to short-circuit the discard
path if no extents are preallocated:
if (UDF_I_ALLOCTYPE(inode) == ICBTAG_FLAG_AD_IN_ICB ||
inode->i_size == UDF_I_LENEXTENTS(inode))
{
return;
}
so if we initialize UDF_I_LENEXTENTS(inode) = 0 earlier in udf_new_inode,
we won't try to free the (not) preallocated blocks, since this will match
the i_size = 0 set when the inode was initialized.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs_write_full_page does zero bytes in the file past eof, but it may
call get_block on those buffers as well. On machines where the page size
is larger than the blocksize, this can result in mmaped files incorrectly
growing up to a block boundary during writepage.
The fix is to avoid calling get_block for any blocks that are entirely past
eof
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In bugzilla #6941, Jens Kilian reported:
"The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the
end of a block of memory allocated via kmalloc(), leading to memory
corruption. This happens only for filenames which are pure ASCII and a
multiple of 4 bytes in length. [...]
Without DEBUG_SLAB, this leads to further corruption and hard lockups; I
believe this is the bug which has made kernels later than 2.6.8 unusable
for me. (This must be due to changes in memory management, the bug has
been in the BeFS driver since the time it was introduced (AFAICT).)
Steps to reproduce:
Create a directory (in BeOS, naturally :-) with files named, e.g.,
"1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find""
This patch implements the suggested fix. Credits to Jens Kilian for
debugging the problem and finding the right fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Cc: Jens Kilian <jjk@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs_get_locked_page is called twice in ufs code, one time in ufs_truncate
path(we allocated last block), and another time when fragments are
reallocated. In ideal world in the second case on allocation/free block
layer we should not know that things like `truncate' exists, but now with
such crutch like ufs_get_locked_page we can (or should?) skip truncated
pages.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As discussed earlier:
http://lkml.org/lkml/2006/6/28/136
this patch fixes such issue:
`ufs_get_locked_page' takes page from cache
after that `vmtruncate' takes page and deletes it from cache
`ufs_get_locked_page' locks page, and reports about EIO error.
Also because of find_lock_page always return valid page or NULL, we have no
need to check it if page not NULL.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We never actually set the b_done field any more; it's always zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from af8412d4283ef91356e65e0ed9b025b376aebded commit)
nfs_writedata_free() and nfs_readdata_free() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)
In one of the error paths of nfs_path, it may return with dcache_lock still
held; fix this by adding and using a new error path Elong_unlock which unlocks
dcache_lock.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from f4b90b43677fb23297c56802c3056fc304f988d9 commit)
When an object is created via a symlink into an audited directory, audit misses
the event due to not having collected the inode data for the directory. Modify
__audit_inode_child() to copy the parent inode data if a parent wasn't found in
audit_names[].
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When the specified path is an existing file or when it is a symlink, audit
collects the wrong inode number, which causes it to miss the open() event.
Adding a second hook to the open() path fixes this.
Also add audit_copy_inode() to consolidate some code.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Based on a bug report from Russ Ross <russruss@gmail.com>
According to the spec:
"The remove request asks the file server both to remove the file
represented by fid and to clunk the fid, even if the remove fails."
but the Linux client seems to expect the fid to be valid after a failed
remove attempt. Specifically, I'm getting this behavior when attempting to
remove a non-empty directory.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Russ Ross <russross@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For files other than IFREG, nobh option doesn't make sense. Modifications
to them are journalled and needs buffer heads to do that. Without this
patch, we get kernel oops in page_buffers().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 7b2fd697427e73c81d5fa659efd91bd07d303b0e in the historical GIT tree
stopped calling the readdir member of a file_operations struct with the big
kernel lock held, and fixed up all the readdir functions to do their own
locking. However, that change added calls to unlock_kernel() in
vxfs_readdir, but no call to lock_kernel(). Fix this by adding a call to
lock_kernel().
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is entirely possible (though rare) that jiffies half-wraps around, while a
dentry/inode remains in the cache. This could mean that the dentry/inode is
not invalidated for another half wraparound-time.
To get around this problem, use 64-bit jiffies. The only problem with this is
that dentry->d_time is 32 bits on 32-bit archs. So use d_fsdata as the high
32 bits. This is an ugly hack, but far simpler, than having to allocate
private data just for this purpose.
Since 64-bit jiffies can be assumed never to wrap around, simple comparison
can be used, and a zero time value can represent "invalid".
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An attribute and entry timeout of zero should mean, that the entity is
invalidated immediately after the operation. Previously invalidation only
happened at the next clock tick.
Reported and tested by Craig Davies.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs_symlink, in one of its error paths, calls unlock_kernel without ever
having called lock_kernel(); fix this by creating and jumping to a new
label out_notlocked rather than the out label used after calling
lock_kernel().
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If efs_symlink_readpage hits the -ENAMETOOLONG error path, it will call
unlock_kernel without ever having called lock_kernel(); fix this by
creating and jumping to a new label fail_notlocked rather than the fail
label used after calling lock_kernel().
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 398c53a757702e1e3a7a2c24860c7ad26acb53ed (in the historical GIT
tree) moved the lock_kernel() in coda_open after the allocation of a
coda_file_info struct, but left an unlock_kernel() in the allocation
failure error path; remove it.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a real deadlock, a nice complex one:
(warning: long explanation follows so that Andrew can have a complete
patch description)
it's an ABCDA deadlock:
A iprune_mutex
B inode->inotify_mutex
C ih->mutex
D dev->ev_mutex
The AB relationship comes straight from invalidate_inodes()
int invalidate_inodes(struct super_block * sb)
{
int busy;
LIST_HEAD(throw_away);
mutex_lock(&iprune_mutex);
spin_lock(&inode_lock);
inotify_unmount_inodes(&sb->s_inodes);
where inotify_umount_inodes() takes the
mutex_lock(&inode->inotify_mutex);
The BC relationship comes directly from inotify_find_update_watch():
s32 inotify_find_update_watch(struct inotify_handle *ih, struct inode *inode,
u32 mask)
{
...
mutex_lock(&inode->inotify_mutex);
mutex_lock(&ih->mutex);
The CD relationship comes from inotify_rm_wd:
inotify_rm_wd does
mutex_lock(&inode->inotify_mutex);
mutex_lock(&ih->mutex)
and then calls inotify_remove_watch_locked() which calls
notify_dev_queue_event() which does
mutex_lock(&dev->ev_mutex);
(this strictly is a BCD relationship)
The DA relationship comes from the most interesting part:
[<ffffffff8022d9f2>] shrink_icache_memory+0x42/0x270
[<ffffffff80240dc4>] shrink_slab+0x11d/0x1c9
[<ffffffff802b5104>] try_to_free_pages+0x187/0x244
[<ffffffff8020efed>] __alloc_pages+0x1cd/0x2e0
[<ffffffff8025e1f8>] cache_alloc_refill+0x3f8/0x821
[<ffffffff8020a5e5>] kmem_cache_alloc+0x85/0xcb
[<ffffffff802db027>] kernel_event+0x2e/0x122
[<ffffffff8021d61c>] inotify_dev_queue_event+0xcc/0x140
inotify_dev_queue_event schedules a kernel_event which does a
kmem_cache_alloc( , GFP_KERNEL) which may try to shrink slabs, including
the inode cache .. which then takes iprune_mutex.
And voila, there is an AB, a BC, a CD relationship (even a direct BCD),
and also now a DA relationship -> a circular type AB-BA deadlock but
involving 4 locks.
The solution is simple: kernel_event() is NOT allowed to use GFP_KERNEL,
but must use GFP_NOFS to not cause recursion into the VFS.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable mac partition table support per default also for a powermac config.
Signed-off-by: Olaf Hering <olh@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can immediately bail from invalidate_bdev() if the blockdev has no
pagecache.
This solves the huge IPI storms which hald is causing on the big ia64
machines when it polls CDROM drives.
Acked-by: Jes Sorensen <jes@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent commit (7fc90ec93a) moved the
call to nfsd_setuser out of the 'find a dentry for a filehandle' branch
of fh_verify so that it would always be called.
This had the unfortunately side-effect of moving *after* the call to
decode_fh, so the prober fsuid was not set when nfsd_acceptable was called,
the 'permission' check did the wrong thing.
This patch moves the nfsd_setuser call back where it was, and add as call
in the other branch of the if.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.
So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.
[akpm@osdl.org: fix off-by-one error]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: <stable@kernel.org>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
flags from iclog buffers before submitting them for writing.
SGI-PV: 954772
SGI-Modid: xfs-linux-melb:xfs-kern:26605a
Signed-off-by: Nathan Scott <nathans@sgi.com>
Before putting them into struct statfs they should be endian-swapped.
SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:26550a
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nathan Scott <nathans@sgi.com>
jfs_quota_read/write are very near duplicates of ext2_quota_read/write.
Cleaned up jfs_get_block as long as I had to change it to be non-static.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.
The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..
This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Export I/O delays seen by a task through /proc/<tgid>/stats for use in top
etc.
Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
together with all other I/O here (this is not the case in the netlink
interface where the swapin I/O is kept distinct)
[akpm@osdl.org: printk warning fix]
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On systems with block devices containing a slash (virtual dasd, cciss,
etc), reiserfs will fail to initialize /proc/fs/reiserfs/<dev> due to it
being interpreted as a subdirectory. The generic block device code changes
the / to ! for use in the sysfs tree. This patch uses that convention.
Tested by making dm devices use dm/<number> rather than dm-<number>
[akpm@osdl.org: name variables consistently]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.16 leaks like hell. While testing, I found massive leakage
(reproduced in openvz) in:
*filp
*size-4096
And 1 object leaks in
*size-32
*size-64
*size-128
It is the fix for the first one. filp leaks in the bowels of namei.c.
Seems, size-4096 is file table leaking in expand_fdtables.
I have no idea what are the rest and why they show only accompanying
another leaks. Some debugging structs?
[akpm@osdl.org, Trond: remove the IS_ERR() check]
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We have a bad interaction with both the kernel and user space being able
to change some of the /proc file status. This fixes the most obvious
part of it, but I expect we'll also make it harder for users to modify
even their "own" files in /proc.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're supposed to go the next power of two if nfds==nr.
Of `nr', not of `nfsd'.
Spotted by Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address a potential 'larger than buffer size' memory access by
clear_user(). Without this patch, this call to clear_user() can attempt to
clear too many (tsz) bytes resulting in a wrong (-EFAULT) return code by
read_kcore().
Signed-off-by: Adam B. Jerome <abj@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs has a different i_mutex lock order behavior for i_mutex than the
other filesystems; sysfs i_mutex is called in many places with subsystem
locks held. At the same time, many of the VFS locking rules do not apply
to sysfs at all (cross directory rename for example). To untangle this
mess (which gives false positives in lockdep), we're giving sysfs inodes
their own class for i_mutex.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When found, it is obvious. nfds calculated when allocating fdsets is
rewritten by calculation of size of fdtable, and when we are unlucky, we
try to free fdsets of wrong size.
Found due to OpenVZ resource management (User Beancounters).
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an nfs4 operations count array to nfsd_stats structure. The count is
incremented in nfsd4_proc_compound() where all the operations are handled
by the nfsv4 server. This count of individual nfsv4 operations is also
entered into /proc filesystem.
Signed-off-by: Shankar Anand<shanand@novell.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These functions no longer exist; remove their declarations.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a fairly obvious infinite loop in there.
Also, use roundup_pow_of_two() rather than open-coding stuff.
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add coredump capability for the ELF-FDPIC binfmt.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.
[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adjust the ELF-FDPIC binfmt driver to conform much more to the CodingStyle,
silly though it may be.
Further changes:
(*) Drop the casts to long for addresses in kdebug() statements (they're
unsigned long already).
(*) Use extra variables to avoid expressions longer than 80 chars by splitting
the statement into multiple statements and letting the compiler optimise
them back together.
(*) Eliminate duplicate call of ksize() when working out how much space was
actually allocated for the stack.
(*) Discard the commented-out load_shlib prototype and op pointer as this will
not be supported in ELF-FDPIC for the foreseeable future.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix execution through the FDPIC binfmt of programs stored on ramfs by
preventing the ramfs mmap() returning successfully on a private mapping of
a ramfs file. This causes NOMMU mmap to make a copy of the mapped portion
of the file and map that instead.
This could be improved by granting direct mapping access to read-only
private mappings for which the data is stored on a contiguous run of pages.
However, this is only likely to be the case if the file was extended with
truncate before being written.
ramfs is left to map the file directly for shared mappings so that SYSV IPC
and POSIX shared memory both still work.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix FDPIC compile errors.
(akpm: we suspect it fixes a warning)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes, applications need below call to be successful although
"/mnt/hugepages/file1" doesn't exist.
fd = open("/mnt/hugepages/file1", O_CREAT|O_RDWR, 0755);
*addr = mmap(NULL, 0x1024*1024*256, PROT_NONE, 0, fd, 0);
As for regular pages (or files), above call does work, but as for huge
pages, above call would fail because hugetlbfs_file_mmap would fail if
(!(vma->vm_flags & VM_WRITE) && len > inode->i_size).
This capability on huge page is useful on ia64 when the process wants to
protect one area on region 4, so other threads couldn't read/write this
area. A famous JVM (Java Virtual Machine) implementation on IA64 needs the
capability.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hugh@veritas.com>
[ Expand-on-mmap semantics again... this time matching normal fs's. wli ]
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch marks an unused export as EXPORT_UNUSED_SYMBOL.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the partition code in fs/partitions/check.c to initialize a newly
detected partition's policy field with that of the containing block device
(see patch below).
My reasoning is that function set_disk_ro() in block/genhd.c modifies the
policy field (read-only indicator) of a disk and all contained partitions.
When a partition is detected after the call to set_disk_ro(), the policy
field of this partition will currently not inherit the disk's policy field.
This behavior poses a problem in cases where a block device can be
'logically de- and reactivated' like e.g. the s390 DASD driver because
partition detection may run after the policy field has been modified.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Makes-sense-to: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When write() extends a file(i_size is increased) and fsync() is called,
change of inode must be written to journaling area through fsync().
But,currently the i_trans_id is not correctly updated when i_size is
increased. So fsync() does not kick the journal writer.
Reiserfs_file_write() already updates the transaction when blocks are
allocated, but the case when i_size increases and new blocks are not added
is not correctly treated.
Following patch fix this bug.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Cc: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several issues noticed/fixed:
- We cannot reliably block in link_pipe() while holding both input and output
mutexes. So do preparatory checks before locking down both mutexes and doing
the link.
- The ipipe->nrbufs vs i check was bad, because we could have dropped the
ipipe lock in-between. This causes us to potentially look at unknown
buffers if we were racing with someone else reading this pipe.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch makes the needlessly global jffs2_obsolete_node_frag()
static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In the case when compiling via a symlink tree, we want to ensure that the
close-to-open GETATTR call is applied only to the final file, and not to
the symlink.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The introduction of the FLUSH_INVALIDATE argument to nfs_sync_inode_wait()
does not clear the nr_unstable page state counter for pages that are being
released.
Also fix a longstanding similar bug when nfs_commit_list() fails.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use FL_ACCESS flag to test and/or wait for local locks before we try
requesting a lock from the server
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the new behaviour of {flock,posix}_file_lock(F_UNLCK) to determine if
we held a lock, and only send the RPC request to the server if this was the
case.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change posix_lock_file_conf(), and flock_lock_file() so that if called
with an F_UNLCK argument, and the FL_EXISTS flag they will indicate
whether or not any locks were actually freed by returning 0 or -ENOENT.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.
Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator.
Effects on non-lockdep kernels:
- the introduction of the following function variants:
extern struct block_device *open_partition_by_devnum(dev_t, unsigned);
extern int blkdev_put_partition(struct block_device *);
static int
blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);
which on non-lockdep are the same as open_by_devnum(), blkdev_put()
and blkdev_get().
- a subclass parameter to do_open(). [unused on non-lockdep]
- a subclass parameter to __blkdev_put(), which is a new internal
function for the main blkdev_put*() functions. [parameter unused
on non-lockdep kernels, except for two sanity check WARN_ON()s]
these functions carry no semantical difference - they only express
object dependencies towards the lockdep subsystem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The s_umount rwsem needs to be classified as per-superblock since it's
perfectly legit to keep multiple of those recursively in the VFS locking
rules.
Has no effect on non-lockdep kernels.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (per-filesystem) locking code to the lock validator.
Minimal effect on non-lockdep kernels: one extra parameter to alloc_super().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The quota code plays interesting games with the lock ordering; to quote Jan:
| i_mutex of inode containing quota file is acquired after all other
| quota locks. i_mutex of all other inodes is acquired before quota
| locks. Quota code makes sure (by resetting inode operations and
| setting special flag on inode) that noone tries to enter quota code
| while holding i_mutex on a quota file...
The good news is that all of this special case i_mutex grabbing happens in the
(per filesystem) low level quota write function. For this special case we
need a new I_MUTEX_* nesting level, since this just entirely outside any of
the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
this is not ever going to deadlock; and based on that the patch below is what
it takes to inform lockdep of these very interesting new locking rules.
The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
deepest possible level of nesting for i_mutex, and that this only should be
used in quota write (and possibly read) function of filesystems. This makes
the lock ordering of the I_MUTEX_* levels:
I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA
Has no effect on non-lockdep kernels.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NTFS uses lots of type-opaque objects which acquire their true identity
runtime - so the lock validator needs to be helped in a couple of places to
figure out object types.
Many thanks to Anton Altaparmakov for giving lots of explanations about NTFS
locking rules.
Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (rwsem-in-irq) locking code to the lock validator. Has no
effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Locking init improvement:
- introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
to pass in the name string of locks, used by debugging
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix check for bad address; use macro instead of open-coding two checks.
Taken from RHEL4 kernel update.
From: Ernie Petrides <petrides@redhat.com>
For background, the BAD_ADDR() macro should return TRUE if the address is
TASK_SIZE, because that's the lowest address that is *not* valid for
user-space mappings. The macro was correct in binfmt_aout.c but was wrong
for the "equal to" case in binfmt_elf.c. There were two in-line validations
of user-space addresses in binfmt_elf.c, which have been appropriately
converted to use the corrected BAD_ADDR() macro in the patch you posted
yesterday. Note that the size checks against TASK_SIZE are okay as coded.
The additional changes that I propose are below. These are in the error
paths for bad ELF entry addresses once load_elf_binary() has already
committed to exec'ing the new image (following the tearing down of the
task's original address space).
The 1st hunk deals with the interp-side of the outer "if". There were two
problems here. The printk() should be removed because this path can be
triggered at will by a bogus interpreter image created and used by a
malicious user. Further, the error code should not be ENOEXEC, because that
causes the loop in search_binary_handler() to continue trying other exec
handlers (twice, in fact). But it's too late for this to work correctly,
because the user address space has already been torn down, and an exec()
failure cannot be returned to the user code because the code no longer
exists. The only recovery is to force a SIGSEGV, but it's best to terminate
the search loop immediately. I somewhat arbitrarily chose EINVAL as a
fallback error code, but any error returned by load_elf_interp() will
override that (but this value will never be seen by user-space).
The 2nd hunk deals with the non-interp-side of the outer "if". There were
two problems here as well. The SIGSEGV needs to be forced, because a prior
sigaction() syscall might have set the associated disposition to SIG_IGN.
And the ENOEXEC should be changed to EINVAL as described above.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a bug in fs/nfs which makes it impossible to build nfs
without having procfs enabled.
Signed-off-by: Dominik Hackl <dominik@hackl.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
jffs2_clear_acl() which releases acl caches allocated by kmalloc()
was defined but it was never called. Thus, we faced to the risk
of memory leaking.
This patch plugs jffs2_clear_acl() into jffs2_do_clear_inode().
It ensures to release acl cache when inode is cleared.
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Reiserfs does not update ctime and mtime on expanding truncate via
truncate(). This patch fixes it.
Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Cc: Hans Reiser <reiser@namesys.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes buggy behaviour of UFS
in such kind of scenario:
open(, O_TRUNC...)
ftruncate(, 1024)
ftruncate(, 0)
Such a scenario causes ufs_panic and remount read-only. This happen
because of according to specification UFS should always allocate block for
last byte, and many parts of our implementation rely on this, but
`ufs_truncate' doesn't care about this.
To make possible return error code and to know about old size, this patch
removes `truncate' from ufs inode_operations and uses `setattr' method to
call ufs_truncate.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
case so that the wrapping code will get copies of the read data instead of
real page cache pages. This makes life simpler when we encrypt the response.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since nfsv4 actually keeps around the file descriptors it gets from open
(instead of just using them for a single read or write operation), we need to
make sure that we can do RDWR opens and not just RDONLY/WRONLY.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These tests always returned true; clearly that wasn't what was intended.
In keeping with kernel style, make them functions instead of macros while
we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the event that lookup_one_len() fails in nfsd_link(), fh_unlock() is
skipped and locks are held overlong.
Patch was tested on 2.6.17-rc2 by causing lookup_one_len() to fail and
verifying that fh_unlock() gets called appropriately.
Signed-off-by: David M. Richter <richterd@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're checking nfs_in_grace here a few times when there isn't really any
reason to--bad_stateid is probably the more sensible return value anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the typical v2/v3 case the only new filehandles used as arguments to
operations are filehandles taken directly off the wire, which don't get
dentries until fh_verify() is called.
But in v4 the filehandles that are arguments to operations were often created
by previous operations (putrootfh, lookup, etc.) using fh_compose, which sets
the dentry in the filehandle without calling nfsd_setuser().
This also means that, for example, if filesystem B is mounted on filesystem A,
and filesystem A is exported without root-squashing, then a client can bypass
the rootsquashing on B using a compound that starts at a filehandle in A,
crosses into B using lookups, and then does stuff in B.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix an improper unlock in an error path.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd tries to return to a client the same sort of filehandle as was used by
the client. This removes some filehandle aliasing issues and means that a
server upgrade followed by a downgrade will not confused clients not restarted
during that time.
However when crossing a mountpoint, the filehandle used for one filesystem
doesn't provide any useful information on what sort of filehandle should be
used on the other, and can provide misleading information. So if the
reference filehandle is on a different filesystem to the one being generated,
ignore it.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a perfectly valid situation where fh_update gets called on an already
uptodate filehandle - in nfsd_create_v3 where a CREATE_UNCHECKED finds an
existing file and wants to just set the size.
We could possible optimise out the call in that case, but the only harm
involved is that fh_update prints a warning, so it is easier to remove the
warning.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Type '3' is used for the fsid in filehandles when the device number of the
device holding the filesystem has more than 8 bits in either major or minor.
Unfortunately expkey_parse doesn't recognise type 3. Fix this.
(Slighty modified from Frank's original)
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>