* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
NFS: remove needless check in nfs_opendir()
NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
NFS: make 2 functions static
NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
NFS: fix PROC_FS=n compile error
VFS: Fix another open intent Oops
RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
Local variable res was initialized to 0 - no check needed here.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Convert a for-loop that explicitly references "NR_CPUS" into the
potentially more efficient for_each_possible_cpu() construct.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the call to nfs_intent_set_file() fails to open a file in
nfs4_proc_create(), we should return an error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: fixup writeout path after ->map changes
[PATCH] splice: offset fixes
[PATCH] tee: link_pipe() must be careful when dropping one of the pipe locks
[PATCH] splice: cleanup the SPLICE_F_NONBLOCK handling
[PATCH] splice: close i_size truncate races on read
There are places in the kernel where we look up files in fd tables and
access the file structure without holding refereces to the file. So, we
need special care to avoid the race between looking up files in the fd
table and tearing down of the file in another CPU. Otherwise, one might
see a NULL f_dentry or such torn down version of the file. This patch
fixes those special places where such a race may happen.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In kernel bugzilla #6248 (http://bugzilla.kernel.org/show_bug.cgi?id=6248),
Adrian Bunk <bunk@stusta.de> notes that CONFIG_HUGETLBFS is missing Kconfig
help text.
Signed-off-by: Arthur Othieno <apgo@patchbomb.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.
We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.
prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since ->map() no longer locks the page, we need to adjust the handling
of those pages (and stealing) a little. This now passes full regressions
again.
Signed-off-by: Jens Axboe <axboe@suse.de>
- We need to adjust *ppos for writes as well.
- Copy back modified offset value if one was passed in, similar to
what sendfile does.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need to ensure that we only drop a lock that is ordered last, to avoid
ABBA deadlocks with competing processes.
Signed-off-by: Jens Axboe <axboe@suse.de>
- generic_file_splice_read() more readable and correct
- Don't bail on page allocation with NONBLOCK set, just don't allow
direct blocking on IO (eg lock_page).
Signed-off-by: Jens Axboe <axboe@suse.de>
Came up through a quick grep for other cases similar to the ftruncate()
one in commit 0a489cb3b6.
Also, add a comment, so that people who read the code understand why we
do what looks like a no-op.
(Again, this won't actually matter to any sane user, since libc will
save and restore the register gcc stomps on, but it's still wrong to
stomp on it)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Gcc thinks it owns the incoming argument stack, but that's not true for
"asmlinkage" functions, and it corrupts the caller-set-up argument stack
when it pushes the third argument onto the stack. Which can result in
%ebx getting corrupted in user space.
Now, normally nobody sane would ever notice, since libc will save and
restore %ebx anyway over the system call, but it's still wrong.
I'd much rather have "asmlinkage" tell gcc directly that it doesn't own
the stack, but no such attribute exists, so we're stuck with our hacky
manual "prevent_tail_call()" macro once more (we've had the same issue
before with sys_waitpid() and sys_wait4()).
Thanks to Hans-Werner Hilse <hilse@sub.uni-goettingen.de> for reporting
the issue and testing the fix.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As noted further on the this file, some block devices have a / in their
name, so fix the "block:..." symlink name the same as the /sys/block name.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[BLOCK] delay all uevents until partition table is scanned
Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.
We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mszeredi/fuse:
[fuse] Direct I/O should not use fuse_reset_request
[fuse] Don't init request twice
[fuse] Fix accounting the number of waiting requests
[fuse] fix deadlock between fuse_put_super() and request_end()
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
This is two distinct changes.
- Not changing our real parents.
- Not changing our ptrace parents.
Not changing our real parents is trivially correct because both tasks
have the same real parents as they are part of a thread group. Now that
we demote the leader to a thread there is no longer any reason to change
it's parentage.
Not changing our ptrace parents is a user visible change if someone
looks hard enough. I don't think user space applications will care or
even notice.
In the practical and I think common case a debugger will have attached
to all of the threads using the same ptrace flags. From my quick skim
of strace and gdb that appears to be the case. Which if true means
debuggers will not notice a change.
Before this point we have already generated a ptrace event in do_exit
that reports the leaders pid has died so de_thread is visible to a
debugger. Which means attempting to hide this case by copying flags
around appears excessive.
By not doing anything it avoids all of the weird locking issues between
de_thread and ptrace attach, and removes one case from consideration for
fixing the ptrace locking.
This only addresses Oleg's first concern with ptrace_attach, that of the
problems caused by reparenting. Oleg's second concern is essentially a
race between ptrace_attach and release_task that causes an oops when we
get to force_sig_specific. There is nothing special about de_thread
with respect to that race.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's cleaner to allocate a new request, otherwise the uid/gid/pid
fields of the request won't be filled in.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Properly accounting the number of waiting requests was forgotten in
"clean up request accounting" patch.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
A deadlock was possible, when the last reference to the superblock was
held due to a background request containing a file reference.
Releasing the file would release the vfsmount which in turn would
release the superblock. Since sbput_sem is held during the fput() and
fuse_put_super() tries to acquire this same semaphore, a deadlock
results.
The chosen soltuion is to get rid of sbput_sem, and instead use the
spinlock to ensure the referenced inodes/file are released only once.
Since the actual release may sleep, defer these outside the locked
region, but using local variables instead of the structure members.
This is a much more rubust solution.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vfs: add splice_write and splice_read to documentation
[PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
[PATCH] splice: warning fix
[PATCH] another round of fs/pipe.c cleanups
[PATCH] splice: comment styles
[PATCH] splice: add Ingo as addition copyright holder
[PATCH] splice: unlikely() optimizations
[PATCH] splice: speedups and optimizations
[PATCH] pipe.c/fifo.c code cleanups
[PATCH] get rid of the PIPE_*() macros
[PATCH] splice: speedup __generic_file_splice_read
[PATCH] splice: add direct fd <-> fd splicing support
[PATCH] splice: add optional input and output offsets
[PATCH] introduce a "kernel-internal pipe object" abstraction
[PATCH] splice: be smarter about calling do_page_cache_readahead()
[PATCH] splice: optimize the splice buffer mapping
[PATCH] splice: cleanup __generic_file_splice_read()
[PATCH] splice: only call wake_up_interruptible() when we really have to
[PATCH] splice: potential !page dereference
[PATCH] splice: mark the io page as accessed
Keep unused openowners around for at least one lease period, to avoid the need
for as many open confirmations and to allow handing out more delegations.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's very easy for the server to DOS itself by just giving out too many
delegations.
For now we just solve the problem with a dumb hard limit. Eventually we'll
want a smarter policy.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should be shutting down rpciod for the callback channel when we shut down
the server.
Also note that we do rpciod_up() and create the callback client *before*
setting cb_set--the cb_set only determines whether the initial null was
succesful. So cb_set is not a reliable determiner of whether we need to clean
up, only cb_client is.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some obvious cleanup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to make sure the laundromat work doesn't reschedule itself just when
we try to cancel it. Also, we shouldn't be waiting for it to finish running
while holding the state lock, as that's a potential deadlock.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix corruption on readdir encoding with 64k pages.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In v4 we grab an extra page just for the padding of returned data. The
formula that the rpc server uses to allocate pages for the response doesn't
take into account this extra page.
Instead of adjusting those formulae, we adopt the same solution as v2 and v3,
and put the "tail" data in the same page as the "head" data.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since nfsd_setuser() is already called from any operation that uses the
current filehandle (because it's called from fh_verify), there's no reason to
call it from putrootfh.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In addition to setting the processes filesystem id's, nfsd_setuser also
modifies the value of the rq_cred which stores the id's that originally came
from the rpc call, for example to reflect root squashing.
There's no real reason to do that--the only case where rqstp->rq_cred is
actually used later on is in the NFSv4 SETCLIENTID/SETCLIENTID_CONFIRM
operations, and there the results are the opposite of what we want--those two
operations don't deal with the filesystem at all, they only record the
credentials used with the rpc call for later reference (so that we may require
the same credentials be used on later operations), and the credentials
shouldn't vary just because there was or wasn't a previous operation in the
compound that referred to some export
This fixes a bug which caused mounts from Solaris clients to fail.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Export a directory that does not exist:
exportfs -orw,fsid=0,insecure,no_subtree_check client:/home/NFS4
Try to mount from client with nfs4. Mount hangs (I'm not sure why -
that's another issue).
While client is hung, back on server
mkdir /home/NFS4
The server panics in dput. I traced the problem back to svc_export_parse()
calling path_release() even though path_lookup() failed (it happens to fill in
the nameidata structure with a negative dentry - so the test after out:
succeeds).
After patching, an recreating the problem, the client mount still takes some
time before finally exiting with a message "couldn't read superblock".
Here is a simple patch to resolve this issue:
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should be using the length from the second vfs_getxattr, in case it
changed. (Note: there's still a small race here; we could end up returning
-ENOMEM if the length increased between the first and second call. I don't
know whether it's worth spending a lot of effort to fix that.)
This makes XFS ACLs usable on NFS exports, which they currently aren't, since
XFS appears to be returning a too-large value for vfs_getxattr() when it's
passed a NULL buffer. So there's probably an XFS bug here too, though since
getxattr with a NULL buffer is usually used to decide how much memory to
allocate, it may be a fairly harmless bug in most cases.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're returning -1 in a few places in the NFSv4<->POSIX acl translation code
where we could return a reasonable error.
Also allows some minor simplification elsewhere.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
this fixes coverity id #3. Coverity detected dead code, since the == -1
comparison only returns 0 or 1 to error. Therefore the if ( error < 0 )
statement was always false. Seems that this was an if( error = nfs4... )
statement some time ago, which got broken during cleanup.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the fl_lmops field to identify which locks are ours, instead of trying to
look them up in our private hash. This is safer and more efficient.
Earlier versions of this patch used a lock flag instead, but Trond pointed out
that adding a new flag for each lock manager wasn't going to scale well, and
suggested this approach instead; a separate patch converts lockd to using
fl_lmops in the same way.
In the NFSv4 case this looks like a bit of a hack, since the NFSv4 server
isn't currently actually defining a lock_manager_operations struct, so we end
up defining one *just* to serve as a cookie to identify our locks.
But it works, and we actually do expect to start using the
lock_manager_operations at some point anyway.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFSd makes sure there is enough space to hold the maximum possible reply
before accepting a request. The units for this maximum is (4byte) words.
However in three places, particularly for read request, the number given is
a number of bytes.
This means too much space is reserved which is slightly wasteful.
This is the sort of patch that could uncover a deeper bug, and it is not
critical, so it would be best for it to spend a while in -mm before going
in to mainline.
(akpm: target 2.6.17-rc2, 2.6.16.3 (approx))
Discovered-by: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous patch removed limiting the number of outstanding requests. This
patch adds a much simpler limiting, that is also compatible with file locking
operations.
A task may have at most one synchronous request allocated. So these requests
need not be otherwise limited.
However the number of background requests (release, forget, asynchronous
reads, interrupted requests) can grow indefinitely. This can be used by a
malicous user to cause FUSE to allocate arbitrary amounts of unswappable
kernel memory, denying service.
For this reason add a limit for the number of background requests, and block
allocations of new requests until the number goes bellow the limit.
Also use this mechanism to block all requests until the INIT reply is
received.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
FUSE allocated most requests from a fixed size pool filled at mount time.
However in some cases (release/forget) non-pool requests were used. File
locking operations aren't well served by the request pool, since they may
block indefinetly thus exhausting the pool.
This patch removes the request pool and always allocates requests on demand.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Return consistent error values for the case when the opened device file has no
mount associated yet.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>