* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Replace flush_workqueue() with cancel_work_sync() and friends
NFS: Replace flush_scheduled_work with cancel_work_sync() and friends
SUNRPC: Don't call gss_delete_sec_context() from an rcu context
NFSv4: Don't call put_rpccred() from an rcu callback
NFS: Fix NFSv4 open stateid regressions
NFSv4: Fix a locking regression in nfs4_set_mode_locked()
NFS: Fix put_nfs_open_context
SUNRPC: Fix a race in rpciod_down()
Do not allow cached open for O_RDONLY or O_WRONLY unless the file has been
previously opened in these modes.
Also Fix the calculation of the mode in nfs4_close_prepare. We should only
issue an OPEN_DOWNGRADE if we're sure that we will still be holding the
correct open modes. This may not be the case if we've been doing delegated
opens.
Finally, there is no need to adjust the open mode bit flags in
nfs4_close_done(): that has already been done in nfs4_close_prepare().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We don't really need to clear &state->inode_states inside
nfs4_set_mode_locked, and doing so without holding the inode->i_lock would
in any case be a bug...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need to grab the inode->i_lock atomically with the last reference put in
order to remove the open context that is being freed from the
nfsi->open_files list.
Fix by converting the kref to a standard atomic counter and then using
atomic_dec_and_lock()...
Thanks to Arnd Bergmann for pointing out the problem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch removes some duplicated wireless ioctl entries in the array
'struct ioctl_trans ioctl_start[]' of fs/compat_ioctl.c
These entries are registered twice like:
COMPATIBLE_IOCTL(SIOCGIWPRIV)
and
HANDLE_IOCTL(SIOCGIWPRIV, do_wireless_ioctl)
Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Debugging the hardware problems in OLPC trac #1905 would be a whole lot
easier if the correct node offsets were printed for the offending nodes.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The try_to_freeze() call was in the wrong place; we need it in the
signal-pending loop now that a pending freeze also makes
signal_pending() return true.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
jffs2_add_physical_node_ref() should never really return error -- it's
an internal debugging check which triggered. We really need to work out
why and stop it happening. But in the meantime, let's make the failure
mode a little less nasty.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch fix weird behaviour of UDF mounting procedure. To get UID
changed (for now) we have to type
mount -t udf -o uid=some_user,uid=ignore /dev/device /mnt/moun_point
and specifying two uid at once is strange a bit. So with the patch we are
able to mount without additional 'uid=ignore' option. The same for GID
option is done.
This patch will not break current mount scheme (with two option).
Btw this does fix (I hope) the following
[BUG 6124] mount of UDF fs ignores UID and GID options
http://bugzilla.kernel.org/show_bug.cgi?id=6124
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael <auslands-kv@gmx.de>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it a little more clear that this is the default implementation for
the setleast operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is possible that another process could acquire a new file lease right
after break_lease() is called during a truncate, but before lease-granting
is disabled by the subsequent get_write_access(). Merely switching the
order of the break_lease() and get_write_access() calls prevents this race.
Signed-off-by: David M. Richter <richterd@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turned out that mounting a corrupted ISO image to a regular file may
succeed, e.g. if an image was prepared as follows:
$ dd if=correct.iso of=bad.iso bs=4k count=8
We then can mount it to a regular file:
# mount -o loop -t iso9660 bad.iso /tmp/file
But mounting it to a directory fails with -ENOTDIR, simply because
the root directory inode doesn't have S_IFDIR set and the condition
in graft_tree() is met:
if (S_ISDIR(nd->dentry->d_inode->i_mode) !=
S_ISDIR(mnt->mnt_root->d_inode->i_mode))
return -ENOTDIR
This is because the root directory inode was read from an incorrect
block. It's supposed to be read from sbi->s_firstdatazone, which is
an absolute value and gets messed up in the case of an incorrect image.
In order to somehow circumvent this we have to check that the root
directory inode is actually a directory after all.
Signed-off-by: Kirill Kuvaldin <kuvkir@epsmu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix file locking for AFS:
(*) Start the lock manager thread under a mutex to avoid a race.
(*) Made the locking non-fair: New readlocks will jump pending writelocks if
there's a readlock currently granted on a file. This makes the behaviour
similar to Linux's VFS locking.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A succesful downcall with a negative result (which indicates that the given
filesystem is not exported to the given user) should not return an error.
Currently mountd is depending on stdio to write these downcalls. With some
versions of libc this appears to cause subsequent writes to attempt to write
all accumulated data (for which writes previously failed) along with any new
data. This can prevent the kernel from seeing responses to later downcalls.
Symptoms will be that nfsd fails to respond to certain requests.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We shouldn't be using negative uid's and gid's in the idmap upcalls.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RFC 3530 says:
If the server uses an attribute to store the exclusive create verifier, it
will signify which attribute by setting the appropriate bit in the attribute
mask that is returned in the results.
Linux uses the atime and mtime to store the verifier, but sends a zeroed out
bitmask back to the client. This patch makes sure that we set the correct
bits in the bitmask in this situation.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yan Zheng wrote:
> I think I found a bug in ext4/extents.c, "ext4_ext_put_in_cache" uses
> "__u32" to receive physical block number. "ext4_ext_put_in_cache" is
> used in "ext4_ext_get_blocks", it sets ext4 inode's extent cache
> according most recently tree lookup (higher 16 bits of saved physical
> block number are always zero). when serving a mapping request,
> "ext4_ext_get_blocks" first check whether the logical block is in
> inode's extent cache. if the logical block is in the cache and the
> cached region isn't a gap, "ext4_ext_get_blocks" gets physical block
> number by using cached region's physical block number and offset in
> the cached region. as described above, "ext4_ext_get_blocks" may
> return wrong result when there are physical block numbers bigger than
> 0xffffffff.
>
You are right. Thanks for reporting this!
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Yan Zheng <yanzheng@21cn.com>
Cc: <stable@kernel.org>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the SYSV IPC SHM to work with the changes applied by the new fault handler
patches when CONFIG_MMU=n.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are handled in a ->compat_ioctl() handler, so it's just noise
when compat_ioctl.c warns which occurs when they are used on non-SBUS
framebuffer devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
Start doing VTOC validation before using its contents.
The validation is adjusted so as not to break existing setups
that do not set the VTOC version, sanity and partition count entries.
VTOC tables with more than 8 partitions will NOT be used.
Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct the Solaris x86 number of partitions (slices) is a way that is
backward compatible with the earlier size.
This works without a new VTOC structure definition as the timestamp
and v_asciilabel fields in the VTOC are not used by the kernel yet.
Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is important to only provide the compat_ioctl method
if the downstream de->proc_fops does too, otherwise this
utterly confuses the logic in fs/compat_ioctl.c and we
end up doing the wrong thing.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
docbook: add pipes, other fixes
blktrace: use cpu_clock() instead of sched_clock()
bsg: Fix build for CONFIG_BLOCK=n
[patch] QUEUE_FLAG_READFULL QUEUE_FLAG_WRITEFULL comment fix
b716395e2b added code to handle
a compatability issue with 32bit quota tools, but the new compat
routines are only needed when CONFIG_COMPAT=y (and with this set
to 'n' there are compilation problems since some new typedefs are
not visible).
Reported by Doug Chapman. Fix tuned by a cast of thousands (Andi,
Andreas, Arthur, HPA, Willy)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix some typos in pipe.c and splice.c.
Add pipes API to kernel-api.tmpl.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
ext[234]_check_descriptors sanity checks block group descriptor geometry at
mount time, testing whether the block bitmap, inode bitmap, and inode table
reside wholly within the blockgroup. However, the inode table test is off
by one so that if the last block in the inode table resides on the last
block of the block group, the test incorrectly fails. This is because it
tests the last block as (start + length) rather than (start + length - 1).
This can be seen by trying to mount a filesystem made such as:
mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024
which yields:
EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
EXT2-fs: group descriptors corrupted!
There is a similar bug in e2fsprogs, patch already sent for that.
(I wonder if inside(), outside(), and/or in_range() should someday be
used in this and other tests throughout the ext filesystems...)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davi fixed a missing cast in the __put_user(), that was making timerfd
return a single byte instead of the full value.
Talking with Michael about the timerfd man page, we think it'd be better to
use a u64 for the returned value, to align it with the eventfd
implementation.
This is an ABI change. The timerfd code is new in 2.6.22 and if we merge this
into 2.6.23 then we should also merge it into 2.6.22.x. That will leave a few
early 2.6.22 kernels out in the wild which might misbehave when a future
timerfd-enabled glibc is run on them.
mtk says: The difference would be that read() will only return 4 bytes, while
the application will expect 8. If the application is checking the size of
returned value, as it should, then it will be able to detect the problem (it
could even be sophisticated enough to know that if this is a 4-byte return,
then it is running on an old 2.6.22 kernel). If the application is not
checking the return from read(), then its 8-byte buffer will not be filled --
the contents of the last 4 bytes will be undefined, so the u64 value as a
whole will be junk.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Davi Arnaut <davi@haxent.com.br>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is probably a leftover from a time when the return wasn't there yet.
Now the extra assignment is just irritating.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kunmap_atomic() takes the virtual address, not the mapped page as
argument.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'request-queue-t' of git://git.kernel.dk/linux-2.6-block:
[BLOCK] Add request_queue_t and mark it deprecated
[BLOCK] Get rid of request_queue_t typedef
The fallocate syscall returns ENOSYS in case the filesystem does not support
the operation and expects the userlevel code to fill in. This is good in
concept.
The problem is that the libc code for old kernels should be able to
distinguish the case where the syscall is not at all available vs not
functioning for a specific mount point. As is this is not possible and we
always have to invoke the syscall even if the kernel doesn't support it.
I suggest the following patch. Using EOPNOTSUPP is IMO the right thing to do.
Cc: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Obviously broken on little-endian; fortunately, the option is not
frequently used...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Hey, sparse is wonderful, but even better than sparse is having people
like Al that actually _run_ it and fix bugs using it. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Too many remote cpu references due to /proc/stat.
On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
/proc/stat is parsed by common commands like top, who etc, causing lots
of cacheline transfers
This statistic seems useless. Other 'big iron' arches disable this.
AK: changed to remove for all SMP setups
AK: add comment
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For display purposes, treat uid's and gid's as unsigned ints for now.
Also fix a typo.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an variation on the patch sent by Christoph Hellwig which kills
file_count abuse by the Coda kernel module by moving the coda_flush
functionality into coda_release. However part of reason we were using the
coda_flush callback was to allow Coda to pass errors that occur during
writeback from the userspace cache manager back to close().
As Al Viro explained on linux-fsdevel, it is impossible to guarantee that
such errors can in fact be returned back to the caller. There are many
cases where the last reference to a file is not released by the close
system call and it is also impossible to pick some close as a 'last-close'
and delay it until all other references have been destroyed.
The CODA_STORE/CODA_RELEASE upcall combination is clearly a broken design,
and it is better to remove it completely.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes up sources after conversion by Lindent.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If add_to_page_cache_lru() fails, the page will not be locked. But
splice jumps to an error path that does a page release and unlock,
causing a BUG() in unlock_page().
Fix this by adding one more label that just releases the page. This bug
was actually triggered on EL5 by gurudas pai <gurudas.pai@oracle.com>
using fio.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix afs_send_simple_reply() to accept a greater-than-zero return value from
rxrpc_kernel_send_data() as being a successful return rather than thinking it
an error and aborting the call.
rxrpc_kernel_send_data() previously returned zero incorrectly when it worked
successfully, but has been patched to return the number of bytes it
transmitted.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix page index to offset conversion overflows in buffer layer, ecryptfs,
and ocfs2.
It would be nice to convert the whole tree to page_offset, but for now
just fix the bugs.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>