Thanks to Adrian Bunk for debugging the problem and to Shaggy for
helping find the solution.
Also added a fix for 64K pages we found in loosely-related testing
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change reverts the 033b96fd30 commit
from Kay Sievers that removed the mount/umount uevents from the kernel.
Some older versions of HAL still depend on these events to detect when a
new device has been mounted. These events are not correctly emitted,
and are broken by design, and so, should not be relied upon by any
future program. Instead, the /proc/mounts file should be polled to
properly detect this kind of event.
A feature-removal-schedule.txt entry has been added, noting when this
interface will be removed from the kernel.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Minor updates to the documentation to bring them into sync with current
websites and available features. The debug flag was switched back to hex
to match the documentation.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) it should use nr_processes(), not nr_threads; otherwise we are getting
very confused find(1) and friends, among other things.
2) better do that at stat() time than at every damn lookup in procfs root.
Patch had been sitting in FC4 kernels for many months now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I got all of these backwards. We want to return
min(input timeout, new timeout)
to userspace to prevent increasing the time-remaining value.
Thanks to Ernst Herzberg <earny@net4u.de> for reporting and diagnosing.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a rather theoretical case of the BUG triggering in
fuse_reset_request():
- iget() fails because of OOM after a successful CREATE_OPEN request
- during IO on the resulting RELEASE request the connection is aborted
Fix and add warning to fuse_reset_request().
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a deadlock possible in the ext2 file system implementation. This
deadlock occurs when a file is removed from an ext2 file system which was
mounted with the "sync" mount option.
The problem is that ext2_xattr_delete_inode() was invoking the routine,
sync_dirty_buffer(), using a buffer head which was previously locked via
lock_buffer(). The first thing that sync_dirty_buffer() does is to lock
the buffer head that it was passed. It does this via lock_buffer(). Oops.
The solution is to unlock the buffer head in ext2_xattr_delete_inode()
before invoking sync_dirty_buffer(). This makes the code in
ext2_xattr_delete_inode() obey the same locking rules as all other callers
of sync_dirty_buffer() in the ext2 file system implementation.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Disable automatic checkpointing of the journal - this is a relic from older
ocfs2 days. Worth quite a bit of performance on longer running single node
tests.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* fix a hang in recovery that occurred in dlmlock_remote. the $RECOVERY
lock was never moved to the granted queue even after getting DLM_NORMAL
back from the master node.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* add dlm_wait_for_node_death function to be used after receiving a network
error. this will wait for the given timeout to allow the heartbeat
callbacks to update the domain map. without this, some paths may spin
and consume enough cpu that the heartbeat gets starved and never updates.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* fix a bug in dlm_convert_lock_handler where dlm_lockres_release_ast was
being called even if no ast was ever reserved
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* after successfully taking the $RECOVERY lock in EX mode, recheck to make
sure that recovery has not already begun or completed on another node
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1. The tracee can go from ptrace_stop() to do_signal_stop()
after __ptrace_unlink(p).
2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
for tasklist_lock in ptrace_detach().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes an oops reported by Adrian Bunk in cifs_user_read when a null
read response is returned on a forcedirectio mount.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If 2 threads attached to the same process are blocking on different locks on
different files (maybe even on different servers) but have the same lock
arguments (i.e. same offset+length - actually quite common, since most
processes try to lock the entire file) then the first GRANTED call that wakes
one up will also wake the other.
Currently when the NLM_GRANTED callback comes in, lockd walks the list of
blocked locks in search of a match to the lock that the NLM server has
granted. Although it checks the lock pid, start and end, it fails to check
the filehandle and the server address.
By checking the filehandle and server IP address, we ensure that this only
happens if the locks truly are referencing the same file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reverts commit f93ea411b7:
[PATCH] jbd: split checkpoint lists
This broke journal_flush() for OCFS2, which is its method of being sure
that metadata is sent to disk for another node.
And two related commits 8d3c7fce2d and
43c3e6f5ab with the subjects:
[PATCH] jbd: log_do_checkpoint fix
[PATCH] jbd: remove_transaction fix
These seem to be incremental bugfixes on the original patch and as such are
no longer needed.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a potential oops if there is an error reported by
posix_acl_from_disk(). This is mostly theoretical due to the use of
magics and checksums in xattrs, but is still possible.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Unfortunately, the reiserfs_attrs_cleared bit in the superblock flag can
lie. File systems have been observed with the bit set, yet still contain
garbage in the stat data field, causing unpredictable results.
This patch backs out the enable-by-default behavior.
It eliminates the changes from: d50a5cd860,
and ef5e5414e7.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With David Woodhouse <dwmw2@infradead.org>
select() presently has a habit of increasing the value of the user's
`timeout' argument on return.
We were writing back a timeout larger than the original. We _deliberately_
round up, since we know we must wait at _least_ as long as the caller asks
us to.
The patch adds a couple of helper functions for magnitude comparison of
timespecs and of timevals, and uses them to prevent the various poll and
select functions from returning a timeout which is larger than the one which
was passed in.
The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
timeout value to the old one and was returning that. It should just return
the new timeout value.
(We have various handy timespec/timeval-to-from-nsec conversion functions in
time.h. But this code open-codes it all).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: george anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The *at patches introduced fstatat and, due to inusfficient research, I
used the newfstat functions generally as the guideline. The result is that
on 32-bit platforms we don't have all the information needed to implement
fstatat64.
This patch modifies the code to pass up 64-bit information if
__ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make
this clear. Other archs will continue to use the existing code. On x86-64
the compat code is implemented using a new sys32_ function. this is what
is done for the other stat syscalls as well.
This patch might break some other archs (those which define
__ARCH_WANT_STAT64 and which already wired up the syscall). Yet others
might need changes to accomodate the compatibility mode. I really don't
want to do that work because all this stat handling is a mess (more so in
glibc, but the kernel is also affected). It should be done by the arch
maintainers. I'll provide some stand-alone test shortly. Those who are
eager could compile glibc and run 'make check' (no installation needed).
The patch below has been tested on x86 and x86-64.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Direct backport of 2.4 fix that didn't get propagated to 2.6; original
comment follows:
<quote>
When I specify the NFS port for nfsroot (e.g.,
nfsroot=<dir>,port=2049), the
kernel uses the wrong port. In my case it tries to use 264 (0x108)
instead
of 2049 (0x801).
This patch adds the missing htons().
Eric
</quote>
Patch got applied in 2.4.21-pre6. Author: Eric Lammerts (<eric@lammerts.org>,
AFAICS).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the namespace structure is being shared, allocate a new one and copy
information from the current, shared, structure.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a user trigger this message on a box that had a lot of different
mounts, all with different options. It might help narrow down wtf happened
if we print out which device failed.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix one-shot support in inotify. We currently drop the IN_ONESHOT flag
during watch addition. Fix is to not do that.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix do_path_lookup() to avoid accessing invalid dentry or inode when the
link_path_walk() has failed. This should fix Bugme #5897.
Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just noticed that my patch "don't create on open that fails due to
ERR_GRACE" (recently commited as fb553c0f17)
had an obvious problem that causes a deadlock on reboot recovery. Sending
in this now since it seems like a clear 2.6.16 candidate.--b.
We're returning with a lock held in some error cases.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
the clustering of extra pages in a buffered write.
SGI-PV: 949210
SGI-Modid: xfs-linux-melb:xfs-kern:25130a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Fix trivial type mixup in the debugfs function comments.
Signed-off-by: Vincent Hanquez <vincent@snarc.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When walking a path, the LOOKUP_CONTINUE flag is used by some filesystems
(for instance NFS) in order to determine whether or not it is looking up
the last component of the path. It this is the case, it may have to look
at the intent information in order to perform various tasks such as atomic
open.
A problem currently occurs when link_path_walk() hits a symlink. In this
case LOOKUP_CONTINUE may be cleared prematurely when we hit the end of the
path passed by __vfs_follow_link() (i.e. the end of the symlink path)
rather than when we hit the end of the path passed by the user.
The solution is to have link_path_walk() clear LOOKUP_CONTINUE if and only
if that flag was unset when we entered the function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ben points out that:
When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
significant drop in throughput as the disk sits idle. The patch below
results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
IDE test box) when writing out files using O_SYNC.
So optimise the batching code by omitting it entirely if the process which is
doing a sync write is the same as the one which did the most recent sync
write. If that's true, we're unlikely to get any other processes joining the
transaction.
(Has been in -mm for ages - it took me a long time to get on to performance
testing it)
Numbers, on write-cache-disabled IDE:
/usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name
Unpatched:
40 seconds
Patched:
35 seconds
Batching disabled:
35 seconds
This is the problematic single-process-doing-fsync case. With multiple
fsyncing processes the numbers are AFACIT unaltered by the patch.
Aside: performance testing and instrumentation shows that the transaction
batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
dir-name on non-writeback-caching IDE). This is because by the time one
process is running a synchronous commit, a bunch of other processes already
have a transaction handle open, so they're all going to batch into the same
transaction anyway.
The batching seems to offer maybe 5-10% speedup with this workload, but I'm
pretty sure it was more important than that when it was first developed 4-odd
years ago...
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last fix for this function in fact opened up a much more often
triggering race.
It was uncommented tricky code, that was buggy. Add comment, make it less
tricky and fix bug.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.
As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().
(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The mount path had incorrectly asked the locking code to wait for recovery
completion, which deadlocks things because recovery waits for mount to
complete first.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
configfs always made item and attribute ownership root.root and
permissions based on a umask of 022. Add ->setattr() to allow
chown(2)/chmod(2), and persist the changes for the lifetime of the
items and attributes.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>