proc_check_chroot() does the check in a very unintuitive way (keeping a
copy of the argument, then modifying the argument), and has uncommented
sideeffects.
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_flock() currently has a race which can result in a double free in the
multi-thread case.
Thread 1 Thread 2
sys_flock(file, LOCK_EX)
sys_flock(file, LOCK_UN)
If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for
list_empty(&lock->fl_link) at the end of sys_flock, then both threads will
end up calling locks_free_lock for the same lock.
Fix is to make flock_lock_file() do the same as posix_lock_file(), namely
to make a copy of the request, so that the caller can always free the lock.
This also has the side-effect of fixing up a reference problem in the
lockd handling of flock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IN_DELETE events are no longer generated for the removal of a file from a
watched directory.
This seems to be a result of clearing DCACHE_INOTIFY_PARENT_WATCHED in
d_delete() directly before calling fsnotify_nameremove().
Assuming the flag doesn't need to be cleared before dentry_iput(), this
should do the trick.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since these names on old MSDOS is used as device, so, current fat driver
doesn't allow a user to create those names. But many OSes and even Windows
can create those names actually, now.
This patch removes the reserved name check.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:
- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.
- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."
Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.
Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).
Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make baby-simple the code for /proc/devices. Based on the proven design
for /proc/interrupts.
This also fixes the early-termination regression 2.6.16 introduced, as
demonstrated by:
# dd if=/proc/devices bs=1
Character devices:
1 mem
27+0 records in
27+0 records out
This should also work (but is untested) when /proc/devices >4096 bytes,
which I believe is what the original 2.6.16 rewrite fixed.
[akpm@osdl.org: cleanups, simplifications]
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__user annotations (hppfs)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Woe be unto he who builds their filesystems as modules.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
[ Obscure quote from the infamous geek bible? ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This enables the caller to migrate pages from one address space page
cache to another. In buzz word marketing, you can do zero-copy file
copies!
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).
From the splice.c comments:
"splice": joining two ropes together by interweaving their strands.
This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.
The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.
Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (67 commits)
[PATCH] powerpc: Remove oprofile spinlock backtrace code
[PATCH] powerpc: Add oprofile calltrace support to all powerpc cpus
[PATCH] powerpc: Add oprofile calltrace support
[PATCH] for_each_possible_cpu: ppc
[PATCH] for_each_possible_cpu: powerpc
[PATCH] lock PTE before updating it in 440/BookE page fault handler
[PATCH] powerpc: Kill _machine and hard-coded platform numbers
ppc: Fix compile error in arch/ppc/lib/strcase.c
[PATCH] git-powerpc: WARN was a dumb idea
[PATCH] powerpc: a couple of trivial compile warning fixes
powerpc: remove OCP references
powerpc: Make uImage default build output for MPC8540 ADS
powerpc: move math-emu over to arch/powerpc
powerpc: use memparse() for mem= command line parsing
ppc: fix strncasecmp prototype
[PATCH] powerpc: make ISA floppies work again
[PATCH] powerpc: Fix some initcall return values
[PATCH] powerpc: Workaround for pSeries RTAS bug
[PATCH] spufs: fix __init/__exit annotations
[PATCH] powerpc: add hvc backend for rtas
...
* git://oss.sgi.com:8090/oss/git/xfs-2.6:
[XFS] Cleanup in XFS after recent get_block_t interface tweaks.
[XFS] Remove unused/obsoleted function: xfs_bmap_do_search_extents()
[XFS] A change to inode chunk allocation to try allocating the new chunk
Fixes a regression from the recent "remove ->get_blocks() support"
[XFS] Fix compiler warning and small code inconsistencies in compat
[XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all
This patch borrows a clever Hugh's 'struct anon_vma' trick.
Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.
But this means we don't need to defer 'kmem_cache_free(sighand)'. We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.
To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently. This patch removes this argument.
No changes in affected .o files.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.
Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups. The leader is already in
the EXIT_DEAD state so no one cares about it's pids. The only concern for
the leader is that __unhash_process called from release_task will function
correctly. If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.
For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.
Currently de_thread is also adding the task to the task list which is just
silly.
Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.
So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.
The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I think it is enough to take tasklist_lock for reading while changing
child_reaper:
Reparenting needs write_lock(tasklist_lock)
Only one thread in a thread group can do exec()
sighand->siglock garantees that get_signal_to_deliver()
will not see a stale value of child_reaper.
This means that we can change child_reaper earlier, without calling
zap_other_threads() twice.
"child_reaper = current" is a NOOP when init does exec from main thread, we
don't care.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After looking at the problem of init calling exec some more I figured out
an easy way to make the code work.
The actual symptom without out this patch is that all threads will die
except pid == 1, and the thread calling exec. The thread calling exec will
wait forever for pid == 1 to die.
Since pid == 1 does not install a handler for SIGKILL it will never die.
This modifies the tests for init from current->pid == 1 to the equivalent
current == child_reaper. And then it causes exec in the ugly case to
modify child_reaper.
The only weird symptom is that you wind up with an init process that
doesn't have the oldest start time on the box.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
contiguous with the most recently allocated chunk. On a striped
filesystem, this will fill a stripe unit with inodes before allocating new
inodes in another stripe unit.
SGI-PV: 951416
SGI-Modid: xfs-linux-melb:xfs-kern:208488a
Signed-off-by: Glen Overby <overby@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
change. inode->i_blkbits should be used when making a get_block_t
request of a filesystem instead of dio->blkbits, as that does not
indicate the filesystem block size all the time (depends on request
alignment - see start of __blockdev_direct_IO).
Signed-off-by: Nathan Scott <nathans@sgi.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove an unnecessary level of indirection in allocating and freeing select
bits, as per the select_bits_alloc() and select_bits_free() functions.
Both select.c and compat.c are updated.
Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimize select and poll by a using stack space for small fd sets
This brings back an old optimization from Linux 2.0. Using the stack is
faster than kmalloc. On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)
It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages. This can add up - e.g. if you have 10
daemons blocking in poll/select you save 40KB of memory.
I did a patch for this long ago, but it was never applied. This version is
a reimplementation of the old patch that tries to be less intrusive. I
only did the minimal changes needed for the stack allocation.
The cut off point before external memory is allocated is currently at
832bytes. The system calls always allocate this much memory on the stack.
These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers. There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]
I suspect more optimizations might be possible, but they would be more
complicated. One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
Problem is when to flush the file descriptors out though.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a proper prototype for autofs4_dentry_release() to autofs_i.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add proper prototypes for fat_cache_init() and fat_cache_destroy() in
msdos_fs.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.
We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Various dodgy firmware might give us nodes and/or properties in the device
tree with conflicting names. That's generally ok, except for when we export
the device tree via /proc, so check when we're creating the proc device tree
and munge names accordingly.
Tested on a faked device tree with kexec, would be good if someone with
actual bogus firmware could try it, but just for completeness.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Adding bd_claim_by_kobject() function which takes kobject as additional
signature of holder device and creates sysfs symlinks between holder device
and claimed device. bd_release_from_kobject() is a counterpart of
bd_claim_by_kobject.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all the CONFIG_SYSFS stuff. That's supposed to all be implemented up
in header files.
Yes, the CONFIG_SYSFS=n data structures will be a little larger than
necessary, but that's a tradeoff we can decide to make.
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Creating "slaves" and "holders" directories in /sys/block/<disk> and
creating "holders" directory under /sys/block/<disk>/<partition>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can now make some code static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. it makes some of the code nicer.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These were an unnecessary wart. Also only have one 'DefineSimpleCache..'
instead of two.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current svc_expkey holds a pointer to the svc_export structure, so updates to
that structure have to be in-place, which is a wart on the whole cache
infrastruct. So we break that linkage and just do a second lookup.
If this became a performance issue, it would be possible to put a direct link
back in which was only used conditionally. i.e. when an object is replaced
in the cache, we set a flag in the old object. When dereferencing the link
from svc_expkey, if the flag is set, we drop the reference and do a fresh
lookup.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 'auth_domain's are simply handles on internal data structures. They do
not cache information from user-space, and forcing them into the mold of a
'cache' misrepresents their true nature and causes confusion.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix accidental underflow of the atomic counter.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This functionality is also need for operation of autofs v5 direct mounts.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We have to have a valid sbi here, or we'd have oopsed already. (There's a
dereference of sbi->catatonic a few lines above)
Coverity #740
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch define a new autofs packet for autofs v5 and updates the waitq.c
functions to handle the additional packet type.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds expire logic for autofs direct mounts.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a follow_link inode method for the root of an autofs direct
mount trigger. It also adds the corresponding mount options and updates the
show_mount method.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In order to be able to trigger a mount using the follow_link inode method the
nameidata struct that is passed in needs to have the vfsmount of the autofs
trigger not its parent.
During a path walk if an autofs trigger is mounted on a dentry, when the
follow_link method is called, the nameidata struct contains the vfsmount and
mountpoint dentry of the parent mount while the dentry that is passed in is
the root of the autofs trigger mount. I believe it is impossible to get the
vfsmount of the trigger mount, within the follow_link method, when only the
parent vfsmount and the root dentry of the trigger mount are known.
This patch updates the nameidata struct on entry to __do_follow_link if it
detects that it is out of date. It moves the path_to_nameidata to above
__do_follow_link to facilitate calling it from there. The dput_path is moved
as well as that seemed sensible. No changes are made to these two functions.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the functions may_umount and may_umount_tree to boolean functions to
aid code readability.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename the function simple_empty_nolock to __simple_empty in line with kernel
naming conventions.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace and formating changes to waitq code.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add show_options method to display autofs4 mount options in the proc
filesystem.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the update of i_atime from autofs4 in favour of having VFS update it.
i_atime is never used for expire in autofs4.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alter the expire semantics that define how "busyness" is determined.
Currently a last_used counter is updated on every revalidate from processes
other than the mount owner process group.
This patch changes that so that an expire candidate is busy only if it has a
reference count greater than the expected minimum, such as when there is an
open file or working directory in use.
This method is the only way that busyness can be established for direct mounts
within the new implementation. For consistency the expire semantic is made
the same for all mounts.
A side effect of the patch is that mounts which remain mounted unessessarily
in the presence of some GUI programs that scan the filesystem should now
expire.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the case where an expire returns busy on a tree mount when it is in fact
not busy. This case was overlooked when the patch to prevent the expiring
away of "scaffolding" directories for tree mounts was applied.
The problem arises when a tree of mounts is a member of a map with other keys.
The current logic will not expire the tree if any other mount in the map is
busy. The solution is to maintain a "minimum" use count for each autofs
dentry and compare this to the actual dentry usage count during expire.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplify the expire tree traversal code by using a function from namespace.c
to calculate the next entry in the top down tree traversals carried out during
the expire operation.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the names of the boolean functions autofs4_check_mount and
autofs4_check_tree to autofs4_mount_busy and autofs4_tree_busy respectively
and alters their return codes to suit in order to aid code readabilty.
A couple of white space cleanups are included as well.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Addresse a problem where stale dentrys stop mounts from happening.
When a mount point directory is pre-created and a non-existent entry within it
is requested a dentry ends up being created within the mount point directory
which stops future mounts. The problem is solved by ignoring negative,
unhashed dentrys in the mount point d_subdirs list.
Additionally the apparent cacheing of -ENOENT returns from requests is
removed. The test on d_time is a tautology and d_time is not initialised and
has an unexpected value. In short it doesn't do what it's meant to.
The cacheing of failed requests to the daemon is important and will be
followed up later.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change readdir routines to use the cursor based routines in libfs.c. This
removes reliance on old readdir code from 2.4 and should improve efficiency of
readdir in autofs4.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace and formating changes to lookup code.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Noted by Oleg Drokin:
We initialized an extra slot of struct kstatfs.spare, sometimes
causing stack corruption.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Oleg Drokin <green@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
Kconfig help: MTD_JEDECPROBE already supports Intel
Remove ugly debugging stuff
do_mounts.c: Minor ROOT_DEV comment cleanup
BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
BUG_ON() Conversion in mm/mempool.c
BUG_ON() Conversion in mm/memory.c
BUG_ON() Conversion in kernel/fork.c
BUG_ON() Conversion in ipc/sem.c
BUG_ON() Conversion in fs/ext2/
BUG_ON() Conversion in fs/hfs/
BUG_ON() Conversion in fs/dcache.c
BUG_ON() Conversion in fs/buffer.c
BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
BUG_ON() Conversion in md/dm-table.c
BUG_ON() Conversion in md/dm-path-selector.c
BUG_ON() Conversion in drivers/isdn
BUG_ON() Conversion in drivers/char
BUG_ON() Conversion in drivers/mtd/
Now the only user who are using generic_ffs() is ntfs filesystem. This patch
isolates generic_ffs() as ntfs_ffs() for ntfs.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The nanosleep cleanup allows to remove the data field of hrtimer. The
callback function can use container_of() to get it's own data. Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the it_real_value from /proc/*/stat, during 1.2.x was the last time it
returned useful data (as it was directly maintained by the scheduler), now
it's only a waste of time to calculate it. Return 0 instead.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is no valid reason why we can't support "nobh" option for filesystems
with blocksize != PAGESIZE.
This patch lets them use "nobh" option for writeback mode for blocksize <
pagesize.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao recently added multi-block allocation support for ext3,
currently used only by DIO. I added support to map multiple blocks for
mpage_readpages(). This patch add support for ext3_get_block() to deal
with multi-block mapping. Basically it renames ext3_direct_io_get_blocks()
as ext3_get_block().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Clean up a few little layout things and comments.
- Add a WARN_ON to a case which I was wondering about.
- Tune up some inlines.
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that get_block() can handle mapping multiple disk blocks, no need to have
->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
and makes it users use get_block() instead.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes mpage_readpages() and get_block() to get the disk mapping
information for multiple blocks at the same time.
b_size represents the amount of disk mapping that needs to mapped. On the
successful get_block() b_size indicates the amount of disk mapping thats
actually mapped. Only the filesystems who care to use this information and
provide multiple disk blocks at a time can choose to do so.
No changes are needed for the filesystems who wants to ignore this.
[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass amount of disk needs to be mapped to get_block(). This way one can
modify the fs ->get_block() functions to map multiple blocks at the same time.
[akpm@osdl.org: performance tweak]
[akpm@osdl.org: remove unneeded assignments]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
Update some old and moldy comments in and around the structure as well.
The b_size increase allows us to perform larger mappings and allocations for
large I/O requests from userspace, which tie in with other changes allowing
the get_block_t() interface to map multiple blocks at once.
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimize the block reservation and the multiple block allocation: with the
knowledge of the total number of blocks ahead, set or adjust the reservation
window size properly (based on the number of blocks needed) before block
allocation happens: if there isn't any reservation yet, make sure the
reservation window equals to or greater than the number of blocks needed,
before create an reservation window; if a reservation window is already
exists, try to extends the window size to match the number of blocks to
allocate. This could increase the possibility of completing multiple blocks
allocation in a single request, as blocks are only allocated in the range of
the inode's reservation window.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update accounting information (quota, boundary checks, free blocks number etc)
in ext3_new_blocks().
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
allocate the requested number of blocks on a best effort basis: After
allocated the first block, it will always attempt to allocate the next few(up
to the requested size and not beyond the reservation window) adjacent blocks
at the same time.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for multiple block allocation in ext3-get-blocks().
Look up the disk block mapping and count the total number of blocks to
allocate, then pass it to ext3_new_block(), where the real block allocation is
performed. Once multiple blocks are allocated, prepare the branch with those
just allocated blocks info and finally splice the whole branch into the block
mapping tree.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently ext3_get_block() only maps or allocates one block at a time. This
is quite inefficient for sequential IO workload.
I have posted a early implements a simply multiple block map and allocation
with current ext3. The basic idea is allocating the 1st block in the existing
way, and attempting to allocate the next adjacent blocks on a best effort
basis. More description about the implementation could be found here:
http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2
The following the latest version of the patch: break the original patch into 5
patches, re-worked some logicals, and fixed some bugs. The break ups are:
[patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
[patch 2] Extend ext3_get_blocks() to support multiple block allocation
[patch 3] Implement multiple block allocation in ext3-try-to-allocate
(called via ext3_new_block()).
[patch 4] Proper accounting updates in ext3_new_blocks()
[patch 5] Adjust reservation window size properly (by the given number
of blocks to allocate) before block allocation to increase the
possibility of allocating multiple blocks in a single call.
Tests done so far includes fsx,tiobench and dbench. The following numbers
collected from Direct IO tests (1G file creation/read) shows the system time
have been greatly reduced (more than 50% on my 8 cpu system) with the patches.
1G file DIO write:
2.6.15 2.6.15+patches
real 0m31.275s 0m31.161s
user 0m0.000s 0m0.000s
sys 0m3.384s 0m0.564s
1G file DIO read:
2.6.15 2.6.15+patches
real 0m30.733s 0m30.624s
user 0m0.000s 0m0.004s
sys 0m0.748s 0m0.380s
Some previous test we did on buffered IO with using multiple blocks allocation
and delayed allocation shows noticeable improvement on throughput and system
time.
This patch:
Add support of mapping multiple blocks in one call.
This is useful for DIO reads and re-writes (where blocks are already
allocated), also is in line with Christoph's proposal of using getblocks() in
mpage_readpage() or mpage_readpages().
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes several mempool users, all of which are basically just
wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather
than their own wrapper function, removing a bunch of duplicated code.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the NFSv4 server to use the new posix_lock_file_conf() interface.
Remove unnecessary (and race-prone) posix_test_file() calls.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lockd and the NFSv4 server both exercise a race condition where
posix_test_lock() is called either before or after posix_lock_file() to
deal with a denied lock request due to a conflicting lock.
Remove the race condition for the NFSv4 server by adding a new conflicting
lock parameter to __posix_lock_file() , changing the name to
__posix_lock_file_conf().
Keep posix_lock_file() interface, add posix_lock_conf() interface, both
call __posix_lock_file_conf().
[akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs]
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>