Commit Graph

3981 Commits

Author SHA1 Message Date
Eric W. Biederman
5e61feafa2 [PATCH] proc: remove the useless SMP-safe comments from /proc
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Eric W. Biederman
7bcd6b0efd [PATCH] proc: remove trailing blank entry from pid_entry arrays
It was pointed out that since I am taking ARRAY_SIZE anyway the trailing empty
entry is silly and just wastes space.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Eric W. Biederman
8e95bd936d [PATCH] proc: properly compute TGID_OFFSET
The value doesn't change but this ensures I will have the proper value when
other files are added to proc_base_stuff.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Oleg Nesterov
b0fa9db6ab [PATCH] proc: drop tasklist lock in task_state()
task_state() needs tasklist_lock to protect ->parent/->real_parent.  However
task->parent points to nowhere only when the actions below happen in order

	1) release_task(task)
	2) release_task(task->parent)
	3) a grace period passed

But 3) implies that the memory ops from 1) should be finished, so pid_alive()
can't be true in such a case.

Otherwise, we don't care if ->parent/->real_parent changes under us.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Oleg Nesterov
a593d6edeb [PATCH] proc: convert do_task_stat() to use lock_task_sighand()
Drop tasklist_lock. ->siglock protects almost all interesting data
(including sub-threads traversal) except:

	->signal->tty
		protected by tty_mutex

	->real_parent
		the task can't be unhashed while we are holding
		->siglock, so ->real_parent can change from under us
		but we can safely dereference it under rcu_read_lock()

	->pgrp/->session
		we can get inconsistent numbers if the task does
		sys_setsid/daemonize at the same time. I hope this
		is acceptable.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Oleg Nesterov
5e6b3f42ed [PATCH] proc: convert task_sig() to use lock_task_sighand()
lock_task_sighand() can take ->siglock without holding tasklist_lock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
7fbaac005c [PATCH] proc: Use pid_task instead of open coding it
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
72d9dcfc7a [PATCH] proc: Merge proc_tid_attr and proc_tgid_attr
The implementation is exactly the same and there is currently nothing to
distinguish proc_tid_attr, and proc_tgid_attr.  So it is pointless to have two
separate implementations.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
61a2878402 [PATCH] proc: Remove the hard coded inode numbers
The hard coded inode numbers in proc currently limit its maintainability,
its flexibility, and what can be done with the rest of system.  /proc limits
pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems,
and placing the pid in the inode number really gets in the way of implementing
subdirectories of per process information.

Ever since people started adding to the middle of the file type enumeration we
haven't been maintaing the historical inode numbers, all we have really
succeeded in doing is keeping the pid in the proc inode number.  The pid is
already available in the directory name so no information is lost removing it
from the inode number.

So if something in user space cares if we remove the inode number from the
/proc inode it is almost certainly broken.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
444ceed8d1 [PATCH] proc: Factor out an instantiate method from every lookup method
To remove the hard coded proc inode numbers it is necessary to be able to
create the proc inodes during readdir.  The instantiate methods are the subset
of lookup that is needed to accomplish that.

This first step just splits the lookup methods into 2 functions.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
801199ce80 [PATCH] proc: Make the generation of the self symlink table driven
This patch generalizes the concept of files in /proc that are related to
processes but live in the root directory of /proc

Ideally this would reuse infrastructure from the rest of the process specific
parts of proc but unfortunately security_task_to_inode must not be called on
files that are not strictly per process.  security_task_to_inode really needs
to be reexamined as the security label can change in important places that we
are not currently catching, but I'm not certain that simplifies this problem.

By at least matching the structure of the rest of proc we get more idiom reuse
and it becomes easier to spot problems in the way things are put together.

Later things like /proc/mounts are likely to be moved into proc_base as well.
If union mounts are ever supported we may be able to make /proc a union mount,
and properly split it into 2 filesystems.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Serge E. Hallyn
96b644bdec [PATCH] namespaces: utsname: use init_utsname when appropriate
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
e9ff3990f0 [PATCH] namespaces: utsname: switch to using uts namespaces
Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
1651e14e28 [PATCH] namespaces: incorporate fs namespace into nsproxy
This moves the mount namespace into the nsproxy.  The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Peter Zijlstra
12fd352038 [PATCH] nfsd: lockdep annotation
while doing a kernel make modules_install install over an NFS mount.

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9550 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9550:
   #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9580 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9580:
   #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
eed2965af1 [PATCH] knfsd: allow admin to set nthreads per node
Add /proc/fs/nfsd/pool_threads which allows the sysadmin (or a userspace
daemon) to read and change the number of nfsd threads in each pool.  The
format is a list of space-separated integers, one per pool.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
eec09661dc [PATCH] knfsd: use svc_set_num_threads to manage threads in knfsd
Replace the existing list of all nfsd threads with new code using
svc_create_pooled().

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
9a24ab5749 [PATCH] knfsd: add svc_get
add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
NeilBrown
4a3ae42dc3 [PATCH] knfsd: Correctly handle error condition from lockd_up
If lockd_up fails - what should we expect?  Do we have to later call
lockd_down?

Well the nfs client thinks "no", the nfs server thinks "yes".  lockd thinks
"yes".

The only answer that really makes sense is "no" !!

So:
  Make lockd_up only increment  nlmsvc_users on success.
  Make nfsd handle errors from lockd_up properly.
  Make sure lockd_up(0) never fails when lockd is running
    so that the 'reclaimer' call to lockd_up doesn't need to
    be error checked.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
7dcf91ec66 [PATCH] knfsd: Move makesock failed warning into make_socks.
Thus it is printed for any path that leads to failure (make_socks is called
from two places).

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
3dfb421053 [PATCH] knfsd: Check return value of lockd_up in write_ports
We should be checking the return value of lockd_up when adding a new socket to
nfsd.  So move the lockd_up before the svc_addsock and check the return value.

The move is because lockd_down is easy, but there is no easy way to remove a
recently added socket.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
6fb2b47fa1 [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
Josh Triplett
896440d560 [PATCH] nfsd: add lock annotations to e_start and e_stop
e_start acquires svc_export_cache.hash_lock, and e_stop releases it.  Add
lock annotations to these two functions so that sparse can check callers
for lock pairing, and so that sparse will not complain about these
functions since they intentionally use locks in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
Greg Banks
bc6f02e516 [PATCH] knfsd: Use SEQ_START_TOKEN instead of hardcoded magic (void*)1
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
b41b66d63c [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'
Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist.  This will cause the nfs server to listen on that socket.

To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
80212d59e3 [PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports
This file will list all ports that nfsd has open.
Default when TCP enabled will be
   ipv4 udp 0.0.0.0 2049
   ipv4 tcp 0.0.0.0 2049

Later, the list of ports will be settable.

'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
02a375f0ac [PATCH] knfsd: separate out some parts of nfsd_svc, which start nfs servers
Separate out the code for creating a new service, and for creating initial
sockets.

Some of these new functions will have multiple callers soon.

[akpm@osdl.org: cleanups]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
6658d3a7bb [PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions
We have an array 'nfsd_version' which lists the available versions of nfsd,
and 'nfsd_versions' (poor choice there :-() which lists the currently active
versions.

Then we have a bitmap - nfsd_versbits which says which versions are wanted.
The bits in this bitset cause content to be copied from nfsd_version to
nfsd_versions when nfsd starts.

This patch removes nfsd_versbits and moves information directly from
nfsd_version to nfsd_versions when requests for version changes arrive.

Note that this doesn't make it possible to change versions while the server is
running.  This is because serv->sv_xdrsize is calculated when a service is
created, and used when threads are created, and xdrsize depends on the active
versions.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
NeilBrown
24e36663c3 [PATCH] knfsd: be more selective in which sockets lockd listens on
Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.

However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.

So:
 - add an option to lockd_up saying which protocol is needed
 - Always open sockets for which an explicit port was given, otherwise
   only open a socket of the type required
 - Change nfsd to do one lockd_up per socket rather than one per thread.

This
 - removes the dependancy on CONFIG_NFSD_TCP
 - means that lockd may open sockets other than at startup
 - means that lockd will *not* listen on UDP if the only
   mounts are TCP mount (and nfsd hasn't started).

The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
NeilBrown
bc591ccff2 [PATCH] knfsd: add a callback for when last rpc thread finishes
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more.  So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Greg Banks
b06c7b4333 [PATCH] knfsd: remove an unused variable from e_show()
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Greg Banks
3e3b480096 [PATCH] knfsd: add some missing newlines in printks
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Eric W. Biederman
43fa1adb93 [PATCH] file: Add locking to f_getown
This has been needed for a long time, but now with the advent of a
reference counted struct pid there are real consequences for getting this
wrong.

Someone I think it was Oleg Nesterov pointed out that this construct was
missing locking, when I introduced struct pid.  After taking time to review
the locking construct already present I figured out which lock needs to be
taken.  The other paths that access f_owner.pid take either the f_owner
read or the write lock.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu
3fbc964864 [PATCH] Define struct pspace
Define a per-container pid space object.  And create one instance of this
object, init_pspace, to define the entire pid space.  Subsequent patches
will provide/use interfaces to create/destroy pid spaces.

Its a subset/rework of Eric Biederman's patch
http://lkml.org/lkml/2006/2/6/285 .

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Andreas Mohr
ed97bd37ef [PATCH] fs/inode.c tweaks
Only touch inode's i_mtime and i_ctime to make them equal to "now" in case
they aren't yet (don't just update timestamp unconditionally).  Uninline
the hash function to save 259 Bytes.

This tiny inode change which may improve cache behaviour also shaves off 8
Bytes from file_update_time() on i386.

Included a tiny codestyle cleanup, too.

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Alexey Dobriyan
07acaf28d2 [PATCH] Remove NULL check in register_nls()
Everybody passes valid pointer there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman
609d7fa956 [PATCH] file: modify struct fown_struct to use a struct pid
File handles can be requested to send sigio and sigurg to processes.  By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman
f6c7a1f34e [PATCH] proc: give the root directory a task
Helper functions in base.c like proc_pident_readdir and proc_pident_lookup
assume the directories have an associated task, and cannot currently be used
on the /proc root directory because it does not have such a task.

This small changes allows for base.c to be simplified and later when multiple
pid spaces are introduced it makes getting the needed context information
trivial.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman
20cdc894c4 [PATCH] proc: modify proc_pident_lookup to be completely table driven
Currently proc_pident_lookup gets the names and types from a table and then
has a huge switch statement to get the inode and file operations it needs.
That is silly and is becoming increasingly hard to maintain so I just put all
of the information in the table.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman
28a6d67179 [PATCH] proc: reorder the functions in base.c
There were enough changes in my last round of cleaning up proc I had to break
up the patch series into smaller chunks, and my last chunk never got resent.

This patchset gives proc dynamic inode numbers (the static inode numbers were
a pain to maintain and prevent all kinds of things), and removes the horrible
switch statements that had to be kept in sync with everything else.  Being
fully table driver takes us 90% of the way of being able to register new
process specific attributes in proc.

This patch:

Group the functions by what they implement instead of by type of operation.
As it existed base.c was quickly approaching the point where it could not be
followed.

No functionality or code changes asside from adding/removing forward
declartions are implemented in this patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman
0804ef4b0d [PATCH] proc: readdir race fix (take 3)
The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls.  For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.

This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.

Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.

This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned.  For directory that are
either created or destroyed during the readdir you may or may not see them.
 Furthermore you may seek to a directory offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects.  Plus it is a simple semantic to
implement reliable service.  It is just a matter of calling readdir a
second time if you are wondering if something new has show up.

These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less code than
what we are doing now.

Also another very subtle bug in this area has been fixed.  It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader.  This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

[oleg@tv-sign.ru: fix it]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:12 -07:00
Dave Kleikamp
63f83c9fcf JFS: White space cleanup
Removed trailing spaces & tabs, and spaces preceding tabs.
Also a couple very minor comment cleanups.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from f74156539964d7b3d5164fdf8848e6a682f75b97 commit)
2006-10-02 09:55:27 -05:00
Akinobu Mita
087387f90f [PATCH] JFS: return correct error when i-node allocation failed
I have seen confusing behavior on JFS when I injected many intentional
slab allocation errors. The cp command failed with no disk space error
with enough disk space.

This patch makes:

- change the return value in case slab allocation failures happen
  from -ENOSPC to -ENOMEM

- ialloc() return error code so that the caller can know the reason
  of failures

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from 2b46f77976f798f3fe800809a1d0ed38763c71c8 commit)
2006-10-02 09:51:01 -05:00
Tony Breeds
2a6968a978 JFS: Remove shadow variable from fs/jfs/jfs_txnmgr.c:xtLog()
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from bdc3d9e5af7d9c105be734dd7b5c3f1d9425a15a commit)
2006-10-02 09:50:51 -05:00
Steven Whitehouse
d00223f169 [GFS2] Fix code style/indent in ops_file.c
Fix a couple of minor issues.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 10:28:05 -04:00
Andrew Morton
930cc237d6 [GFS2] streamline-generic_file_-interfaces-and-filemap gfs fix
Fix GFS for streamline-generic_file_-interfaces-and-filemap.patch

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 09:02:54 -04:00
Badari Pulavarty
9c9eb21eee [GFS2] Remove readv/writev methods and use aio_read/aio_write instead (gfs bits)
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 09:02:12 -04:00
Steven Whitehouse
59458f40e2 Merge branch 'master' into gfs2 2006-10-02 08:45:08 -04:00
Andi Kleen
d025c9db7f [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern
Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run.  The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks.  The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks.  They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour.  I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps?  I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way.  I ran out of time before doing that though.

The demo prototype scripts weren't very good.  If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance).  If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

  I don't believe that piping as such as neccessarily the right model, but
  the ability to intercept and processes core dumps from user space is asked
  for by many enterprise users as well.  They want to know about, capture,
  analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Andi Kleen
d6cbd281d1 [PATCH] Some cleanup in the pipe code
Split the big and hard to read do_pipe function into smaller pieces.

This creates new create_write_pipe/free_write_pipe/create_read_pipe
functions.  These functions are made global so that they can be used by
other parts of the kernel.

The resulting code is more generic and easier to read and has cleaner error
handling and less gotos.

[akpm@osdl.org: cleanup]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Dave Hansen
ce71ec3684 [PATCH] r/o bind mounts: monitor zeroing of i_nlink
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation.  We need to catch these in addition to the
decrement operations.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Mark Fasheh
17ff785691 [PATCH] r/o bind mounts: clean up OCFS2 nlink handling
OCFS2 does some operations on i_nlink, then reverts them if some of its
operations fail to complete.  This does not fit in well with the
drop_nlink() logic where we expect i_nlink to stay at zero once it gets
there.

So, delay all of the nlink operations until we're sure that the operations
have completed.  Also, introduce a small helper to check whether an inode
has proper "unlinkable" i_nlink counts no matter whether it is a directory
or regular inode.

This patch is broken out from the others because it does contain some
logical changes.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
d8c76e6f45 [PATCH] r/o bind mount prepwork: inc_nlink() helper
This is mostly included for parity with dec_nlink(), where we will have some
more hooks.  This one should stay pretty darn straightforward for now.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
9a53c3a783 [PATCH] r/o bind mounts: unlink: monitor i_nlink
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.

We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.

So, add a little helper function to do the decrements.  We'll tie into it in a
bit to note when i_nlink hits zero.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
aab520e2f6 [PATCH] r/o bind mount prepwork: move open_namei()'s vfs_create()
The code around vfs_create() in open_namei() is getting a bit too complex.
Right now, there is at least the reference count on the dentry, and the
i_mutex to worry about.  Soon, we'll also have mnt_writecount.

So, break the vfs_create() call out of open_namei(), and into a helper
function.  This duplicates the call to may_open(), but that isn't such a bad
thing since the arguments (acc_mode and flag) were being heavily massaged
anyway.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
6902d925d5 [PATCH] r/o bind mounts: prepare for write access checks: collapse if()
We're shortly going to be adding a bunch more permission checks in these
functions.  That requires adding either a bunch of new if() conditions, or
some gotos.  This patch collapses existing if()s and uses gotos instead to
prepare for the upcoming changes.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Jay Lan
8f0ab51479 [PATCH] csa: convert CONFIG tag for extended accounting routines
There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.

Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Badari Pulavarty
eed4e51fb6 [PATCH] Add vector AIO support
This work is initially done by Zach Brown to add support for vectored aio.
These are the core changes for AIO to support
IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.

[akpm@osdl.org: huge build fix]
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Badari Pulavarty
543ade1fc9 [PATCH] Streamline generic_file_* interfaces and filemap cleanups
This patch cleans up generic_file_*_read/write() interfaces.  Christoph
Hellwig gave me the idea for this clean ups.

In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods.  This allows us
to cleanup all variants of generic_file_* routines.

Final available interfaces:

generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler

__generic_file_aio_write_nolock() - internal worker routine

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Badari Pulavarty
ee0b3e671b [PATCH] Remove readv/writev methods and use aio_read/aio_write instead
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Badari Pulavarty
027445c372 [PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jeff Mahoney
9ea0f9499d [PATCH] reiserfs: eliminate minimum window size for bitmap searching
When a file system becomes fragmented (using MythTV, for example), the
bigalloc window searching ends up causing huge performance problems.  In a
file system presented by a user experiencing this bug, the file system was
90% free, but no 32-block free windows existed on the entire file system.
This causes the allocator to scan the entire file system for each 128k
write before backing down to searching for individual blocks.

In the end, finding a contiguous window for all the blocks in a write is an
advantageous special case, but one that can be found naturally when such a
window exists anyway.

This patch removes the bigalloc window searching, and has been proven to
fix the test case described above.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jeff Mahoney
5a2618e6a9 [PATCH] reiserfs: use generic_file_open for open() checks
The other common disk-based file systems (I checked ext[23], xfs, jfs)
check to ensure that opens of files > 2 GB fail unless O_LARGEFILE is
specified.  They check via generic_file_open or their own open routine.

ReiserFS doesn't have an f_op->open defined, and as such, it's possible to
open files > 2 GB without O_LARGEFILE.

This patch adds the f_op->open member to conform with the expected
behavior.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jeff Mahoney
5065227b46 [PATCH] reiserfs: on-demand bitmap loading
This is the patch the three previous ones have been leading up to.

It changes the behavior of ReiserFS from loading and caching all the bitmaps
as special, to treating the bitmaps like any other bit of metadata and just
letting the system-wide caches figure out what to hang on to.

Buffer heads are allocated on the fly, so there is no need to retain pointers
to all of them.  The caching of the metadata occurs when the data is read and
updated, and is considered invalid and uncached until then.

I needed to remove the vs-4040 check for performing a duplicate operation on a
particular bit.  The reason is that while the other sites for working with
bitmaps are allowed to schedule, is_reusable() is called from do_balance(),
which will panic if a schedule occurs in certain places.

The benefit of on-demand bitmaps clearly outweighs a sanity check that depends
on a compile-time option that is discouraged.

[akpm@osdl.org: warning fix]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jeff Mahoney
6f01046b35 [PATCH] reiserfs: reorganize bitmap loading functions
This patch moves the bitmap loading code from super.c to bitmap.c

The code is also restructured somewhat.  The only difference between new
format bitmaps and old format bitmaps is where they are.  That's a two liner
before loading the block to use the correct one.  There's no need for an
entirely separate code path.

The load path is generally the same, with the pattern being to throw out a
bunch of requests and then wait for them, then cache the metadata from the
contents.

Again, like the previous patches, the purpose is to set up for later ones.

Update: There was a bug in the previously posted version of this that resulted
in corruption.  The problem was that bitmap 0 on new format file systems must
be treated specially, and wasn't.  A stupid bug with an easy fix.

This is hopefully the last fix for the disaster that is the reiserfs bitmap
patch set.

If a bitmap block was full, first_zero_hint would end up at zero since it
would never be changed from it's zeroed out value.  This just sets it
beyond the end of the bitmap block.  If any bits are freed, it will be
reset to a valid bit.  When info->free_count = 0, then we already know it's
full.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:27 -07:00
Jeff Mahoney
0b3dc17bc0 [PATCH] reiserfs: clean up bitmap block buffer head references
Similar to the SB_JOURNAL cleanup that was accepted a while ago, this patch
uses a temporary variable for buffer head references from the bitmap info
array.

This makes the code much more readable in some areas.

It also uses proper reference counting, doing a get_bh() after using the
pointer from the array and brelse()'ing it later.  This may seem silly, but a
later patch will replace the simple temporary variables with an actual read,
so the reference freeing will be used then.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:27 -07:00
Jeff Mahoney
e1fabd3ccf [PATCH] reiserfs: fix is_reusable bitmap check to not traverse the bitmap info array
There is a check in is_reusable to determine if a particular block is a bitmap
block.  It verifies this by going through the array of bitmap block buffer
heads and comparing the block number to each one.

Bitmap blocks are at defined locations on the disk in both old and current
formats.  Simply checking against the known good values is enough.

This is a trivial optimization for a non-production codepath, but this is the
first in a series of patches that will ultimately remove the buffer heads from
that array.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:27 -07:00
Petr Vandrovec
54f67f631d [PATCH] Move ncpfs 32bit compat ioctl to ncpfs
The ncp specific compat ioctls are clearly local to one file system, so the
code can better live there.

This version of the patch moves everything into the generic ioctl handler
and uses it for both 32 and 64 bit calls.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:23 -07:00
Josef 'Jeff' Sipek
f5579f8c7d [PATCH] VFS: Use SEEK_{SET, CUR, END} instead of hardcoded values
VFS: Use SEEK_{SET,CUR,END} instead of hardcoded values

Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:21 -07:00
Alexey Dobriyan
82b0547cfa [PATCH] Create fs/utimes.c
* fs/open.c is getting bit crowdy
* preparation to lutimes(2)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:19 -07:00
Alexey Dobriyan
52978be636 [PATCH] kmemdup: some users
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:19 -07:00
Richard Knutsson
130c6b9898 [PATCH] fs/partitions: Conversion to generic boolean
Conversion of booleans to: generic-boolean.patch (2006-08-23)

Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:19 -07:00
Richard Knutsson
4d81715fc5 [PATCH] fs/jfs: Conversion to generic boolean
Conversion of booleans to: generic-boolean.patch (2006-08-23)

Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:19 -07:00
Richard Knutsson
c49c311150 [PATCH] fs/ntfs: Conversion to generic boolean
Conversion of booleans to: generic-boolean.patch (2006-08-23)

Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:19 -07:00
Jens Axboe
0fe2347957 [PATCH] Update axboe@suse.de email address
As people often look for the copyright in files to see who to mail,
update the link to a neutral one.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:34 +02:00
Milan Broz
50be345560 [PATCH] fix creating zero sized bio mempools in low memory system
In the very low memory systems is in the init_bio call
scale parameter set to zero and it leads to creating
zero sized mempool.

This patch prevents pool_entries parameter become zero,
so the created pool have at least 1 entry.

Mempool with 0 entries lead to incorrect behaviour
of mempool_free. (Alloc requests are not waken up
and system stalls in mempool_alloc->ioschedule).

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:33 +02:00
Andrew Morton
5e6d12b2c8 [PATCH] CONFIG_BLOCK internal.h cleanups
- forward declare struct superblock
- use inlines, not macros

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:32 +02:00
David Howells
9361401eb7 [PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer.  Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

 (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
     support.

 (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
     an item that uses the block layer.  This includes:

     (*) Block I/O tracing.

     (*) Disk partition code.

     (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

     (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
     	 block layer to do scheduling.  Some drivers that use SCSI facilities -
     	 such as USB storage - end up disabled indirectly from this.

     (*) Various block-based device drivers, such as IDE and the old CDROM
     	 drivers.

     (*) MTD blockdev handling and FTL.

     (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
     	 taking a leaf out of JFFS2's book.

 (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
     linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
     however, still used in places, and so is still available.

 (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
     parts of linux/fs.h.

 (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

 (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

 (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
     is not enabled.

 (*) fs/no-block.c is created to hold out-of-line stubs and things that are
     required when CONFIG_BLOCK is not set:

     (*) Default blockdev file operations (to give error ENODEV on opening).

 (*) Makes some /proc changes:

     (*) /proc/devices does not list any blockdevs.

     (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

 (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

 (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
     given command other than Q_SYNC or if a special device is specified.

 (*) In init/do_mounts.c, no reference is made to the blockdev routines if
     CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.

 (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
     error ENOSYS by way of cond_syscall if so).

 (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
     CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:31 +02:00
David Howells
d366e40a1c [PATCH] BLOCK: Remove no-longer necessary linux/buffer_head.h inclusions [try #6]
Remove inclusions of linux/buffer_head.h that are no longer necessary due to the
transfer of a number of things out of there.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:31 +02:00
David Howells
4cb50dc2ea [PATCH] BLOCK: Remove no-longer necessary linux/mpage.h inclusions [try #6]
Remove inclusions of linux/mpage.h that are no longer necessary due to the
transfer of generic_writepages().

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:30 +02:00
David Howells
188f83dfe0 [PATCH] BLOCK: Move the msdos device ioctl compat stuff to the msdos driver [try #6]
Move the msdos device ioctl compat stuff from fs/compat_ioctl.c to the msdos
driver so that the msdos header file doesn't need to be included.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:30 +02:00
David Howells
52a700c567 [PATCH] BLOCK: Move the Ext3 device ioctl compat stuff to the Ext3 driver [try #6]
Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3
driver so that the Ext3 header file doesn't need to be included.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:29 +02:00
David Howells
e322ff07fb [PATCH] BLOCK: Move the Ext2 device ioctl compat stuff to the Ext2 driver [try #6]
Move the Ext2 device ioctl compat stuff from fs/compat_ioctl.c to the Ext2
driver so that the Ext2 header file doesn't need to be included.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:29 +02:00
David Howells
52b499c438 [PATCH] BLOCK: Move the ReiserFS device ioctl compat stuff to the ReiserFS driver [try #6]
Move the ReiserFS device ioctl compat stuff from fs/compat_ioctl.c to the
ReiserFS driver so that the ReiserFS header file doesn't need to be included.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:28 +02:00
David Howells
36695673b0 [PATCH] BLOCK: Move common FS-specific ioctls to linux/fs.h [try #6]
Move common FS-specific ioctls from linux/ext2_fs.h to linux/fs.h as FS_IOC_*
and FS_IOC32_* and have the users of them use those as a base.

Also move the GETFLAGS/SETFLAGS flags to linux/fs.h as FS_*_FL macros, and then
have the other users use them as a base.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:28 +02:00
David Howells
863d5b822c [PATCH] BLOCK: Move the loop device ioctl compat stuff to the loop driver [try #6]
Move the loop device ioctl compat stuff from fs/compat_ioctl.c to the loop
driver so that the loop header file doesn't need to be included.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:27 +02:00
David Howells
b71e8a4ce0 [PATCH] BLOCK: Move __invalidate_device() to block_dev.c [try #6]
Move __invalidate_device() from fs/inode.c to fs/block_dev.c so that it can
more easily be disabled when the block layer is disabled.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:27 +02:00
David Howells
811d736f9e [PATCH] BLOCK: Dissociate generic_writepages() from mpage stuff [try #6]
Dissociate the generic_writepages() function from the mpage stuff, moving its
declaration to linux/mm.h and actually emitting a full implementation into
mm/page-writeback.c.

The implementation is a partial duplicate of mpage_writepages() with all BIO
references removed.

It is used by NFS to do writeback.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:26 +02:00
David Howells
7b0de42d7c [PATCH] BLOCK: Remove dependence on existence of blockdev_superblock [try #6]
Move blockdev_superblock extern declaration from fs/fs-writeback.c to a
headerfile and remove the dependence on it by wrapping it in a macro.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:26 +02:00
David Howells
07f3f05c1e [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6]
Create a new header file, fs/internal.h, for common definitions local to the
sources in the fs/ directory.

Move extern definitions that should be in header files from fs/*.c to
fs/internal.h or other main header files where they span directories.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:18 +02:00
David Howells
65e6f5bc81 [PATCH] BLOCK: Don't call block_sync_page() from AFS [try #6]
The AFS filesystem no longer needs to override its sync_page() op.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:32:12 +02:00
David Howells
cf9a2ae8d4 [PATCH] BLOCK: Move functions out of buffer code [try #6]
Move some functions out of the buffering code that aren't strictly buffering
specific.  This is a precursor to being able to disable the block layer.

 (*) Moved some stuff out of fs/buffer.c:

     (*) The file sync and general sync stuff moved to fs/sync.c.

     (*) The superblock sync stuff moved to fs/super.c.

     (*) do_invalidatepage() moved to mm/truncate.c.

     (*) try_to_release_page() moved to mm/filemap.c.

 (*) Moved some related declarations between header files:

     (*) declarations for do_invalidatepage() and try_to_release_page() moved
     	 to linux/mm.h.

     (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:31:19 +02:00
Oleg Nesterov
cf342e52e3 [PATCH] Don't need to disable interrupts for tasklist_lock
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:31:18 +02:00
Jens Axboe
caa38fb0f4 [PATCH] ext3: make meta data reads use READ_META
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:42 +02:00
Jens Axboe
fc46379daf [PATCH] cfq-iosched: kill cfq_exit_lock
cfq_exit_lock is protecting two things now:

- The per-ioc rbtree of cfq_io_contexts

- The per-cfqd linked list of cfq_io_contexts

The per-cfqd linked list can be protected by the queue lock, as it is (by
definition) per cfqd as the queue lock is.

The per-ioc rbtree is mainly used and updated by the process itself only.
The only outside use is the io priority changing. If we move the
priority changing to not browsing the rbtree, we can remove any locking
from the rbtree updates and lookup completely. Let the sys_ioprio syscall
just mark processes as having the iopriority changed and lazily update
the private cfq io contexts the next time io is queued, and we can
remove this locking as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:36 +02:00
Linus Torvalds
c0341b0f47 Merge git://oss.sgi.com:8090/xfs/xfs-2.6
* git://oss.sgi.com:8090/xfs/xfs-2.6: (49 commits)
  [XFS] Remove v1 dir trace macro - missed in a past commit.
  [XFS] 955947: Infinite loop in xfs_bulkstat() on formatter() error
  [XFS] pv 956241, author: nathans, rv: vapo - make ino validation checks
  [XFS] pv 956240, author: nathans, rv: vapo - Minor fixes in
  [XFS] Really fix use after free in xfs_iunpin.
  [XFS] Collapse sv_init and init_sv into just the one interface.
  [XFS] standardize on one sema init macro
  [XFS] Reduce endian flipping in alloc_btree, same as was done for
  [XFS] Minor cleanup from dio locking fix, remove an extra conditional.
  [XFS] Fix kmem_zalloc_greedy warnings on 64 bit platforms.
  [XFS] pv 955157, rv bnaujok - break the loop on EFAULT formatter() error
  [XFS] pv 955157, rv bnaujok - break the loop on formatter() error
  [XFS] Fixes the leak in reservation space because we weren't ungranting
  [XFS] Add lock annotations to xfs_trans_update_ail and
  [XFS] Fix a porting botch on the realtime subvol growfs code path.
  [XFS] Minor code rearranging and cleanup to prevent some coverity false
  [XFS] Remove a no-longer-correct debug assert from dio completion
  [XFS] Add a greedy allocation interface, allocating within a min/max size
  [XFS] Improve error handling for the zero-fsblock extent detection code.
  [XFS] Be more defensive with page flags (error/private) for metadata
  ...
2006-09-29 09:36:55 -07:00
Vivek Goyal
632dd2053a [PATCH] Kcore elf note namesz field fix
o As per ELF specifications, it looks like that elf note "namesz" field
  contains the length of "name" including the size of null character.  And
  currently we are filling "namesz" without taking into the consideration
  the null character size.

o Kexec-tools performs this check deligently hence I ran into the issue
  while trying to open /proc/kcore in kexec-tools for some info.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:25 -07:00
Andrew Morton
327dcaadc0 [PATCH] expand_fdtable(): remove pointless unlock+lock
This unlock/lock on a super-unlikely path isn't worth the kernel text.

Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:25 -07:00
Vadim Lobanov
74d392aaab [PATCH] Clean up expand_fdtable() and expand_files()
Perform a code cleanup against the expand_fdtable() and expand_files()
functions inside fs/file.c.  It aims to make the flow of code within these
functions simpler and easier to understand, via added comments and modest
refactoring.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:25 -07:00
Andreas Gruenbacher
39f0247d38 [PATCH] Access Control Lists for tmpfs
Add access control lists for tmpfs.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:24 -07:00
Andreas Gruenbacher
f0c8bd164e [PATCH] Generic infrastructure for acls
The patches solve the following problem: We want to grant access to devices
based on who is logged in from where, etc.  This includes switching back and
forth between multiple user sessions, etc.

Using ACLs to define device access for logged-in users gives us all the
flexibility we need in order to fully solve the problem.

Device special files nowadays usually live on tmpfs, hence tmpfs ACLs.

Different distros have come up with solutions that solve the problem to
different degrees: SUSE uses a resource manager which tracks login sessions
and sets ACLs on device inodes as appropriate.  RedHat uses pam_console, which
changes the primary file ownership to the logged-in user.  Others use a set of
groups that users must be in in order to be granted the appropriate accesses.

The freedesktop.org project plans to implement a combination of a
console-tracker and a HAL-device-list based solution to grant access to
devices to users, and more distros will likely follow this approach.

These patches have first been posted here on 2 February 2005, and again
on 8 January 2006. We have been shipping them in SLES9 and SLES10 with
no problems reported.  The previous submission is archived here:

   http://lkml.org/lkml/2006/1/8/229
   http://lkml.org/lkml/2006/1/8/230
   http://lkml.org/lkml/2006/1/8/231

This patch:

Add some infrastructure for access control lists on in-memory
filesystems such as tmpfs.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:24 -07:00
Chris Snook
4e6fd33b75 [PATCH] enforce RLIMIT_NOFILE in poll()
POSIX states that poll() shall fail with EINVAL if nfds > OPEN_MAX.  In
this context, POSIX is referring to sysconf(OPEN_MAX), which is the value
of current->signal->rlim[RLIMIT_NOFILE].rlim_cur in the linux kernel, not
the compile-time constant which happens to also be named OPEN_MAX.  In the
current code, an application may poll up to max_fdset file descriptors,
even if this exceeds RLIMIT_NOFILE.  The current code also breaks
applications which poll more than max_fdset descriptors, which worked circa
2.4.18 when the check was against NR_OPEN, which is 1024*1024.  This patch
enforces the limit precisely as POSIX defines, even if RLIMIT_NOFILE has
been changed at run time with ulimit -n.

To elaborate on the rationale for this, there are three cases:

1) RLIMIT_NOFILE is at the default value of 1024

In this (default) case, the patch changes nothing.  Calls with nfds > 1024
fail with EINVAL both before and after the patch, and calls with nfds <=
1024 pass the check both before and after the patch, since 1024 is the
initial value of max_fdset.

2) RLIMIT_NOFILE has been raised above the default

In this case, poll() becomes more permissive, allowing polling up to
RLIMIT_NOFILE file descriptors even if less than 1024 have been opened.
The patch won't introduce new errors here.  If an application somehow
depends on poll() failing when it polls with duplicate or invalid file
descriptors, it's already broken, since this is already allowed below 1024,
and will also work above 1024 if enough file descriptors have been open at
some point to cause max_fdset to have been increased above nfds.

3) RLIMIT_NOFILE has been lowered below the default

In this case, the system administrator or the user has gone out of their
way to protect the system from inefficient (or malicious) applications
wasting kernel memory.  The current code allows polling up to 1024 file
descriptors even if RLIMIT_NOFILE is much lower, which is not what the user
or administrator intended.  Well-written applications which only poll
valid, unique file descriptors will never notice the difference, because
they'll hit the limit on open() first.  If an application gets broken
because of the patch in this case, then it was already poorly/maliciously
designed, and allowing it to work in the past was a violation of POSIX and
a DoS risk on low-resource systems.

With this patch, poll() will permit exactly what POSIX suggests, no more,
no less, and for any run-time value set with ulimit -n, not just 256 or
1024.  There are existing apps which which poll a large number of file
descriptors, some of which may be invalid, and if those numbers stradle
1024, they currently fail with or without the patch in -mm, though they
worked fine under 2.4.18.

Signed-off-by: Chris Snook <csnook@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:23 -07:00
Andreas Mohr
e518ddb7ba [PATCH] fs/namei.c: replace multiple current->fs by shortcut variable
Replace current->fs by fs helper variable to reduce some indirection
overhead and (at least at the moment, before the current_thread_info() %gs
PDA improvement is available) get rid of more costly current references.
Reduces fs/namei.o from 37786 to 37082 Bytes (704 Bytes saved).

[akpm@osdl.org: cleanup]
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:22 -07:00
Alexey Dobriyan
6ea36ddbd1 [PATCH] Ban register_filesystem(NULL);
Everyone passes valid pointer there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:20 -07:00
Alexey Dobriyan
d826380b30 [PATCH] 9p: fix leak on error path
If register_filesystem() fails mux workqueue must be killed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@lanl.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:20 -07:00
Alexey Dobriyan
368bdb3d61 [PATCH] cramfs: make cramfs_uncompress_exit() return void
It always returns 0, so relying on it is useless.  The only caller isn't
checking return value.  In general, un-, de-, -free functions should return
void.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:20 -07:00
Alexey Dobriyan
a4376e13ce [PATCH] freevxfs: fix leak on error path
If register_filesystem() fails, vxfs_inode cache must be destroyed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:20 -07:00
Alexey Dobriyan
50d44ed009 [PATCH] cramfs: rewrite init_cramfs_fs()
Two lines -- two bugs. :-(

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:20 -07:00
Frederik Deweerdt
f7ca54f486 [PATCH] fix mem_write() return value
At the beginning of the routine, "copied" is set to 0, but it is no good
because in lines 805 and 812 it is set to other values.  Finally, the
routine returns as if it copied 12 (=ENOMEM) bytes less than it actually
did.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:19 -07:00
Jason Baron
87d7c8aca8 [PATCH] block_dev.c mutex_lock_nested() fix
In the case below we are locking the whole disk not a partition.  This
change simply brings the code in line with the piece above where when we
are the 'first' opener, and we are a partition.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:19 -07:00
Ian Kent
44938af6e0 [PATCH] autofs4: pending flag not cleared on mount fail
During testing I've found that the mount pending flag can be left set at
exit from autofs4_lookup after a failed mount request.  This shouldn't be
allowed to happen and causes incorrect error returns.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:18 -07:00
Ian Kent
be3ca7fecb [PATCH] autofs4: autofs4_follow_link false negative fix
The check for an empty directory in the autofs4_follow_link method fails
occassionally due to old dentrys.  We had the same problem
autofs4_revalidate ages ago.  I thought we wouldn't need this in
autofs4_follow_link, silly me.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:18 -07:00
Jonathan Corbet
cf3e43dbe0 [PATCH] cdev documentation
Add some documentation comments for the cdev interface.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:16 -07:00
Alan Cox
3cfd0885fa [PATCH] tty: stop the tty vanishing under procfs access
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:16 -07:00
Joel & Rebecca VanderZee
fb50ae7446 [PATCH] I/O Error attempting to read last partial block of a file in an ISO9660 file system
There was an I/O error that prevented reading the last partial block of
large files in an ISO9660 filesystem.  The error was generated when a file
comprised more than one section and had a size that was not an exact
multiple of the filesystem block size.  This patch removes the check (and
failure) for reading into the last partial block (and possibly beyond) for
multiple-section files.

It worked in my testing to prevent reading beyond the end of the section;
my first patch just incremented the sect_size block count for a partial
block and continued doing the check.  But there is a commment in the source
code about reading beyond the end of the file to fill a page cache.
Failing to access beyond the section would prevent reading beyond the end
of the file.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:15 -07:00
Jan Kara
b525a7e444 [PATCH] dquot: add proper locking when using current->signal->tty
Dquot passes the tty to tty_write_message without locking

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:14 -07:00
Oleg Nesterov
bce9a234ce [PATCH] elf_fdpic_core_dump: don't take tasklist_lock
do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.

Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:14 -07:00
Oleg Nesterov
486ccb05fd [PATCH] elf_core_dump: don't take tasklist_lock
do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.

Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:14 -07:00
Ernie Petrides
ee731f4f78 [PATCH] fix wrong error code on interrupted close syscalls
The problem is that close() syscalls can call a file system's flush
handler, which in turn might sleep interruptibly and ultimately pass back
an -ERESTARTSYS return value.  This happens for files backed by an
interruptible NFS mount under nfs_file_flush() when a large file has just
been written and nfs_wait_bit_interruptible() detects that there is a
signal pending.

I have a test case where the "strace" command is used to attach to a
process sleeping in such a close().  Since the SIGSTOP is forced onto the
victim process (removing it from the thread's "blocked" mask in
force_sig_info()), the RPC wait is interrupted and the close() is
terminated early.

But the file table entry has already been cleared before the flush handler
was called.  Thus, when the syscall is restarted, the file descriptor
appears closed and an EBADF error is returned (which is wrong).  What's
worse, there is the hypothetical case where another thread of a
multi-threaded application might have reused the file descriptor, in which
case that file would be mistakenly closed.

The bottom line is that close() syscalls are not restartable, and thus
-ERESTARTSYS return values should be mapped to -EINTR.  This is consistent
with the close(2) manual page.  The fix is below.

Signed-off-by: Ernie Petrides <petrides@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:13 -07:00
Amos Waterland
01d553d0fe [PATCH] Chardev checking of overlapping ranges
The code in __register_chrdev_region checks that if the driver wishing to
register has the same major as an existing driver the new minor range is
strictly less than the existing minor range.  However, it does not also
check that the new minor range is strictly greater than the existing minor
range.  That is, if driver X has registered with major=x and minor=0-3,
__register_chrdev_region will allow driver Y to register with major=x and
minor=1-4.

Signed-off-by: Amos Waterland <apw@us.ibm.com>
Cc: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Kirill Korotaev
3b9b8ab65d [PATCH] Fix unserialized task->files changing
Fixed race on put_files_struct on exec with proc.  Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Chris Mason
ae78bf9c4f [PATCH] add -o flush for fat
Fat is commonly used on removable media.  Mounting with -o flush tells the
FS to write things to disk as quickly as possible.  It is like -o sync, but
much faster (and not as safe).

Signed-off-by: Chris Mason <mason@suse.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Alexey Dobriyan
50462062a0 [PATCH] fs.h: ifdef security fields
[assuming BSD security levels are deleted]
The only user of i_security, f_security, s_security fields is SELinux,
however, quite a few security modules are trying to get into kernel.
So, wrap them under CONFIG_SECURITY. Adding config option for each
security field is likely an overkill.

Following Stephen Smalley's suggestion, i_security initialization is
moved to security_inode_alloc() to not clutter core code with ifdefs
and make alloc_inode() codepath tiny little bit smaller and faster.

The user of (highly greppable) struct fown_struct::security field is
still to be found. I've checked every "fown_struct" and every "f_owner"
occurence. Additionally it's removal doesn't break i386 allmodconfig
build.

struct inode, struct file, struct super_block, struct fown_struct
become smaller.

P.S. Combined with two reiserfs inode shrinking patches sent to
linux-fsdevel, I can finally suck 12 reiserfs inodes into one page.

		/proc/slabinfo

	-ext2_inode_cache	388	10
	+ext2_inode_cache	384	10
	-inode_cache		280	14
	+inode_cache		276	14
	-proc_inode_cache	296	13
	+proc_inode_cache	292	13
	-reiser_inode_cache	336	11
	+reiser_inode_cache	332	12 <=
	-shmem_inode_cache	372	10
	+shmem_inode_cache	368	10

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:11 -07:00
Alexey Dobriyan
cfe14677f2 [PATCH] reiserfs: ifdef ACL stuff from inode
Shrink reiserfs inode more (by 8 bytes) for ACL non-users:

	-reiser_inode_cache     344     11
	+reiser_inode_cache     336     11

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:11 -07:00
Alexey Dobriyan
068fbb315d [PATCH] reiserfs: ifdef xattr_sem
Shrink reiserfs inode by 12 bytes for xattr non-users (me).

	-reiser_inode_cache     356     11
	+reiser_inode_cache     344     11

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:11 -07:00
Chris Mason
a317202714 [PATCH] Fix reiserfs latencies caused by data=ordered
ReiserFS does periodic cleanup of old transactions in order to limit the
length of time a journal replay may take after a crash.  Sometimes, writing
metadata from an old (already committed) transaction may require committing
a newer transaction, which also requires writing all data=ordered buffers.
This can cause very long stalls on journal_begin.

This patch makes sure new transactions will not need to be committed before
trying a periodic reclaim of an old transaction.  It is low risk because if
a bad decision is made, it just means a slightly longer journal replay
after a crash.

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:11 -07:00
Chris Mason
25736b1c69 [PATCH] reiserfs_fsync should only use barriers when they are enabled
make sure that reiserfs_fsync only triggers barriers when mounted with -o
barrier=flush

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:11 -07:00
Eric Sandeen
39b3f6d6e9 [PATCH] mount udf UDF_PART_FLAG_READ_ONLY partitions with MS_RDONLY
There's a bug where a UDF_PART_FLAG_READ_ONLY udf partition gets mounted
read-write, then subsequent problems happen; files seem to be able to be
removed, but file creation results in EIO or worse, oops.

EIO is coming from udf_new_block(), which returns EIO if the right flags
aren't set; only UDF_PART_FLAG_READ_ONLY is set in this case.  We probably
s hould not have gotten this far...

Attached patch seems to fix it - and includes a printk to alert the user
that their "rw" mount request has been converted to "ro."

Here's the testcase I used:

[root@magnesium ~]# mkisofs -R -J -udf -o testiso /tmp/
...
Total translation table size: 0
Total rockridge attributes bytes: 342923
Total directory bytes: 382312
Path table size(bytes): 104
Max brk space used 103000
105059 extents written (205 MB)

[root@magnesium ~]# mount -o loop testiso /mnt/test/
[root@magnesium ~]# ls /mnt/test/fsfile
/mnt/test/fsfile
[root@magnesium ~]# rm /mnt/test/fsfile
[root@magnesium ~]# ls /mnt/test/fsfile
ls: /mnt/test/fsfile: No such file or directory
[root@magnesium ~]# touch /mnt/test/fsfile
touch: cannot touch `/mnt/test/fsfile': Input/output error
[root@magnesium tmp]# grep udf /proc/mounts
/dev/loop1 /mnt/test udf rw 0 0

Force readonly mounts of UDF partitions marked as read-only.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:09 -07:00
Olaf Hering
e1dfa92dca [PATCH] ignore partition table on disks with AIX label
The on-disk data structures from AIX are not known, also the filesystem
layout is not known.  There is a msdos partition signature at the end of
the first block, and the kernel recognizes 3 small (and overlapping)
partitions.  But they are not usable.  Maybe the firmware uses it to find
the bootloader for AIX, but AIX boots also if the first block is cleared.

This is the content of the partition table:
 # dd if=/dev/sdb count=$(( 4 * 16 )) bs=1 skip=$(( 0x1be )) | xxd
0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000010: 80ff ffff 41ff ffff 1b11 0000 381b 0000  ....A.......8...
0000020: 00ff ffff 41ff ffff 0211 0000 1900 0000  ....A...........
0000030: 80ff ffff 41ff ffff 1b11 0000 381b 0000  ....A.......8...

Handle the whole disk as empty disk.

This fixes also YaST which compares the output from parted (and formerly
fdisk) with /proc/partitions.  fdisk recognizes the AIX label since a long
time, SuSE has a patch for parted to handle the disk label as unknown.

dmesg will look like this:
 sda: [AIX]  unknown partition table

Tested on an IBM B50 with AIX V4.3.3.

Signed-off-by: Olaf Hering <olh@suse.de>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:09 -07:00
Miklos Szeredi
650a898342 [PATCH] vfs: define new lookup flag for chdir
In the "operation does permission checking" model used by fuse, chdir
permission is not checked, since there's no chdir method.

For this case set a lookup flag, which will be passed to ->permission(), so
fuse can distinguish it from permission checks for other operations.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:08 -07:00
Miklos Szeredi
5b35e8e58a [PATCH] fuse: use dentry in statfs
Some filesystems may want to report different values depending on the path
within the filesystem, i.e.  one mount is actually several filesystems.  This
can be the case for a network filesystem exported by an unprivileged server
(e.g.  sshfs).

This is now possible, thanks to David Howells "VFS: Permit filesystem to
perform statfs with a known root dentry" patch.

This change is backward compatible, so no need to change interface version.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:08 -07:00
Eugene Teo
8454aeef6f [PATCH] Require mmap handler for a.out executables
Files supported by fs/proc/base.c, i.e.  /proc/<pid>/*, are not capable of
meeting the validity checks in ELF load_elf_*() handling because they have
no mmap handler which is required by ELF.  In order to stop a.out
executables being used as part of an exploit attack against /proc-related
vulnerabilities, we make a.out executables depend on ->mmap() existing.

Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:08 -07:00
Josh Triplett
9c4dbee79d [PATCH] fs: add lock annotation to grab_super
grab_super gets called with sb_lock held, and releases it.  Add a lock
annotation to this function so that sparse can check callers for lock
pairing, and so that sparse will not complain about this function since it
intentionally uses the lock in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:08 -07:00
Josh Triplett
ddc0a51d2e [PATCH] hugetlbfs: add lock annotation to hugetlbfs_forget_inode()
hugetlbfs_forget_inode releases inode_lock.  Add a lock annotation to this
function so that sparse can check callers for lock pairing, and so that
sparse will not complain about this functions since it intentionally uses
the lock in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:08 -07:00
Josh Triplett
105f4d7a81 [PATCH] fuse: add lock annotations to request_end and fuse_read_interrupt
request_end and fuse_read_interrupt release fc->lock.  Add lock annotations
to these two functions so that sparse can check callers for lock pairing,
and so that sparse will not complain about these functions since they
intentionally use locks in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:08 -07:00
Josh Triplett
99fc705996 [PATCH] afs: add lock annotations to afs_proc_cell_servers_{start,stop}
afs_proc_cell_servers_start acquires a lock, and afs_proc_cell_servers_stop
releases that lock.  Add lock annotations to these two functions so that
sparse can check callers for lock pairing, and so that sparse will not
complain about these functions since they intentionally use locks in this
manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:07 -07:00
Josh Triplett
58f555e5f6 [PATCH] mbcache: add lock annotation for __mb_cache_entry_release_unlock()
__mb_cache_entry_release_unlock releases mb_cache_spinlock, so annotate it
accordingly.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:07 -07:00
Olaf Hering
42012cc4a2 [PATCH] use gcc -O1 in fs/reiserfs only for ancient gcc versions
Only compile with -O1 if the (very old) compiler is broken.  We use
reiserfs alot since SLES9 on ppc64, and it was never seen with gcc33.
Assume the broken gcc is gcc-3.4 or older.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:07 -07:00
Pekka J Enberg
c0d92cbc58 [PATCH] libfs: remove page up-to-date check from simple_readpage
Remove the unnecessary PageUptodate check from simple_readpage.  The only
two callers for ->readpage that don't have explicit PageUptodate check are
read_cache_pages and page_cache_read which operate on newly allocated pages
which don't have the flag set.

[akpm: use the allegedly-faster clear_page(), too]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:06 -07:00
Randy Dunlap
15a67dd8cc [PATCH] fs/namespace: handle init/registration errors
Check and handle init errors.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:05 -07:00
Andrew Morton
4d7dd8fd95 [PATCH] blockdev.c: check driver layer errors
Check driver layer errors.

Fix from: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>

In blockdevc-check-errors.patch, add_bd_holder() is modified to return error
values when some of its operation failed.  Among them, it returns -EEXIST when
a given bd_holder object already exists in the list.

However, in this case, the function completed its work successfully and need
no action by its caller other than freeing unused bd_holder object.  So I
think it's better to return success after freeing by itself.

Otherwise, bd_claim-ing with same claim pointer will fail.
Typically, lvresize will fails with following message:
  device-mapper: reload ioctl failed: Invalid argument
and you'll see messages like below in kernel log:
  device-mapper: table: 254:13: linear: dm-linear: Device lookup failed
  device-mapper: ioctl: error adding target to table

Similarly, it should not add bd_holder to the list if either one of symlinking
fails.  I don't have a test case for this to happen but it should cause
dereference of freed pointer.

If a matching bd_holder is found in bd_holder_list, add_bd_holder() completes
its job by just incrementing the reference count.  In this case, it should be
considered as success but it used to return 'fail' to let the caller free
temporary bd_holder.  Fixed it to return success and free given object by
itself.

Also, if either one of symlinking fails, the bd_holder should not be added to
the list so that it can be discarded later.  Otherwise, the caller will free
bd_holder which is in the list.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:04 -07:00
Zoltan Menyhart
d1807793e1 [PATCH] JBD: memory leak in "journal_init_dev()"
We leak a bh ref in "journal_init_dev()" in case of failure.

Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:03 -07:00
Dave Kleikamp
f71b2f10f5 [PATCH] JBD: Make journal_brelse_array() static
It's always good to make symbols static when we can, and this also eliminates
the need to rename the function in jbd2

Suggested by Eric Sandeen.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:03 -07:00
Tim Shimmin
65e8697a12 [XFS] Remove v1 dir trace macro - missed in a past commit.
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-29 15:23:02 +10:00
Theodore Ts'o
825f9075d7 [GFS2] inode-diet: Eliminate i_blksize from the inode structure
This eliminates the i_blksize field from struct inode.  Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.

Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-28 08:32:51 -04:00
Theodore Ts'o
bba9dfd835 [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86.  (It would be more on an x86_64 system).  This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer.  Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer.  This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-28 08:32:24 -04:00
Steven Whitehouse
185a257f2f Merge branch 'master' into gfs2 2006-09-28 08:29:59 -04:00
Vlad Apostolov
6e73b41888 [XFS] 955947: Infinite loop in xfs_bulkstat() on formatter() error
SGI-PV: 955947
SGI-Modid: xfs-linux-melb:xfs-kern:26986a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-28 11:06:21 +10:00
Vlad Apostolov
6f1f216840 [XFS] pv 956241, author: nathans, rv: vapo - make ino validation checks
consistent in bulkstat

SGI-PV: 956241
SGI-Modid: xfs-linux-melb:xfs-kern:26984a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-28 11:06:15 +10:00
Vlad Apostolov
6216ff1883 [XFS] pv 956240, author: nathans, rv: vapo - Minor fixes in
kmem_zalloc_greedy()

SGI-PV: 956240
SGI-Modid: xfs-linux-melb:xfs-kern:26983a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-28 11:06:10 +10:00