Commit Graph

1029 Commits

Author SHA1 Message Date
Kostik Belousov
411b67b4b6 [PATCH] readv/writev syscalls are not checked by lsm
it seems that readv(2)/writev(2) syscalls do not call
file_permission callback. Looks like this is overlook.

I have filled the issue into redhat bugzilla as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=169433
and got the recommendation to post this on lsm mailing list.

The following trivial patch solves the problem.

Signed-off-by: Kostik Belousov <kostikbel@gmail.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
2005-09-29 15:42:08 -07:00
Paul Mackerras
ab11d1ea28 Merge by hand from Linus' tree.
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-09-29 13:13:36 +10:00
Davide Libenzi
e3306dd5f7 [PATCH] epoll: handle timeout overflow
Handle the timeout upper boundary for epoll.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:41 -07:00
Latchesar Ionkov
0b8dd17762 [PATCH] v9fs: fix races in fid allocation
Fid management cleanup.  The patch attempts to fix the races in dentry's
fid management.

Dentries don't keep the opened fids anymore, they are moved to the file
structs.  Ideally there should be no more than one fid with fidcreate equal
to zero in the dentry's list of fids.

v9fs_fid_create initializes the important fields (fid, fidcreated) before
v9fs_fid is added to the list.  v9fs_fid_lookup returns only fids that are
not created by v9fs_create.  v9fs_fid_get_created returns the fid created
by the same process by v9fs_create (if any) and removes it from dentry's
list

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Chris Sykes
dc7b5fd6b0 [PATCH] Fix ext3_new_inode() failure paths
Fix failure paths in ext3_new_inode() and clean up duplicated code: -
DQUOT_DROP() was not being called if ext3_init_security() failed.

Signed-off-by: Chris Sykes <chris@sigsegv.plus.com>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Chris Sykes
9ed6c2fb34 [PATCH] Fix ext2_new_inode() failure paths
Fix failure paths in ext2_new_inode() and clean up duplicated code: -
DQUOT_DROP() was not being called if ext2_init_security() failed.

Signed-off-by: Chris Sykes <chris@sigsegv.plus.com>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Miklos Szeredi
ee4e52719c [PATCH] fuse: check reserved node ID values
This patch checks reserved node ID values returned by lookup and creation
operations.  In case one of the reserved values is sent, return -EIO.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Miklos Szeredi
909021ea7a [PATCH] fuse: add required version info
Add information about required version of the userspace library/utilities
to Documentation/Changes.  Also add pointer to this and to FUSE
documentation from Kconfig.

Thanks to Anton Altaparmakov for the reminder.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Anton Altaparmakov
e2fcc61ef0 NTFS: Re-fix sparse warnings in a more correct way, i.e. don't use an enum with
different types in it but #define the two constants instead.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-26 17:02:41 +01:00
Anton Altaparmakov
e8c2cd99a3 Merge branch 'master' of /home/src/linux-2.6/ 2005-09-26 10:50:29 +01:00
Anton Altaparmakov
5a8c0cc32b NTFS: More $LogFile handling fixes: when chkdsk has been run, it can leave the
restart pages in the journal without multi sector transfer protection
      fixups (i.e. the update sequence array is empty and in fact does not
      exist).

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-26 10:48:54 +01:00
Anton Altaparmakov
838bf9675a NTFS: Fix the definition of the CHKD ntfs record magic. It had an off by
two error causing it to be CHKB instead of CHKD.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-26 10:45:46 +01:00
Paul Mackerras
14cf11af6c powerpc: Merge enough to start building in arch/powerpc.
This creates the directory structure under arch/powerpc and a bunch
of Kconfig files.  It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac.  This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.

For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel.  This makes some minor changes
to files in those directories and files outside arch/powerpc.

The boot directory is still not merged.  That's going to be interesting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-09-26 16:04:21 +10:00
Steve French
ede1327ea4 [PATCH] cifs: Add support for suspend
cifsd had been preventing software suspend from completing.

Signed-off-by: pavel@suse.de
Signed-off-by: Steve French <sfrench@us.ibm.com>  lightly modified
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-23 11:37:53 -07:00
Trond Myklebust
f134585a73 Revert "[PATCH] RPC,NFS: new rpc_pipefs patch"
This reverts 17f4e6febca160a9f9dd4bdece9784577a2f4524 commit.
2005-09-23 12:39:00 -04:00
Trond Myklebust
3063d8a166 NFS: Make /proc/mounts display the protocol used by NFSv4
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:59 -04:00
Christoph Hellwig
278c995c8a [PATCH] RPC,NFS: new rpc_pipefs patch
Currently rpc_mkdir/rpc_rmdir and rpc_mkpipe/mk_unlink have an API that's
 a little unfortunate.  They take a path relative to the rpc_pipefs root and
 thus need to perform a full lookup.  If you look at debugfs or usbfs they
 always store the dentry for directories they created and thus can pass in
 a dentry + single pathname component pair into their equivalents of the
 above functions.

 And in fact rpc_pipefs actually stores a dentry for all but one component so
 this change not only simplifies the core rpc_pipe code but also the callers.

 Unfortuntately this code path is only used by the NFS4 idmapper and
 AUTH_GSSAPI for which I don't have a test enviroment.  Could someone give
 it a spin?  It's the last bit needed before we can rework the
 lookup_hash API

 Signed-off-by: Christoph Hellwig <hch@lst.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:57 -04:00
Chuck Lever
03bf4b707e [PATCH] RPC: parametrize various transport connect timeouts
Each transport implementation can now set unique bind, connect,
 reestablishment, and idle timeout values.  These are variables,
 allowing the values to be modified dynamically.  This permits
 exponential backoff of any of these values, for instance.

 As an example, we implement exponential backoff for the connection
 reestablishment timeout.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:53 -04:00
Chuck Lever
ed63c00370 [PATCH] RPC: remove xprt->nocong
Get rid of the "xprt->nocong" variable.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
 Look for significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:47 -04:00
Chuck Lever
43118c29de [PATCH] RPC: get rid of xprt->stream
Now we can fix up the last few places that use the "xprt->stream"
 variable, and get rid of it from the rpc_xprt structure.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:35 -04:00
Chuck Lever
eab5c084b8 [PATCH] NFS: use a constant value for TCP retransmit timeouts
Implement a best practice: don't use exponential backoff when computing
 retransmit timeout values on TCP connections, but simply retransmit
 at regular intervals.

 This also fixes a bug introduced when xprt_reset_majortimeo() was added.

 Test-plan:
 Enable RPC debugging and watch timeout behavior on a NFS/TCP mount.

 Version: Thu, 11 Aug 2005 16:02:19 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:06 -04:00
Trond Myklebust
9aa48b7e27 NFS: Don't expose internal READDIR errors to userspace
Fixes a condition whereby the kernel is returning the non-POSIX error
 EBADCOOKIE to userspace.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:01 -04:00
Olaf Kirch
449231d6dd From: Olaf Kirch <okir@suse.de>
[PATCH] Fix miscompare in __posix_lock_file

 If an application requests the same lock twice, the
 kernel should just leave the existing lock in place.
 Currently, it will install a second lock of the same type.

 Signed-off-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:37:59 -04:00
Trond Myklebust
20509f1bc5 NFS: Drop inode after rename
When doing a rename on top of an existing file that is not in use,
 the inode of the overwritten file will remain in the icache.

 The fix is to decrement i_nlink of the overwritten inode, like we
 do for unlink, rmdir etc already.

 Problem diagnosed by Olaf Kirch. This patch is a slight variation
 on his fix.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:37:58 -04:00
Linus Torvalds
bfab08c097 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/shaggy/jfs-2.6 2005-09-23 07:40:53 -07:00
Anton Altaparmakov
715dc636b6 NTFS: Change ntfs_cluster_free() to require a write locked runlist on entry
since we otherwise get into a lock reversal deadlock if a read locked
      runlist is passed in. In the process also change it to take an ntfs
      inode instead of a vfs inode as parameter.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-23 11:24:28 +01:00
Nick Wilson
10d2c46f94 [PATCH] NFS: fix client oops when debugging is on
nfs_readpage_release() causes an oops while accessing a file with NFS
debugging turned on (echo 32767 > /proc/sys/sunrpc/nfs_debug) and a kernel
built with CONFIG_DEBUG_SLAB.

This patch moves the debugging statement above nfs_release_request() to
avoid accessing freed memory.

Signed-off-by: Nick Wilson <njw@osdl.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:37 -07:00
Glauber de Oliveira Costa
8bdac5d1ed [PATCH] ext3: EXT3_DEBUG build fixes
Fix some warnings and a build error when EXT3_DEBUG is enabled.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:37 -07:00
OGAWA Hirofumi
275abf5b06 [PATCH] ext3: ext3_show_options fix
EXT3_MOUNT_DATA_FLAGS is not a boolean. This fixes it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:35 -07:00
Latchesar Ionkov
f71626a461 [PATCH] v9fs: don't free root dentry & inode if error occurs in v9fs_get_sb
If error occurs while in v9fs_get_sb after it calles sget, the dentry object
of the root and its inode may be freed twice -- once while handling the error
in v9fs_get_sb, and second time when v9fs_get_sb calles deactivate_super
(which in turn calls v9fs_kill_super)

The patch removes the unnecessary code that frees the root dentry and its
inode.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
a1f9d8d23f [PATCH] v9fs: replace strlen on newly allocated by __getname buffers to PATH_MAX
v9fs_vfs_readlink allocates space for the link using __getname and
errorneously uses strlen on the newly allocated buffer to check if the buffer
passed by the user is bigger than the one returned by __getname.

The patch replaces the strlen usage to PATH_MAX, which is the actual size of
the buffers returned by __getname.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
a8e63bff52 [PATCH] v9fs: make copy of the transport prototype instead of using it directly
When a new session is created it uses a template object of the specified
transport type to instantiate its own copy.  The code for the making a copy of
the template object was lost, and the object itself is attached to the v9fs
session.  This leads to many sessions using the same transport instead of
having their own copy.

The patch puts back the code that makes a copy of the template object.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
5b06767623 [PATCH] v9fs: allocate the Rwalk qid array from the right conv buffer
When v9fs_deserealize_fcall deserializes a Rwalk message, it incorrectly
allocates space for the qid array in the source instead of the destination
buffer.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
d06a8fb130 [PATCH] v9fs: make conv functions to check for conv buffer overflow
buf_check_size function checks if the conv buffer has enough space for the
performed operation, but it doesn't return the result back to the calling
function, only logs an error in the log.

The report-back-error functionality was lost when buf_check_size was
converted from macro to inline function. The return in the macro used to
exit from the functions that include it, after the conversion it just exits
from the inline function itself.

The patch makes buf_check_size to return flag and all functions that use
it check if they should perform the operation, or exit.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Andrew Morton
0678e5feaa [PATCH] proc_task_root_link c99 fix
fs/proc/base.c: In function `proc_task_root_link':
fs/proc/base.c:364: warning: ISO C90 forbids mixed declarations and code

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Anton Altaparmakov
91fbc6edfa NTFS: Fix sparse warnings that have crept in over time.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-22 13:26:44 +01:00
Stephane Kardas
62a36c43c8 [PATCH] fat: fix adate
During a forensic analysis on the fat file system, I found than the result for
the last access date on this file system was different between the stat
command and the istat command (package tct-utils).

The istat command display a true date (the right windows date) but the stat
primitive (so stat, find, ls command) displays a wrong date.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 10:12:18 -07:00
Sripathi Kodi
66dcca0628 [PATCH] Fix invisible threads problem
When the main thread of a thread group has done pthread_exit() and died,
the other threads are still happily running, but will not be visible
under /proc because their leader is no longer accessible.

This fixes the access control so that we can see the sub-threads again.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 09:15:34 -07:00
Dave Kleikamp
438282d85d JFS: don't dereference tlck->ip from txUpdateMap
The inode pointer may no longer be valid

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-20 14:58:11 -05:00
Anton Altaparmakov
eed8b2dee7 NTFS: More runlist handling fixes from Richard Russon and myself.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-20 14:19:30 +01:00
Linus Torvalds
f805fbdaac Make fsnotify possibly work better for the inode removal case
Checking i_nlink is dubious, but the alternatives look even
less appetizing.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-19 19:54:29 -07:00
Anton Altaparmakov
044a500e46 Merge branch 'master' of /home/src/linux-2.6/ 2005-09-19 09:47:49 +01:00
Anton Altaparmakov
f6098cf449 NTFS: Fix ntfs_{read,write}page() to cope with concurrent truncates better.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-19 09:41:39 +01:00
Anton Altaparmakov
4e64c88693 NTFS: Fix handling of compressed directories that I broke in earlier changeset.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-19 09:38:41 +01:00
Anton Altaparmakov
5c9f6de3b8 NTFS: Fix various bugs in the runlist merging code. (Based on libntfs
changes by Richard Russon.)

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-19 09:33:40 +01:00
OGAWA Hirofumi
ef402268f7 [PATCH] FAT: miss-sync issues on sync mount (miss-sync on write)
This patch fixes miss-sync issue on write() system call.  This updates
inode attrs flags, mtime and ctime on every comit_write call, due to
locking.

Signed-off-by: Hiroyuki Machida <machida@sm.sony.co.jp>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Dipankar Sarma
4fb3a53860 [PATCH] files: fix preemption issues
With the new fdtable locking rules, you have to protect fdtable with either
->file_lock or rcu_read_lock/unlock().  There are some places where we
aren't doing either.  This patch fixes those places.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Zach Brown
a464adeb7e [PATCH] Add smp_mb__after_clear_bit() to unlock_kiocb()
Add smp_mb__after_clear_bit() to unlock_kiocb()

AIO's use of wait_on_bit_lock()/wake_up_bit() forgot to add a barrier
between clearing its lock bit and calling wake_up_bit() so wake_up_bit()'s
unlocked waitqueue_active() can race.  This puts AIO's use in line with the
others and the comment above wake_up_bit().

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Davide Libenzi
53d2be79d5 [PATCH] epoll: fix delayed initialization bug
Al found a potential problem in epoll_create(), where the
file->private_data member was set after fd_install().  This is obviously
wrong since another thread might do a close() on that fd# before we set the
file->private_data member.  This goes over 2.6.13 and passes a few basic
tests I've done here.

(akpm: snuck in a kzalloc() cleanup too)

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Dave Kleikamp
6cb1269b96 JFS: Fix sparse warnings, including endian error
The fix in inode.c is a real bug.  It could result in undeleted, yet
unconnected files on big-endian hardware.

The others are trivial.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-15 23:25:41 -05:00
David S. Miller
4a805e863d [COMPAT]: Fixup compat_do_execve()
Missing acct_update_integrals() and update_mem_hiwater() calls
compared to it's native counterpart.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-14 21:40:00 -07:00
Dipankar Sarma
0b175a7e68 [PATCH] Fix the fdtable freeing in the case of vmalloced fdset/arrays
Noted by David Miller:

  "The bug is that free_fd_array() takes a "num" argument, but when
   calling it from __free_fdtable() we're instead passing in the size in
   bytes (ie.  "num * sizeof(struct file *)")."

Yes it is a bug. I think I messed it up while merging newer
changes with an older version where I was using size in bytes
to optimize.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 12:38:26 -07:00
Hugh Dickins
2fd4ef85e0 [PATCH] error path in setup_arg_pages() misses vm_unacct_memory()
Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
security_vm_enough_memory tend to forget to vm_unacct_memory when a
failure occurs further down (typically in setup_arg_pages variants).

These are all users of insert_vm_struct, and that reservation will only
be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.

So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
Committed_AS each time they're run.  But don't add VM_ACCOUNT to them,
it's inappropriate to reserve against the very unlikely case that gdb
be used to COW a vDSO page - we ought to do something about that in
do_wp_page, but there are yet other inconsistencies to be resolved.

The safe and economical way to fix this is to let insert_vm_struct do
the security_vm_enough_memory check when it finds VM_ACCOUNT is set.

And the MIPS irix_brk has been calling security_vm_enough_memory before
calling do_brk which repeats it, doubly accounting and so also leaking.
Remove that, and all the fs and arch calls to security_vm_enough_memory:
give it a less misleading name later on.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 11:18:13 -07:00
Alexander Nyberg
fb085cf1d4 [PATCH] Fix fs/exec.c:788 (de_thread()) BUG_ON
It turns out that the BUG_ON() in fs/exec.c: de_thread() is unreliable
and can trigger due to the test itself being racy.

de_thread() does
 	while (atomic_read(&sig->count) > count) {
	}
	.....
	.....
	BUG_ON(!thread_group_empty(current));

but release_task does
	write_lock_irq(&tasklist_lock)
	__exit_signal
		(this is where atomic_dec(&sig->count) is run)
	__exit_sighand
	__unhash_process
		takes write lock on tasklist_lock
		remove itself out of PIDTYPE_TGID list
	write_unlock_irq(&tasklist_lock)

so there's a clear (although small) window between the
atomic_dec(&sig->count) and the actual PIDTYPE_TGID unhashing of the
thread.

And actually there is no need for all threads to have exited at this
point, so we simply kill the BUG_ON.

Big thanks to Marc Lehmann who provided the test-case.

Fixes Bug 5170 (http://bugme.osdl.org/show_bug.cgi?id=5170)

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 10:26:34 -07:00
Linus Torvalds
5d54e69c68 Merge master.kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6 2005-09-13 09:47:30 -07:00
Neil Brown
73aea4ecd3 [PATCH] nfsd4: fix setclientid unlock of unlocked state lock
We could try to unlock the state lock here without having first locked it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:32 -07:00
Neil Brown
b59e3c0e17 [PATCH] nfsd4: fix open seqid incrementing in lock
In the case of a lock which introduces a new lockowner, the openowner's
sequence id should be incremented, even when the operation fails, if the
error is a sequence-id-mutating error.  The current code fails to do that
in some cases.  Fix this by using the same sequence-id-incrementing
mechanism that all other such operations use.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:32 -07:00
Neil Brown
f2327d9adb [PATCH] nfsd4: move replay_owner
It seems more natural to move the setting of the replay_owner into the
relevant procedure instead of doing it in nfsv4_proc_compound.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:31 -07:00
Neil Brown
849823c52d [PATCH] nfsd4: printk reduction
Demote some printk's that look like they could be triggered by non-buggy
clients to dprintk's.  (For example, stale clientid's are normal
occurrences on reboot, and on a server with a lot of clients these messages
could become annoying.)

Also remove some redundant dprintk's (e.g. no need for both STALE_CLIENTID
and its callers to do dprintks).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:31 -07:00
Chris Mason
9f03783ce5 [PATCH] reiserfs: use mark_inode_dirty instead of reiserfs_update_sd
reiserfs should use mark_inode_dirty during reiserfs_file_write and
reiserfs_commit_write.  This makes sure the inode is properly flagged as
dirty, which is used during O_SYNC to decide when to trigger log commits.

This patch also removes the O_SYNC check from reiserfs_commit_write, since
that gets dealt with properly at higher layers once we start using
mark_inode_dirty.

Thanks to Hifumi Hisashi <hifumi.hisashi@lab.ntt.co.jp> for catching this.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:29 -07:00
Peter Staubach
a1a5b3d93c [PATCH] open returns ENFILE but creates file anyway
When open(O_CREAT) is called and the error, ENFILE, is returned, the file
may be created anyway.  This is counter intuitive, against the SUS V3
specification, and may cause applications to misbehave if they are not
coded correctly to handle this semantic.  The SUS V3 specification
explicitly states "No files shall be created or modified if the function
returns -1.".

The error, ENFILE, is used to indicate the system wide open file table is
full and no more file structs can be allocated.

This is due to an ordering problem.  The entry in the directory is created
before the file struct is allocated.  If the allocation for the file struct
fails, then the system call must return an error, but the directory entry
was already created and can not be safely removed.

The solution to this situation is relatively easy.  The file struct should
be allocated before the directory entry is created.  If the allocation
fails, then the error can be returned directly.  If the creation of the
directory entry fails, then the file struct can be easily freed.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:28 -07:00
Anton Altaparmakov
89ecf38c7a NTFS: Mask out __GFP_HIGHMEM when doing kmalloc() in __ntfs_malloc() as it
otherwise causes a BUG().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-12 15:43:03 +01:00
Anton Altaparmakov
5d46770f5f NTFS: Change the mount options {u,f,d}mask to always parse the number as
an octal number to conform to how chmod(1) works, too.  Thanks to
      Giuseppe Bilotta and Horst von Brand for pointing out the errors of
      my ways.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-12 14:33:47 +01:00
Linus Torvalds
32983696a4 Merge branch 'for-linus' from kernel.org:/.../shaggy/jfs-2.6 manually
Clash due to new delete_inode behavior (the filesystem now needs to do
the truncate_inode_pages() call itself).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-11 10:14:54 -07:00
Nishanth Aravamudan
041e0e3b19 [PATCH] fs: fix-up schedule_timeout() usage
Use schedule_timeout_{,un}interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.  Also use helper
functions to convert between human time units and jiffies rather than constant
HZ division to avoid rounding errors.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:36 -07:00
Adrian Bunk
e711700a0e [PATCH] fs/cramfs/uncompress.c should #include <linux/cramfs_fs.h>
Every file should #include the header with the prototypes of the global
functions it is offering.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
James Lamanna
ea0e0a4f53 [PATCH] janitor: reiserfs: super.c - vfree() checking cleanups
super.c vfree() checking cleanups.

Signed-off by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:34 -07:00
Domen Puncer
0cdca3f980 [PATCH] janitor: fs/dcache.c: list_for_each*
First one is list_for_each_entry (thanks maks), second 2 list_for_each_safe.

Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:32 -07:00
Domen Puncer
fdadd65fbc [PATCH] janitor: fs/namespace.c: list_for_each_entry
Make code more readable with list_for_each_entry.

Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:32 -07:00
Domen Puncer
216d81bb35 [PATCH] janitor: jffs/intrep: list_for_each_entry
Use list_for_each_entry to make code more readable.

Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Cc: <jffs-dev@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:32 -07:00
Ingo Molnar
d79fc0fc66 [PATCH] sched: TASK_NONINTERACTIVE
This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
used by blocking points to mark the task's wait as "non-interactive".  This
does not mean the task will be considered a CPU-hog - the wait will simply
not have an effect on the waiting task's priority - positive or negative
alike.  Right now only pipe_wait() will make use of it, because it's a
common source of not-so-interactive waits (kernel compilation jobs, etc.).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:22 -07:00
Ingo Molnar
fb1c8f93d8 [PATCH] spinlock consolidation
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as ->break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler <grundler@parisc-linux.org>

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" <tony.luck@intel.com>

   ia64 fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:21 -07:00
Andrew Morton
b4012a9895 [PATCH] ntfs build fix
*** Warning: "bit_spin_lock" [fs/ntfs/ntfs.ko] undefined!
*** Warning: "bit_spin_unlock" [fs/ntfs/ntfs.ko] undefined!

Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:20 -07:00
Linus Torvalds
ac5b8b6f22 Preempt-safe RCU file usage
Fix up fs/compat.c fixes.
2005-09-09 15:42:34 -07:00
Linus Torvalds
a4531edd75 Fix up lost patch in compat_sys_select() for new RCU files world order
Andrew lost this in patch reject resolution, and never noticed, since
the compat code isn't in use on x86.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 15:10:52 -07:00
Kirill Korotaev
d99901d6fd [PATCH] Lost sockfd_put() in routing_ioctl()
This patch adds lost sockfd_put() in 32bit compat rounting_ioctl() on
64bit platforms

Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-Off-By: Maxim Giryaev <gem@sw.ru>
Signed-off-By: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:24:05 -07:00
Ingo Molnar
a9f6a0dd54 [PATCH] more SPIN_LOCK_UNLOCKED -> DEFINE_SPINLOCK conversions
This converts the final 20 DEFINE_SPINLOCK holdouts.  (another 580 places
are already using DEFINE_SPINLOCK).  Build tested on x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:48 -07:00
Miklos Szeredi
7c352bdf04 [PATCH] FUSE: don't allow restarting of system calls
This patch removes ability to interrupt and restart operations while there
hasn't been any side-effect.

The reason: applications.  There are some apps it seems that generate
signals at a fast rate.  This means, that if the operation cannot make
enough progress between two signals, it will be restarted for ever.  This
bug actually manifested itself with 'krusader' trying to open a file for
writing under sshfs.  Thanks to Eduard Czimbalmos for the report.

The problem can be solved just by making open() uninterruptible, because in
this case it was the truncate operation that slowed down the progress.  But
it's better to solve this by simply not allowing interrupts at all (except
SIGKILL), because applications don't expect file operations to be
interruptible anyway.  As an added bonus the code is simplified somewhat.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:48 -07:00
Miklos Szeredi
8254798199 [PATCH] FUSE: add fsync operation for directories
This patch adds a new FSYNCDIR request, which is sent when fsync is called
on directories.  This operation is available in libfuse 2.3-pre1 or
greater.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:47 -07:00
Miklos Szeredi
b36c31ba95 [PATCH] fuse: don't update file times
Don't change mtime/ctime/atime to local time on read/write.  Rather invalidate
file attributes, so next stat() will force a GETATTR call.  Bug reported by
Ben Grimm.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:47 -07:00
Miklos Szeredi
45323fb764 [PATCH] fuse: more flexible caching
Make data caching behavior selectable on a per-open basis instead of
per-mount.  Compatibility for the old mount options 'kernel_cache' and
'direct_io' is retained in the userspace library (version 2.4.0-pre1 or
later).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:47 -07:00
Miklos Szeredi
04730fef1f [PATCH] fuse: transfer readdir data through device
This patch removes a long lasting "hack" in FUSE, which used a separate
channel (a file descriptor refering to a disk-file) to transfer directory
contents from userspace to the kernel.

The patch adds three new operations (OPENDIR, READDIR, RELEASEDIR), which
have semantics and implementation exactly maching the respective file
operations (OPEN, READ, RELEASE).

This simplifies the directory reading code.  Also disk space is not
necessary, which can be important in embedded systems.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:47 -07:00
Miklos Szeredi
413ef8cb30 [PATCH] FUSE - direct I/O
This patch adds support for the "direct_io" mount option of FUSE.

When this mount option is specified, the page cache is bypassed for
read and write operations.  This is useful for example, if the
filesystem doesn't know the size of files before reading them, or when
any kind of caching is harmful.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:46 -07:00
Miklos Szeredi
5a53368277 [PATCH] fuse: stricter mount option checking
Check for the presence of all mandatory mount options.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:46 -07:00
Miklos Szeredi
87729a5514 [PATCH] FUSE: tighten check for processes allowed access
This patch tightens the check for allowing processes to access non-privileged
mounts.  The rational is that the filesystem implementation can control the
behavior or get otherwise unavailable information of the filesystem user.  If
the filesystem user process has the same uid, gid, and is not suid or sgid
application, then access is safe.  Otherwise access is not allowed unless the
"allow_other" mount option is given (for which policy is controlled by the
userspace mount utility).

Thanks to everyone linux-fsdevel, especially Martin Mares who helped uncover
problems with the previous approach.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:46 -07:00
Miklos Szeredi
db50b96c0f [PATCH] FUSE - readpages operation
This patch adds readpages support to FUSE.

With the help of the readpages() operation multiple reads are bundled
together and sent as a single request to userspace.  This can improve
reading performace.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:46 -07:00
Miklos Szeredi
92a8780e11 [PATCH] FUSE - extended attribute operations
This patch adds the extended attribute operations to FUSE.

The following operations are added:

 o getxattr
 o setxattr
 o listxattr
 o removexattr

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
1e9a4ed939 [PATCH] FUSE - mount options
This patch adds miscellaneous mount options to the FUSE filesystem.

The following mount options are added:

 o default_permissions:  check permissions with generic_permission()
 o allow_other:          allow other users to access files
 o allow_root:           allow root to access files
 o kernel_cache:         don't invalidate page cache on open

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
b6aeadeda2 [PATCH] FUSE - file operations
This patch adds the file operations of FUSE.

The following operations are added:

 o open
 o flush
 o release
 o fsync
 o readpage
 o commit_write

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
9e6268db49 [PATCH] FUSE - read-write operations
This patch adds the write filesystem operations of FUSE.

The following operations are added:

 o setattr
 o symlink
 o mknod
 o mkdir
 o create
 o unlink
 o rmdir
 o rename
 o link

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
e5e5558e92 [PATCH] FUSE - read-only operations
This patch adds the read-only filesystem operations of FUSE.

This contains the following files:

 o dir.c
    - directory, symlink and file-inode operations

The following operations are added:

 o lookup
 o getattr
 o readlink
 o follow_link
 o directory open
 o readdir
 o directory release
 o permission
 o dentry revalidate
 o statfs

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
334f485df8 [PATCH] FUSE - device functions
This adds the FUSE device handling functions.

This contains the following files:

 o dev.c
    - fuse device operations (read, write, release, poll)
    - registers misc device
    - support for sending requests to userspace

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:44 -07:00
Miklos Szeredi
d8a5ba4545 [PATCH] FUSE - core
This patch adds FUSE core.

This contains the following files:

 o inode.c
    - superblock operations (alloc_inode, destroy_inode, read_inode,
      clear_inode, put_super, show_options)
    - registers FUSE filesystem

 o fuse_i.h
    - private header file

Requirements
============

 The most important difference between orinary filesystems and FUSE is
 the fact, that the filesystem data/metadata is provided by a userspace
 process run with the privileges of the mount "owner" instead of the
 kernel, or some remote entity usually running with elevated
 privileges.

 The security implication of this is that a non-privileged user must
 not be able to use this capability to compromise the system.  Obvious
 requirements arising from this are:

  - mount owner should not be able to get elevated privileges with the
    help of the mounted filesystem

  - mount owner should not be able to induce undesired behavior in
    other users' or the super user's processes

  - mount owner should not get illegitimate access to information from
    other users' and the super user's processes

 These are currently ensured with the following constraints:

  1) mount is only allowed to directory or file which the mount owner
    can modify without limitation (write access + no sticky bit for
    directories)

  2) nosuid,nodev mount options are forced

  3) any process running with fsuid different from the owner is denied
     all access to the filesystem

 1) and 2) are ensured by the "fusermount" mount utility which is a
    setuid root application doing the actual mount operation.

 3) is ensured by a check in the permission() method in kernel

 I started thinking about doing 3) in a different way because Christoph
 H. made a big deal out of it, saying that FUSE is unacceptable into
 mainline in this form.

 The suggested use of private namespaces would be OK, but in their
 current form have many limitations that make their use impractical (as
 discussed in this thread).

 Suggested improvements that would address these limitations:

   - implement shared subtrees

   - allow a process to join an existing namespace (make namespaces
     first-class objects)

   - implement the namespace creation/joining in a PAM module

 With all that in place the check of owner against current->fsuid may
 be removed from the FUSE kernel module, without compromising the
 security requirements.

 Suid programs still interesting questions, since they get access even
 to the private namespace causing some information leak (exact
 order/timing of filesystem operations performed), giving some
 ptrace-like capabilities to unprivileged users.  BTW this problem is
 not strictly limited to the namespace approach, since suid programs
 setting fsuid and accessing users' files will succeed with the current
 approach too.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:44 -07:00
Miklos Szeredi
04578f174f [PATCH] FUSE - MAINTAINERS, Kconfig and Makefile changes
This patch adds FUSE filesystem to MAINTAINERS, fs/Kconfig and
fs/Makefile.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:44 -07:00
Eric Van Hensbergen
cb2e87a65d [PATCH] v9fs: fix handling of malformed 9P messages
This patch attempts to do a better job of cleaning up after detecting errors
on the transport.  This should also improve error reporting on broken
connections to servers.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:58 -07:00
Eric Van Hensbergen
b501611a6f [PATCH] v9fs: readlink extended mode check
LANL reported some issues with random crashes during mount of legacy protocol
servers (9P2000 versus 9P2000.u) -- crash was always happening in readlink
(which should never happen in legacy mode).  Added some sanity conditionals to
the get_inode code which should prevent the errors LANL was seeing.  Code
tested benign through regression.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:58 -07:00
Eric Van Hensbergen
5d58bec5b7 [PATCH] v9fs: Fix support for special files (devices, named pipes, etc.)
Fix v9fs special files (block, char devices) support.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:58 -07:00
Eric Van Hensbergen
73c592b9b8 [PATCH] v9fs: Clean-up vfs_inode and setattr functions
Cleanup code in v9fs vfs_inode as suggested by Alexey Dobriyan.  Did some
major revamping of the v9fs setattr code to remove unnecessary allocations and
clean up some dead-code.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:57 -07:00
Eric Van Hensbergen
1346f51ede [PATCH] v9fs: Change error magic numbers to defined constants
Change magic error numbers to system defined constants in v9fs error.h As
suggested by Jan-Benedict Glaw.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:57 -07:00
Eric Van Hensbergen
3ed8491c8a [PATCH] v9fs: debug and support routines
This part of the patch contains debug and other misc routines.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:57 -07:00
Eric Van Hensbergen
322b329ab7 [PATCH] v9fs: Support to force umount
Support for force umount

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:57 -07:00
Eric Van Hensbergen
426cc91aa6 [PATCH] v9fs: transport modules
This part of the patch contains transport routines.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:57 -07:00
Eric Van Hensbergen
b8cf945b31 [PATCH] v9fs: 9P protocol implementation
This part of the patch contains the 9P protocol functions.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:56 -07:00
Eric Van Hensbergen
9e82cf6a80 [PATCH] v9fs: VFS superblock operations and glue
This part of the patch contains VFS superblock and mapping code.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:56 -07:00
Eric Van Hensbergen
2bad847151 [PATCH] v9fs: VFS inode operations
This part of the patch contains the VFS inode interfaces.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:56 -07:00
Eric Van Hensbergen
e69e7fe5b0 [PATCH] v9fs: VFS file, dentry, and directory operations
This part of the patch contains the VFS file, dentry & directory interfaces.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:56 -07:00
Eric Van Hensbergen
93fa58cb83 [PATCH] v9fs: Documentation, Makefiles, Configuration
OVERVIEW

V9FS is a distributed file system for Linux which provides an
implementation of the Plan 9 resource sharing protocol 9P.  It can be
used to share all sorts of resources: static files, synthetic file servers
(such as /proc or /sys), devices, and application file servers (such as
FUSE).

BACKGROUND

Plan 9 (http://plan9.bell-labs.com/plan9) is a research operating
system and associated applications suite developed by the Computing
Science Research Center of AT&T Bell Laboratories (now a part of
Lucent Technologies), the same group that developed UNIX , C, and C++.
Plan 9 was initially released in 1993 to universities, and then made
generally available in 1995. Its core operating systems code laid the
foundation for the Inferno Operating System released as a product by
Lucent Bell-Labs in 1997. The Inferno venture was the only commercial
embodiment of Plan 9 and is currently maintained as a product by Vita
Nuova (http://www.vitanuova.com). After updated releases in 2000 and
2002, Plan 9 was open-sourced under the OSI approved Lucent Public
License in 2003.

The Plan 9 project was started by Ken Thompson and Rob Pike in 1985.
Their intent was to explore potential solutions to some of the
shortcomings of UNIX in the face of the widespread use of high-speed
networks to connect machines. In UNIX, networking was an afterthought
and UNIX clusters became little more than a network of stand-alone
systems. Plan 9 was designed from first principles as a seamless
distributed system with integrated secure network resource sharing.
Applications and services were architected in such a way as to allow
for implicit distribution across a cluster of systems. Configuring an
environment to use remote application components or services in place
of their local equivalent could be achieved with a few simple command
line instructions. For the most part, application implementations
operated independent of the location of their actual resources.

Commercial operating systems haven't changed much in the 20 years
since Plan 9 was conceived. Network and distributed systems support is
provided by a patchwork of middle-ware, with an endless number of
packages supplying pieces of the puzzle. Matters are complicated by
the use of different complicated protocols for individual services,
and separate implementations for kernel and application resources.
The V9FS project (http://v9fs.sourceforge.net) is an attempt to bring
Plan 9's unified approach to resource sharing to Linux and other
operating systems via support for the 9P2000 resource sharing
protocol.

V9FS HISTORY

V9FS was originally developed by Ron Minnich and Maya Gokhale at Los
Alamos National Labs (LANL) in 1997.  In November of 2001, Greg Watson
setup a SourceForge project as a public repository for the code which
supported the Linux 2.4 kernel.

About a year ago, I picked up the initial attempt Ron Minnich had
made to provide 2.6 support and got the code integrated into a 2.6.5
kernel.   I then went through a line-for-line re-write attempting to
clean-up the code while more closely following the Linux Kernel style
guidelines.  I co-authored a paper with Ron Minnich on the V9FS Linux
support including performance comparisons to NFSv3 using Bonnie and
PostMark - this paper appeared at the USENIX/FREENIX 2005
conference in April 2005:
( http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html ).

CALL FOR PARTICIPATION/REQUEST FOR COMMENTS

Our 2.6 kernel support is stabilizing and we'd like to begin pursuing
its integration into the official kernel tree.  We would appreciate any
review, comments, critiques, and additions from this community and are
actively seeking people to join our project and help us produce
something that would be acceptable and useful to the Linux community.

STATUS

The code is reasonably stable, although there are no doubt corner cases
our regression tests haven't discovered yet.  It is in regular use by several
of the developers and has been tested on x86 and PowerPC
(32-bit and 64-bit) in both small and large (LANL cluster) deployments.
Our current regression tests include fsx, bonnie, and postmark.

It was our intention to keep things as simple as possible for this
release -- trying to focus on correctness within the core of the
protocol support versus a rich set of features.  For example: a more
complete security model and cache layer are in the road map, but
excluded from this release.   Additionally, we have removed support for
mmap operations at Al Viro's request.

PERFORMANCE

Detailed performance numbers and analysis are included in the FREENIX
paper, but we show comparable performance to NFSv3 for large file
operations based on the Bonnie benchmark, and superior performance for
many small file operations based on the PostMark benchmark.   Somewhat
preliminary graphs (from the FREENIX paper) are available
(http://v9fs.sourceforge.net/perf/index.html).

RESOURCES

The source code is available in a few different forms:

tarballs: http://v9fs.sf.net
CVSweb: http://cvs.sourceforge.net/viewcvs.py/v9fs/linux-9p/
CVS: :pserver:anonymous@cvs.sourceforge.net:/cvsroot/v9fs/linux-9p
Git: rsync://v9fs.graverobber.org/v9fs (webgit: http://v9fs.graverobber.org)
9P: tcp!v9fs.graverobber.org!6564

The user-level server is available from either the Plan 9 distribution
or from http://v9fs.sf.net
Other support applications are still being developed, but preliminary
version can be downloaded from sourceforge.

Documentation on the protocol has historically been the Plan 9 Man
pages (http://plan9.bell-labs.com/sys/man/5/INDEX.html), but there is
an effort under way to write a more complete Internet-Draft style
specification (http://v9fs.sf.net/rfc).

There are a couple of mailing lists supporting v9fs, but the most used
is v9fs-developer@lists.sourceforge.net -- please direct/cc your
comments there so the other v9fs contibutors can participate in the
conversation.  There is also an IRC channel: irc://freenode.net/#v9fs

This part of the patch contains Documentation, Makefiles, and configuration
file changes.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:56 -07:00
Dipankar Sarma
b835996f62 [PATCH] files: lock-free fd look-up
With the use of RCU in files structure, the look-up of files using fds can now
be lock-free.  The lookup is protected by rcu_read_lock()/rcu_read_unlock().
This patch changes the readers to use lock-free lookup.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Ravikiran Thirumalai <kiran_th@gmail.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:55 -07:00
Dipankar Sarma
ab2af1f500 [PATCH] files: files struct with RCU
Patch to eliminate struct files_struct.file_lock spinlock on the reader side
and use rcu refcounting rcuref_xxx api for the f_count refcounter.  The
updates to the fdtable are done by allocating a new fdtable structure and
setting files->fdt to point to the new structure.  The fdtable structure is
protected by RCU thereby allowing lock-free lookup.  For fd arrays/sets that
are vmalloced, we use keventd to free them since RCU callbacks can't sleep.  A
global list of fdtable to be freed is not scalable, so we use a per-cpu list.
If keventd is already handling the current cpu's work, we use a timer to defer
queueing of that work.

Since the last publication, this patch has been re-written to avoid using
explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
premitives instead.  This required that the fd information is kept in a
separate structure (fdtable) and updated atomically.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:55 -07:00
Dipankar Sarma
badf16621c [PATCH] files: break up files struct
In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically.  Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure.  This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct.  It also changes all
the users to refer to the file table using files_fdtable() macro.  Subsequent
applciation of RCU becomes easier after this.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:55 -07:00
Benjamin LaHaise
ac0b1bc1ed [PATCH] aio: kiocb locking to serialise retry and cancel
Implement a per-kiocb lock to serialise retry operations and cancel.  This
is done using wait_on_bit_lock() on the KIF_LOCKED bit of kiocb->ki_flags.
Also, make the cancellation path lock the kiocb and subsequently release
all references to it if the cancel was successful.  This version includes a
fix for the deadlock with __aio_run_iocbs.

Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:32 -07:00
Wendy Cheng
8f58202bf6 [PATCH] change io_cancel return code for no cancel case
Note that other than few exceptions, most of the current filesystem and/or
drivers do not have aio cancel specifically defined (kiob->ki_cancel field
is mostly NULL).  However, sys_io_cancel system call universally sets
return code to -EAGAIN.  This gives applications a wrong impression that
this call is implemented but just never works.  We have customer inquires
about this issue.

Changed by Benjamin LaHaise to EINVAL instead of ENOSYS

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:32 -07:00
Andrew Stribblehill
fac92becda [PATCH] bfs: fix endianness, signedness; add trivial bugfix
* Makes BFS code endianness-clean.

* Fixes some signedness warnings.

* Fixes a problem in fs/bfs/inode.c:164 where inodes not synced to disk
  don't get fully marked as clean.  Here's how to reproduce it:

# mount -o loop -t bfs /bfs.img /mnt
# df -i /mnt
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/bfs.img                  48       1      47    3% /mnt
# df -k /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/bfs.img                   512         5       508   1% /mnt
# cp 60k-archive.zip /mnt/mt.zip
# df -k /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/bfs.img                   512        65       447  13% /mnt
# df -i /mnt
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/bfs.img                  48       2      46    5% /mnt
# rm /mnt/mt.zip
# echo $?
0

 [If the unlink happens before the buffers flush, the following happens:]

# df -i /mnt
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/bfs.img                  48       2      46    5% /mnt
# df -k /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/bfs.img                   512        65       447  13% /mnt

 fs/bfs/bfs.h           |    1

Signed-off-by: Andrew Stribblehill <ads@wompom.org>
Cc: <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:32 -07:00
Alexander Krizhanovsky
f76baf9365 [PATCH] autofs: fix "busy inodes after umount..."
This patch for old autofs (version 3) cleans dentries which are not putted
after killing the automount daemon (it's analogue of recent patch for
autofs4).

Signed-off-by: Alexander Krizhanovsky <klx@yandex.ru>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:31 -07:00
Stephen Smalley
e31e14ec35 [PATCH] remove the inode_post_link and inode_post_rename LSM hooks
This patch removes the inode_post_link and inode_post_rename LSM hooks as
they are unused (and likely useless).

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:28 -07:00
Stephen Smalley
a74574aafe [PATCH] Remove security_inode_post_create/mkdir/symlink/mknod hooks
This patch removes the inode_post_create/mkdir/mknod/symlink LSM hooks as
they are obsoleted by the new inode_init_security hook that enables atomic
inode security labeling.

If anyone sees any reason to retain these hooks, please speak now.  Also,
is anyone using the post_rename/link hooks; if not, those could also be
removed.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:28 -07:00
Stephen Smalley
ac50960afa [PATCH] ext3: Enable atomic inode security labeling
This patch modifies ext3 to call the inode_init_security LSM hook to obtain
the security attribute for a newly created inode and to set the resulting
attribute on the new inode as part of the same transaction.  This parallels
the existing processing for setting ACLs on newly created inodes.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:28 -07:00
Stephen Smalley
10f47e6a1b [PATCH] ext2: Enable atomic inode security labeling
This patch modifies ext2 to call the inode_init_security LSM hook to obtain
the security attribute for a newly created inode and to set the resulting
attribute on the new inode.  This parallels the existing processing for
setting ACLs on newly created inodes.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:27 -07:00
Mark Fasheh
fef266580e [PATCH] update filesystems for new delete_inode behavior
Update the file systems in fs/ implementing a delete_inode() callback to
call truncate_inode_pages().  One implementation note: In developing this
patch I put the calls to truncate_inode_pages() at the very top of those
filesystems delete_inode() callbacks in order to retain the previous
behavior.  I'm guessing that some of those could probably be optimized.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:27 -07:00
Mark Fasheh
e85b565233 [PATCH] move truncate_inode_pages() into ->delete_inode()
Allow file systems supporting ->delete_inode() to call
truncate_inode_pages() on their own.  OCFS2 wants this so it can query the
cluster before making a final decision on whether to wipe an inode from
disk or not.  In some corner cases an inode marked on the local node via
voting may not actually get orphaned.  A good example is node death before
the transaction moving the inode to the orphan dir commits to the journal.
Without this patch, the truncate_inode_pages() call in
generic_delete_inode() would discard valid data for such inodes.

During earlier discussion in the 2.6.13 merge plan thread, Christoph
Hellwig indicated that other file systems might also find this useful.

IMHO, the best solution would be to just allow ->drop_inode() to do the
cluster query but it seems that would require a substantial reworking of
that section of the code.  Assuming it is safe to call write_inode_now() in
ocfs2_delete_inode() for those inodes which won't actually get wiped, this
solution should get us by for now.

Trivial testing of this patch (and a related OCFS2 update) has shown this
to avoid the corruption I'm seeing.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:26 -07:00
viro@ZenIV.linux.org.uk
3f70353ea9 [PATCH] bogus cast in bio.c
<qualifier> void * is not the same as void <qualifier> *...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 10:31:58 -07:00
Anton Altaparmakov
ddcc959634 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2005-09-09 08:19:10 -07:00
Nathan Scott
c9fc0d6a69 [XFS] Revert recent quota Makefile change, not in a fit state for merging.
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-09-09 11:38:09 +10:00
Anton Altaparmakov
223176bc72 Merge branch 'master' of /usr/src/linux-2.6 2005-09-08 23:03:30 +01:00
Anton Altaparmakov
7d333d6c73 NTFS: 2.1.24 release and some minor final fixes.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 23:01:16 +01:00
Anton Altaparmakov
e604635c8b NTFS: Improve scalability by changing the driver global spin lock in
fs/ntfs/aops.c::ntfs_end_buffer_async_read() to a bit spin lock
      in the first buffer head of a page.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 22:13:02 +01:00
Anton Altaparmakov
a01ac532b5 NTFS: Fix page_has_buffers()/page_buffers() handling in fs/ntfs/aops.c.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 22:08:11 +01:00
Anton Altaparmakov
311120eca0 NTFS: Fixup handling of sparse, compressed, and encrypted attributes in
fs/ntfs/aops.c::ntfs_readpage().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 22:04:20 +01:00
Anton Altaparmakov
8273d5d4c2 NTFS: Fix fs/ntfs/aops.c::ntfs_{read,write}_block() to handle the case
where a concurrent truncate has truncated the runlist under our feet.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 22:00:33 +01:00
Anton Altaparmakov
54b02eb01c NTFS: Optimize fs/ntfs/aops.c::ntfs_write_block() by extending the page
lock protection over the buffer submission for i/o which allows the
      removal of the get_bh()/put_bh() pairs for each buffer.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:43:47 +01:00
Anton Altaparmakov
bd45fdd209 NTFS: Fixup handling of sparse, compressed, and encrypted attributes in
fs/ntfs/aops.c::ntfs_writepage().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:38:05 +01:00
Anton Altaparmakov
8dcdebafb8 NTFS: Make ntfs_write_block() not instantiate sparse blocks if they are zero.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:25:48 +01:00
Anton Altaparmakov
67bb103725 NTFS: Fixup handling of sparse, compressed, and encrypted attributes in
fs/ntfs/inode.c::ntfs_read_locked_{,attr_,index_}inode().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:19:45 +01:00
Anton Altaparmakov
1c7d469d47 NTFS: Truncate {a,c,m}time to the ntfs supported time granularity when
updating the times in the inode in ntfs_setattr().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:15:09 +01:00
Anton Altaparmakov
bbf1813fb8 NTFS: Fix cluster (de)allocators to work when the runlist is NULL and more
importantly to take a locked runlist rather than them locking it
      which leads to lock reversal.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:09:06 +01:00
Anton Altaparmakov
807c453de7 NTFS: Fix handling of sparse attributes in ntfs_attr_make_non_resident().
Also, add BUG() checks to ntfs_attr_make_non_resident() and
      ntfs_attr_set() to ensure that these functions are never called
      for compressed or encrypted attributes.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 21:01:17 +01:00
Anton Altaparmakov
2983d1bd1a NTFS: Fix several bugs in fs/ntfs/attrib.c.
- Fix a bug in ntfs_map_runlist_nolock() where we forgot to protect
  access to the allocated size in the ntfs inode with the size lock.
- Fix ntfs_attr_vcn_to_lcn_nolock() and ntfs_attr_find_vcn_nolock() to
  return LCN_ENOENT when there is no runlist and the allocated size is
  zero.
- Fix load_attribute_list() to handle the case of a NULL runlist.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 20:56:09 +01:00
Anton Altaparmakov
0aacceacf3 NTFS: Add fs/ntfs/attrib.[hc]::ntfs_resident_attr_value_resize().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 20:40:32 +01:00
Anton Altaparmakov
f25dfb5e44 NTFS: Remove bogus setting of PageError in ntfs_read_compressed_block().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 20:35:33 +01:00
Anton Altaparmakov
8e08ceaeac NTFS: Fix a bug in fs/ntfs/index.c::ntfs_index_lookup(). When the returned
index entry is in the index root, we forgot to set the @ir pointer in
      the index context.  Thanks for Yura Pakhuchiy for finding this bug.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 20:29:50 +01:00
Anton Altaparmakov
6e48321a40 NTFS: Add ntfs_rl_punch_nolock() which punches a caller specified hole into a runlist.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 20:26:34 +01:00
Anton Altaparmakov
3ffc5a4438 NTFS: Change ntfs_rl_truncate_nolock() to throw away the runlist if the new
length is zero.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 20:23:06 +01:00
Anton Altaparmakov
f94ad38e68 NTFS: Report unrepresentable inodes during ntfs_readdir() as KERN_WARNING
messages and include the inode number.  Thanks to Yura Pakhuchiy for
      pointing this out.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 17:04:11 +01:00
Anton Altaparmakov
2b0ada2b8e NTFS: Fix handling of valid but empty mapping pairs array in
fs/ntfs/runlist.c::ntfs_mapping_pairs_decompress().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 16:52:31 +01:00
Anton Altaparmakov
8bb735216a NTFS: Remove two bogus BUG_ON()s from fs/ntfs/mft.c.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 16:48:28 +01:00
Anton Altaparmakov
84d6ebe63f NTFS: Fix two nasty runlist merging bugs that had gone unnoticed so far.
Thanks to Stefano Picerno for the bug report.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 16:46:55 +01:00
Anton Altaparmakov
9529d461d0 NTFS: Use ntfs_malloc_nofs_nofail() in runlist.c::ntfs_runlists_merge()
in the two critical regions.  This means we no longer need to
      panic() when the allocation fails as it now cannot fail.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 16:33:12 +01:00
Anton Altaparmakov
06d0e3cf3d NTFS: Allow highmem kmalloc() in ntfs_malloc_nofs() and add _nofail() version.
- Modify fs/ntfs/malloc.h::ntfs_malloc_nofs() to do the kmalloc() based
  allocations with __GFP_HIGHMEM, analogous to how the vmalloc() based
  allocations are done.
- Add fs/ntfs/malloc.h::ntfs_malloc_nofs_nofail() which is analogous to
  ntfs_malloc_nofs() but it performs allocations with __GFP_NOFAIL and
  hence cannot fail.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 16:28:25 +01:00
Anton Altaparmakov
e7a1033b94 NTFS: Support more clean journal ($LogFile) states.
- Support journals ($LogFile) which have been modified by chkdsk.  This
        means users can boot into Windows after we marked the volume dirty.
        The Windows boot will run chkdsk and then reboot.  The user can then
        immediately boot into Linux rather than having to do a full Windows
        boot first before rebooting into Linux and we will recognize such a
        journal and empty it as it is clean by definition.
      - Support journals ($LogFile) with only one restart page as well as
        journals with two different restart pages.  We sanity check both and
        either use the only sane one or the more recent one of the two in the
        case that both are valid.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 16:12:28 +01:00
Nathan Scott
eccdfcd6f8 [XFS] Fix modular XFS builds (Makefile botch).
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-09-08 15:38:52 +10:00
Nathan Scott
20ba02879b [XFS] Remove special Kconfig XFS menu, make XFS options "inline".
Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-09-08 15:34:58 +10:00
Nathan Scott
f016bad6be [XFS] Cleanup some -Wundef flag warnings in the endian macros (thanks
Christoph).

SGI-PV: 942400
SGI-Modid: xfs-linux-melb:xfs-kern:23771a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-09-08 15:30:05 +10:00
Linus Torvalds
0481990b75 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-for-linus-2.6 2005-09-07 17:31:27 -07:00
Linus Torvalds
87129d9626 Merge git://oss.sgi.com:8090/oss/git/xfs-2.6 2005-09-07 17:23:52 -07:00
Miklos Szeredi
0bb6fcc13a [PATCH] pivot_root() circular reference fix
Fix http://bugzilla.kernel.org/show_bug.cgi?id=4857

When pivot_root is called from an init script in an initramfs environment,
it causes a circular reference in the mount tree.

The cause of this is that pivot_root() is not prepared to handle pivoting
an unattached mount.  In an initramfs environment, rootfs is the root of
the namespace, and so it is not attached.

This patch fixes this and related problems, by returning -EINVAL if either
the current root or the new root is detached.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <bigfish@asmallpond.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:58:01 -07:00
Jan Kara
4407c2b6b2 [PATCH] Fix race in do_get_write_access()
attached patch should fix the following race:
     Proc 1                               Proc 2

     __flush_batch()
       ll_rw_block()
                                        do_get_write_access()
					   lock_buffer
                                             jh is only waiting for checkpoint
					     -> b_transaction == NULL ->
					     do nothing
                                           unlock_buffer
    test_set_buffer_locked()
    test_clear_buffer_dirty()
                                           __journal_file_buffer()
                                        change the data
    submit_bh()

and we have sent wrong data to disk...  We now clean the dirty buffer flag
under buffer lock in all cases and hence we know that whenever a buffer is
starting to be journaled we either finish the pending write-out before
attaching a buffer to a transaction or we won't write the buffer until the
transaction is going to be committed.

The test in jbd_unexpected_dirty_buffer() is redundant - remove it.
Furthermore we have to clear the buffer dirty bit under the buffer lock to
prevent races with buffer write-out (and hence prevent returning a buffer with
IO happening).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:57 -07:00
Jan Kara
e39f07c83b [PATCH] Change HFS+ to not use ll_rw_block()
Use block layer predefined function.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:56 -07:00
Jan Kara
096125f31a [PATCH] Change ll_rw_block() calls in UFS
We need to be sure that current data are sent to disk.  Hence we call
ll_rw_block() with SWRITE.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:56 -07:00
Jan Kara
53778ffde6 [PATCH] Change ll_rw_block() calls in Reiser
We need to be sure that current data in buffer are sent to disk.  Hence we
need to call ll_rw_block() with SWRITE.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:56 -07:00
Jan Kara
26707699b5 [PATCH] Change ll_rw_block() calls in JBD
We must be sure that the current data in buffer are sent to disk.  Hence we
have to call ll_rw_block() with SWRITE.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:55 -07:00
Jan Kara
a766223625 [PATCH] Make ll_rw_block() wait for buffer lock
Introduce new ll_rw_block() operation SWRITE meaning that block layer should
wait for the buffer lock and write-out afterwards.  Hence data in buffers at
the time of call are guaranteed to be submitted to the disk.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:55 -07:00
Jan Kara
e6c9f5c188 [PATCH] Fix JBD race in t_forget list handling
Fix race between journal_commit_transaction() and other places as
journal_unmap_buffer() that are adding buffers to transaction's t_forget list.
 We have to protect against such places by holding j_list_lock even when
traversing the t_forget list.  The fact that other places can only add buffers
to the list makes the locking easier.  OTOH the lock ranking complicates the
stuff...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:54 -07:00
Mark Fasheh
cbf0d27a13 [PATCH] kjournald: missing JFS_UNMOUNT check
It seems that kjournald() may be missing a check of the JFS_UNMOUNT flag
before calling schedule().  This showed up in testing of OCFS2 recovery
where our recovery thread would hang in journal_kill_thread() called from
journal_destroy() because kjournald never got a chance to read the flag to
shut down before the schedule().

Zach pointed out the missing check which led me to hack up this trivial
patch.  It's been tested many times now and I have yet to reproduce the
hang, which was happening very regularly before.

<mild rant>
I'm guessing that we could really use some wait_event() calls with helper
functions in, well, most of jbd these days which would make a ton of the
wait code there vastly cleaner.
</mild rant>

As for why this doesn't happen in ext3 (or OCFS2 during normal
mount/unmount of the local nodes journal), I think it may that the specific
timing of events in the ocfs2 recovery thread exposes a race there.
Because ocfs2_replay_journal() is only interested in playing back the
journal, initialization and shutdown happen very quicky with no other
metadata put into that specific journal.

Acked-by: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:54 -07:00
Roman Zippel
328b922786 [PATCH] hfs: NLS support
This adds NLS support to HFS.  Using the kernel options iocharset and codepage
it's possible to map the disk encoding to a local mapping.  If these options
are not used, it falls back to the old direct mapping.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:50 -07:00
Roman Zippel
717dd80e99 [PATCH] hfs: show_options support
This adds support for show_options.  It also fixes some namespace polution in
the hfsplus driver.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:50 -07:00
Roman Zippel
a5e3985fa0 [PATCH] hfs: remove debug code
This removes some old debug code, which is no longer needed.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:50 -07:00
Pekka Enberg
e915fc497a [PATCH] fs: convert kcalloc to kzalloc
This patch converts kcalloc(1, ...) calls to use the new kzalloc() function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:46 -07:00
Miklos Szeredi
e08fc0457a [PATCH] cifs_create() fix
cifs_create() did totally the wrong thing with nd->intent.open.flags:
it interpreted nd->intent.open.flags as the original open flags, not
the one transformed for open_namei().  Also it used the intent data
even if it was not filled in (if called from sys_mknod()).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:44 -07:00
Miklos Szeredi
e922efc342 [PATCH] remove duplicated sys_open32() code from 64bit archs
64 bit architectures all implement their own compatibility sys_open(),
when in fact the difference is simply not forcing the O_LARGEFILE
flag.  So use the a common function instead.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Miklos Szeredi
ab8d11beb4 [PATCH] remove duplicated code from proc and ptrace
Extract common code used by ptrace_attach() and may_ptrace_attach()
into a separate function.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Miklos Szeredi
5e21ccb136 [PATCH] fix enum pid_directory_inos in proc/base.c
This patch fixes wrongly placed elements in the pid_directory_inos
enum.  Also add comment so this mistake is not repeated.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Miklos Szeredi
0494f6ec5d [PATCH] use get_fs_struct() in proc
This patch cleans up proc_cwd_link() and proc_root_link() by factoring
out common code into get_fs_struct().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Miklos Szeredi
09dd17d3e5 [PATCH] namei cleanup
Extract common code into inline functions to make reading easier.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:42 -07:00
Miklos Szeredi
e89bbd3a0b [PATCH] remove iattr.ia_attr_flags
Remove unused ia_attr_flags from struct iattr, and related defines.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:42 -07:00
Coywolf Qi Hunt
736c7b808f [PATCH] alloc_buffer_head() and free_buffer_head() cleanup
Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:41 -07:00
John McCutchan
7ea6040b0e [PATCH] inotify: fix event loss on hardlinked files
People have run into a problem when they do this:

watch (file1, all_events);
watch (file2, some_events);

if file2 is a hard link to file1, some events will be missed because by
default we replace the mask.  The patch below adds a flag IN_MASK_ADD which
will cause inotify to add to the existing mask if present.

Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:39 -07:00
Jesper Juhl
82a25b5670 [PATCH] remove verify_area(): remove fs/umsdos/notes as it only contain a verify_area related note
The file `fs/umsdos/notes' contains only a small note about a possible bug
involving verify_area().  Since umsdos is no longer in the kernel and
verify_area() is also gone, it seems to make sense that this file goes the way
of the Dodo.

After applying this patch the `fs/umsdos/' directory will be empty and can be
removed entirely.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Pekka Enberg
5e5d7a2229 [PATCH] pipe: remove redundant fifo_poll abstraction
Remove a redundant fifo_poll() abstraction from fs/pipe.c and adds a big
fat comment stating we set POLLERR for FIFOs too on Linux unlike most
Unices.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Ravikiran G Thirumalai
6c231b7bab [PATCH] Additions to .data.read_mostly section
Mark variables which are usually accessed for reads with __readmostly.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Dave Johnson
a97c9bf33f [PATCH] fix cramfs making duplicate entries in inode cache
Every time cramfs_lookup() is called to lookup and inode for a dentry,
get_cramfs_inode() will allocate a new inode without checking to see if that
inode already exists in the inode cache.

This is fine the first time, but if the dentry cache entry(ies) associated
with that inode are aged out, but the inode entry is not aged out (which can
be quite common if the inode has buffer cache linked to it), cramfs_lookup()
will be called again and another inode will be allocated and added to the
inode cache creating a duplicate in the inode cache.

The big issue here is that the buffers associated with each inode cache entry
are not shared between the duplicates!

The older inode entries are now orphaned as no dentry points to it and won't
be freed until the buffer cache assoicated with them are first freed.  The
newest entry will have to create all new buffer cache for each part of its
file as the old buffer cache is now orphaned as well.

Patch below fixes this by making get_cramfs_inode() use the inode cache before
blindly creating a new entry every time.  This eliminates the duplicate inodes
and duplicate buffer cache.

Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Eric Dumazet
2832e9366a [PATCH] remove file.f_maxcount
struct file cleanup: f_maxcount has an unique value (INT_MAX).  Just use
the hard-wired value.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:32 -07:00
Tejun Heo
5acd57936c [PATCH] fs: remove redundant timespec_equal test in update_atime()
In update_atime(), timespec_equal() test is done twice in succession and
the second is always false.  This patch removes the second test.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Adrian Bunk
5e1efe4931 [PATCH] jffs/jffs2: remove wrong function prototypes
This patch removes prototypes for the generic_file_open and
generic_file_llseek functions.

Besides being superfluous because they are already present in fs.h, they
were also wrong because the actual functions aren't weak functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Adrian Bunk
919532a545 [PATCH] fs/Kconfig: quota help text updates
This patch contains the following updates to the help texts:
- QUOTA: most people will get the quota utilities from their
         distribution, and if not the mini-HOWTO will tell them
- QFMT_V2: quota utilities 3.01 are no longer recent, they are now
           ancient
           and 3.01 is lower than the minimal version documented in
           Documentation/Changes

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Miklos Szeredi
2b579beec2 [PATCH] proc: link count fix
This patch fixes bug titled "sunrpc as module and bad proc/sys link count"
reported by Jiri Slaby.

The problem was, that only proc_dir_entry->nlink was updated and the
corresponding inode->i_nlink was not.  The fix is to implement the
inode->getattr() method, and update i_nlink (if necessary).

A quick audit of proc code shows that no other attribute changes after
creation.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:28 -07:00
Robert Love
b800685437 [PATCH] fsnotify: hook on removexattr, too
Add fsnotify_xattr() hook to removexattr().

Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: John McCtuchan <ttb@tentacle.dhs.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:27 -07:00
Karsten Wiese
f3ef6f63e5 [PATCH] Speedup FAT filesystem directory reads
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

This speeds up directory reads for large FAT partitions, if the buffercache
has to be filled from the drive. Following values were taken from:

        $ time find path_to_freshly_mounted_fat > /dev/null

on an otherwise idle system.

FAT with 16KB Clusters on IDE attached drive:   Factor  2
FAT with 32KB Clusters on USB2 attached drive:  Factor 10 (!)
Its less than 1/10 slower, if the buffercache is uptodate.

The patch introduces the new function fat_dir_readahead().

fat_dir_readahead() calls sb_breadahead() to readahead a whole cluster,
if the requested sector is the first one in a cluster.
It is usefull to do this, because on FAT directories occupy whole
clusters, with the exception of FAT12/FAT16 root dirs.

Readahead is only done, if the cluster's first sector is not uptodate
to avoid overhead, when the buffer cache is already uptodate.
Note that under memory pressure, the maximal byte count wasted
(read: has to be red from disk twice) is 1 cluster's size.  Thats 64KB.

fat_dir_readahead() is called from fat__get_entry().

There is also an unrelated cleanup at one spot:

        if (bh)
                brelse(bh);

is replaced with:

        brelse(bh);

brelse() can handle NULL pointer arguments by itself.

Signed-off-by: Karsten Wiese <annabellesgarden@yahoo.de>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:26 -07:00
Bruce Allan
f35279d3f7 [PATCH] sunrpc: cache_register can use wrong module reference
When registering an RPC cache, cache_register() always sets the owner as the
sunrpc module.  However, there are RPC caches owned by other modules.  With
the incorrect owner setting, the real owning module can be removed potentially
with an open reference to the cache from userspace.

For example, if one were to stop the nfs server and unmount the nfsd
filesystem, the nfsd module could be removed eventhough rpc.idmapd had
references to the idtoname and nametoid caches (i.e.
/proc/net/rpc/nfs4.<cachename>/channel is still open).  This resulted in a
system panic on one of our machines when attempting to restart the nfs
services after reloading the nfsd module.

The following patch adds a 'struct module *owner' field in struct
cache_detail.  The owner is further assigned to the struct proc_dir_entry
in cache_register() so that the module cannot be unloaded while user-space
daemons have an open reference on the associated file under /proc.

Signed-off-by: Bruce Allan <bwa@us.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:25 -07:00
Mark Bellon
8fc2751beb [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system)
are written to /etc/mtab by the mount command.  The quota tools look there
for the quota strings for their operation.  If, however, /etc/mtab is a
symlink to /proc/mounts (a "good thing" in some environments) the tools
don't write anything - they assume the kernel will take care of things.

While the quota options are sent down to the kernel via the mount system
call and the file system codes handle them properly unfortunately there is
no code to echo the quota strings into /proc/mounts and the quota tools
fail in the symlink case.

The attached patchs modify the EXT[2|3] and JFS codes to add the necessary
hooks.  The show_options function of each file system in these patches
currently deal with only those things that seemed related to quotas;
especially in the EXT3 case more can be done (later?).

Jan Kara also noted the difficulty in moving these changes above the FS
codes responding similarly to myself to Andrew's comment about possible
VFS migration. Issue summary:

 - FS codes have to process the entire string of options anyway.

 - Only FS codes that use quotas must have a show_options function (for
   quotas to work properly) however quotas are only used in a small number
   of FS.

 - Since most of the quota using FS support other options these FS codes
   should have the a show_options function to show those options - and the
   quota echoing becomes virtually negligible.

Based on feedback I have modified my patches from the original:

   JFS a missing patch has been restored to the posting
   EXT[2|3] and JFS always use the show_options function
       - Each FS has at least one FS specific option displayed
       - QUOTA output is under a CONFIG_QUOTA ifdef
       - a follow-on patch will add a multitude of options for each FS
   EXT[2|3] and JFS "quota" is treated as "usrquota"
   EXT3 journalled data check for journalled quota removed
   EXT[2|3] mount when quota specified but not compiled in

 - no changes from my original patch.  I tested the patch and the codes
   warn but

 - still mount.  With all due respection I believe the comments
   otherwise were a

 - misread of the patch.  Please reread/test and comment.  XFS patch
   removed - the XFS team already made the necessary changes EXT3 mixing
   old and new quotas are handled differently (not purely exclusive)

 - if old and new quotas for the same type are used together the old
   type is silently depricated for compatability (e.g.  usrquota and
   usrjquota)

 - mixing of old and new quotas is an error (e.g.  usrjquota and
   grpquota)

Signed-off-by: Mark Bellon <mbellon@mvista.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:23 -07:00
Adrian Bunk
5dd42c262b [PATCH] remove register_ioctl32_conversion and unregister_ioctl32_conversion
All users have been converted.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:20 -07:00
Peter Osterlund
3676347a5e [PATCH] kill bio->bi_set
Jens:

->bi_set is totally unnecessary bloat of struct bio.  Just define a proper
destructor for the bio and it already knows what bio_set it belongs too.

Peter:

Fixed the bugs.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:20 -07:00
Adrian Bunk
022a4a7bbd [PATCH] fs/jbd/: cleanups
This patch contains the following cleanups:
- make needlessly global functions static
- journal.c: remove the unused global function __journal_internal_check
             and move the check to journal_init
- remove the following write-only global variable:
  - journal.c: current_journal
- remove the following unneeded EXPORT_SYMBOL:
  - journal.c: journal_recover

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:19 -07:00
Stephen Rothwell
202e5979af [PATCH] compat: be more consistent about [ug]id_t
When I first wrote the compat layer patches, I was somewhat cavalier about
the definition of compat_uid_t and compat_gid_t (or maybe I just
misunderstood :-)).  This patch makes the compat types much more consistent
with the types we are being compatible with and hopefully will fix a few
bugs along the way.

	compat type		type in compat arch
	__compat_[ug]id_t	__kernel_[ug]id_t
	__compat_[ug]id32_t	__kernel_[ug]id32_t
	compat_[ug]id_t		[ug]id_t

The difference is that compat_uid_t is always 32 bits (for the archs we
care about) but __compat_uid_t may be 16 bits on some.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:19 -07:00
John McCutchan
820249bafe [PATCH] inotify speedup
Bypass an inotify-related fastpath spinlock and several function calls on
systems which have no inotify watches registered.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:19 -07:00
Tom Zanussi
e82894f84d [PATCH] relayfs
Here's the latest version of relayfs, against linux-2.6.11-mm2.  I'm hoping
you'll consider putting this version back into your tree - the previous
rounds of comment seem to have shaken out all the API issues and the number
of comments on the code itself have also steadily dwindled.

This patch is essentially the same as the relayfs redux part 5 patch, with
some minor changes based on reviewer comments.  Thanks again to Pekka
Enberg for those.  The patch size without documentation is now a little
smaller at just over 40k.  Here's a detailed list of the changes:

- removed the attribute_flags in relay open and changed it to a
  boolean specifying either overwrite or no-overwrite mode, and removed
  everything referencing the attribute flags.
- added a check for NULL names in relayfs_create_entry()
- got rid of the unnecessary multiple labels in relay_create_buf()
- some minor simplification of relay_alloc_buf() which got rid of a
  couple params
- updated the Documentation

In addition, this version (through code contained in the relay-apps tarball
linked to below, not as part of the relayfs patch) tries to make it as easy
as possible to create the cooperating kernel/user pieces of a typical and
common type of logging application, one where kernel logging is kicked off
when a user space data collection app starts and stops when the collection
app exits, with the data being automatically logged to disk in between.  To
create this type of application, you basically just include a header file
(relay-app.h, included in the relay-apps tarball) in your kernel module,
define a couple of callbacks and call an initialization function, and on
the user side call a single function that sets up and continuously monitors
the buffers, and writes data to files as it becomes available.  Channels
are created when the collection app is started and destroyed when it exits,
not when the kernel module is inserted, so different channel buffer sizes
can be specified for each separate run via command-line options.  See the
README in the relay-apps tarball for details.

Also included in the relay-apps tarball are a couple examples
demonstrating how you can use this to create quick and dirty kernel
logging/debugging applications.  They are:

- tprintk, short for 'tee printk', which temporarily puts a kprobe on
  printk() and writes a duplicate stream of printk output to a relayfs
  channel.  This could be used anywhere there's printk() debugging code
  in the kernel which you'd like to exercise, but would rather not have
  your system logs cluttered with debugging junk.  You'd probably want
  to kill klogd while you do this, otherwise there wouldn't be much
  point (since putting a kprobe on printk() doesn't change the output
  of printk()).  I've used this method to temporarily divert the packet
  logging output of the iptables LOG target from the system logs to
  relayfs files instead, for instance.

- klog, which just provides a printk-like formatted logging function
  on top of relayfs.  Again, you can use this to keep stuff out of your
  system logs if used in place of printk.

The example applications can be found here:

http://prdownloads.sourceforge.net/dprobes/relay-apps.tar.gz?download

From: Christoph Hellwig <hch@lst.de>

  avoid lookup_hash usage in relayfs

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:18 -07:00
Linus Torvalds
48467641bc Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-05 00:11:50 -07:00
Paolo 'Blaisorblade' Giarrusso
1e40cd383c [PATCH] uml: fixes performance regression in activate_mm and thus exec()
Normally, activate_mm() is called from exec(), and thus it used to be a
no-op because we use a completely new "MM context" on the host (for
instance, a new process), and so we didn't need to flush any "TLB entries"
(which for us are the set of memory mappings for the host process from the
virtual "RAM" file).

Kernel threads, instead, are usually handled in a different way.  So, when
for AIO we call use_mm(), things used to break and so Benjamin implemented
activate_mm().  However, that is only needed for AIO, and could slow down
exec() inside UML, so be smart: detect being called for AIO (via
PF_BORROWED_MM) and do the full flush only in that situation.

Comment also the caller so that people won't go breaking UML without
noticing.  I also rely on the caller's locks for testing current->flags.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:21 -07:00
Stephen Smalley
f549d6c18c [PATCH] Generic VFS fallback for security xattrs
This patch modifies the VFS setxattr, getxattr, and listxattr code to fall
back to the security module for security xattrs if the filesystem does not
support xattrs natively.  This allows security modules to export the incore
inode security label information to userspace even if the filesystem does
not provide xattr storage, and eliminates the need to individually patch
various pseudo filesystem types to provide such access.  The patch removes
the existing xattr code from devpts and tmpfs as it is then no longer
needed.

The patch restructures the code flow slightly to reduce duplication between
the normal path and the fallback path, but this should only have one
user-visible side effect - a program may get -EACCES rather than
-EOPNOTSUPP if policy denied access but the filesystem didn't support the
operation anyway.  Note that the post_setxattr hook call is not needed in
the fallback case, as the inode_setsecurity hook call handles the incore
inode security state update directly.  In contrast, we do call fsnotify in
both cases.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:52 -07:00
Mauricio Lin
e070ad49f3 [PATCH] add /proc/pid/smaps
Add a "smaps" entry to /proc/pid: show howmuch memory is resident in each
mapping.

People that want to perform a memory consumption analysing can use it
mainly if someone needs to figure out which libraries can be reduced for
embedded systems.  So the new features are the physical size of shared and
clean [or dirty]; private and clean [or dirty].

Take a look the example below:

# cat /proc/4576/smaps

08048000-080dc000 r-xp /bin/bash
Size:               592 KB
Rss:                500 KB
Shared_Clean:       500 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:        0 KB
080dc000-080e2000 rw-p /bin/bash
Size:                24 KB
Rss:                 24 KB
Shared_Clean:         0 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:       24 KB
080e2000-08116000 rw-p
Size:               208 KB
Rss:                208 KB
Shared_Clean:         0 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:      208 KB
b7e2b000-b7e34000 r-xp /lib/tls/libnss_files-2.3.2.so
Size:                36 KB
Rss:                 12 KB
Shared_Clean:        12 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:        0 KB
...

(Includes a cleanup from "Richard Purdie" <rpurdie@rpsys.net>)

From: Torsten Foertsch <torsten.foertsch@gmx.net>

show_smap calls first show_map and then prints its additional information to
the seq_file.  show_map checks if all it has to print fits into the buffer and
if yes marks the current vma as written.  While that is correct for show_map
it is not for show_smap.  Here the vma should be marked as written only after
the additional information is also written.

The attached patch cures the problem.  It moves the functionality of the
show_map function to a new function show_map_internal that is called with an
additional struct mem_size_stats* argument.  Then show_map calls
show_map_internal with NULL as struct mem_size_stats* whereas show_smap calls
it with a real pointer.  Now the final

	if (m->count < m->size)  /* vma is copied successfully */
		m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;

is done only if the whole entry fits into the buffer.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:49 -07:00
Christoph Lameter
6e21c8f145 [PATCH] /proc/<pid>/numa_maps to show on which nodes pages reside
This patch was recently discussed on linux-mm:
http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2

I inherited a large code base from Ray for page migration.  There was a
small patch in there that I find to be very useful since it allows the
display of the locality of the pages in use by a process.  I reworked that
patch and came up with a /proc/<pid>/numa_maps that gives more information
about the vma's of a process.  numa_maps is indexes by the start address
found in /proc/<pid>/maps.  F.e.  with this patch you can see the page use
of the "getty" process:

margin:/proc/12008 # cat maps
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-200000000002c000 r-xp 00000000 08:04 516                /lib/ld-2.3.3.so
2000000000038000-2000000000040000 rw-p 00028000 08:04 516                /lib/ld-2.3.3.so
2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842           /lib/tls/libc.so.6.1
2000000000260000-2000000000268000 ---p 00208000 08:04 54707842           /lib/tls/libc.so.6.1
2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842           /lib/tls/libc.so.6.1
2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923            /usr/lib/locale/en_US.utf8/LC_CTYPE
2000000000300000-2000000000308000 r--s 00000000 08:04 60071467           /usr/lib/gconv/gconv-modules.cache
2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399           /sbin/mingetty
6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399           /sbin/mingetty
6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0          [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0          [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]

cat numa_maps
2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1

getty uses ld.so.  The first vma is the code segment which is used by 43
other processes and the pages are evenly distributed over the 4 nodes.

The second vma is the process specific data portion for ld.so.  This is
only one page.

The display format is:

<startaddress>	 Links to information in /proc/<pid>/map
<memory policy>  This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
MaxRef=		<maximum reference to a page in this vma>
Pages=		<Nr of pages in use>
Mapped=		<Nr of pages with mapcount >
Anon=		<nr of anonymous pages>
Nx=		<Nr of pages on Node x>

The content of the proc-file is self-evident.  If this would be tied into
the sparsemem system then the contents of this file would not be too
useful.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:43 -07:00