In the early days of NFS, there was no duplicate reply cache on the server.
Thus retransmitted non-idempotent requests often found that the request had
already completed on the server. To avoid passing an unanticipated return
code to unsuspecting applications, NFS clients would often shunt error
codes that implied the request had been retried but already completed.
Thanks to NFS over TCP, duplicate reply caches on the server, and network
performance and reliability improvements, it is safe to remove such checks.
Test plan:
None.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace xprt_create_proto/rpc_create_client call in NFS server callback
functions to use new rpc_create() API.
Test plan:
NFSv4 delegation functionality tests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Convert NFS client mount logic to use rpc_create() instead of the old
xprt_create_proto/rpc_create_client API.
Test plan:
Mount stress tests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace xprt_create_proto/rpc_create_client with new rpc_create()
interface in the Network Lock Manager.
Note that the semantics of NLM transports is now "hard" instead of "soft"
to provide a better guarantee that lock requests will get to the server.
Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
NLM requests are working correctly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Remove some unused macros related to accessing an RPC peer address
Test plan:
Compile kernel with CONFIG_NFS option enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h.
We can remove xprt.h from source files that already include clnt.h.
Likewise include/linux/sunrpc/timer.h.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the details of how the RPC client stores remote peer addresses from
the Network Lock Manager.
Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Invoke security_d_instantiate() on root dentries after allocating them with
dentry_alloc_anon(). Normally dentry_alloc_root() would do that, but we don't
call that as we don't want to assign a name to the root dentry at this point
(we may discover the real name later).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix an error handling problem: nfs_put_client() can be given a NULL pointer if
nfs_free_server() is asked to destroy a partially initialised record.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make two new proc files available:
/proc/fs/nfsfs/servers
/proc/fs/nfsfs/volumes
The first lists the servers with which we are currently dealing (struct
nfs_client), and the second lists the volumes we have on those servers (struct
nfs_server).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The attached patch makes NFS share superblocks between mounts from the same
server and FSID over the same protocol.
It does this by creating each superblock with a false root and returning the
real root dentry in the vfsmount presented by get_sb(). The root dentry set
starts off as an anonymous dentry if we don't already have the dentry for its
inode, otherwise it simply returns the dentry we already have.
We may thus end up with several trees of dentries in the superblock, and if at
some later point one of anonymous tree roots is discovered by normal filesystem
activity to be located in another tree within the superblock, the anonymous
root is named and materialises attached to the second tree at the appropriate
point.
Why do it this way? Why not pass an extra argument to the mount() syscall to
indicate the subpath and then pathwalk from the server root to the desired
directory? You can't guarantee this will work for two reasons:
(1) The root and intervening nodes may not be accessible to the client.
With NFS2 and NFS3, for instance, mountd is called on the server to get
the filehandle for the tip of a path. mountd won't give us handles for
anything we don't have permission to access, and so we can't set up NFS
inodes for such nodes, and so can't easily set up dentries (we'd have to
have ghost inodes or something).
With this patch we don't actually create dentries until we get handles
from the server that we can use to set up their inodes, and we don't
actually bind them into the tree until we know for sure where they go.
(2) Inaccessible symbolic links.
If we're asked to mount two exports from the server, eg:
mount warthog:/warthog/aaa/xxx /mmm
mount warthog:/warthog/bbb/yyy /nnn
We may not be able to access anything nearer the root than xxx and yyy,
but we may find out later that /mmm/www/yyy, say, is actually the same
directory as the one mounted on /nnn. What we might then find out, for
example, is that /warthog/bbb was actually a symbolic link to
/warthog/aaa/xxx/www, but we can't actually determine that by talking to
the server until /warthog is made available by NFS.
This would lead to having constructed an errneous dentry tree which we
can't easily fix. We can end up with a dentry marked as a directory when
it should actually be a symlink, or we could end up with an apparently
hardlinked directory.
With this patch we need not make assumptions about the type of a dentry
for which we can't retrieve information, nor need we assume we know its
place in the grand scheme of things until we actually see that place.
This patch reduces the possibility of aliasing in the inode and page caches for
inodes that may be accessed by more than one NFS export. It also reduces the
number of superblocks required for NFS where there are many NFS exports being
used from a server (home directory server + autofs for example).
This in turn makes it simpler to do local caching of network filesystems, as it
can then be guaranteed that there won't be links from multiple inodes in
separate superblocks to the same cache file.
Obviously, cache aliasing between different levels of NFS protocol could still
be a problem, but at least that gives us another key to use when indexing the
cache.
This patch makes the following changes:
(1) The server record construction/destruction has been abstracted out into
its own set of functions to make things easier to get right. These have
been moved into fs/nfs/client.c.
All the code in fs/nfs/client.c has to do with the management of
connections to servers, and doesn't touch superblocks in any way; the
remaining code in fs/nfs/super.c has to do with VFS superblock management.
(2) The sequence of events undertaken by NFS mount is now reordered:
(a) A volume representation (struct nfs_server) is allocated.
(b) A server representation (struct nfs_client) is acquired. This may be
allocated or shared, and is keyed on server address, port and NFS
version.
(c) If allocated, the client representation is initialised. The state
member variable of nfs_client is used to prevent a race during
initialisation from two mounts.
(d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find
the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we
are given the root FH in advance.
(e) The volume FSID is probed for on the root FH.
(f) The volume representation is initialised from the FSINFO record
retrieved on the root FH.
(g) sget() is called to acquire a superblock. This may be allocated or
shared, keyed on client pointer and FSID.
(h) If allocated, the superblock is initialised.
(i) If the superblock is shared, then the new nfs_server record is
discarded.
(j) The root dentry for this mount is looked up from the root FH.
(k) The root dentry for this mount is assigned to the vfsmount.
(3) nfs_readdir_lookup() creates dentries for each of the entries readdir()
returns; this function now attaches disconnected trees from alternate
roots that happen to be discovered attached to a directory being read (in
the same way nfs_lookup() is made to do for lookup ops).
The new d_materialise_unique() function is now used to do this, thus
permitting the whole thing to be done under one set of locks, and thus
avoiding any race between mount and lookup operations on the same
directory.
(4) The client management code uses a new debug facility: NFSDBG_CLIENT which
is set by echoing 1024 to /proc/net/sunrpc/nfs_debug.
(5) Clone mounts are now called xdev mounts.
(6) Use the dentry passed to the statfs() op as the handle for retrieving fs
statistics rather than the root dentry of the superblock (which is now a
dummy).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Start rpciod in the server common (nfs_client struct) management code rather
than in the superblock management code. This means we only need to "start" it
once per server instead of once per superblock.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eliminate nfs_server::client_sys in favour of nfs_client::cl_rpcclient as we
only really need one per server that we're talking to since it doesn't have any
security on it.
The retransmission management variables are also moved to the common struct as
they're required to set up the cl_rpcclient connection.
The NFS2/3 client and client_acl connections are thenceforth derived by cloning
the cl_rpcclient connection and post-applying the authorisation flavour.
The code for setting up the initial common connection has been moved to
client.c as nfs_create_rpc_client(). All the NFS program definition tables are
also moved there as that's where they're now required rather than super.c.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're
common to all server records of a particular NFS protocol version.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make better use of inode* dereferencing macros to hide dereferencing chains
(including NFS_PROTO and NFS_CLIENT).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Maintain a common server record for NFS2/3 as well as for NFS4 so that common
stuff can be moved there from struct nfs_server.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add some extra const qualifiers into NFS.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the nominated dentry's superblock directly in the NFS statfs() op to get a
file handle, rather than using s_root (which will become a dummy dentry in a
future patch).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Generalise the nfs_client structure by:
(1) Moving nfs_client to a more general place (nfs_fs_sb.h).
(2) Renaming its maintenance routines to be non-NFS4 specific.
(3) Move those maintenance routines to a new non-NFS4 specific file (client.c)
and move the declarations to internal.h.
(4) Make nfs_find/get_client() take a full sockaddr_in to include the port
number (will be required for NFS2/3).
(5) Make nfs_find/get_client() take the NFS protocol version (again will be
required to differentiate NFS2, 3 & 4 client records).
Also:
(6) Make nfs_client construction proceed akin to inodes, marking them as under
construction and providing a function to indicate completion.
(7) Make nfs_get_client() wait interruptibly if it finds a client that it can
share, but that client is currently being constructed.
(8) Make nfs4_create_client() use (6) and (7) instead of locking cl_sem.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a set_capabilities NFS RPC op so that the server capabilities can be set.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a lookup filehandle NFS RPC op so that a file handle can be looked up
without requiring dentries and inodes and other VFS stuff when doing an NFS4
pathwalk during mounting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Return an error when starting the idmapping pipe so that we can detect it
failing.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename nfs_server::nfs4_state to nfs_client as it will be used to represent the
client state for NFS2 and NFS3 also.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename struct nfs4_client to struct nfs_client so that it can become the basis
for a general client record for NFS2 and NFS3 in addition to NFS4.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make the nfs_callback_up()/down() prototypes just do nothing if NFS4 is not
enabled. Also make the down function void type since we can't really do
anything if it fails.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename the NFS4 version of nfs_stat_to_errno() so that it doesn't conflict with
the common one used by NFS2 and NFS3.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix ups for the splitting of the superblock stuff out of fs/nfs/inode.c,
including:
(*) Move the callback tcpport module param into callback.c.
(*) Move the idmap cache timeout module param into idmap.c.
(*) Changes to internal.h:
(*) namespace-nfs4.c was renamed to nfs4namespace.c.
(*) nfs_stat_to_errno() is in nfs2xdr.c, not nfs4xdr.c.
(*) nfs4xdr.c is contingent on CONFIG_NFS_V4.
(*) nfs4_path() is only uses if CONFIG_NFS_V4 is set.
Plus also:
(*) The sec_flavours[] table should really be const.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The attached patch adds a new directory cache management function that prepares
a disconnected anonymous function to be connected into the dentry tree. The
anonymous dentry is transferred the name and parentage from another dentry.
The following changes were made in [try #2]:
(*) d_materialise_dentry() now switches the parentage of the two nodes around
correctly when one or other of them is self-referential.
The following changes were made in [try #7]:
(*) d_instantiate_unique() has had the interior part split out as function
__d_instantiate_unique(). Callers of this latter function must be holding
the appropriate locks.
(*) _d_rehash() has been added as a wrapper around __d_rehash() to call it
with the most obvious hash list (the one from the name). d_rehash() now
calls _d_rehash().
(*) d_materialise_dentry() is now __d_materialise_dentry() and is static.
(*) d_materialise_unique() added to perform the combination of d_find_alias(),
d_materialise_dentry() and d_add_unique() that the NFS client was doing
twice, all within a single dcache_lock critical section. This reduces the
number of times two different spinlocks were being accessed.
The following further changes were made:
(*) Add the dentries onto their parents d_subdirs lists.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A pinned inode may in theory end up filling memory with cached ACCESS
calls. This patch ensures that the VM may shrink away the cache in these
particular cases.
The shrinker works by iterating through the list of inodes on the global
nfs_access_lru_list, and removing the least recently used access
cache entry until it is done (or until the entire cache is empty).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current access cache only allows one entry at a time to be cached for each
inode. Add a per-inode red-black tree in order to allow more than one to
be cached at a time.
Should significantly cut down the time spent in path traversal for shared
directories such as ${PATH}, /usr/share, etc.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] statfs for cifs unix extensions no longer experimental
[CIFS] New POSIX locking code not setting rc properly to zero on successful
[CIFS] Support deep tree mounts (e.g. mounts to //server/share/path)
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (64 commits)
[BLOCK] dm-crypt: trivial comment improvements
[CRYPTO] api: Deprecate crypto_digest_* and crypto_alg_available
[CRYPTO] padlock: Convert padlock-sha to use crypto_hash
[CRYPTO] users: Use crypto_comp and crypto_has_*
[CRYPTO] api: Add crypto_comp and crypto_has_*
[CRYPTO] users: Use crypto_hash interface instead of crypto_digest
[SCSI] iscsi: Use crypto_hash interface instead of crypto_digest
[CRYPTO] digest: Remove old HMAC implementation
[CRYPTO] doc: Update documentation for hash and me
[SCTP]: Use HMAC template and hash interface
[IPSEC]: Use HMAC template and hash interface
[CRYPTO] tcrypt: Use HMAC template and hash interface
[CRYPTO] hmac: Add crypto template implementation
[CRYPTO] digest: Added user API for new hash type
[CRYPTO] api: Mark parts of cipher interface as deprecated
[PATCH] scatterlist: Add const to sg_set_buf/sg_init_one pointer argument
[CRYPTO] drivers: Remove obsolete block cipher operations
[CRYPTO] users: Use block ciphers where applicable
[SUNRPC] GSS: Use block ciphers where applicable
[IPSEC] ESP: Use block ciphers where applicable
...
We certainly don't need the check for Linux version > 2.5.2, and in fact
we can also live without the __ECOS check, since we can just add it back
in the eCos git tree which is automatically derived from the Linux fs/jffs2
subdirectory in the upstream git tree.
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Fix a bug in the directory reading code, where we might have dereferenced
a NULL pointer in case of OOM. Updated the directory code to use the new
& improved version of gfs2_meta_ra() which now returns the first block
that was being read. Previously it was releasing it requiring following
code to grab the block again at each point it was called.
Also turned off readahead on directory lookups since we are reading a
hash table, and therefore reading the entries in order is very
unlikely. Readahead is still used for all other calls to the
directory reading function (e.g. when growing the hash table).
Removed the DIO_START constant. Everywhere this was used, it was
used to unconditionally start i/o aside from a couple of places, so
I've removed it and made the couple of exceptions to this rule into
separate functions.
Also hunted through the other DIO flags and removed them as arguments
from functions which were always called with the same combination of
arguments.
Updated gfs2_meta_indirect_buffer to be a bit more efficient and
hopefully also be a bit easier to read.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The truncate code was never supposed to BUG() on an allocator it doesn't
know about, but rather to ignore it. Right now, this does nothing, but when
we change our allocation paths to use all suballocator files, this will
allow current versions of the fs module to work fine.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Things have been working pretty well for a while now.
We should've probably done this at least one kernel
revision ago, but it doesn't hurt to be paranoid.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Uptodate.c now knows about read-ahead buffers. Use some more aggressive
logic in ocfs2_readdir().
The two functions which currently use directory read-ahead are
ocfs2_find_entry() and ocfs2_readdir().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We weren't always updating i_mtime on writes, so fix ocfs2_commit_write() to
handle this.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Remove the redundant "i_nlink >= OCFS2_LINK_MAX" check and adds an unlinked
directory check in ocfs2_link().
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The dir nlink check in ocfs2_mknod() was being done outside of the cluster
lock, which means we could have been checking against a stale version of the
inode. Fix this by doing the check after the cluster lock instead.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Every file should #include the headers containing the prototypes for its
global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Support immutable, and other attributes.
Some renaming and other minor fixes done by myself.
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
For all child objects, creation comes through mkdir(2), so duplicate names
are prevented.
Subsystems, though, are registered by client drivers at init_module()/__init
time. This patch prevents duplicate subsystem names.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6:
NFS: Fix nfs_page use after free issues in fs/nfs/write.c
NFSv4: Fix incorrect semaphore release in _nfs4_do_open()
NFS: Fix Oopsable condition in nfs_readpage_sync()
This is an attempt to fix Red Hat bz 204364. I don't hit it all
the time, but with these changes, running postmark which used to
trigger it on a regular basis no longer appears to. So I'm not
saying that its 100% certain that its fixed, but it does look
promising at the moment.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* git://git.infradead.org/mtd-2.6:
[MTD] Use SEEK_{SET,CUR,END} instead of hardcoded values in mtdchar lseek()
MTD: Fix bug in fixup_convert_atmel_pri
[JFFS2][SUMMARY] Fix a summary collecting bug.
[PATCH] [MTD] DEVICES: Fill more device IDs in the structure of m25p80
MTD: Add lock/unlock operations for Atmel AT49BV6416
MTD: Convert Atmel PRI information to AMD format
fs/jffs2/xattr.c: remove dead code
[PATCH] [MTD] Maps: Add dependency on alternate probe methods to physmap
[PATCH] MTD: Add Macronix MX29F040 to JEDEC
[MTD] Fixes of performance and stability issues in CFI driver.
block2mtd.c: Make kernel boot command line arguments work (try 4)
[MTD NAND] Fix lookup error in nand_get_flash_type()
remove #error on !PCI from pmc551.c
MTD: [NAND] Fix the sharpsl driver after breakage from a core conversion
[MTD] NAND: OOB buffer offset fixups
make fs/jffs2/nodelist.c:jffs2_obsolete_node_frag() static
[PATCH] [MTD] NAND: fix dead URL in Kconfig
Fix a performance degradation introduced in 2.6.17. (30% degradation
running dbench with 16 threads)
Commit 21730eed11, which claims to make
EXT2_DEBUG work again, moves the taking of the kernel lock out of
debug-only code in ext2_count_free_inodes and ext2_count_free_blocks and
into ext2_statfs.
The same problem was fixed in ext3 by removing the lock completely (commit
5b11687924)
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lm_interface.h has a few out of the tree clients such as GFS1
and userland tools.
Right now, these clients keeps a copy of the file in their build tree
that can go out of sync.
Move lm_interface.h to include/linux, export it to userland and
clean up fs/gfs2 to use the new location.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
i_blksize got removed in -mm.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix for Red Hat bz 205307. Don't need to lock in readpage if
the higher level code has already grabbed the lock.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is a tidy up of the GFS2 bmap code. The main change is that the
bh is passed to gfs2_block_map allowing the flags to be set directly
rather than having to repeat that code several times in ops_address.c.
At the same time, the extent mapping code from gfs2_extent_map has
been moved into gfs2_block_map. This allows all calls to gfs2_block_map
to map extents in the case that no allocation is taking place. As a
result reads and non-allocating writes should be faster. A quick test
with postmark appears to support this.
There is a limit on the number of blocks mapped in a single bmap
call in that it will only ever map blocks which are pointed to
from a single pointer block. So in other words, it will never try
to do additional i/o in order to satisfy read-ahead. The maximum
number of blocks is thus somewhat less than 512 (the GFS2 4k block
size minus the header divided by sizeof(u64)). I've further limited
the mapping of "normal" blocks to 32 blocks (to avoid extra work)
since readpages() will currently read a maximum of 32 blocks ahead (128k).
Some further work will probably be needed to set a suitable value
for DIO as well, but for now thats left at the maximum 512 (see
ops_address.c:gfs2_get_block_direct).
There is probably a lot more that can be done to improve bmap for GFS2,
but this is a good first step.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Print an error message if mount fails in setting up the sysfs files.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In some special case (padding because of sync or umount) it can be possible
that summary information is not fit to the end of the erase block. In
these cases the collecting of summary is disabled for this erase block.
The problem was that this was not respected by jffs2_sum_add_kvec(). This
patch fix this bug.
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ext3-get-blocks support caused ~20% degrade in Sequential read
performance (tiobench). Problem is with marking the buffer boundary
so IO can be submitted right away. Here is the patch to fix it.
2.6.18-rc6:
-----------
# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s
real 1m15.285s
user 0m0.276s
sys 0m3.884s
2.6.18-rc6 + fix:
-----------------
[root@elm3a241 ~]# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s
The boundary block check in ext3_get_blocks_handle needs to be adjusted
against the count of blocks mapped in this call, now that it can map
more than one block.
Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com>
Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Inodes earlier than the 'first' inode (e.g. journal, resize) should be
rejected early - except the root inode. Also inode numbers that are too
big should be rejected early.
[akpm@osdl.org: cleanup]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This prevents bad inode numbers from triggering errors in ext2_get_inode.
[akpm@osdl.org: speedup, cleanup]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In some special case (padding because of sync
or umount) it can be possible that summary
information is not fit to the end of the erase
block. In these cases the collecting of summary
is disabled for this erase block.
The problem was that this was not respected
by jffs2_sum_add_kvec(). This patch fix this
bug.
From: Zoltan Sogor <weth@inf.u-szeged.hu>
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
A one liner bug fix to prevent the return value being
wrong when more than one superblock is mounted.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Based upon previous feedback from lkml and also removing some
commented out debugging which is no longer needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use atomic_t as the ref count in glocks rather than a kref.
This is another step towards using RCU for the glock hash.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Fix a bad pointer dereference in the quota statvfs handling.
[XFS] Fix xfs_splice_write() so appended data gets to disk.
[XFS] Fix ABBA deadlock between i_mutex and iolock. Avoid calling
[XFS] Prevent free space oversubscription and xfssyncd looping.
This results in smaller list heads, so that we can have more chains
in the same amount of memory (twice as many). I've multiplied the
size of the table by four though - this is because we are saving
memory by not having one lock per chain any more. So we land up
using about the same amount of memory for the hash table as we
did before I started these changes, the difference being that we
now have four times as many hash chains.
The reason that I say "about the same amount of memory" is that the
actual amount now depends upon the NR_CPUS and some of the config
variables, so that its not exact and in some cases we do use more
memory. Eventually we might want to scale the hash table size
according to the size of physical ram as measured on module load.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The existing implementation of this function in glock.c was not
very efficient as it relied upon keeping a cursor element upon the
hash chain in question and moving it along. This new version improves
upon this by using the current element as a cursor. This is possible
since we only look at the "next" element in the list after we've
taken the read_lock() subsequent to calling the examiner function.
Obviously we have to eventually drop the ref count that we are then
left with and we cannot do that while holding the read_lock, so we
do that next time we drop the lock. That means either just before
we examine another glock, or when the loop has terminated.
The new implementation has several advantages: it uses only a
read_lock() rather than a write_lock(), so it can run simnultaneously
with other code, it doesn't need a "plug" element, so that it removes
a test not only from this list iterator, but from all the other glock
list iterators too. So it makes things faster and smaller.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Add back the consts which were casted away in the glock sorting
function. Also add early exit code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Make the number of locks used for hash chains in glock.c
proportional to NR_CPUS. Also move constants for the number
of hash chains into glock.c from incore.h since they are
not used outside of glock.c.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fixing the following scenario:
- A request is on the waiters list waiting for a reply from a remote node.
- The request is the first one on the resource, so first_lkid is set.
- The remote node fails causing recovery.
- During recovery the requesting node becomes master.
- The request is now processed locally instead of being a remote operation.
- At this point we need to call confirm_master() on the resource since
we're certain we're now the master node. This will clear first_lkid.
- We weren't calling confirm_master(), so first_lkid was not being cleared
causing subsequent requests on that resource to get stuck.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This splits the rwlocks guarding the hash chains of the glock hash
table into their own array. This will reduce memory usage in some
cases due to better alignment, although the real reason for doing it
is to allow the two tables to be different sizes in future (i.e.
the locks will be sized proportionally with the max number of CPUs
and the hash chains sized proportinally with the size of physical memory)
In order to allow this, the gl_bucket member of struct gfs2_glock has
now become gl_hash, so we record the hash rather than a pointer to the
bucket itself.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can
allow data->npages to be one larger than rpages. This causes a page
pointer to be written beyond the end of the pagevec in nfs_read_data (or
nfs_write_data).
Fix this by making nfs_(read|write)_alloc() calculate the size of the
pagevec array, and initialise data->npages.
Also get rid of the redundant argument to nfs_commit_alloc().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has been reported that ext3_getblk() is not doing the right thing and
triggering following WARN():
BUG: warning at fs/ext3/inode.c:1016/ext3_getblk()
<c01c5140> ext3_getblk+0x98/0x2a6 <c03b2806> md_wakeup_thread+0x26/0x2a
<c01c536d> ext3_bread+0x1f/0x88 <c01cedf9> ext3_quota_read+0x136/0x1ae
<c018b683> v1_read_dqblk+0x61/0xac <c0188f32> dquot_acquire+0xf6/0x107
<c01ceaba> ext3_acquire_dquot+0x46/0x68 <c01897d4> dqget+0x155/0x1e7
<c018a97b> dquot_transfer+0x3e0/0x3e9 <c016fe52> dput+0x23/0x13e
<c01c7986> ext3_setattr+0xc3/0x240 <c0120f66> current_fs_time+0x52/0x6a
<c017320e> notify_change+0x2bd/0x30d <c0159246> chown_common+0x9c/0xc5
<c02a222c> strncpy_from_user+0x3b/0x68 <c0167fe6> do_path_lookup+0xdf/0x266
<c016841b> __user_walk_fd+0x44/0x5a <c01592b9> sys_chown+0x4a/0x55
<c015a43c> vfs_write+0xe7/0x13c <c01695d4> sys_mkdir+0x1f/0x23
<c0102a97> syscall_call+0x7/0xb
Looking at the code, it looks like it's not handle HOLE correctly. It ends
up returning -EIO. Here is the patch to fix it.
If we really want to be paranoid, we can allow return values 0 (HOLE), 1
(we asked for one block) and return -EIO for more than 1 block. But I
really don't see a reason for doing it - all we need is the block# here.
(doesn't matter how many blocks are mapped).
ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0
in case of HOLE. ext3_getblk() should handle HOLE properly (currently its
dumping warning stack and returning -EIO).
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As requested by Jan Engelhardt, this removes the typedefs in the
locking module interface and replaces them with void *. Also
since we are changing the interface, I've added a few consts
as well.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This was missed in an earlier patch when changing over from vmalloc
to kmalloc for the superblock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This code is no longer used for anything and can be removed
from the locking modules. The sync_lvb function is not required
as this happens automatically with the current locking system.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes one of the typedefs from the locking interface. It
is replaced by a forward declaration of the gfs2 superblock. The
other two are not so easy to solve since in their case, they
can refer to one of two possible structures.
Cc: David Teigland <teigland@redhat.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There are several reasons why we want to do this:
- Firstly its large and thus we'll scale better with multiple
GFS2 fs mounted at the same time
- Secondly its easier to scale its size as required (thats a plan
for later patches)
- Thirdly, we can use kzalloc rather than vmalloc when allocating
the superblock (its now only 4888 bytes)
- Fourth its all part of my plan to eventually be able to use RCU
with the glock hash.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is another patch preparing for sharing of the glock hash
table between different gfs2 mounts.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf in sysfs show
methods.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf for sysfs show
methods. Per instructions in Documentation/filesystems/sysfs.txt
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
xfs_splice_write() failed to update the on disk inode size when extending
the so when the file was closed the range extended by splice was truncated
off. Hence any region of a file written to by splice would end up as a
hole full of zeros.
SGI-PV: 955939
SGI-Modid: xfs-linux-melb:xfs-kern:26920a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
__blockdev_direct_IO for the DIO_OWN_LOCKING case for direct I/O reads
since it drops and reacquires the i_mutex while holding the iolock and
this violates the locking order.
SGI-PV: 955696
SGI-Modid: xfs-linux-melb:xfs-kern:26898a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
The fix for recent ENOSPC deadlocks introduced certain limitations on
allocations. The fix could cause xfssyncd to loop endlessly if we did not
leave some space free for the allocator to work correctly. Basically, we
needed to ensure that we had at least 4 blocks free for an AG free list
and a block for the inode bmap btree at all times.
However, this did not take into account the fact that each AG has a free
list that needs 4 blocks. Hence any filesystem with more than one AG could
cause oversubscription of free space and make xfssyncd spin forever trying
to allocate space needed for AG freelists that was not available in the
AG.
The following patch reserves space for the free lists in all AGs plus the
inode bmap btree which prevents oversubscription. It also prevents those
blocks from being reported as free space (as they can never be used) and
makes the SMP in-core superblock accounting code and the reserved block
ioctl respect this requirement.
SGI-PV: 955674
SGI-Modid: xfs-linux-melb:xfs-kern:26894a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
CIFS had one path in which dentry was instantiated before the corresponding
inode metadata was filled in.
Fixes Redhat bugzilla bug #163493
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Adds kernel-doc for alloc_super() type in fs/super.c.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ass a comment explaining the slightly odd construct used to
pass error values back.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's follow up emails, here are a few small
fixes which were missed earlier.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, some unused code is removed and
some consts added in the quota code.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, removed some unused code and
removed some brackets which were not required.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request and also a few of my own. It has
been possible to add a few most const to the code as a result of
the change in gfs2_ea_name2type.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, I've added a ',' to the end of
each of the multi-line structures which didn't already have
one (most already did).
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, this should make all the headers
compile on their own by including and/or declaring structures
early.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per comments from Jan Engelhardt, remove redundant casts, redundant
endian conversions, add a smattering of const and rewrite the
dirent_next function in order to avoid as many casts as possible.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Introduce a couple of new constants which make the NFS filehandle
sizes that GFS2 uses a bit clearer. Also fix one or two minor
issues as per Jan Engelhardt's sixth email.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's fifth email. This has most of the changes
recommended, which is the removal of casts which are not required,
some indenting fixes and similar.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per the remainder of Jan Engelhardt's fourth email comments,
remove an cast thats not required. Also tidy up the "limit" code
in stuck_releasepage().
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use const in endian conversion and printing of on-disk structures.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's fourth email, this is the first part of the
change set with a few minor style points.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The remains of the changes for Jan Engelhardt's third email. Remove
a cast and tidy up gfs2_inode_attr_in.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This makes all fixed size types have consistent names.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's third set of comments, this make various
code style changes and moves the structures from format.h into
super.c, which was the only place that format.h was actually used.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's second email, this removes some unused code,
and fixes up indenting in various places.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Missed a place where I forgot to convert kfree() to kmem_cache_free() as
part of jbd-manage-its-own-slab changes.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.
The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].
I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Check if the FORCEFREE flag has been provided from user space. If so, set
the force option to dlm_release_lockspace() so that any remaining locks
will be freed.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes three main bugs. Firstly the direct i/o get_block
was returning the wrong return code in certain cases. Secondly, the
GFS2's releasepage function was not dealing with cases when clean,
ordered buffers were found still queued on a transaction (which can
happen depending on the ordering of journal flushes). Thirdly, the
journaling code itself needed altering to take account of the
after effects of removing the clean ordered buffers from the transactions
before a journal flush.
The releasepage bug did also show up under "normal" buffered i/o
as well, so its not just a fix for direct i/o. In fact its not
normally used in the direct i/o path at all, except when flushing
existing buffers after performing a direct i/o write, but that was
the code path that led us to spot this.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds the superblock as a key for glock lookups. Since the glocks
are already stored in a per-superblock table, this has no effect at
the moment. Later on this will change though.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We can take advantage of the slab allocator to ensure that all the list
heads and the spinlock (plus one or two other fields) are initialised
by slab to speed up allocation of glocks.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove the unused sync feature from glocks. This is currently done by
calling the required functions to sync pages/blocks directly so this
code isn't needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
For all the usual reasons of enforcing correctness and potentially
reducing code size, this patch makes the glock operations const.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
conversion.
Since bma.conv is a char and XFS_BMAPI_CONVERT is 0x1000, bma.conv was
always assigned zero. Spotted by the GNU C compiler (SVN version).
SGI-PV: 947312
SGI-Modid: xfs-linux-melb:xfs-kern:26887a
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Nathan Scott <nathans@sgi.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Do not send Query All EAs SMB when mount option nouser_xattr
[CIFS] endian errors in lanman protocol support
[CIFS] Fix oops in cifs_close due to unitialized lock sem and list in
[CIFS] Fix oops when negotiating lanman and no password specified
[CIFS]
[CIFS] Allow cifsd to suspend if connection is lost
[CIFS] Make midState usage more consistent
[CIFS] spinlock protect read of last srv response time in timeout path
[CIFS] Do not time out posix brl requests when using new posix setfileinfo
None of the other /proc/meminfo lines have a space in the identifier. This
post-2.6.17 addition has the potential to break existing parsers, so use an
underscore instead (like Committed_AS).
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes the locking error noticed by lockdep:
=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
init/1 is trying to acquire lock:
(&sighand->siglock){....}, at: [<c047a78a>] flush_old_exec+0x3ae/0x859
but task is already holding lock:
(&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
other info that might help us debug this:
2 locks held by init/1:
#0: (tasklist_lock){..--}, at: [<c047a76a>] flush_old_exec+0x38e/0x859
#1: (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
stack backtrace:
[<c04051e1>] show_trace_log_lvl+0x54/0xfd
[<c040579d>] show_trace+0xd/0x10
[<c04058b6>] dump_stack+0x19/0x1b
[<c043b33a>] __lock_acquire+0x773/0x997
[<c043bacf>] lock_acquire+0x4b/0x6c
[<c060630b>] _spin_lock+0x19/0x28
[<c047a78a>] flush_old_exec+0x3ae/0x859
[<c0498053>] load_elf_binary+0x4aa/0x1628
[<c0479cab>] search_binary_handler+0xa7/0x24e
[<c047b577>] do_execve+0x15b/0x1f9
[<c04022b4>] sys_execve+0x29/0x4d
[<c0403faf>] syscall_call+0x7/0xb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs seems to have another locking level layer for the i_mutex due to the
xattrs-are-a-directory thing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JBD currently allocates commit and frozen buffers from slabs. With
CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page
boundary causing IO problems.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127
So, instead of allocating these from regular slabs - manage allocation from
its own slabs and disable slab debug for these slabs.
[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix two compile failures in eventpoll.c code which would happen if
DEBUG_EPOLL is bigger than zero.
Signed-off-by: Masoud Sharbiani <masouds@google.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) When we allocated last fragment in ufs_truncate, we read page, check
if block mapped to address, and if not trying to allocate it. This is
wrong behaviour, fragment may be NOT allocated, but mapped, this
happened because of "block map" function not checked allocated fragment
or not, it just take address of the first fragment in the block, add
offset of fragment and return result, this is correct behaviour in
almost all situation except call from ufs_truncate.
2) Almost all implementation of UFS, which I can investigate have such
"defect": if you have full disk, and try truncate file, for example 3GB
to 2MB, and have hole in this region, truncate return -ENOSPC. I tried
evade from this problem, but "block allocation" algorithm is tied to
right value of i_lastfrag, and fix of this corner case may slow down of
ordinaries scenarios, so this patch makes behavior of "truncate"
operations similar to what other UFS implementations do.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On UFS, this scenario:
open(O_TRUNC)
lseek(1024 * 1024 * 80)
write("A")
lseek(1024 * 2)
write("A")
may cause access to invalid address.
This happened because of "goal" is calculated in wrong way in block
allocation path, as I see this problem exists also in 2.4.
We use construction like this i_data[lastfrag], i_data array of pointers to
direct blocks, indirect and so on, it has ceratain size ~20 elements, and
lastfrag may have value for example 40000.
Also this patch fixes related to handling such scenario issues, wrong
zeroing metadata, in case of block(not fragment) allocation, and wrong goal
calculation, when we allocate block
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To handle the earlier bogus ENOSPC error caused by filesystem full of block
reservation, current code falls back to non block reservation, starts to
allocate block(s) from the goal allocation block group as if there is no
block reservation.
Current code needs to re-load the corresponding block group descriptor for
the initial goal block group in this case. The patch fixes this.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mounting an ext2 filesystem with zero s_inodes_per_group will cause a
divide error.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>