* git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Fix a bad pointer dereference in the quota statvfs handling.
[XFS] Fix xfs_splice_write() so appended data gets to disk.
[XFS] Fix ABBA deadlock between i_mutex and iolock. Avoid calling
[XFS] Prevent free space oversubscription and xfssyncd looping.
This results in smaller list heads, so that we can have more chains
in the same amount of memory (twice as many). I've multiplied the
size of the table by four though - this is because we are saving
memory by not having one lock per chain any more. So we land up
using about the same amount of memory for the hash table as we
did before I started these changes, the difference being that we
now have four times as many hash chains.
The reason that I say "about the same amount of memory" is that the
actual amount now depends upon the NR_CPUS and some of the config
variables, so that its not exact and in some cases we do use more
memory. Eventually we might want to scale the hash table size
according to the size of physical ram as measured on module load.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The existing implementation of this function in glock.c was not
very efficient as it relied upon keeping a cursor element upon the
hash chain in question and moving it along. This new version improves
upon this by using the current element as a cursor. This is possible
since we only look at the "next" element in the list after we've
taken the read_lock() subsequent to calling the examiner function.
Obviously we have to eventually drop the ref count that we are then
left with and we cannot do that while holding the read_lock, so we
do that next time we drop the lock. That means either just before
we examine another glock, or when the loop has terminated.
The new implementation has several advantages: it uses only a
read_lock() rather than a write_lock(), so it can run simnultaneously
with other code, it doesn't need a "plug" element, so that it removes
a test not only from this list iterator, but from all the other glock
list iterators too. So it makes things faster and smaller.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Add back the consts which were casted away in the glock sorting
function. Also add early exit code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Make the number of locks used for hash chains in glock.c
proportional to NR_CPUS. Also move constants for the number
of hash chains into glock.c from incore.h since they are
not used outside of glock.c.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fixing the following scenario:
- A request is on the waiters list waiting for a reply from a remote node.
- The request is the first one on the resource, so first_lkid is set.
- The remote node fails causing recovery.
- During recovery the requesting node becomes master.
- The request is now processed locally instead of being a remote operation.
- At this point we need to call confirm_master() on the resource since
we're certain we're now the master node. This will clear first_lkid.
- We weren't calling confirm_master(), so first_lkid was not being cleared
causing subsequent requests on that resource to get stuck.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This splits the rwlocks guarding the hash chains of the glock hash
table into their own array. This will reduce memory usage in some
cases due to better alignment, although the real reason for doing it
is to allow the two tables to be different sizes in future (i.e.
the locks will be sized proportionally with the max number of CPUs
and the hash chains sized proportinally with the size of physical memory)
In order to allow this, the gl_bucket member of struct gfs2_glock has
now become gl_hash, so we record the hash rather than a pointer to the
bucket itself.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can
allow data->npages to be one larger than rpages. This causes a page
pointer to be written beyond the end of the pagevec in nfs_read_data (or
nfs_write_data).
Fix this by making nfs_(read|write)_alloc() calculate the size of the
pagevec array, and initialise data->npages.
Also get rid of the redundant argument to nfs_commit_alloc().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has been reported that ext3_getblk() is not doing the right thing and
triggering following WARN():
BUG: warning at fs/ext3/inode.c:1016/ext3_getblk()
<c01c5140> ext3_getblk+0x98/0x2a6 <c03b2806> md_wakeup_thread+0x26/0x2a
<c01c536d> ext3_bread+0x1f/0x88 <c01cedf9> ext3_quota_read+0x136/0x1ae
<c018b683> v1_read_dqblk+0x61/0xac <c0188f32> dquot_acquire+0xf6/0x107
<c01ceaba> ext3_acquire_dquot+0x46/0x68 <c01897d4> dqget+0x155/0x1e7
<c018a97b> dquot_transfer+0x3e0/0x3e9 <c016fe52> dput+0x23/0x13e
<c01c7986> ext3_setattr+0xc3/0x240 <c0120f66> current_fs_time+0x52/0x6a
<c017320e> notify_change+0x2bd/0x30d <c0159246> chown_common+0x9c/0xc5
<c02a222c> strncpy_from_user+0x3b/0x68 <c0167fe6> do_path_lookup+0xdf/0x266
<c016841b> __user_walk_fd+0x44/0x5a <c01592b9> sys_chown+0x4a/0x55
<c015a43c> vfs_write+0xe7/0x13c <c01695d4> sys_mkdir+0x1f/0x23
<c0102a97> syscall_call+0x7/0xb
Looking at the code, it looks like it's not handle HOLE correctly. It ends
up returning -EIO. Here is the patch to fix it.
If we really want to be paranoid, we can allow return values 0 (HOLE), 1
(we asked for one block) and return -EIO for more than 1 block. But I
really don't see a reason for doing it - all we need is the block# here.
(doesn't matter how many blocks are mapped).
ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0
in case of HOLE. ext3_getblk() should handle HOLE properly (currently its
dumping warning stack and returning -EIO).
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As requested by Jan Engelhardt, this removes the typedefs in the
locking module interface and replaces them with void *. Also
since we are changing the interface, I've added a few consts
as well.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This was missed in an earlier patch when changing over from vmalloc
to kmalloc for the superblock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This code is no longer used for anything and can be removed
from the locking modules. The sync_lvb function is not required
as this happens automatically with the current locking system.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes one of the typedefs from the locking interface. It
is replaced by a forward declaration of the gfs2 superblock. The
other two are not so easy to solve since in their case, they
can refer to one of two possible structures.
Cc: David Teigland <teigland@redhat.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There are several reasons why we want to do this:
- Firstly its large and thus we'll scale better with multiple
GFS2 fs mounted at the same time
- Secondly its easier to scale its size as required (thats a plan
for later patches)
- Thirdly, we can use kzalloc rather than vmalloc when allocating
the superblock (its now only 4888 bytes)
- Fourth its all part of my plan to eventually be able to use RCU
with the glock hash.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is another patch preparing for sharing of the glock hash
table between different gfs2 mounts.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf in sysfs show
methods.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf for sysfs show
methods. Per instructions in Documentation/filesystems/sysfs.txt
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
xfs_splice_write() failed to update the on disk inode size when extending
the so when the file was closed the range extended by splice was truncated
off. Hence any region of a file written to by splice would end up as a
hole full of zeros.
SGI-PV: 955939
SGI-Modid: xfs-linux-melb:xfs-kern:26920a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
__blockdev_direct_IO for the DIO_OWN_LOCKING case for direct I/O reads
since it drops and reacquires the i_mutex while holding the iolock and
this violates the locking order.
SGI-PV: 955696
SGI-Modid: xfs-linux-melb:xfs-kern:26898a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
The fix for recent ENOSPC deadlocks introduced certain limitations on
allocations. The fix could cause xfssyncd to loop endlessly if we did not
leave some space free for the allocator to work correctly. Basically, we
needed to ensure that we had at least 4 blocks free for an AG free list
and a block for the inode bmap btree at all times.
However, this did not take into account the fact that each AG has a free
list that needs 4 blocks. Hence any filesystem with more than one AG could
cause oversubscription of free space and make xfssyncd spin forever trying
to allocate space needed for AG freelists that was not available in the
AG.
The following patch reserves space for the free lists in all AGs plus the
inode bmap btree which prevents oversubscription. It also prevents those
blocks from being reported as free space (as they can never be used) and
makes the SMP in-core superblock accounting code and the reserved block
ioctl respect this requirement.
SGI-PV: 955674
SGI-Modid: xfs-linux-melb:xfs-kern:26894a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
CIFS had one path in which dentry was instantiated before the corresponding
inode metadata was filled in.
Fixes Redhat bugzilla bug #163493
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Adds kernel-doc for alloc_super() type in fs/super.c.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ass a comment explaining the slightly odd construct used to
pass error values back.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's follow up emails, here are a few small
fixes which were missed earlier.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, some unused code is removed and
some consts added in the quota code.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, removed some unused code and
removed some brackets which were not required.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request and also a few of my own. It has
been possible to add a few most const to the code as a result of
the change in gfs2_ea_name2type.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, I've added a ',' to the end of
each of the multi-line structures which didn't already have
one (most already did).
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, this should make all the headers
compile on their own by including and/or declaring structures
early.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per comments from Jan Engelhardt, remove redundant casts, redundant
endian conversions, add a smattering of const and rewrite the
dirent_next function in order to avoid as many casts as possible.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Introduce a couple of new constants which make the NFS filehandle
sizes that GFS2 uses a bit clearer. Also fix one or two minor
issues as per Jan Engelhardt's sixth email.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's fifth email. This has most of the changes
recommended, which is the removal of casts which are not required,
some indenting fixes and similar.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per the remainder of Jan Engelhardt's fourth email comments,
remove an cast thats not required. Also tidy up the "limit" code
in stuck_releasepage().
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use const in endian conversion and printing of on-disk structures.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's fourth email, this is the first part of the
change set with a few minor style points.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The remains of the changes for Jan Engelhardt's third email. Remove
a cast and tidy up gfs2_inode_attr_in.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This makes all fixed size types have consistent names.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's third set of comments, this make various
code style changes and moves the structures from format.h into
super.c, which was the only place that format.h was actually used.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>