Fix GFS2's need_sync()'s use of do_div() on an s64 by using div_s64() instead.
This does assume that gt_quota_scale_den can be cast to an s32.
This was introduced by patch b3b94faa5f.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The implementation of gfs2_inode_attr_in is removed.
So remove its declaration.
Signed-off-by: Li Xiaodong <lixd@cn.fujitsu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The ability to mark files for direct i/o access when opened
normally is both unused and pointless, so this patch removes
support for that feature.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes the "recent list" which is used during allocation
and replaces it with the (already existing) mru list used during
deletion. The "recent list" was not a true mru list leading to a number
of inefficiencies including a "next" function which made scanning the
list an order N^2 operation wrt to the number of list elements.
This should increase allocation performance with large numbers of rgrps.
Its also a useful preparation and cleanup before some further changes
which are planned in this area.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We already allow local SH locks while we hold a cached EX glock, so here
we allow DF locks as well. This works only because we rely on the VFS's
invalidation for locally cached data, and because if we hold an EX lock,
then we know that no other node can be caching data relating to this
file.
It dramatically speeds up initial writes to O_DIRECT files since we fall
back to buffered I/O for this and would otherwise bounce between DF and
EX modes on each and every write call. The lessons to be learned from
that are to ensure that (for the time being anyway) O_DIRECT files are
preallocated and that they are written to using reasonably large I/O
sizes. Even so this change fixes that corner case nicely
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There is a race in the delayed demote code where it does the wrong thing
if a demotion to UN has occurred for other reasons before the delay has
expired. This patch adds an assert to catch that condition as well as
fixing the root cause by adding an additional check for the UN state.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
GFS2 calls permission() to verify permissions after locks on the files
have been taken.
For this it's sufficient to call gfs2_permission() instead. This
results in the following changes:
- IS_RDONLY() check is not performed
- IS_IMMUTABLE() check is not performed
- devcgroup_inode_permission() is not called
- security_inode_permission() is not called
IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
flag should provide protection against read-only remounts during
operations. do_gfs2_set_flags() has been fixed to perform
mnt_want_write()/mnt_drop_write() to protect against remounting
read-only.
IS_IMMUTABLE has been added to gfs2_permission()
Repeating the security checks seems to be pointless, as they don't
normally change, and if they do, it's independent of the filesystem
state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.
I moved them all over in a single patch to avoid unbisectable sections.
Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.
Do we need a special lock for the pos/f_version = 0 checks?
Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.
v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
and factor duplicated code (suggested by hch)
Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
I discovered that we had a list onto which every lock_dlm
lock was being put. Its only function was to discover whether
we'd got any locks left after umount. Since there was already
a counter for that purpose as well, I removed the list. The
saving is sizeof(struct list_head) per glock - well worth
having.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There are several reasons why this is undesirable:
1. It never happens during normal operation anyway
2. If it does happen it causes performance to be very, very poor
3. It isn't likely to solve the original problem (memory shortage
on remote DLM node) it was supposed to solve
4. It uses a bunch of arbitrary constants which are unlikely to be
correct for any particular situation and for which the tuning seems
to be a black art.
5. In an N node cluster, only 1/N of the dropped locked will actually
contribute to solving the problem on average.
So all in all we are better off without it. This also makes merging
the lock_dlm module into GFS2 a bit easier.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes Red Hat bugzilla bug 450156.
This started with a not-too-improbable mount failure because the
locking protocol was never set back to its proper "lock_dlm" after the
system was rebooted in the middle of a gfs2_fsck. That left a
(purposely) invalid locking protocol in the superblock, which caused an
error when the file system was mounted the next time.
When there's an error mounting, vfs calls DQUOT_OFF, which calls
vfs_quota_off which calls gfs2_sync_fs. Next, gfs2_sync_fs calls
gfs2_log_flush passing s_fs_info. But due to the error, s_fs_info
had been previously set to NULL, and so we have the kernel oops.
My solution in this patch is to test for the NULL value before passing
it. I tested this patch and it fixes the problem.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The previous attempt to fix the locking in readpage failed due
to the use of a "try lock" which resulted in occasional high
cpu usage during testing (due to repeated tries) and also it
did not resolve all the ordering problems wrt the transaction
lock (although it did solve all the inode lock ordering problems).
This patch avoids the problem by unlocking the page and getting the
locks in the correct order. This means that we have to retest the
page to ensure that it hasn't changed when we relock the page.
This now passes the tests which were previously failing.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The patch to remove lock_nolock managed to get the arguments
of this list_add backwards. This fixes it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch merges the lock_nolock module into GFS2 itself. As well as removing
some of the overhead of the module, it also means that its now impossible to
build GFS2 without a lock module (which would be a pointless thing to do
anyway).
We also plan to merge lock_dlm into GFS2 in the future, but that is a more
tricky task, and will therefore be a separate patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>
This looks like a lot of change, but in fact its not. Mostly its
things moving from one file to another. The change is just that
instead of queuing lock completions and callbacks from the DLM
we now pass them directly to GFS2.
This gives us a net loss of two list heads per glock (a fair
saving in memory) plus a reduction in the latency of delivering
the messages to GFS2, plus we now have one thread fewer as well.
There was a bug where callbacks and completions could be delivered
in the wrong order due to this unnecessary queuing which is fixed
by this patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
This patch implements a number of cleanups to the core of the
GFS2 glock code. As a result a lot of code is removed. It looks
like a really big change, but actually a large part of this patch
is either removing or moving existing code.
There are some new bits too though, such as the new run_queue()
function which is considerably streamlined. Highlights of this
patch include:
o Fixes a cluster coherency bug during SH -> EX lock conversions
o Removes the "glmutex" code in favour of a single bit lock
o Removes the ->go_xmote_bh() for inodes since it was duplicating
->go_lock()
o We now only use the ->lm_lock() function for both locks and
unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
o The fast path is considerably shortly, giving performance gains
especially with lock_nolock
o The glock_workqueue is now used for all the callbacks from the DLM
which allows us to simplify the lock_dlm module (see following patch)
o The way is now open to make further changes such as eliminating the two
threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
scheme.
This patch has undergone extensive testing with various test suites
so it should be pretty stable by now.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
This patch fixes bz 450641.
This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to
handle kernel paging request at ffff81002690e000.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
strlcpy is faster than snprintf when you don't use the returned value.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes bz 444829 where allocating a new block caused gfs2 file systems to
report 0 bytes used in df. It was caused by a broken cast from an unsigned int
in gfs2_block_alloc() to a negative s64 in gfs2_statfs_change(). This patch
casts the unsigned int to an s64 before the unary minus is applied.
Signed-off-by: Andrew Price <andy@andrewprice.me.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes a GFS2 filesystem consistency error reported from
function do_strip. The problem was caused by a timing window
that allowed two vfs inodes to be created in memory that point
to the same file. The problem is fixed by making the vfs's
iget_test, iget_set mechanism check and set a new bit in the
in-core gfs2_inode structure while the vfs inode spin_lock is held.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nothing in the tree uses nopage any more. Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the code that handles cluster posix locks from gfs2 into the dlm
so that it can be used by both gfs2 and ocfs2.
Signed-off-by: David Teigland <teigland@redhat.com>
There are several places where GFP_KERNEL allocations happen under a glock,
which will result in hangs if we're under memory pressure and go to re-enter the
fs in order to flush stuff out. This patch changes the culprits to GFS_NOFS to
keep this problem from happening. Thank you,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The function gfs2_inode_lookup always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.
The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)
//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@
E = gfs2_inode_lookup(...)
... when != E = E1
if@p (E) S else S1
@n@
position a.p;
expression E,E1;
statement S,S1;
@@
E = NULL
... when != E = E1
if@p (E) S else S1
@depends on !n@
expression E;
statement S,S1;
position a.p;
@@
* if@p (E)
S else S1
//</smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2 wasn't invalidating its cache before it called into the lock manager
with a request that could potentially drop a lock. This was leaving a
window where the lock could be actually be held by another node, but the
file's page cache would still appear valid, causing coherency problems.
This patch moves the cache invalidation to before the lock manager call
when dropping a lock. It also adds the option to the lock_dlm lock
manager to not use conversion mode deadlock avoidance, which, on a
conversion from shared to exclusive, could internally drop the lock, and
then reacquire in. GFS2 now asks lock_dlm to not do this. Instead, GFS2
manually drops the lock and reacquires it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
fs/gfs2/recovery.c: In function 'get_log_header':
fs/gfs2/recovery.c:152: warning: 'lh.lh_sequence' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_flags' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_tail' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_blkno' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_hash' may be used uninitialized in this function
Cc: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This version of the gfs2_bitfit algorithm includes the latest
suggestions from Steve Whitehouse. It is typically eight to
ten times faster than the version we're using today. If there
is a lot of metadata mixed in (lots of small files) the
algorithm is often 15 times faster, and given the right
conditions, I've seen peaks of 20 times faster.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch streamlines the quota checking in the "no quota" case by
making the check inline in the calling function, thus reducing the
number of function calls. Eventually we might be able to remove the
checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but
currently we can't as there are a very few places in the code which need
to call these functions directly still.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
In an earlier patch "[GFS2] fix file_system_type leak on gfs2meta mount"
we removed the code to grab a ref to the module which was not needed
(since we know that the module cannot be unloaded at that time) so
this patch removes the code to drop that reference.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch combines the 2 patches in bug 434736 to correct the lock
ordering in the unstuffing of the quota inode in gfs2_adjust_quota and
adjusting the number of revokes in gfs2_write_jdata_pagevec
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
gfs2_alloc_get may fail so we have to check it to prevent
NULL pointer dereference.
Signed-off-by: Cyrill Gorcunov <gorcunov@gamil.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We need to ensure that sector_t is 64bits for GFS2, so that we need to
depend on LBD as well as LSF.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
a previous commit removed call to
init_special_inode from inode lookuping, this cause problems as:
# mknod /mnt/gfs2/dev/null c 1 3
# cat /mnt/gfs2/dev/null
cat: /mnt/gfs2/dev/null: Invalid argument
without special inode, GFS2 cannot support char device file,
block device file, fifo pipe, and socket file, lose many important
features as a common file system.
this one line patch re add special inode support.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
struct inode_operations gfs2_dev_iops is always the same as gfs2_file_iops,
since Jan 2006, when GFS2 merged into mainstream kernel.
So one of them could be removed.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
get_gfs2_sb does a get_fs_type without doing a put_filesystem and
thus leaking a file_system_type reference everytime it's called.
Just use gfs2_fs_type directly instead of doing the lookup and thus
fix the problem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We've supported mapping of extents when no block allocation is required
for some time. This patch extends that to mapping of extents when an
allocation has been requested. In that case we try to allocate as many
blocks as are requested, but we might return fewer in case there is
something preventing us from returning the complete amount (e.g. an
already allocated block is in the way).
Currently the only code path which can actually request multiple data
blocks in a single bmap call is the page_mkwrite path and even then it
only happens if there are multiple blocks per page. What this patch does
do however, is merge the allocation requests for metadata (growing the
metadata tree in either height or depth) with the allocation of the data
blocks in the case that both are needed. This results in lower overheads
even in the single block allocation case.
The one thing which we can't handle here at the moment is unstuffing. I
would like to be able to do that, but the problem which arises is that
in order to unstuff one has to get a locked page from the page cache
which results in locking problems in the (usual) case that the caller is
holding the page lock on the page it wishes to map. So that case will
have to be addressed in future patches.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We've previously been using a "try lock" in readpage on the basis that
it would prevent deadlocks due to the inverted lock ordering (our normal
lock ordering is glock first and then page lock). Unfortunately tests
have shown that this isn't enough. If the glock has a demote request
queued such that run_queue() in the glock code tries to do a demote when
its called under readpage then it will try and write out all the dirty
pages which requires locking them. This then deadlocks with the page
locked by readpage.
The solution is to always require two calls into readpage. The first
unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
second does the actual readpage and unlocks the glock & page as
required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch adds a proper extern declaration for gdlm_ops in
fs/gfs2/locking/dlm/lock_dlm.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Every file should include the headers containing the prototypes for
its global functions (in this case for gfs2_set_inode_flags()).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As a result of an earlier patch, drop_bh was being called in cases
when it shouldn't have been. Since we never have a gh in the drop
case and we always have a gh in the promote case, we can use that
extra information to tell which case has been seen.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
In the case that we needed to grow the height of the metadata tree
we were looking up the inode buffer and then brelse()ing it despite
the fact that it is needed later in the block map process.
This patch ensures that we look up the inode's buffer once and only
once during the block map process.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The blocks counter is almost a duplicate of the i_blocks
field in the VFS inode. The only difference is that i_blocks
can be only 32bits long for 32bit arch without large single file
support. Since GFS2 doesn't handle the non-large single file
case (for 32 bit anyway) this adds a new config dependency on
64BIT || LSF. This has always been the case, however we've never
explicitly said so before.
Even if we do add support for the non-LSF case, we will still
not require this field to be duplicated since we will not be
able to access oversized files anyway.
So the net result of all this is that we shave 8 bytes from a gfs2_inode
and get our config deps correct.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds a function (currently the only use is during mapping
of already allocated blocks, but watch this space) which iterates
over a number of pointers in a block and returns the extent length.
If the initial pointer is 0 (i.e. unallocated) it will return the
number of unallocated blocks in the extent. If the initial pointer
is allocated, then it returns the number of contiguously allocated
blocks in the extent.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Rather than having to allocate a single block at a time, this patch
allows the block allocator to allocate an extent. Since there is
no difference (so far as the block allocator is concerned) between
data blocks and indirect blocks, it is posible to allocate a single
extent and for the caller to unrevoke just the blocks required
for indirect blocks.
Currently the only bit of GFS2 to make use of this feature is the
build height function. The intention is that gfs2_block_map will
be changed to make use of this feature in future patches.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Thanks to the preceeding patches, the only difference between
these two functions is their name. We can thus merge them
and call the new function gfs2_alloc_block to reflect the
fact that it can allocate either kind of block.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
By adding an extra argument to gfs2_trans_add_unrevoke we can now
specify an extent length of blocks to unrevoke. This means that
we only need to make one pass through the list for each extent
rather than each block. Currently the only extent length which
is used is 1, but that will change in the future.
Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
since its the only difference between this and gfs2_alloc_data
which is left. This will allow a future patch to merge these
two functions into one (i.e. one call to allocate both data
and metadata in a single extent in the future).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We don't need to keep track of when we last allocated data
and metadata separately since the only thing thats important
when searching for a free block is whether its free or not,
which is independent from what type of block it is.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There were three fields being used to keep track of the location
of the most recently allocated block for each inode. These have
been merged into a single field in order to better keep the
data and metadata for an inode close on disk, and also to reduce
the space required for storage.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is kind of trivial in the greater scheme of things, but
this removes three counters that AFAICT are never used.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch forms a pair with the previous patch which shrunk
di_height. Like that patch di_depth is renamed i_depth and moved
into struct gfs2_inode directly. Also the field goes from 16 bits
to 8 bits since it is also limited to a max value which is rather
small (17 in this case). In addition we also now validate the field
against this maximum value when its read in.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch further reduces GFS2's memory requirements by
eliminating the 64-bit version number fields in lieu of
a couple bits.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The functions in lm.c were just wrappers which were mostly
only used in one other file. By moving the functions to
the files where they are being used, they can be marked
static and also this will usually result in them being inlined
since they are often only used from one point in the code.
A couple of really trivial functions have been inlined by hand
into the function which called them as it makes the code clearer
to do that.
We also gain from one fewer function call in the glock lock and
unlock paths.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch further reduces the memory needs of GFS2 by
eliminating the gl_req_bh variable from struct gfs2_glock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There are a couple of routines which scan bitmaps where we can
mark the bitmaps const, plus a couple of call sites that can
be updated too.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The reason for doing this is to allow all the block mapping code
to share the same array. As a result we can remove two arguments
from lookup_metapath since they are now returned via the array.
We also add a function to drop all refs to buffer heads when we
are done with the metapath. The build_height function shares the
struct metapath, but currently still frees its own buffers, and
this will change in a future patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch reduces memory by replacing the int variable
gl_waiters2 by a single bit in the gl_flags.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch reduces the memory required by GFS2 by combining
the rd_flags and rg_flags (in core only).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch moves the gfs2_rgrpd structure to its own slab
memory. This makes it easier to control and monitor, and
yields less memory fragmentation.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch only wakes up the glock reclaim daemon if there is
actually something to be reclaimed.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch contains two small fixups that didn't fit elsewhere.
They are: (1) get rid of temp variable in find_metapath.
(2) Remove vestigial "ret" variable from gfs2_writepage_common.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch is performance related. When we're doing a log flush,
I noticed we were calling buf_lo_incore_commit twice: once for
data bufs and once for metadata bufs. Since this is the same
function and does the same thing in both cases, there should be
no reason to call it twice. Since we only need to call it once,
we can also make it faster by removing it from the generic "lops"
code and making it a stand-along static function.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I noticed that the latest change to i_height got rid of the
value from the inode dump. This patch adds it back.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removed the unnecessary parameter from function
gfs2_rlist_alloc. The parameter was always passed in as 0.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch improves the calculation of the tree height in order to reduce
the number of operations which are carried out on each call to gfs2_block_map.
In the common case, we now make a single comparison, rather than calculating
the required tree height from scratch each time. Also in the case that the
tree does need some extra height, we start from the current height rather from
zero when we work out what the new height ought to be.
In addition the di_height field is moved into the inode proper and reduced
in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes the call to gfs2_extent_map from gfs2_write_alloc_required,
instead we call gfs2_block_map directly. This results in fewer overall calls
to gfs2_block_map in the multi-block case.
Also, gfs2_extent_map is marked as deprecated so that people know that its
going away as soon as all the callers have been converted.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use iget_failed() in GFS2 to kill a failed inode.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gl_owner_pid field is used to get the lock owning task by its pid, so make
it in a proper manner, i.e. by using the struct pid pointer and pid_task()
function.
The pid_task() becomes exported for the gfs2 module.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gl_owner_pid field is used to get the holder task by its pid and check
whether the current is a holder, so make it in a proper manner, i.e. via the
struct pid * manipulations.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify page cache zeroing of segments of pages through 3 functions
zero_user_segments(page, start1, end1, start2, end2)
Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.
zero_user_segment(page, start, end)
Same for a single segment.
zero_user(page, start, length)
Length variant for the case where we know the length.
We remove the zero_user_page macro. Issues:
1. Its a macro. Inline functions are preferable.
2. The KM_USER0 macro is only defined for HIGHMEM.
Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.
Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.
Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.
Also extract the flushing of the caches to be outside of the kmap.
[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows gfs2 to perform journal recovery even if it is mounted
read-only. Strictly speaking, a read-only mount should not be writing to
the filesystem, but we do this only to perform journal recovery. A
read-only mount will fail if we don't recover the dirty journal. Also,
when gfs2 is used as a root filesystem, it will be mounted read-only
before being mounted read-write during the boot sequence. A failed
read-only mount will panic the machine during bootup.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I spotted this bug while I was digging around. Looks like it could cause
a lockup in some rare error condition.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There was a bug in the truncation/invalidation race path for
->page_mkwrite for gfs2. It ought to return 0 so that the effect is the
same as if the page was truncated at any of the other points at which
the page_lock is dropped. This will result in the restart of the whole
page fault path. If it was due to a real truncation (as opposed to an
invalidate because we let a glock go) then the ->fault path will pick
that up when it gets called again.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes a minor typo. Surprisingly, it still compiled.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of
an earlier version I got wrong). The idea here is to check if the write extends
past the last block in the file. If so, the function can save itself a lot of
time and trouble because it knows an allocate will be required. Benchmarks like
iozone should see better performance.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes a vestigial variable "i_spin" from the gfs2_inode
structure. This not only saves us memory (>300000 of these in memory
for the oom test) it also saves us time because we don't have to
spend time initializing it (i.e. slightly better performance).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.
In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Although the values were all being calculated correctly, there was a
race in the assert due to the way it was using atomic variables. This
changes the value we assert on so that we get the same effect by testing
a different variable. This prevents the assert triggering when it shouldn't.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes a couple of problems which affected the execution of files
on GFS2. The first is that there was a corner case where inodes were not
always uptodate at the point at which permissions checks were being carried
out, this was resulting in refusal of execute permission, but only on the
first lookup, subsequent requests worked correctly. The second was a problem
relating to incorrect updating of file sizes which was introduced with the
write_begin/end code for GFS2 a little while back.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
Here is a patch for the latest upstream GFS2 code:
The journal extent map needs to be initialized sooner than it
currently is. Otherwise failed mount attempts (e.g. not enough
journals, etc.) may panic trying to access the uninitialized list.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
To improve performance on NUMA, we use the VM's standard page
migration for writeback and ordered pages. Probably we could
also do the same for journaled data, but that would need a
careful audit of the code, so will be the subject of a later
patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is a small correction to my previously posted patch1.
It just changes a divide to a shift. It's faster and doesn't
introduce odd dependencies on 32-bit compiles.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>