Commit Graph

4870 Commits

Author SHA1 Message Date
Lachlan McIlroy
5180602e6f [XFS] remove unused filp from ioctl functions
SGI-PV: 959140
SGI-Modid: xfs-linux-melb:xfs-kern:27712a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:46 +11:00
Lachlan McIlroy
a3227fb996 [XFS] mraccessf & mrupdatef are supposed to be the "flags" versions of the
functions, but they

a) ignore the flags parameter completely, and b) are never called
directly, only via the flag-less defines anyway

So, drop the #define indirection, and rename mraccessf to mraccess, etc.

SGI-PV: 959138
SGI-Modid: xfs-linux-melb:xfs-kern:27711a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:40 +11:00
Lachlan McIlroy
1f9b3b64d4 [XFS] remove unused xflags parameter from sync routines
SGI-PV: 959137
SGI-Modid: xfs-linux-melb:xfs-kern:27710a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:33 +11:00
Lachlan McIlroy
1c91ad3aed [XFS] fix sparse warning in xfs_da_btree.c
SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:27702a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:27 +11:00
Lachlan McIlroy
e5eb7f202b [XFS] use struct kvec in struct uio
SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:27701a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:21 +11:00
David Chinner
03135cf726 [XFS] Fix UP build breakage due to undefined m_icsb_mutex.
SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27692a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:15 +11:00
David Chinner
20b642858b [XFS] Reduction global superblock lock contention near ENOSPC.
The existing per-cpu superblock counter code uses the global superblock
spin lock when we approach ENOSPC for global synchronisation. On larger
machines than this code was originally tested on this can still get
catastrophic spinlock contention due increasing rebalance frequency near
ENOSPC.

By introducing a sleeping lock that is used to serialise balances and
modifications near ENOSPC we prevent contention from needlessly from
wasting the CPU time of potentially hundreds of CPUs.

To reduce the number of balances occuring, we separate the need rebalance
case from the slow allocate case. Now, a counter running dry will trigger
a rebalance during which counters are disabled. Any thread that sees a
disabled counter enters a different path where it waits on the new mutex.
When it gets the new mutex, it checks if the counter is disabled. If the
counter is disabled, then we _know_ that we have to use the global counter
and lock and it is safe to do so immediately. Otherwise, we drop the mutex
and go back to trying the per-cpu counters which we know were re-enabled.

SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27612a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:09 +11:00
Eric Sandeen
804195b63a [XFS] Get rid of old 5.3/6.1 v1 log items. Cleanup patch sent in by Eric
Sandeen.

SGI-PV: 958736
SGI-Modid: xfs-linux-melb:xfs-kern:27596a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:02 +11:00
David Chinner
7989cb8ef5 [XFS] Keep stack usage down for 4k stacks by using noinline.
gcc-4.1 and more recent aggressively inline static functions which
increases XFS stack usage by ~15% in critical paths. Prevent this from
occurring by adding noinline to the STATIC definition.

Also uninline some functions that are too large to be inlined and were
causing problems with CONFIG_FORCED_INLINING=y.

Finally, clean up all the different users of inline, __inline and
__inline__ and put them under one STATIC_INLINE macro. For debug kernels
the STATIC_INLINE macro uninlines those functions.

SGI-PV: 957159
SGI-Modid: xfs-linux-melb:xfs-kern:27585a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:56 +11:00
David Chinner
5e6a07dfe4 [XFS] Current usage of buftarg flags is incorrect.
The {test,set,clear}_bit() operations take a bit index for the bit to
operate on. The XBT_* flags are defined as bit fields which is incorrect,
not to mention the way the bit fields are enumerated is broken too. This
was only working by chance.

Fix the definitions of the flags and make the code using them use the
{test,set,clear}_bit() operations correctly.

SGI-PV: 958639
SGI-Modid: xfs-linux-melb:xfs-kern:27565a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:49 +11:00
Lachlan McIlroy
dc74eaad8c [XFS] Prevent buffer overrun in cmn_err().
The message buffer used by cmn_err() is only 256 bytes and some CXFS
messages were exceeding this length. Since we were using vsprintf() and
not checking for buffer overruns we were clobbering memory beyond the
buffer. The size of the buffer has been increased to 1024 bytes so we can
capture these larger messages and we are now using vsnprintf() to prevent
overrunning the buffer size.

SGI-PV: 958599
SGI-Modid: xfs-linux-melb:xfs-kern:27561a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Geoffrey Wehrman <gwehrman@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:38 +11:00
David Chinner
585e6d8856 [XFS] Fix a synchronous buftarg flush deadlock when freezing.
At the last stage of a freeze, we flush the buftarg synchronously over and
over again until it succeeds twice without skipping any buffers.

The delwri list flush skips pinned buffers, but tries to flush all others.
It removes the buffers from the delwri list, then tries to lock them one
at a time as it traverses the list to issue the I/O. It holds them locked
until we issue all of the I/O and then unlocks them once we've waited for
it to complete.

The problem is that during a freeze, the filesystem may still be doing
stuff - like flushing delalloc data buffers - in the background and hence
we can be trying to lock buffers that were on the delwri list at the same
time. Hence we can get ABBA deadlocks between threads doing allocation and
the buftarg flush (freeze) thread.

Fix it by skipping locked (and pinned) buffers as we traverse the delwri
buffer list.

SGI-PV: 957195
SGI-Modid: xfs-linux-melb:xfs-kern:27535a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:32:29 +11:00
David Chinner
dac61f521b [XFS] Make quiet mounts quiet
The XFS quiet mount logic was inverted making quiet mounts noisy and vice
versa. Fix it.

SGI-PV: 958469
SGI-Modid: xfs-linux-melb:xfs-kern:27520a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:27:56 +11:00
Greg Ungerer
72613e5f44 [PATCH] uclinux: correctly remap bin_fmtflat exe allocated mem regions
remap() the region we get from mmap() to mark the fact that we are
using all of the available slack space. Any slack space is used
to form a simple brk region, and potentially more stack space than
requested at load time.

Any searches of the vma chain may well fail looking for
stack (and especially arg) addresses if the remaping is not done.
The simplest example is /proc/<pid>/cmdline, since the args
are pretty much always at the top of the data/bss/stack region.

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 10:45:33 -08:00
Adrian Bunk
835d90c421 [PATCH] v9fs_vfs_mkdir(): fix a double free
Fix a double free of "dfid" introduced by commit
da977b2c7e and spotted by the Coverity
checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:25:47 -08:00
Ken Chen
6649a38632 [PATCH] hugetlb: preserve hugetlb pte dirty state
__unmap_hugepage_range() is buggy that it does not preserve dirty state of
huge_pte when unmapping hugepage range.  It causes data corruption in the
event of dop_caches being used by sys admin.  For example, an application
creates a hugetlb file, modify pages, then unmap it.  While leaving the
hugetlb file alive, comes along sys admin doing a "echo 3 >
/proc/sys/vm/drop_caches".

drop_pagecache_sb() will happily free all pages that aren't marked dirty if
there are no active mapping.  Later when application remaps the hugetlb
file back and all data are gone, triggering catastrophic flip over on
application.

Not only that, the internal resv_huge_pages count will also get all messed
up.  Fix it up by marking page dirty appropriately.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: "Nish Aravamudan" <nish.aravamudan@gmail.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: <stable@kernel.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:25:46 -08:00
Evgeniy Dushistov
f336953bfd [PATCH] ufs: restore back support of openstep
This is a fix of regression, which triggered by ~2.6.16.

Patch with name ufs-directory-and-page-cache-from-blocks-to-pages.patch: in
additional to conversation from block to page cache mechanism added new
checks of directory integrity, one of them that directory entry do not
across directory chunks.

But some kinds of UFS: OpenStep UFS and Apple UFS (looks like these are the
same filesystems) have different directory chunk size, then common
UFSes(BSD and Solaris UFS).

So this patch adds ability to works with variable size of directory chunks,
and set it for ufstype=openstep to right size.

Tested on darwin ufs.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:25:46 -08:00
Al Viro
58addbffdd [PATCH] dlm: use kern_recvmsg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:06 -08:00
David S. Miller
9783e1df7a Merge branch 'HEAD' of master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
Conflicts:

	crypto/Kconfig
2007-02-08 15:25:18 -08:00
Linus Torvalds
5986a2ec35 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2: (22 commits)
  configfs: Zero terminate data in configfs attribute writes.
  [PATCH] ocfs2 heartbeat: clean up bio submission code
  ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
  [PATCH] ocfs2: drop INET from Kconfig, not needed
  ocfs2_dlm: Add timeout to dlm join domain
  ocfs2_dlm: Silence some messages during join domain
  ocfs2_dlm: disallow a domain join if node maps mismatch
  ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres
  ocfs2: Binds listener to the configured ip address
  ocfs2_dlm: Calling post handler function in assert master handler
  ocfs2: Added post handler callable function in o2net message handler
  ocfs2_dlm: Cookies in locks not being printed correctly in error messages
  ocfs2_dlm: Silence a failed convert
  ocfs2_dlm: wake up sleepers on the lockres waitqueue
  ocfs2_dlm: Dlm dispatch was stopping too early
  ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres
  ocfs2_dlm: Flush dlm workqueue before starting to migrate
  ocfs2_dlm: Fix migrate lockres handler queue scanning
  ocfs2_dlm: Make dlmunlock() wait for migration to complete
  ocfs2_dlm: Fixes race between migrate and dirty
  ...
2007-02-08 10:37:22 -08:00
Joel Becker
ff05d1c464 configfs: Zero terminate data in configfs attribute writes.
Attributes in configfs are text files.  As such, most handlers expect to be
able to call functions like simple_strtoul() without checking the bounds
of the buffer.  Change the call to zero terminate the buffer before calling
the client's ->store() method.  This does reduce the attribute size from
PAGE_SIZE to PAGE_SIZE-1.

Also, change get_zeroed_page() to alloc_page(), as we are handling the
termination.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:17:08 -08:00
Philipp Reisner
b559292e06 [PATCH] ocfs2 heartbeat: clean up bio submission code
As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
predictable manner.

That is not true, especially for devices with own merge_bvec_fn().

Therefore OCFS2's heartbeat code is very likely to fail on such devices.

Move the bio_put() call into the bio's bi_end_io() function. This makes the
whole idea of trying to predict the behaviour of bio_add_page() unnecessary.
Removed compute_max_sectors() and o2hb_compute_request_limits().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:58 -08:00
Zhen Wei
925037bcba ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
When there is a lot of multithreaded I/O usage, two threads can collide
while sending out a message to the other nodes. This is due to the lack of
locking between threads while sending out the messages.

When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
itself by acquiring a lock at invocation by calling lock_sock().
tcp_sendmsg() then loops over the buffers in the iovec, allocating
associated sk_buff's and cache pages for use in the actual send. As it does
so, it pushes the data out to tcp for actual transmission. However, if one
of those allocation fails (because a large number of large sends is being
processed, for example), it must wait for memory to become available. It
does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
contains the call to release_sock().

The following patch adds a lock to the socket container in order to
properly serialize outbound requests.

From: Zhen Wei <zwei@novell.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:11 -08:00
Randy Dunlap
f71aa8a55a [PATCH] ocfs2: drop INET from Kconfig, not needed
OCFS2: drop 'depends on INET' since local mounts are now allowed.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:14:27 -08:00
Sunil Mushran
0dd82141b2 ocfs2_dlm: Add timeout to dlm join domain
Currently the ocfs2 dlm has no timeout during dlm join domain. While this is
not a problem in normal operation, this does become an issue if, say, the
other node is refusing to let the node join the domain because of a stuck
recovery. This patch adds a 90 sec timeout.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:10:39 -08:00
Sunil Mushran
e4968476a9 ocfs2_dlm: Silence some messages during join domain
These messages can easily be activated using the mlog infrastructure
and don't need to be enabled by default.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:10:14 -08:00
Srinivas Eeda
1faf289454 ocfs2_dlm: disallow a domain join if node maps mismatch
There is a small window where a joining node may not see the node(s) that
just died but are still part of the domain. To fix this, we must disallow
join requests if the joining node has a different node map.

A new field node_map is added to dlm_query_join_request to send the current
nodes nodemap along with join request. On the receiving end the nodes that
are part of the cluster verifies if this new node sees all the nodes that
are still part of the cluster. They disallow the join if the maps mismatch.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:09:14 -08:00
Sunil Mushran
f3f854648d ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres
Eventhough the set refmap bit message is sent before the clear refmap
message, currently there is no guarentee that the set message will be
handled before the clear. This patch prevents the clear refmap to be
processed while the node is sending assert master messages to other
nodes. (The set refmap message is sent as a response to the assert
master request).

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:08:13 -08:00
Sunil Mushran
ab81afd30b ocfs2: Binds listener to the configured ip address
This patch binds the o2net listener to the configured ip address
instead of INADDR_ANY for security. Fixes oss.oracle.com bugzilla#814.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:49 -08:00
Kurt Hackel
3b8118cffa ocfs2_dlm: Calling post handler function in assert master handler
This patch prevents the dlm from sending the clear refmap message
before the set refmap. We use the newly created post function handler
routine to accomplish the task.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:24 -08:00
Kurt Hackel
d74c9803a9 ocfs2: Added post handler callable function in o2net message handler
Currently o2net allows one handler function per message type. This
patch adds the ability to call another function to be called after
the handler has returned the message to the other node.

Handlers are now given the option of returning a context (in the form of a
void **) which will be passed back into the post message handler function.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:56 -08:00
Kurt Hackel
74aa25856c ocfs2_dlm: Cookies in locks not being printed correctly in error messages
The dlm encodes the node number and a sequence number in the lock cookie.
It also stores the cookie in the lockres in the big endian format to avoid
swapping 8 bytes on each lock request. The bug here was that it was assuming
the cookie to be in the cpu format when decoding it for printing the error
message. This patch swaps the bytes before the print.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:24 -08:00
Kurt Hackel
90aaaf1c23 ocfs2_dlm: Silence a failed convert
When the lockres is in migrate or recovery state, all convert requests
are denied with the appropriate error status that is handled on the
requester node. This patch silences the erroneous error message printed
on the master node.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:05:48 -08:00
Kurt Hackel
a6fa36402a ocfs2_dlm: wake up sleepers on the lockres waitqueue
The dlm was not waking up threads waiting on the lockres wait queue,
waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS
and the DLM_LOCK_RES_MIGRATING states.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:05:19 -08:00
Kurt Hackel
28b72d9c92 ocfs2_dlm: Dlm dispatch was stopping too early
dlm_dispatch_work was not processing the queued up tasks at
the first sign of the node leaving the domain leading to not
only incompleted tasks but also a mismatch in the dlm refcnt.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:04:27 -08:00
Kurt Hackel
50635f15b3 ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:03:42 -08:00
Kurt Hackel
1cd04dbe33 ocfs2_dlm: Flush dlm workqueue before starting to migrate
This is to prevent the condition in which a previously queued
up assert master asserts after we start the migration. Now
migration ensures the workqueue is flushed before proceeding
with migrating the lock to another node. This condition is
typically encountered during parallel umounts.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:03:02 -08:00
Kurt Hackel
e17e75ecb8 ocfs2_dlm: Fix migrate lockres handler queue scanning
The migrate lockres handler was only searching for its lock on
migrated lockres on the expected queue. This could be problematic
as the new master could have also issued a convert request
during the migration and thus moved the lock to the convert queue.
We now search for the lock on all three queues.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:02:40 -08:00
Kurt Hackel
71ac106243 ocfs2_dlm: Make dlmunlock() wait for migration to complete
dlmunlock() was not waiting for migration to complete before releasing locks
on locally mastered locks.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:01:49 -08:00
Kurt Hackel
ddc09c8dda ocfs2_dlm: Fixes race between migrate and dirty
dlmthread was removing lockres' from the dirty list
and resetting the dirty flag before shuffling the list.
This patch retains the dirty state flag until the lists
are shuffled.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:00:57 -08:00
Adrian Bunk
faf0ec9f13 [PATCH] fs/ocfs2/dlm/: make functions static
This patch makes some needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:31 -08:00
Kurt Hackel
ba2bf21851 ocfs2_dlm: fix cluster-wide refcounting of lock resources
This was previously broken and migration of some locks had to be temporarily
disabled. We use a new (and backward-incompatible) set of network messages
to account for all references to a lock resources held across the cluster.
once these are all freed, the master node may then free the lock resource
memory once its local references are dropped.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:07 -08:00
Eric W. Biederman
b592fcfe7f sysfs: Shadow directory support
The problem.  When implementing a network namespace I need to be able
to have multiple network devices with the same name.  Currently this
is a problem for /sys/class/net/*. 

What I want is a separate /sys/class/net directory in sysfs for each
network namespace, and I want to name each of them /sys/class/net.

I looked and the VFS actually allows that.  All that is needed is
for /sys/class/net to implement a follow link method to redirect
lookups to the real directory you want. 

Implementing a follow link method that is sensitive to the current
network namespace turns out to be 3 lines of code so it looks like a
clean approach.  Modifying sysfs so it doesn't get in my was is a bit
trickier. 

I am calling the concept of multiple directories all at the same path
in the filesystem shadow directories.  With the directory entry really
at that location the shadow master. 

The following patch modifies sysfs so it can handle a directory
structure slightly different from the kobject tree so I can implement
the shadow directories for handling /sys/class/net/.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:14 -08:00
Oliver Neukum
82244b169e sysfs: error handling in sysfs, fill_read_buffer()
if a driver returns an error in fill_read_buffer(), the buffer will be
marked as filled. Subsequent reads will return eof. But there is
no data because of an error, not because it has been read.
Not marking the buffer filled is the obvious fix.

Signed-off-by: Oliver Neukum <oliver@neukum.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:13 -08:00
Mariusz Kozlowski
f750653670 sysfs: kobject_put cleanup
This patch removes redundant argument checks for kobject_put().

Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:13 -08:00
Frederik Deweerdt
d3fc373ac5 sysfs: suppress lockdep warnings
Lockdep issues the following warning:
[    9.064000] =============================================
[    9.064000] [ INFO: possible recursive locking detected ]
[    9.064000] 2.6.20-rc3-mm1 #3
[    9.064000] ---------------------------------------------
[    9.064000] init/1 is trying to acquire lock:
[    9.064000]  (&sysfs_inode_imutex_key){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
[    9.064000]
[    9.064000] but task is already holding lock:
[    9.064000]  (&sysfs_inode_imutex_key){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
[    9.065000]
[    9.065000] other info that might help us debug this:
[    9.065000] 2 locks held by init/1:
[    9.065000]  #0:  (tty_mutex){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
[    9.065000]  #1:  (&sysfs_inode_imutex_key){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
[    9.065000]
[    9.065000] stack backtrace:
[    9.065000]  [<c010390d>] show_trace_log_lvl+0x1a/0x30
[    9.066000]  [<c0103935>] show_trace+0x12/0x14
[    9.066000]  [<c0103a2f>] dump_stack+0x16/0x18
[    9.066000]  [<c0138cb8>] print_deadlock_bug+0xb9/0xc3
[    9.066000]  [<c0138d17>] check_deadlock+0x55/0x5a
[    9.066000]  [<c013a953>] __lock_acquire+0x371/0xbf0
[    9.066000]  [<c013b7a9>] lock_acquire+0x69/0x83
[    9.066000]  [<c03e6b7e>] __mutex_lock_slowpath+0x75/0x2d1
[    9.066000]  [<c03e6afc>] mutex_lock+0x1c/0x1f
[    9.066000]  [<c01b249c>] sysfs_drop_dentry+0xb1/0x133
[    9.066000]  [<c01b25d1>] sysfs_hash_and_remove+0xb3/0x142
[    9.066000]  [<c01b30ed>] sysfs_remove_file+0xd/0x10
[    9.067000]  [<c02849e0>] device_remove_file+0x23/0x2e
[    9.067000]  [<c02850b2>] device_del+0x188/0x1e6
[    9.067000]  [<c028511b>] device_unregister+0xb/0x15
[    9.067000]  [<c0285318>] device_destroy+0x9c/0xa9
[    9.067000]  [<c0261431>] vcs_remove_sysfs+0x1c/0x3b
[    9.067000]  [<c0267a08>] con_close+0x5e/0x6b
[    9.067000]  [<c02598f2>] release_dev+0x4c4/0x6e5
[    9.067000]  [<c0259faa>] tty_release+0x12/0x1c
[    9.067000]  [<c0174872>] __fput+0x177/0x1a0
[    9.067000]  [<c01746f5>] fput+0x3b/0x41
[    9.068000]  [<c0172ee1>] filp_close+0x36/0x65
[    9.068000]  [<c0172f73>] sys_close+0x63/0xa4
[    9.068000]  [<c0102a96>] sysenter_past_esp+0x5f/0x99
[    9.068000]  =======================

This is due to sysfs_hash_and_remove() holding dir->d_inode->i_mutex
before calling sysfs_drop_dentry() which calls orphan_all_buffers()
which in turn takes node->i_mutex.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:13 -08:00
Oliver Neukum
94bebf4d1b Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
This patch prevents a race between IO and removing a file from sysfs.
It introduces a list of sysfs_buffers associated with a file at the inode.
Upon removal of a file the list is walked and the buffers marked orphaned.
IO to orphaned buffers fails with -ENODEV. The driver can safely free
associated data structures or be unloaded.

Signed-off-by: Oliver Neukum <oliver@neukum.name>
Acked-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:13 -08:00
Cornelia Huck
c744aeae9d driver core: Allow device_move(dev, NULL).
If we allow NULL as the new parent in device_move(), we need to make sure
that the device is placed into the same place as it would if it was
newly registered:

- Consider the device virtual tree. In order to be able to reuse code,
  setup_parent() has been tweaked a bit.
- kobject_move() can fall back to the kset's kobject.
- sysfs_move_dir() uses the sysfs root dir as fallback.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:11 -08:00
Linus Torvalds
5331be0905 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: Remove incorrect kgdb define
  JFS: call io_schedule() instead of schedule() to avoid deadlock
  JFS: Add lockdep annotations
  JFS: Avoid BUG() on a damaged file system
2007-02-07 08:10:48 -08:00
Linus Torvalds
d3f8fd765e Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits)
  [GFS2] make gfs2_writepages() static
  [GFS2] Unlock page on prepare_write try lock failure
  [GFS2] nfsd readdirplus assertion failure
  [DLM] fix softlockup in dlm_recv
  [DLM] zero new user lvbs
  [DLM/GFS2] indent help text
  [GFS2] Fix unlink deadlocks
  [GFS2] Put back semaphore to avoid umount problem
  [GFS2] more CURRENT_TIME_SEC
  [GFS2/DLM] fix GFS2 circular dependency
  [GFS2/DLM] use sysfs
  [GFS2] make lock_dlm drop_count tunable in sysfs
  [GFS2] increase default lock limit
  [GFS2] Fix list corruption in lops.c
  [GFS2] Fix recursive locking attempt with NFS
  [DLM] can miss clearing resend flag
  [DLM] saved dlm message can be dropped
  [DLM] Make sock_sem into a mutex
  [GFS2] Fix typo in glock.c
  [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2
  ...
2007-02-07 08:09:00 -08:00
Adrian Bunk
a2cf822274 [GFS2] make gfs2_writepages() static
On Mon, Jan 29, 2007 at 08:45:28PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc6-mm2:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly global gfs2_writepages() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-07 10:48:48 -05:00
Steven Whitehouse
2d72e7101c [GFS2] Unlock page on prepare_write try lock failure
When the try lock of the glock failed in prepare_write we were
incorrectly exiting this function with the page still locked.
This was resulting in further I/O to this page hanging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-07 10:25:59 -05:00
Linus Torvalds
dda2ac15d2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: ocfs2_link() journal credits update
2007-02-06 14:59:27 -08:00
Linus Torvalds
768c242b30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Minor cleanup
  [CIFS] Missing free in error path
  [CIFS] Reduce cifs stack space usage
  [CIFS] lseek polling returned stale EOF
2007-02-06 14:42:20 -08:00
Herbert Xu
f1ddcaf339 [CRYPTO] api: Remove deprecated interface
This patch removes the old cipher interface and related code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-02-07 09:21:00 +11:00
Steve French
a850790f6c [CIFS] Minor cleanup
Missing tab.  Missing entry in changelog

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-02-06 20:43:30 +00:00
Wendy Cheng
549ae0ac3d [GFS2] nfsd readdirplus assertion failure
Glock assertion failure found in '07 NFS connectathon. One of the NFSDs
is doing a "readdirplus" procedure call. It passes the logic into
gfs2_readdir() where it obtains its directory inode glock. This is then
followed by filehandle construction that invokes lookup code. It hits
the assertion failure while trying to obtain the inode glock again
inside gfs2_drevalidate().

This patch bypasses the recursive glock call if caller already holds the
lock.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-06 11:36:01 -05:00
Patrick Caulfield
a34fbc6363 [DLM] fix softlockup in dlm_recv
This patch stops the dlm_recv workqueue from busy-waiting when a node
disconnects. This can cause soft lockup errors on debug systems and bad
performance generally.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:27 -05:00
David Teigland
62a0f62369 [DLM] zero new user lvbs
A new lvb for a userland lock wasn't being initialized to zero.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:24 -05:00
Randy Dunlap
9beeb9f3c5 [DLM/GFS2] indent help text
Indent help text as expected.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:20 -05:00
Russell Cattelan
ddee76089c [GFS2] Fix unlink deadlocks
Move the glock acquisition to outside of the transactions.

Lock odering must be preserved in order to prevent ABBA
deadlocks. The current gfs2_change_nlink code would tries
to grab the glock after having started a transaction and thus is holding
the log lock. This is inconsistent with other code paths in
gfs that grab the resource group glock prior to staring
a tranactions.

One problem with this fix is that the resource group
lock is always grabbed now even if the inode still has
ref count and can not be marked for unlink.

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:17 -05:00
Steven Whitehouse
61be084efc [GFS2] Put back semaphore to avoid umount problem
Dave Teigland fixed this bug a while back, but I managed to mistakenly
remove the semaphore during later development. It is required to avoid
the list of inodes changing during an invalidate_inodes call. I have
made it an rwsem since the read side will be taken frequently during
normal filesystem operation. The write site will only happen during
umount of the file system.

Also the bug only triggers when using the DLM lock manager and only then
under certain conditions as its timing related.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2007-02-05 13:38:14 -05:00
Eric Sandeen
bbb28ab759 [GFS2] more CURRENT_TIME_SEC
Whoops, quilt user error, missed this one in the previous patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:11 -05:00
Adrian Bunk
0011727785 [GFS2/DLM] fix GFS2 circular dependency
On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote:
> Andrew Morton napsal(a):
> >Temporarily at
> >
> >	http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/
>
> Unable to select IPV6. Menuconfig doesn't offer it when INET is selected.
> When it's not it appears in the menu, but after state change it gets away.
> The same behaviour in xconfig, gconfig.
>
> $ mkdir ../a/tst
> $ make O=../a/tst menuconfig
>   HOSTCC  scripts/basic/fixdep
> [...]
>   HOSTLD  scripts/kconfig/mconf
> scripts/kconfig/mconf arch/i386/Kconfig
> Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS
> OCFS2_FS INET
>
> Maybe this is the problem?

Yes, patch below.

> regards,

cu
Adrian

<--  snip  -->

This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM
and DLM depend on instead of select SYSFS.

Since SYSFS depends on EMBEDDED this change shouldn't cause any problems
for users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:08 -05:00
Randy Dunlap
67f55897ee [GFS2/DLM] use sysfs
With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build
fails with:

WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use
kernel_subsys, they should either DEPEND on it or SELECT it.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:05 -05:00
David Teigland
ee32e4f3d3 [GFS2] make lock_dlm drop_count tunable in sysfs
We want to be able to change or disable the default drop_count (number at
which the dlm asks gfs to limit the the number of locks it's holding).
Add it to the collection of sysfs tunables for an fs.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:01 -05:00
David Teigland
2f708649ba [GFS2] increase default lock limit
Increase the number of locks at which point the dlm begins asking gfs to
reduce its lock usage.  The default value is largely arbitrary, but the
current value of 50,000 ends up limiting performance unnecessarily for too
many users.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:59 -05:00
Steven Whitehouse
8bd9572769 [GFS2] Fix list corruption in lops.c
The patch below appears to fix the list corruption that we are seeing on
occasion. Although the transaction structure is private to a single
thread, when the queued structures are dismantled during an in-core
commit, its possible for a different thread to be trying to add the same
structure to another, new, transaction at the same time.

To avoid this, this patch takes the log spinlock during this operation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:56 -05:00
Steven Whitehouse
d7c103d0bd [GFS2] Fix recursive locking attempt with NFS
In certain cases, its possible for NFS to call the lookup code while
holding the glock (when doing a readdirplus operation) so we need to
check for that and not try and lock the glock twice. This also fixes a
typo in a previous NFS related GFS2 patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:53 -05:00
David Teigland
b790c3b7c3 [DLM] can miss clearing resend flag
A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.

- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
  it optimizes the lookup into a request and sends a request
  reply back
- the request reply is saved on the requestqueue to be processed
  after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
  so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
  which removes the lkb off the waitesr list, _without_ clearing
  the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
  with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
  flag set
- the lkb is on the waiters list again, now for unlock, when recovery
  occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
  set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
  the waiters list because it has the RESEND flag set, then reports
  an error because unlocks are never supposed to be handled in
  recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
  the waiters list because recover_waiters_post() has wrongly
  removed it.
- the unlock operation has been lost, and we're left with a
  stray granted lock
- unmount spins waiting for the unlock to complete

The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.

The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:50 -05:00
David Teigland
8fd3a98f2c [DLM] saved dlm message can be dropped
dlm_receive_message() returns 0 instead of returning 'error'.  What would
happen is that process_requestqueue would take a saved message off the
requestqueue and call receive_message on it.  receive_message would then
see that recovery had been aborted, set error to EINTR, and 'goto out',
expecting that the error would be returned.  Instead, 0 was always
returned, so process_requestqueue would think that the message had been
processed and delete it instead of saving it to process next time.  This
means the message (usually an unlock in my tests) would be lost.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:47 -05:00
Patrick Caulfield
f1f1c1ccf7 [DLM] Make sock_sem into a mutex
Now that there can be multiple dlm_recv threads running we need to prevent two
recvs running for the same connection - it's unlikely but it can happen and it
causes message corruption.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:44 -05:00
Steven Whitehouse
d043e1900c [GFS2] Fix typo in glock.c
This is a one letter typo fix in glock.c, spotted by Rob Kenna.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:41 -05:00
Eric Sandeen
ddfe062783 [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2
I was looking something else up and came across this...

I don't honestly have a good reason to change it other than to make it
like every other Linux filesystem in this regard.  ;-)  It doesn't
functionally change anything, but makes some lines shorter. :)

I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
not in timespec_t format, and only stores second resolutions?  Seems like
you're halfway to sub-second resolutions already.

I suppose if that gets implemented then all of the below should
instead be CURRENT_TIME not CURRENT_TIME_SEC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:38 -05:00
Steven Whitehouse
90101c3186 [GFS2] Compile fix for glock.c
This one liner got missed from the previous patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:35 -05:00
Steven Whitehouse
12132933c4 [GFS2] Remove queue_empty() function
This function is not longer required since we do not do recursive
locking in the glock layer. As a result all its callers can be
replaceed with list_empty() calls.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:32 -05:00
Patrick Caulfield
bd44e2b007 [DLM] fix lowcomms receiving
This patch fixes a bug whereby data on a newly accepted connection would be
ignored if it arrived soon after the accept.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:29 -05:00
Steven Whitehouse
b5d32bead1 [GFS2] Tidy up glops calls
This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:26 -05:00
Patrick Caulfield
f2f5095f9e [DLM] lowcomms tidy
This patch removes some redundant fields from the connection structure and adds
some lockdep annotation to remove spurious warnings.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:23 -05:00
Steven Whitehouse
1c0f4872dc [GFS2] Remove local exclusive glock mode
Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.

Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:20 -05:00
Steven Whitehouse
6bd9c8c2fb [GFS2] Remove unused go_callback operation
This is never used, so we might as well remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:17 -05:00
Steven Whitehouse
e5dab552c8 [GFS2] Remove the "greedy" function from glock.[ch]
The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:14 -05:00
Steven Whitehouse
fee852e374 [GFS2] Shrink gfs2_inode memory by half
Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:11 -05:00
Steven Whitehouse
330005c2b2 [GFS2] Remove max_atomic_write tunable
This removes an unused sysfs tunable parameter.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:08 -05:00
Steven Whitehouse
3699e3a44b [GFS2] Clean up/speed up readdir
This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:04 -05:00
Steven Whitehouse
a8d638e30e [GFS2] Add writepages for "data=writeback" mounts
It occurred to me that although a gfs2 specific writepages for ordered
writes and journaled data would be tricky, by hooking writepages only
for "data=writeback" mounts we could take advantage of not needing
buffer heads (we don't use them on the read side, nor have we for some
time) and create much larger I/Os for the block layer.

Using blktrace both before and after, its possible to see that for large
I/Os, most of the requests generated through writepages are now 1024
sectors after this patch is applied as opposed to 8 sectors before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:01 -05:00
David Teigland
222d396092 [DLM] fix master recovery
If master recovery happens on an rsb in one recovery sequence, then that
sequence is aborted before lock recovery happens, then in the next
sequence, we rely on the previous master recovery (which may now be
invalid due to another node ignoring a lookup result) and go on do to the
lock recovery where we get stuck due to an invalid master value.

 recovery cycle begins: master of rsb X has left
 nodes A and B send node C an rcom lookup for X to find the new master
 C gets lookup from B first, sets B as new master, and sends reply back to B
 C gets lookup from A next, and sends reply back to A saying B is master
 A gets lookup reply from C and sets B as the new master in the rsb
 recovery cycle on A, B and C is aborted to start a new recovery
 B gets lookup reply from C and ignores it since there's a new recovery
 recovery cycle begins: some other node has joined
 B doesn't think it's the master of X so it doesn't rebuild it in the directory
 C looks up the master of X, no one is master, so it becomes new master
 B looks up the master of X, finds it's C
 A believes that B is the master of X, so it sends its lock to B
 B sends an error back to A
 A resends
 this repeats forever, the incorrect master value on A is never corrected

The fix is to do master recovery on an rsb that still has the NEW_MASTER
flag set from an earlier recovery sequence, and therefore didn't complete
lock recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:58 -05:00
David Teigland
a1bc86e6bd [DLM] fix user unlocking
When a user process exits, we clear all the locks it holds.  There is a
problem, though, with locks that the process had begun unlocking before it
exited.  We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD.  To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding.  We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:55 -05:00
Patrick Caulfield
1d6e8131cf [DLM] Use workqueues for dlm lowcomms
This patch converts the DLM TCP lowcomms to use workqueues rather than using its
own daemon functions. Simultaneously removing a lot of code and making it more
scalable on multi-processor machines.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:52 -05:00
Adrian Bunk
03dc6a538e [GFS2] make gfs2_change_nlink_i() static
On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly globlal gfs2_change_nlink_i() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:49 -05:00
Robert Peterson
7083146564 [GFS2] gfs2 knows of directories which it chooses not to display
This is for Red Hat bugzilla bug bz #222302:

Moving a virtual IP from node to node between two NFS-over-GFS2
servers was causing one of the GFS2 servers to become confused and
reference a deleted inode.  The problem was due to vfs dentries that did
not reference the gfs2_dops and therefore didn't call the gfs2 revalidate
code to revalidate a dentry after a directory had been deleted & recreated.
This patch is a crosswrite from a RHEL4 bug found in GFS1 as
bz #190756 and it is against the latest -nmw git tree.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:46 -05:00
David Teigland
d200778e12 [DLM] expose dlm_config_info fields in configfs
Make the dlm_config_info values readable and writeable via configfs
entries.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:43 -05:00
David Teigland
99fc64874a [DLM] add config entry to enable log_debug
Add a new dlm_config_info field to enable log_debug output and change
log_debug() to use it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:40 -05:00
David Teigland
68c817a1c4 [DLM] rename dlm_config_info fields
Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
can use macros to add configfs functions to access them (in a later
patch).  No functional changes in this patch, just naming changes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:37 -05:00
David Teigland
8ec6886748 [DLM] change some log_error to log_debug
Some common, non-error messages should use log_debug instead of log_error
so they can be turned off.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:34 -05:00
S. Wendy Cheng
87d21e07f3 [GFS2] Fix gfs2_rename deadlock
Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.

This fixes Red Hat bugzilla #221237

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:31 -05:00
Russell Cattelan
6c93fd1e57 [GFS2] BZ 217008 fsfuzzer fix.
Update the quilt header comments to match the
code changes.

Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.

This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.html

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:28 -05:00
Steven Whitehouse
49686f7106 [GFS2] Fix ordering of page disposal vs. glock_dq
In case of unlinked files with dirty pages GFS2 wasn't clearing
the pages in quite the right order. This patch clears the pages
earlier (before the qlock_dq) to avoid the situation that the
release of the glock results in attempting to write back data that
has already been deallocated.

This fixes Red Hat bugzilla: #220117

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:24 -05:00
Patrick Caulfield
4edde74eed [DLM] Fix spin lock already unlocked bug
I just noticed this message when testing some other changes I'd made to
lowcomms (to use workqueues) but the problem seems to be in the current
git trees too. I'm amazed no-one has seen it.

    BUG: spinlock already unlocked on CPU#1, dlm_recoverd/16868

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:21 -05:00
Patrick Caulfield
3fb4a251fe [DLM] Fix schedule() calls
I was a little over-enthusiastic turning schedule() calls int cond_sched() when fixing the DLM for Andrew Morton.

These four should really be calls to schedule() or the dlm can busy-wait.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:18 -05:00