Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot,
during which clients may reclaim locks from the previous server instance, but
may not acquire new locks.
Currently the lockd and nfsd enforce grace periods of different lengths. This
may cause problems when we reboot a server with both v2/v3 and v4 clients.
For example, if the lockd grace period is shorter (as is likely the case),
then a v3 client might acquire a new lock that conflicts with a lock already
held (but not yet reclaimed) by a v4 client.
This patch calculates a lease time that lockd and nfsd can both use.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the destination address of the original NLM request as the
source address in callbacks to the client.
Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
- fs/lockd/xdr4.c:140:27: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:141:27: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:432:28: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:433:28: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:587:20: warning: symbol 'nlm_version4' was not declared.
Should it be static?
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
gfs2: nfs lock support for gfs2
lockd: add code to handle deferred lock requests
lockd: always preallocate block in nlmsvc_lock()
lockd: handle test_lock deferrals
lockd: pass cookie in nlmsvc_testlock
lockd: handle fl_grant callbacks
lockd: save lock state on deferral
locks: add fl_grant callback for asynchronous lock return
nfsd4: Convert NFSv4 to new lock interface
locks: add lock cancel command
locks: allow {vfs,posix}_lock_file to return conflicting lock
locks: factor out generic/filesystem switch from setlock code
locks: factor out generic/filesystem switch from test_lock
locks: give posix_test_lock same interface as ->lock
locks: make ->lock release private data before returning in GETLK case
locks: create posix-to-flock helper functions
locks: trivial removal of unnecessary parentheses
Change NLM internal interface to pass more information for test lock; we
need this to make sure the cookie information is pushed down to the place
where we do request deferral, which is handled for testlock by the
following patch.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We need to keep some state for a pending asynchronous lock request, so this
patch adds that state to struct nlm_block.
This also adds a function which defers the request, by calling
rqstp->rq_chandle.defer and storing the resulting deferred request in a
nlm_block structure which we insert into lockd's global block list. That
new function isn't called yet, so it's dead code until a later patch.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NLM version 4 requests estimate the call and reply header sizes rather
conservatively, using the very maximum size allowed in the protocol even
though Linux always uses only a small fraction of the allowable space.
Reduce the size of caller and lock arguments to conserve RPC buffer space
while XDR encoding NLM4 arguments. Add compile-time checks to ensure the
hostname string won't overflow NLM protocol maximums.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Annotated, all places switched to keeping status net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the following needlessly global functions static:
- nlm_lookup_host()
- nsm_find()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is possible for the ->fopen callback from lockd into nfsd to find that an
answer cannot be given straight away (an upcall is needed) and so the request
has to be 'dropped', to be retried later. That error status is not currently
propagated back.
So:
Change nlm_fopen to return nlm error codes (rather than a private
protocol) and define a new nlm_drop_reply code.
Cause nlm_drop_reply to cause the rpc request to get rpc_drop_reply
when this error comes back.
Cause svc_process to drop a request which returns a status of
rpc_drop_reply.
[akpm@osdl.org: fix warning storm]
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both the (recently introduces) nsm_sema and the older f_sema are converted
over.
Cc: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every NLM call includes the client's NSM state. Currently, the Linux client
always reports 0 - which seems not to cause any problems, but is not what the
protocol says.
This patch exposes the kernel's internal variable to user space via a sysctl,
which can be set at system boot time by statd.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we send a GRANTED_MSG call, we current copy the NLM cookie provided in
the original LOCK call - because in 1996, some broken clients seemed to rely
on this bug. However, this means the cookies are not unique, so that when the
client's GRANTED_RES message comes back, we cannot simply match it based on
the cookie, but have to use the client's IP address in addition. Which breaks
when you have a multi-homed NFS client.
The X/Open spec explicitly mentions that clients should not expect the same
cookie; so one may hope that any clients that were broken in 1996 have either
been fixed or rendered obsolete.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The way we incremented the NLM cookie in nlmclnt_next_cookie was not thread
safe. This patch changes the counter to an atomic_t
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the nsm_use_hostnames sysctl and module param. If set, lockd
will use the client's name (as given in the NLM arguments) to find the NSM
handle. This makes recovery work when the NFS peer is multi-homed, and the
reboot notification arrives from a different IP than the original lock calls.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As a result of previous patches, the loop in nlmsvc_invalidate_all just sets
h_expires for all client/hosts to 0 (though does it in a very complicated
way).
This was possibly meant to trigger early garbage collection but half the time
'0' is in the future and so it infact delays garbage collection.
Pre-aging the 'hosts' is not really needed at this point anyway so we throw
out the loop and nlm_find_client which is no longer needed.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes nlm_traverse{locks,blocks,shares} and friends use a function
pointer rather than a "action" enum.
This function pointer is given two nlm_hosts (one given by the caller, the
other taken from the lock/block/share currently visited), and is free to do
with them as it wants. If it returns a non-zero value, the lockd/block/share
is released.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes struct nlm_file and the nlm_files hash table to use a hlist
instead of the home-grown lists.
This allows us to remove f_hash which was only used to find the right hash
chain to delete an entry from.
It also increases the size of the nlm_files hash table from 32 to 128.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes the nlm_blocked list to use a list_node instead of
homegrown linked list handling.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Get rid of the home-grown singly linked lists for the nlm_host hash table.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This converts the statd upcalls to use the nsm_handle
This means that we only register each host once with statd, rather than
registering each host/vers/protocol triple.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the SM_NOTIFY handling understand and use the nsm_handle.
To make it a bit clear what is happening:
nlmclent_prepare_reclaim and nlmclnt_finish_reclaim
get open-coded into 'reclaimer'
The result is tidied up.
Then some of that functionality is moved out into nlm_host_rebooted (which
calls nlmclnt_recovery which starts a thread which runs reclaimer).
Also host_rebooted now finds an nsm_handle rather than a host, then then
iterates over all hosts and deals with each host that shares that nsm_handle.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the nsm_handle, which is shared by all nlm_host objects
referring to the same client.
With this patch applied, all nlm_hosts from the same address will share the
same nsm_handle. A future patch will add sharing by name.
Note: this patch changes h_name so that it is no longer guaranteed to be an IP
address of the host. When the host represents an NFS server, h_name will be
the name passed in the mount call. When the host represents a client, h_name
will be the name presented in the lock request received from the client. A
h_name is only used for printing informational messages, this change should
not be significant.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the peer's hostname (and name length) to all calls to
nlm*_lookup_host functions. A subsequent patch will make use of these (is
requested by a sysctl).
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Common code from nlm4svc_proc_sm_notify and nlmsvc_proc_sm_notify is moved
into a new nlm_host_rebooted.
This is in preparation of a patch that will change the reboot notification
handling entirely.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.
Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.
However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.
So:
- add an option to lockd_up saying which protocol is needed
- Always open sockets for which an explicit port was given, otherwise
only open a socket of the type required
- Change nfsd to do one lockd_up per socket rather than one per thread.
This
- removes the dependancy on CONFIG_NFSD_TCP
- means that lockd may open sockets other than at startup
- means that lockd will *not* listen on UDP if the only
mounts are TCP mount (and nfsd hasn't started).
The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We never actually set the b_done field any more; it's always zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from af8412d4283ef91356e65e0ed9b025b376aebded commit)
Currently it is possible for a task to remove its locks at the same time as
the NLM recovery thread is trying to recover them. This quickly leads to an
Oops.
Protect the locks using an rw semaphore while they are being recovered.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add the fl_owner to NLM compare locks. Since two different client can
present the same pid to the server it is not enough to distinguish locks
from different clients. The fl_owner field is a pointer to the struct
nlm_host which is unique for each client.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nlmsvc_traverse_shares return value is always zero, hence useless.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Note that we never return non-zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently lockd directly access the file_lock_list from fs/locks.c.
It does so to mark locks granted or reclaimable. This is very
suboptimal, because a) lockd needs to poke into locks.c internals, and
b) it needs to iterate over all locks in the system for marking locks
granted or reclaimable.
This patch adds lists for granted and reclaimable locks to the nlm_host
structure instead, and adds locks to those.
nlmclnt_lock:
now adds the lock to h_granted instead of setting the
NFS_LCK_GRANTED, still O(1)
nlmclnt_mark_reclaim:
goes away completely, replaced by a list_splice_init.
Complexity reduced from O(locks in the system) to O(1)
reclaimer:
iterates over h_reclaim now, complexity reduced from
O(locks in the system) to O(locks per nlm_host)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If 2 threads attached to the same process are blocking on different locks on
different files (maybe even on different servers) but have the same lock
arguments (i.e. same offset+length - actually quite common, since most
processes try to lock the entire file) then the first GRANTED call that wakes
one up will also wake the other.
Currently when the NLM_GRANTED callback comes in, lockd walks the list of
blocked locks in search of a match to the lock that the NLM server has
granted. Although it checks the lock pid, start and end, it fails to check
the filehandle and the server address.
By checking the filehandle and server IP address, we ensure that this only
happens if the locks truly are referencing the same file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the server returns NLM_LCK_DENIED_NOLOCKS, we currently retry the
entire NLM_CANCEL request. This may end up looping forever unless the
server changes its mind (why would it do that, though?).
Ensure that we limit the number of retries (to 3).
See bug# 5957 in bugzilla.kernel.org.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The OpenGroup docs state that the arguments "block", "exclusive" and
"alock" must exactly match the arguments for the lock call that we are
trying to cancel.
Currently, "block" is always set to false, which is wrong.
See bug# 5956 on bugzilla.kernel.org.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Shrink the RPC task structure. Instead of storing separate pointers
for task->tk_exit and task->tk_release, put them in a structure.
Also pass the user data pointer as a parameter instead of passing it via
task->tk_calldata. This enables us to nest callbacks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the lock blocks, the server may send us a GRANTED message that
races with the reply to our LOCK request. Make sure that we catch
the GRANTED by queueing up our request on the nlm_blocked list
before we send off the first LOCK rpc call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!