Commit Graph

322 Commits

Author SHA1 Message Date
Linus Torvalds
6e188240eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits)
  ceph: reuse mon subscribe message instead of allocated anew
  ceph: avoid resending queued message to monitor
  ceph: Storage class should be before const qualifier
  ceph: all allocation functions should get gfp_mask
  ceph: specify max_bytes on readdir replies
  ceph: cleanup pool op strings
  ceph: Use kzalloc
  ceph: use common helper for aborted dir request invalidation
  ceph: cope with out of order (unsafe after safe) mds reply
  ceph: save peer feature bits in connection structure
  ceph: resync headers with userland
  ceph: use ceph. prefix for virtual xattrs
  ceph: throw out dirty caps metadata, data on session teardown
  ceph: attempt mds reconnect if mds closes our session
  ceph: clean up send_mds_reconnect interface
  ceph: wait for mds OPEN reply to indicate reconnect success
  ceph: only send cap releases when mds is OPEN|HUNG
  ceph: dicard cap releases on mds restart
  ceph: make mon client statfs handling more generic
  ceph: drop src address(es) from message header [new protocol feature]
  ...
2010-05-24 07:37:52 -07:00
Sage Weil
240ed68eb5 ceph: reuse mon subscribe message instead of allocated anew
Use the same message, allocated during startup.  No need to reallocate a
new one each time around (and potentially ENOMEM).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-21 16:26:11 -07:00
Christoph Hellwig
8018ab0574 sanitize vfs_fsync calling conventions
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.

The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:21 -04:00
Al Viro
3981f2e2a0 ceph: should use deactivate_locked_super() on failure exits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:13 -04:00
Sage Weil
970690012c ceph: avoid resending queued message to monitor
The auth_reply handler will (re)send any pending requests.  For the
initial mon authenticate phase, that's correct, but when a auth ticket
renewal races with an in-flight request, we may resend a request message
that is already in flight.  Avoid this by revoking the message before
sending it.

We should also avoid resending requests at all during ticket renewal; that
will come soon.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-21 15:01:22 -07:00
Tobias Klauser
9e32789f63 ceph: Storage class should be before const qualifier
The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-21 15:01:21 -07:00
Yehuda Sadeh
34d23762d9 ceph: all allocation functions should get gfp_mask
This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:42 -07:00
Sage Weil
23804d91f1 ceph: specify max_bytes on readdir replies
Specify max bytes in request to bound size of reply.  Add associated
mount option with default value of 512 KB.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:41 -07:00
Sage Weil
366837706b ceph: cleanup pool op strings
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:41 -07:00
Julia Lawall
cffe7b6d8c ceph: Use kzalloc
Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:40 -07:00
Sage Weil
167c9e352d ceph: use common helper for aborted dir request invalidation
We invalidate I_COMPLETE and dentry leases in two places: on aborted mds
request and on request replay.  Use common helper to avoid duplicate code.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:40 -07:00
Sage Weil
85792d0dd6 ceph: cope with out of order (unsafe after safe) mds reply
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:39 -07:00
Sage Weil
aba558e28a ceph: save peer feature bits in connection structure
These are used for adjusting behavior, such as conditionally encoding a
newer message format.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:38 -07:00
Sage Weil
ca9d93a292 ceph: resync headers with userland
Notable changes include pool op defines and types, FLOCK feature bit, and
new CMPXATTR osd ops.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:38 -07:00
Sage Weil
1a75627896 ceph: use ceph. prefix for virtual xattrs
Drop the 'user.' prefix and use just 'ceph.' for fs virtual xattrs.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:37 -07:00
Sage Weil
6c99f2545d ceph: throw out dirty caps metadata, data on session teardown
The remove_session_caps() helper is called when an MDS closes out our
session (either normally, or as a result of a failed reconnect), and when
we tear down state for umount.  If we remove the last cap, and there are
no cap migrations in progress, then there is little hope of us flushing
out that data to the mds (without heroic efforts to reconnect and flush).

So, to avoid leaving inodes pinned (due to dirty state) and crashing after
umount, throw out dirty caps state and unpin the inodes.  Print a warning
to the console so we know something was lost.

NOTE: Although we drop wrbuffer refs, we don't actually mark pages clean;
maybe a truncate should be queued?

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:37 -07:00
Sage Weil
7e70f0ed9f ceph: attempt mds reconnect if mds closes our session
Currently, if our session is closed (due to a timeout, or explicit close,
or whatever), we just sit there doing nothing unless/until the MDS
restarts, at which point we try to reconnect.

Change client to attempt an immediate reconnect if our session is closed.

Note that currently the MDS doesn't support this, and our attempt will
fail.  We'll get a session CLOSE, our caps and dirty cap state will be
dropped, and the client will be free to attempt to reconnect.  That's
clearly not as nice as a successful reconnect, but it at least allows us
to try to carry on, and in the future the MDS will support a reconnect
and we will fare better.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:36 -07:00
Sage Weil
34b6c855fa ceph: clean up send_mds_reconnect interface
Pass a ceph_mds_session, since the caller has it.

Remove the dead code for sending empty reconnects.  It used to be used
when the MDS contacted _us_ to solicit a reconnect, and we could reply
saying "go away, I have no session."  Now we only send reconnects based
on the mds map, and only when we do in fact have an open session.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:35 -07:00
Sage Weil
29790f26ab ceph: wait for mds OPEN reply to indicate reconnect success
We used to infer reconnect success by watching the MDS state, essentially
assuming that hearing nothing meant things were ok.  That wasn't
particularly reliable.  Instead, the MDS replies with an explicit OPEN
message to indicate success.

Strictly speaking, this is a protocol change, but it is a backwards
compatible one that does not break new clients + old servers or old
clients + new servers.  At least not yet.

Drop unused @all argument from kick_requests while we're at it.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:35 -07:00
Sage Weil
aab53dd9e8 ceph: only send cap releases when mds is OPEN|HUNG
On OPENING we shouldn't have any caps (or releases).
On CLOSING, we should wait until we succeed (and throw it all out), or
don't (and are OPEN again).
On RECONNECTING we can wait until we are OPEN.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:34 -07:00
Sage Weil
e01a594646 ceph: dicard cap releases on mds restart
If the MDS restarts, the expire caps state is no longer shared, and can be
thrown out.  Caps state will be rebuilt on the MDS during the reconnect
process that follows.  Zero out any release messages and adjust the
release counter accordingly.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:33 -07:00
Yehuda Sadeh
f8c76f6f25 ceph: make mon client statfs handling more generic
This is being done so that we could reuse the statfs
infrastructure with other requests that return values.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:33 -07:00
Sage Weil
dbad185d49 ceph: drop src address(es) from message header [new protocol feature]
The CEPH_FEATURE_NOSRCADDR protocol feature avoids putting the full source
address in each message header (twice).  This patch switches the client to
the new scheme, and _requires_ this feature on the server.  The server
will support both the old and new schemes.  That means an old client will
work with a new server, but a new client will not work with an old server.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:32 -07:00
Dan Carpenter
a5ee751c15 ceph: cleanup: remove unused assignement
We don't ever use "dirty" so we can remove it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:32 -07:00
Sage Weil
0f8605f2bd ceph: clean up cap release loop vs spinlock
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:31 -07:00
Sage Weil
31e0cf8f6a ceph: name bdi ceph-%d instead of major:minor
The bdi_setup_and_register() helper doesn't help us since we bdi_init() in
create_client() and bdi_register() only when sget() succeeds.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:30 -07:00
Sage Weil
56b7cf9581 ceph: skip mds sync on forced unmount
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:30 -07:00
Sage Weil
b736b3d9d0 ceph: adjust masked struct_v variable names
Reported-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:29 -07:00
Sage Weil
6e19a16ef2 ceph: clean up mount options, ->show_options()
Ensure all options are included in /proc/mounts.  Some cleanup.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:29 -07:00
Sage Weil
1cd3935bed ceph: set dn offset when spliced
We want to assign an offset when the dentry goes from null to linked, which
is always done by splice_dentry().  Notably, we should NOT assign an
offset when a dentry is first created and is still null.

BUG if we try to splice a non-null dentry (we shouldn't).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:28 -07:00
Sage Weil
1b7facc41b ceph: don't clobber i_max_offset on already complete dir
This can screw up offsets assigned to new dentries and break dcache
readdir results.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:27 -07:00
Sage Weil
e8a7498715 ceph: skip set_dentry_offset work if directory not I_COMPLETE
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:27 -07:00
Sage Weil
f1f2765fae ceph: set next_offset on readdir finish
Set next_offset to 2 (always 2!), not 0, on readdir finish.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:26 -07:00
Henry C Chang
bddfa3cc18 ceph: listxattr should compare version by >=
If the version hasn't changed, don't rebuild the index.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:26 -07:00
Sage Weil
a6424e48c8 ceph: fix xattr dangling pointer / double free
If we use the xattr_blob, clear the pointer so we don't release the memory
at the bottom of the fuction.

Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:25 -07:00
Sage Weil
9dd4658db1 ceph: close messenger race
Simplify messenger locking, and close race between ceph_con_close() setting
the CLOSED bit and con_work() checking the bit, then taking the mutex.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:25 -07:00
Sage Weil
4f48280ee1 ceph: name msgpools; useful error messages
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:24 -07:00
Sage Weil
8c6efb58a5 ceph: fix memory leak due to possible dentry init race
Free dentry_info in error path.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:23 -07:00
Sage Weil
559c1e0073 ceph: include auth method in error messages
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:23 -07:00
Sage Weil
f26e681d52 ceph: osdtimeout=0 for now timeout
Allow the osd reset timeout to be disabled.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:22 -07:00
Dan Carpenter
0d509c949a ceph: d_obtain_alias() returns ERR_PTR()
d_obtain_alias() doesn't return NULL, it returns an ERR_PTR().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:22 -07:00
Yehuda Sadeh
c473ad927e ceph: wake up mount thread when getting osdmap
Now that the mount thread waits for the osdmap, it needs
to be awaken.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-05-17 15:25:21 -07:00
Huang Weiyi
1bb71637d0 ceph: remove unused #includes
Remove unused #include's in
  fs/ceph/super.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:21 -07:00
Sage Weil
6822d00b54 ceph: wait for both monmap and osdmap when opening session
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-05-17 15:25:20 -07:00
Sage Weil
6f2bc3ff4c ceph: clean up connection reset
Reset out_keepalive_pending and peer_global_seq, and drop unused var.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:20 -07:00
Sage Weil
bb257664f7 ceph: simplify ceph_msg_new
We only need to pass in front_len.  Callers can attach any other payload
pieces (middle, data) as they see fit.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:19 -07:00
Sage Weil
a79832f26b ceph: make ceph_msg_new return NULL on failure; clean up, fix callers
Returning ERR_PTR(-ENOMEM) is useless extra work.  Return NULL on failure
instead, and fix up the callers (about half of which were wrong anyway).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:18 -07:00
Sage Weil
d52f847a84 ceph: rewrite msgpool using mempool_t
Since we don't need to maintain large pools of messages, we can just
use the standard mempool_t.  We maintain a msgpool 'wrapper' because we
need the mempool_t* in the alloc function, and mempool gives us only
pool_data.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:18 -07:00
Cheng Renquan
640ef79d27 ceph: use ceph_sb_to_client instead of ceph_client
ceph_sb_to_client and ceph_client are really identical, we need to dump
one; while function ceph_client is confusing with "struct ceph_client",
ceph_sb_to_client's definition is more clear; so we'd better switch all
call to ceph_sb_to_client.

  -static inline struct ceph_client *ceph_client(struct super_block *sb)
  -{
  -	return sb->s_fs_info;
  -}

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:17 -07:00
Cheng Renquan
2d06eeb877 ceph: handle kzalloc() failure
Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:16 -07:00