9f10523f89
Fix the state management of internal fscache operations and the accounting of what operations are in what states. This is done by: (1) Give struct fscache_operation a enum variable that directly represents the state it's currently in, rather than spreading this knowledge over a bunch of flags, who's processing the operation at the moment and whether it is queued or not. This makes it easier to write assertions to check the state at various points and to prevent invalid state transitions. (2) Add an 'operation complete' state and supply a function to indicate the completion of an operation (fscache_op_complete()) and make things call it. The final call to fscache_put_operation() can then check that an op in the appropriate state (complete or cancelled). (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better govern the state of an object: (a) The ->n_ops is now the number of extant operations on the object and is now decremented by fscache_put_operation() only. (b) The ->n_in_progress is simply the number of objects that have been taken off of the object's pending queue for the purposes of being run. This is decremented by fscache_op_complete() only. (c) The ->n_exclusive is the number of exclusive ops that have been submitted and queued or are in progress. It is decremented by fscache_op_complete() and by fscache_cancel_op(). fscache_put_operation() and fscache_operation_gc() now no longer try to clean up ->n_exclusive and ->n_in_progress. That was leading to double decrements against fscache_cancel_op(). fscache_cancel_op() now no longer decrements ->n_ops. That was leading to double decrements against fscache_put_operation(). fscache_submit_exclusive_op() now decides whether it has to queue an op based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter will persist in being true even after all preceding operations have been cancelled or completed. Furthermore, if an object is active and there are runnable ops against it, there must be at least one op running. (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and provide a function to record completion of the pages as they complete. When n_pages reaches 0, the operation is deemed to be complete and fscache_op_complete() is called. Add calls to fscache_retrieval_complete() anywhere we've finished with a page we've been given to read or allocate for. This includes places where we just return pages to the netfs for reading from the server and where accessing the cache fails and we discard the proposed netfs page. The bugs in the unfixed state management manifest themselves as oopses like the following where the operation completion gets out of sync with return of the cookie by the netfs. This is possible because the cache unlocks and returns all the netfs pages before recording its completion - which means that there's nothing to stop the netfs discarding them and returning the cookie. FS-Cache: Cookie 'NFS.fh' still has outstanding reads ------------[ cut here ]------------ kernel BUG at fs/fscache/cookie.c:519! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY RIP: 0010:[<ffffffffa007050a>] [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache] RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282 RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000 RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000 R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98 R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370 FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040) Stack: ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91 Call Trace: [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs] [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs] [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs] [<ffffffff810d8d47>] evict+0xa1/0x15c [<ffffffff810d8e2e>] dispose_list+0x2c/0x38 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b [<ffffffff810c56b7>] prune_super+0xd5/0x140 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595 [<ffffffff8103e009>] ? process_timeout+0xb/0xb [<ffffffff8109dba3>] kswapd+0x270/0x289 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595 [<ffffffff8104bf7a>] kthread+0x7f/0x87 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb Signed-off-by: David Howells <dhowells@redhat.com>
214 lines
7.2 KiB
Plaintext
214 lines
7.2 KiB
Plaintext
================================
|
|
ASYNCHRONOUS OPERATIONS HANDLING
|
|
================================
|
|
|
|
By: David Howells <dhowells@redhat.com>
|
|
|
|
Contents:
|
|
|
|
(*) Overview.
|
|
|
|
(*) Operation record initialisation.
|
|
|
|
(*) Parameters.
|
|
|
|
(*) Procedure.
|
|
|
|
(*) Asynchronous callback.
|
|
|
|
|
|
========
|
|
OVERVIEW
|
|
========
|
|
|
|
FS-Cache has an asynchronous operations handling facility that it uses for its
|
|
data storage and retrieval routines. Its operations are represented by
|
|
fscache_operation structs, though these are usually embedded into some other
|
|
structure.
|
|
|
|
This facility is available to and expected to be be used by the cache backends,
|
|
and FS-Cache will create operations and pass them off to the appropriate cache
|
|
backend for completion.
|
|
|
|
To make use of this facility, <linux/fscache-cache.h> should be #included.
|
|
|
|
|
|
===============================
|
|
OPERATION RECORD INITIALISATION
|
|
===============================
|
|
|
|
An operation is recorded in an fscache_operation struct:
|
|
|
|
struct fscache_operation {
|
|
union {
|
|
struct work_struct fast_work;
|
|
struct slow_work slow_work;
|
|
};
|
|
unsigned long flags;
|
|
fscache_operation_processor_t processor;
|
|
...
|
|
};
|
|
|
|
Someone wanting to issue an operation should allocate something with this
|
|
struct embedded in it. They should initialise it by calling:
|
|
|
|
void fscache_operation_init(struct fscache_operation *op,
|
|
fscache_operation_release_t release);
|
|
|
|
with the operation to be initialised and the release function to use.
|
|
|
|
The op->flags parameter should be set to indicate the CPU time provision and
|
|
the exclusivity (see the Parameters section).
|
|
|
|
The op->fast_work, op->slow_work and op->processor flags should be set as
|
|
appropriate for the CPU time provision (see the Parameters section).
|
|
|
|
FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
|
|
operation and waited for afterwards.
|
|
|
|
|
|
==========
|
|
PARAMETERS
|
|
==========
|
|
|
|
There are a number of parameters that can be set in the operation record's flag
|
|
parameter. There are three options for the provision of CPU time in these
|
|
operations:
|
|
|
|
(1) The operation may be done synchronously (FSCACHE_OP_MYTHREAD). A thread
|
|
may decide it wants to handle an operation itself without deferring it to
|
|
another thread.
|
|
|
|
This is, for example, used in read operations for calling readpages() on
|
|
the backing filesystem in CacheFiles. Although readpages() does an
|
|
asynchronous data fetch, the determination of whether pages exist is done
|
|
synchronously - and the netfs does not proceed until this has been
|
|
determined.
|
|
|
|
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
|
|
before submitting the operation, and the operating thread must wait for it
|
|
to be cleared before proceeding:
|
|
|
|
wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
|
|
fscache_wait_bit, TASK_UNINTERRUPTIBLE);
|
|
|
|
|
|
(2) The operation may be fast asynchronous (FSCACHE_OP_FAST), in which case it
|
|
will be given to keventd to process. Such an operation is not permitted
|
|
to sleep on I/O.
|
|
|
|
This is, for example, used by CacheFiles to copy data from a backing fs
|
|
page to a netfs page after the backing fs has read the page in.
|
|
|
|
If this option is used, op->fast_work and op->processor must be
|
|
initialised before submitting the operation:
|
|
|
|
INIT_WORK(&op->fast_work, do_some_work);
|
|
|
|
|
|
(3) The operation may be slow asynchronous (FSCACHE_OP_SLOW), in which case it
|
|
will be given to the slow work facility to process. Such an operation is
|
|
permitted to sleep on I/O.
|
|
|
|
This is, for example, used by FS-Cache to handle background writes of
|
|
pages that have just been fetched from a remote server.
|
|
|
|
If this option is used, op->slow_work and op->processor must be
|
|
initialised before submitting the operation:
|
|
|
|
fscache_operation_init_slow(op, processor)
|
|
|
|
|
|
Furthermore, operations may be one of two types:
|
|
|
|
(1) Exclusive (FSCACHE_OP_EXCLUSIVE). Operations of this type may not run in
|
|
conjunction with any other operation on the object being operated upon.
|
|
|
|
An example of this is the attribute change operation, in which the file
|
|
being written to may need truncation.
|
|
|
|
(2) Shareable. Operations of this type may be running simultaneously. It's
|
|
up to the operation implementation to prevent interference between other
|
|
operations running at the same time.
|
|
|
|
|
|
=========
|
|
PROCEDURE
|
|
=========
|
|
|
|
Operations are used through the following procedure:
|
|
|
|
(1) The submitting thread must allocate the operation and initialise it
|
|
itself. Normally this would be part of a more specific structure with the
|
|
generic op embedded within.
|
|
|
|
(2) The submitting thread must then submit the operation for processing using
|
|
one of the following two functions:
|
|
|
|
int fscache_submit_op(struct fscache_object *object,
|
|
struct fscache_operation *op);
|
|
|
|
int fscache_submit_exclusive_op(struct fscache_object *object,
|
|
struct fscache_operation *op);
|
|
|
|
The first function should be used to submit non-exclusive ops and the
|
|
second to submit exclusive ones. The caller must still set the
|
|
FSCACHE_OP_EXCLUSIVE flag.
|
|
|
|
If successful, both functions will assign the operation to the specified
|
|
object and return 0. -ENOBUFS will be returned if the object specified is
|
|
permanently unavailable.
|
|
|
|
The operation manager will defer operations on an object that is still
|
|
undergoing lookup or creation. The operation will also be deferred if an
|
|
operation of conflicting exclusivity is in progress on the object.
|
|
|
|
If the operation is asynchronous, the manager will retain a reference to
|
|
it, so the caller should put their reference to it by passing it to:
|
|
|
|
void fscache_put_operation(struct fscache_operation *op);
|
|
|
|
(3) If the submitting thread wants to do the work itself, and has marked the
|
|
operation with FSCACHE_OP_MYTHREAD, then it should monitor
|
|
FSCACHE_OP_WAITING as described above and check the state of the object if
|
|
necessary (the object might have died whilst the thread was waiting).
|
|
|
|
When it has finished doing its processing, it should call
|
|
fscache_op_complete() and fscache_put_operation() on it.
|
|
|
|
(4) The operation holds an effective lock upon the object, preventing other
|
|
exclusive ops conflicting until it is released. The operation can be
|
|
enqueued for further immediate asynchronous processing by adjusting the
|
|
CPU time provisioning option if necessary, eg:
|
|
|
|
op->flags &= ~FSCACHE_OP_TYPE;
|
|
op->flags |= ~FSCACHE_OP_FAST;
|
|
|
|
and calling:
|
|
|
|
void fscache_enqueue_operation(struct fscache_operation *op)
|
|
|
|
This can be used to allow other things to have use of the worker thread
|
|
pools.
|
|
|
|
|
|
=====================
|
|
ASYNCHRONOUS CALLBACK
|
|
=====================
|
|
|
|
When used in asynchronous mode, the worker thread pool will invoke the
|
|
processor method with a pointer to the operation. This should then get at the
|
|
container struct by using container_of():
|
|
|
|
static void fscache_write_op(struct fscache_operation *_op)
|
|
{
|
|
struct fscache_storage *op =
|
|
container_of(_op, struct fscache_storage, op);
|
|
...
|
|
}
|
|
|
|
The caller holds a reference on the operation, and will invoke
|
|
fscache_put_operation() when the processor function returns. The processor
|
|
function is at liberty to call fscache_enqueue_operation() or to take extra
|
|
references.
|