android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Joel Becker	12462f1d9f	ocfs2: Add the 'inode64' mount option. Now that ocfs2 limits inode numbers to 32bits, add a mount option to disable the limit. This parallels XFS. 64bit systems can handle the larger inode numbers. [ Added description of inode64 mount option in ocfs2.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:08 -07:00
Joel Becker	1187c96885	ocfs2: Limit inode allocation to 32bits. ocfs2 inode numbers are block numbers. For any filesystem with less than 2^32 blocks, this is not a problem. However, when ocfs2 starts using JDB2, it will be able to support filesystems with more than 2^32 blocks. This would result in inode numbers higher than 2^32. The problem is that stat(2) can't handle those numbers on 32bit machines. The simple solution is to have ocfs2 allocate all inodes below that boundary. The suballoc code is changed to honor an optional block limit. Only the inode suballocator sets that limit - all other allocations stay unlimited. The biggest trick is to grow the inode suballocator beneath that limit. There's no point in allocating block groups that are above the limit, then rejecting their elements later on. We want to prevent the inode allocator from ever having block groups above the limit. This involves a little gyration with the local alloc code. If the local alloc window is above the limit, it signals the caller to try the global bitmap but does not disable the local alloc file (which can be used for other allocations). [ Minor cleanup - removed an ML_NOTICE comment. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:07 -07:00
Tao Ma	08413899db	ocfs2: Resolve deadlock in ocfs2_xattr_free_block. In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we have a transaction open. This will deadlock the downconvert thread, so fix it. We can clean up how xattr blocks are removed while here - this patch also moves the mechanism of releasing xattr block (including both value, xattr tree and xattr block) into this function. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:06 -07:00
Tao Ma	28b8ca0b7f	ocfs2: bug-fix for journal extend in xattr. In ocfs2_extend_trans, when we can't extend the current transaction, it will commit current transaction and restart a new one. So if the previous credits we have allocated aren't used(the block isn't dirtied before our extend), we will not have enough credits for any future operation(it will cause jbd complain and bug out). So check this and re-extend it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:06 -07:00
Joel Becker	8d6220d6a7	ocfs2: Change ocfs2_get__extent_tree() to ocfs2_init__extent_tree() The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get__extent_tree() to ocfs2_init__extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:05 -07:00
Joel Becker	1625f8ac15	ocfs2: Comment struct ocfs2_extent_tree_operations. struct ocfs2_extent_tree_operations provides methods for the different on-disk btrees in ocfs2. Describing what those methods do is probably a good idea. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:05 -07:00
Joel Becker	f99b9b7ccf	ocfs2: Make ocfs2_extent_tree the first-class representation of a tree. We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:05 -07:00
Joel Becker	1e61ee79e2	ocfs2: Add an insertion check to ocfs2_extent_tree_operations. A couple places check an extent_tree for a valid inode. We move that out to add an eo_insert_check() operation. It can be called from ocfs2_insert_extent() and elsewhere. We also have the wrapper calls ocfs2_et_insert_check() and ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to provide useless operations for xattr types. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:05 -07:00
Joel Becker	1a09f556e5	ocfs2: Create specific get_extent_tree functions. A caller knows what kind of extent tree they have. There's no reason they have to call ocfs2_get_extent_tree() with a NULL when they could just as easily call a specific function to their type of extent tree. Introduce ocfs2_dinode_get_extent_tree(), ocfs2_xattr_tree_get_extent_tree(), and ocfs2_xattr_value_get_extent_tree(). They only take the necessary arguments, calling into the underlying __ocfs2_get_extent_tree() to do the real work. __ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but without needing any switch-by-type logic. ocfs2_get_extent_tree() is now a wrapper around the specific calls. It exists because a couple alloc.c functions can take et_type. This will go later. Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a struct ocfs2_xattr_value_root* instead of void*. This gives us typechecking where we didn't have it before. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:05 -07:00
Joel Becker	943cced39e	ocfs2: Determine an extent tree's max_leaf_clusters in an et_op. Provide an optional extent_tree_operation to specify the max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value is 0 (unlimited). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Joel Becker	1c25d93a4a	ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents(). ocfs2_num_free_extents() re-implements the logic of ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not allocate, let's use it in ocfs2_num_free_extents() to simplify the code. The inode validation code in ocfs2_num_free_extents() is not needed. All callers are passing in pre-validated inodes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Joel Becker	0ce1010f1a	ocfs2: Provide the get_root_el() method to ocfs2_extent_tree_operations. The root_el of an ocfs2_extent_tree needs to be calculated from et->et_object. Make it an operation on et->et_ops. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Joel Becker	ea5efa1512	ocfs2: Make 'private' into 'object' on ocfs2_extent_tree. The 'private' pointer was a way to store off xattr values, which don't live at a set place in the bh. But the concept of "the object containing the extent tree" is much more generic. For an inode it's the struct ocfs2_dinode, for an xattr value its the value. Let's save off the 'object' at all times. If NULL is passed to ocfs2_get_extent_tree(), 'object' is set to bh->b_data; Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Joel Becker	dc0ce61af4	ocfs2: Make ocfs2_extent_tree get/put instead of alloc. Rather than allocating a struct ocfs2_extent_tree, just put it on the stack. Fill it with ocfs2_get_extent_tree() and drop it with ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet still safely ref the root_bh. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Joel Becker	ce1d9ea621	ocfs2: Prefix the ocfs2_extent_tree structure. The members of the ocfs2_extent_tree structure gain a prefix of 'et_'. All users are updated. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Joel Becker	35dc0aa3c5	ocfs2: Prefix the extent tree operations structure. The ocfs2_extent_tree_operations structure gains a field prefix on its members. The ->eo_sanity_check() operation gains a wrapper function for completeness. All of the extent tree operation wrappers gain a consistent name (ocfs2_et_*()). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:04 -07:00
Mark Fasheh	ff1ec20ef6	ocfs2: fix printk format warnings This patch fixes the following build warnings: fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket': fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket': fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tiger Yang	8154da3d21	ocfs2: Add incompatible flag for extended attribute This patch adds the s_incompat flag for extended attribute support. This helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able to mount a volume with xattr support. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tao Ma	a394425643	ocfs2: Delete all xattr buckets during inode removal In inode removal, we need to iterate all the buckets, remove any externally-stored EA values and delete the xattr buckets. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tao Ma	012255961c	ocfs2: Enable xattr set in index btree Where the previous patches added the ability of list/get xattr in buckets for ocfs2, this patch enables ocfs2 to store large numbers of EAs. The original design doc is written by Mark Fasheh, and it can be found in http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to make small modifications to it. First, because the bucket size is 4K, a new field named xh_free_start is added in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket. It is used when we store new EA name/value. With this field, we can find the place more quickly and what's more, we don't need to sort the name/value every time to let the last entry indicate the next unused space. This makes the insert operation more efficient for blocksizes smaller than 4k. Because of the new xh_free_start, another field named as xh_name_value_len is also added in ocfs2_xattr_header. It records the total length of all the name/values in the bucket. We need this so that we can check it and defragment the bucket if there is not enough contiguous free space. An xattr insertion looks like this: 1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA. 2. check whether there is enough space in bucketA. If yes, insert it directly and modify xh_free_start and xh_name_value_len accordingly. If not, check xh_name_value_len to see whether we can store this by defragment the bucket. If yes, defragment it and go on insertion. 3. If defragement doesn't work, check whether there is new empty bucket in the clusters within this extent record. If yes, init the new bucket and move all the buckets after bucketA one by one to the next bucket. Move half of the entries in bucketA to the next bucket and go on insertion. 4. If there is no new bucket, grow the extent tree. As for xattr deletion, we will delete an xattr bucket when all it's xattrs are removed and move all the buckets after it to the previous one. When all the xattr buckets in an extend record are freed, free this extend records from ocfs2_xattr_tree. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tao Ma	ca12b7c489	ocfs2: Optionally limit extent size in ocfs2_insert_extent() In xattr bucket, we want to limit the maximum size of a btree leaf, otherwise we'll lose the benefits of hashing because we'll have to search large leaves. So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster size we want so that we can prevent ocfs2_insert_extent() from merging the leaf record even if it is contiguous with an adjacent record. Other btree types are not affected by this change. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tao Ma	589dc2602f	ocfs2: Add xattr lookup code xattr btrees Add code to lookup a given extended attribute in the xattr btree. Lookup follows this general scheme: 1. Use ocfs2_xattr_get_rec to find the xattr extent record 2. Find the xattr bucket within the extent which may contain this xattr 3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need to recalcuate the block offset and name offset for the right position of name/value. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tao Ma	0c044f0b24	ocfs2: Add xattr bucket iteration for large numbers of EAs Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets. Attributes are stored within a given bucket, depending on hash value. After a discussion with Mark, we decided that the per-bucket index (xe_entry[]) would only exist in the 1st block of a bucket. Likewise, name/value pairs will not straddle more than one block. This allows the majority of operations to work directly on the buffer heads in a leaf block. This patch adds code to iterate the buckets in an EA. A new abstration of ocfs2_xattr_bucket is added. It records the bhs in this bucket and ocfs2_xattr_header. This keeps the code neat, improving readibility. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:03 -07:00
Tao Ma	ba492615f0	ocfs2: Add xattr index tree operations When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to store large numbers of EAs. This patch adds a new type in ocfs2_extent_tree_type and adds the implementation so that we can re-use the b-tree code to handle the storage of many EAs. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:02 -07:00
Tiger Yang	cf1d6c763f	ocfs2: Add extended attribute support This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:02 -07:00
Tiger Yang	fdd77704a8	ocfs2: reserve inline space for extended attribute Add the structures and helper functions we want for handling inline extended attributes. We also update the inline-data handlers so that they properly function in the event that we have both inline data and inline attributes sharing an inode block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:02 -07:00
Tao Ma	f56654c435	ocfs2: Add extent tree operation for xattr value btrees Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:01 -07:00
Tao Ma	ac11c82719	ocfs2: Add helper function in uptodate.c for removing xattr clusters The old uptodate only handles the issue of removing one buffer_head from ocfs2 inode's buffer cache. With xattr clusters, we may need to remove multiple buffer_head's at a time. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:59 -07:00
Tao Ma	5a7bc8eb29	ocfs2: Add the basic xattr disk layout in ocfs2_fs.h Ocfs2 uses a very flexible structure for storing extended attributes on disk. Small amount of attributes are stored directly in the inode block - up to 256 bytes worth. If that fills up, attributes are also stored in an external block, linked to from the inode block. That block can in turn expand to a btree, capable of storing large numbers of attributes. Individual attribute values are stored inline if they're small enough (currently about 80 bytes, this can be changed though), and otherwise are expanded to a btree. The theoretical limit to the size of an individual attribute is about the same as an inode, though the kernel's upper bound on the size of an attributes data is far smaller. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:59 -07:00
Tao Ma	0eb8d47e69	ocfs2: Make high level btree extend code generic Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:59 -07:00
Tao Ma	e7d4cb6bc1	ocfs2: Abstract ocfs2_extent_tree in b-tree operations. In the old extent tree operation, we take the hypothesis that we are using the ocfs2_extent_list in ocfs2_dinode as the tree root. As xattr will also use ocfs2_extent_list to store large value for a xattr entry, we refactor the tree operation so that xattr can use it directly. The refactoring includes 4 steps: 1. Abstract set/get of last_eb_blk and update_clusters since they may be stored in different location for dinode and xattr. 2. Add a new structure named ocfs2_extent_tree to indicate the extent tree the operation will work on. 3. Remove all the use of fe_bh and di, use root_bh and root_el in extent tree instead. So now all the fe_bh is replaced with et->root_bh, el with root_el accordingly. 4. Make ocfs2_lock_allocators generic. Now it is limited to be only used in file extend allocation. But the whole function is useful when we want to store large EAs. Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used for anything other than truncate inode data btrees. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:58 -07:00
Tao Ma	811f933df1	ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations. But they are all limited to an inode btree because they use a struct ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list (the part of an ocfs2_dinode they actually use) so that the xattr btree code can use these functions. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:58 -07:00
Tao Ma	231b87d109	ocfs2: Modify ocfs2_num_free_extents for future xattr usage. ocfs2_num_free_extents() is used to find the number of free extent records in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to use this for extended attribute trees in the future, so genericize the interface the take a buffer head. A future patch will allow that buffer_head to contain any structure rooting an ocfs2 btree. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:58 -07:00
Mark Fasheh	9a8ff578fb	ocfs2: track local alloc state via debugfs A per-mount debugfs file, "local_alloc" is created which when read will expose live state of the nodes local alloc file. Performance impact is minimal, only a bit of memory overhead per mount point. Still, the code is hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug local alloc performance problems on a live system. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:58 -07:00
Mark Fasheh	9c7af40b21	ocfs2: throttle back local alloc when low on disk space Ocfs2's local allocator disables itself for the duration of a mount point when it has trouble allocating a large enough area from the primary bitmap. That can cause performance problems, especially for disks which were only temporarily full or fragmented. This patch allows for the allocator to shrink it's window first, before being disabled. Later, it can also be re-enabled so that any performance drop is minimized. To do this, we allow the value of osb->local_alloc_bits to be shrunk when needed. The default value is recorded in a mostly read-only variable so that we can re-initialize when required. Locking had to be updated so that we could protect changes to local_alloc_bits. Mostly this involves protecting various local alloc values with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which is used when the local allocator is has shrunk, but is not disabled. If the available space dips below 1 megabyte, the local alloc file is disabled. In either case, local alloc is re-enabled 30 seconds after the event, or when an appropriate amount of bits is seen in the primary bitmap. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:57 -07:00
Mark Fasheh	ebcee4b5c9	ocfs2: Track local alloc bits internally Do this instead of tracking absolute local alloc size. This avoids needless re-calculatiion of bits from bytes in localalloc.c. Additionally, the value is now in a more natural unit for internal file system bitmap work. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:57 -07:00
Mark Fasheh	53da4939f3	ocfs2: POSIX file locks support This is actually pretty easy since fs/dlm already handles the bulk of the work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the underlying lock manager, so I only had to add the right calls. Cluster-aware POSIX locks ("plocks") can be turned off by the same means at UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume. Internally, the file system uses two sets of file_operations, depending on whether cluster aware plocks is required. This turns out to be easier than implementing local-only versions of ->lock. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:57 -07:00
Steven Whitehouse	a447c09324	vfs: Use const for kernel parser table This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 10:10:37 -07:00
Mark Fasheh	00dc417fa3	ocfs2: fiemap support Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be refactored so that the extent cache can be skipped in favor of going directly to the on-disk records. This makes it easier for us to determine which extent is the last one in the btree. Also, I'm not sure we want to be caching fiemap lookups anyway as they're not directly related to data read/write. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ocfs2-devel@oss.oracle.com Cc: linux-fsdevel@vger.kernel.org	2008-10-03 17:32:11 -04:00
Tao Ma	0e116227a0	ocfs2: Fix a bug in direct IO read. ocfs2 will become read-only if we try to read the bytes which pass the end of i_size. This can be easily reproduced by following steps: 1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse. 2. create a small file(say less than 100 bytes) and we will create the file which is allocated 1 cluster. 3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit. 4. The ocfs2 volume becomes read-only and dmesg shows: OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks: Inode 66010 has a hole at block 1 File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted. So suppress the ERROR message. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-09-10 01:44:08 -07:00
Joel Becker	d6817cdbd1	ocfs2: Increment the reference count of an already-active stack. The ocfs2_stack_driver_request() function failed to increment the refcount of an already-active stack. It only did the increment on the first reference. Whoops. Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Marcos Matsunaga <marcos.matsunaga@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-08-25 07:29:47 -07:00
Mark Fasheh	9780eb6cfa	ocfs2: correctly set i_blocks after inline dir gets expanded We were setting i_blocks based on allocation before the extent insert, which is wrong as the value is a calculation based on ip_clusters which gets updated as a result of the insert. This patch moves the line in question to just after the call to ocfs2_insert_extent(). Without this fix, inline directories were temporarily having an i_blocks value of zero immediately after expansion to extents. Reported-and-tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-08-22 11:09:02 -07:00
Tao Ma	83cab5338f	ocfs2: Jump to correct label in ocfs2_expand_inline_dir() When we fail to insert extent in ocfs2_expand_inline_dir(), we should go to out_commit, not out. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-08-22 11:09:02 -07:00
Mark Fasheh	a1af7d15a1	ocfs2: Fix sleep-with-spinlock recovery regression This fixes a bug introduced with `539d826409`: [PATCH 2/2] ocfs2: Fix race between mount and recovery ocfs2_mark_dead_nodes() was reading journal inodes while holding the spinlock protecting our in-memory recovery state. The fix is very simple - the disk state is protected by a cluster lock that's already held, so we just move the spinlock down past the read. Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-08-22 11:08:38 -07:00
Alexander Beregalov	a57a874b04	[PATCH] ocfs2/cluster/netdebug.c: fix warning ocfs2/cluster/netdebug.c: fix warning fs/ocfs2/cluster/netdebug.c:154: warning: format '%lu' expects type 'long unsigned int', but argument 17 has type 'suseconds_t' Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-08-22 10:56:57 -07:00
Adrian Bunk	18496e80f7	[PATCH] ocfs2/cluster/tcp.c: make some functions static Commit `0f475b2abe` (ocfs2/net: Silence build warnings) made sense as far as it fixed compile warnings, but it was not required that it made the functions global. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-08-22 10:56:40 -07:00
Julia Lawall	c259ae52e2	[PATCH] ocfs2: Release mutex in error handling code The mutex is released on a successful return, so it would seem that it should be released on an error return as well. The semantic patch finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression l; @@ mutex_lock(l); ... when != mutex_unlock(l) when any when strict ( if (...) { ... when != mutex_unlock(l) + mutex_unlock(l); return ...; } \| mutex_unlock(l); ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:14 -07:00
Sunil Mushran	961cecbee6	[PATCH] ocfs2: Fix oops when racing files truncates with writes into an mmap region This patch fixes an oops that is reproduced when one races writes to a mmap-ed region with another process truncating the file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:14 -07:00
Sunil Mushran	539d826409	[PATCH 2/2] ocfs2: Fix race between mount and recovery As the fs recovery is asynchronous, there is a small chance that another node can mount (and thus recover) the slot before the recovery thread gets to it. If this happens, the recovery thread will block indefinitely on the journal/slot lock as that lock will be held for the duration of the mount (by design) by the node assigned to that slot. The solution implemented is to keep track of the journal replays using a recovery generation in the journal inode, which will be incremented by the thread replaying that journal. The recovery thread, before attempting the blocking lock on the journal/slot lock, will compare the generation on disk with what it has cached and skip recovery if it does not match. This bug appears to have been inadvertently introduced during the mount/umount vote removal by mainline commit `34d024f843`. In the mount voting scheme, the messaging would indirectly indicate that the slot was being recovered. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:14 -07:00
Sunil Mushran	c69991aac7	[PATCH 1/2] ocfs2: Add counter in struct ocfs2_dinode to track journal replays This patch renames the ij_pad to ij_recovery_generation in struct ocfs2_dinode. This will be used to keep count of journal replays after an unclean shutdown. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00

1 2 3 4 5 ...

640 Commits