When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.
This actually should enable us to remove the verify_chain().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
The unused code found in ext3_find_entry() is also present (and still
unused) in the ext4_find_entry() code. This patch removes it.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a new block bitmap is read from disk in read_block_bitmap()
there are a few bits that should ALWAYS be set. In particular,
the blocks given corresponding to block bitmap, inode bitmap and inode tables.
Validate the block bitmap against these blocks.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Add buffer head related helper function bh_uptodate_or_lock and
bh_submit_read which can be used by file system
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
ext4 file system was by default ignoring errors and continuing. This
is not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system
readonly. Debian and ubuntu already does this as the default in their
fstab.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
When mounting an ext4 filesystem with corrupted s_first_data_block, things
can go very wrong and oops.
Because blocks_count in ext4_fill_super is a u64, and we must use do_div,
the calculation of db_count is done differently than on ext4. If
first_data_block is corrupted such that it is larger than ext4_blocks_count,
for example, then the intermediate blocks_count value may go negative,
but sign-extend to a very large value:
blocks_count = (ext4_blocks_count(es) -
le32_to_cpu(es->s_first_data_block) +
EXT4_BLOCKS_PER_GROUP(sb) - 1);
This is then assigned to s_groups_count which is an unsigned long:
sbi->s_groups_count = blocks_count;
This may result in a value of 0xFFFFFFFF which is then used to compute
db_count:
db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
EXT4_DESC_PER_BLOCK(sb);
and in this case db_count will wind up as 0 because the addition overflows
32 bits. This in turn causes the kmalloc for group_desc to be of 0 size:
sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *),
GFP_KERNEL);
and eventually in ext4_check_descriptors, dereferencing
sbi->s_group_desc[desc_block] will result in a NULL pointer dereference.
The simplest test seems to be to sanity check s_first_data_block,
EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure
their combination won't result in a bad intermediate value for
blocks_count. We could just check for db_count == 0, but
catching it at the root cause seems like it provides more info.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
This fix some instances where we were continuing after calling
ext4_error. ext4_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext4_error call
Reported by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This patch extends bg_itable_unused of ext4 group descriptor
from 16bit into 32bit. In order to add bg_itable_unused_hi into
struct ext4_group_desc, some extra fields which are already introduced into
e2fsprogs are also added in for consistency.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
The max file size for ext3 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
The max file size for ext2 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.
The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.
inode flag EXT4_HUGE_FILE_FL
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.
We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.
Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This patch fixes various places where the group number is set to a negative
value.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB
This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
There are many casts in extents.c which are not needed,
as the variables are already the type of the cast, or
are being promoted for no particular reason in printk's.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.
This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32
Also remove dead code ext4_ext_walk_space
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0xffff instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code. The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem. The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0xffff to handle this.
The Patch-set consists of the following 2 patches.
[1/2] ext4: enlarge blocksize
- Allow blocksize up to pagesize
[2/2] ext4: fix rec_len overflow
- prevent rec_len from overflow with 64KB blocksize
Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
commit c60b717879 "kset: convert ocfs2 to
use kset_create". Specifically, the '/sys/o2cb' kset was moved to
'/sys/fs/o2cb'. This breaks all ocfs2 tools and renders the
filesystem unmountable.
This fix moves '/sys/o2cb' back where it belongs.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
If an entry is being deleted because it has only one reference,
we immediately delete it and blindly register the rcu handler for it,
This results in oops by double freeing that object.
This patch fixes it by consolidating the code paths for the deletion;
let its rcu handler delete the object if it has no more reference.
Bug was found by Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a bunch of warnings in PPP and related drivers. Mostly because
sparse doesn't like it when the the function is only marked private in
the forward declaration.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since fib_route_seq_show now uses hlist_for_each_entry(), the leaf
info can not be NULL.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nla_nest_start/nla_nest_end for dumping nested attributes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_parse() returns more detailed errno codes, propagate them back on
error.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert open-coded nlmsg_parse to use the real function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix two invalid attribute accesses, indices start at 1 with the new
netlink API.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace open coded equivalent of nla_parse_nested_compat().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix format string warning introduces by the netlink API conversion:
net/sched/sch_atm.c:250: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.
Other protocols will follow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>