On huge machines, delayed allocation may try to allocate massive extents.
This change allows btrfs_alloc_extent to return something smaller than
the caller asked for, and the data allocation routines will loop over
the allocations until it fills the whole delayed alloc.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This adds basic O_DIRECT read and write support. In the write case, we
just do a normal buffered write followed by a cache flush. O_DIRECT +
O_SYNC are required to trigger metadata syncs.
In the read case, there is a basic btrfs_get_block call for use by
the generic O_DIRECT code. This does honor multi-volume mapping rules
but it skips all checksumming.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Before it was done by the bio end_io routine, the work queue code is able
to scale much better with faster IO subsystems.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we checkum file data during writepage, the checksumming is done one
page at a time, making it difficult to do bulk metadata modifications
to insert checksums for large ranges of the file at once.
This patch changes btrfs to checksum on a per-bio basis instead. The
bios are checksummed before they are handed off to the block layer, so
each bio is contiguous and only has pages from the same inode.
Checksumming on a bio basis allows us to insert and modify the file
checksum items in large groups. It also allows the checksumming to
be done more easily by async worker threads.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Now that delayed allocation accounting works, i_blocks accounting is changed
to only modify i_blocks when extents inserted or removed.
The fillattr call is changed to include the delayed allocation byte count
in the i_blocks result.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The end_bio routines are changed to take a pointer to the extent state
struct, and the state tree is walked in order to set/clear appropriate
bits as IO completes. This greatly reduces the number of rbtree searches
done by the end_bio handlers, and reduces lock contention.
The extent_io releasepage function is changed to avoid expensive searches
for locked state.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There is now extent_map for mapping offsets in the file to disk and
extent_io for state tracking, IO submission and extent_bufers.
The new extent_map code shifts from [start,end] pairs to [start,len], and
pushes the locking out into the caller. This allows a few performance
optimizations and is easier to use.
A number of extent_map usage bugs were fixed, mostly with failing
to remove extent_map entries when changing the file.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There were a few places that could cause duplicate extent insertion,
this adjusts the code that creates holes to avoid it.
lookup_extent_map is changed to correctly return all of the extents in a
range, even when there are none matching at the start of the range.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch adds readonly inode flag support. A file with this flag
can't be modified, but can be deleted.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
While shrinking the FS, the allocation functions need to make sure
they don't try to allocate bytes past the end of the FS.
nodatacow needed an extra check to force cows when the existing extents are
past the end of the FS.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It is very difficult to create a consistent snapshot of the btree when
other writers may update the btree before the commit is done.
This changes the snapshot creation to happen during the commit, while
no other updates are possible.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This forces file data extents down the disk along with the metadata that
references them. The current implementation is fairly simple, and just
writes out all of the dirty pages in an inode before the commit.
Signed-off-by: Chris Mason <chris.mason@oracle.com>