This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.
See mm/filemap.c:
And changes the filemap_write_and_wait() and filemap_write_and_wait_range().
Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
returns error. However, even if filemap_fdatawrite() returned an
error, it may have submitted the partially data pages to the device.
(e.g. in the case of -ENOSPC)
<quotation>
Andrew Morton writes,
If filemap_fdatawrite() returns an error, this might be due to some
I/O problem: dead disk, unplugged cable, etc. Given the generally
crappy quality of the kernel's handling of such exceptions, there's a
good chance that the filemap_fdatawait() will get stuck in D state
forever.
</quotation>
So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.
Trond, could you please review the nfs part? Especially I'm not sure,
nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This exports/changes the sync_page_range/_nolock(). The fatfs needs
sync_page_range/_nolock() for expanding truncate, and changes "size_t count"
to "loff_t count".
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As find_lock_page() already checks with TestSetPageLocked() that page is
locked, there is no need to call lock_page() that will try-lock page again
(chances of page being unlocked in between are small). Call __lock_page()
directly, this saves one atomic operation.
Also, mark truncate-while-slept path as unlikely while we are here.
(akpm: ug. But this is actually a common path for normal old read()s against
a page which is under readahead I/O so ho-hum.)
Signed-off-by: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page. OCFS2 uses this to detect and work around a lock inversion in
its aop methods. There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.
WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Has been introduced for x86-64 at some point to save memory
in struct page, but has been obsolete for some time. Just
remove it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When __generic_file_aio_read() hits an error during reading, it reports the
error iff nothing has successfully been read yet. This is condition - when
an error occurs, if nothing has been read/written, report the error code;
otherwise, report the amount of bytes successfully transferred upto that
point.
This corner case can be exposed by performing readv(2) with the following
iov.
iov[0] = len0 @ ptr0
iov[1] = len1 @ NULL (or any other invalid pointer)
iov[2] = len2 @ ptr2
When file size is enough, performing above readv(2) results in
len0 bytes from file_pos @ ptr0
len2 bytes from file_pos + len0 @ ptr2
And the return value is len0 + len2. Test program is attached to this
mail.
This patch makes __generic_file_aio_read()'s error handling identical to
other functions.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/uio.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv)
{
const char *path;
struct stat stbuf;
size_t len0, len1;
void *buf0, *buf1;
struct iovec iov[3];
int fd, i;
ssize_t ret;
if (argc < 2) {
fprintf(stderr, "Usage: testreadv path (better be a "
"small text file)\n");
return 1;
}
path = argv[1];
if (stat(path, &stbuf) < 0) {
perror("stat");
return 1;
}
len0 = stbuf.st_size / 2;
len1 = stbuf.st_size - len0;
if (!len0 || !len1) {
fprintf(stderr, "Dude, file is too small\n");
return 1;
}
if ((fd = open(path, O_RDONLY)) < 0) {
perror("open");
return 1;
}
if (!(buf0 = malloc(len0)) || !(buf1 = malloc(len1))) {
perror("malloc");
return 1;
}
memset(buf0, 0, len0);
memset(buf1, 0, len1);
iov[0].iov_base = buf0;
iov[0].iov_len = len0;
iov[1].iov_base = NULL;
iov[1].iov_len = len1;
iov[2].iov_base = buf1;
iov[2].iov_len = len1;
printf("vector ");
for (i = 0; i < 3; i++)
printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len);
printf("\n");
ret = readv(fd, iov, 3);
if (ret < 0)
perror("readv");
printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n",
ret, (char *)buf0, (char *)buf1);
return 0;
}
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move EXPORT_SYMBOL(filemap_populate) to the proper place: just after
function itself: it's easy to miss that function is exported otherwise.
Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Updated several references to page_table_lock in common code comments.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Impose a little more consistency on the page fault handlers do_wp_page,
do_swap_page, do_anonymous_page, do_no_page, do_file_page: why not pass their
arguments in the same order, called the same names?
break_cow is all very well, but what it did was inlined elsewhere: easier to
compare if it's brought back into do_wp_page.
do_file_page's fallback to do_no_page dates from a time when we were testing
pte_file by using it wherever possible: currently it's peculiar to nonlinear
vmas, so just check that. BUG_ON if not? Better not, it's probably page
table corruption, so just show the pte: hmm, there's a pte_ERROR macro, let's
use that for do_wp_page's invalid pfn too.
Hah! Someone in the ppc64 world noticed pte_ERROR was unused so removed it:
restored (and say "pud" not "pmd" in its pud_ERROR).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Nick Piggin <npiggin@suse.de>
Give some things static scope.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Either shmem_getpage returns a failure, or it found a page, or it was told
it couldn't do any I/O. So it's useless to check nonblock in the else
branch. We could add a BUG() there but I preferred to comment the
offending function.
This was taken out from one Ingo Molnar's old patch I'm resurrecting.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The idea of a swap_device_lock per device, and a swap_list_lock over them all,
is appealing; but in practice almost every holder of swap_device_lock must
already hold swap_list_lock, which defeats the purpose of the split.
The only exceptions have been swap_duplicate, valid_swaphandles and an
untrodden path in try_to_unuse (plus a few places added in this series).
valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
demand attention. However, with the hold time in get_swap_pages so much
reduced, I've not yet found a load and set of swap device priorities to show
even swap_duplicate benefitting from the split. Certainly the split is mere
overhead in the common case of a single swap device.
So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
(generally we seem to prefer an _ in the name, and not hide in a macro).
If someone can show a regression in swap_duplicate, then probably we should
add a hashlock for the swap_map entries alone (shorts being anatomic), so as
to help the case of the single swap device too.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is the fix for the problem described in
http://bugzilla.kernel.org/show_bug.cgi?id=4721
Basically, problem is generic_file_buffered_write() is accessing beyond end
of the iov[] vector after handling the last vector. If we happen to cross
page boundary, we get a fault.
I think this simple patch is good enough. If we really don't want to
depend on the "count", then we need pass nr_segs to
filemap_set_next_iovec() and decrement it and check it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug on error handling in the direct I/O function.
Currently, if a file is opened with the O_DIRECT|O_SYNC flag, the write()
syscall cannot receive the EIO error after an I/O error (SCSI cable is
disconnected etc.).
Return values of other points that call generic_osync_inode() are treated
appropriately.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- generic_file* file operations do no longer have a xip/non-xip split
- filemap_xip.c implements a new set of fops that require get_xip_page
aop to work proper. all new fops are exported GPL-only (don't like to
see whatever code use those except GPL modules)
- __xip_unmap now uses page_check_address, which is no longer static
in rmap.c, and defined in linux/rmap.h
- mm/filemap.h is now much more clean, plainly having just Linus'
inline funcs moved here from filemap.c
- fix includes in filemap_xip to make it build cleanly on i386
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch removes the f_error field and all checks of f_error.
Trond said:
f_error was introduced for NFS, and made sense when we were guaranteed
always to have a file pointer around when write errors occurred. Since
then, we have (for various reasons) had to introduce the nfs_open_context in
order to track the file read/write state, and it made sense to move our
f_error tracking there too.
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fault_in_pages_readable() is being passed an incorrect `end' address, which
can result in writes accidentally faulting in pages which will not be affected
by the write() call.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I came across the following problem while running ltp-aiodio testcases from
ltp-full-20050405 on linux-2.6.12-rc3-mm3. I tried running the tests with
EXT3 as well as JFS filesystems.
One or two fsx-linux testcases were hung after some time. These testcases
were hanging at wait_for_all_aios().
Debugging shows that there were some iocbs which were not getting completed
eventhough the last retry for those returned -EIOCBQUEUED. Also all such
pending iocbs represented READ operation.
Further debugging revealed that all such iocbs hit EOF in the DIO layer.
To be more precise, the "pos" from which they were trying to read was
greater than the "size" of the file. So the generic_file_direct_IO
returned 0.
This happens rarely as there is already a check in
__generic_file_aio_read(), for whether "pos" < "size" before calling direct
IO routine.
>size = i_size_read(inode);
>if (pos < size) {
> retval = generic_file_direct_IO(READ, iocb,
> iov, pos, nr_segs);
But for READ, we are taking the inode->i_sem only in the DIO layer. So it
is possible that some other process can change the size of the file before
we take the i_sem. In such a case ( when "pos" > "size"), the
__generic_file_aio_read() would return -EIOCBQUEUED even though there were
no I/O requests submitted by the DIO layer. This would cause the AIO layer
to expect aio_complete() for THE iocb, which doesnot happen. And thus the
test hangs forever, waiting for an I/O completion, where there are no
requests submitted at all.
The following patch makes __generic_file_aio_read() return 0 (instead of
returning -EIOCBQUEUED), on getting 0 from generic_file_direct_IO(), so
that the AIO layer does the aio_complete().
Testing:
I have tested the patch on a SMP machine(with 2 Pentium 4 (HT)) running
linux-2.6.12-rc3-mm3. I ran the ltp-aiodio testcases and none of the
fsx-linux tests hung. Also the aio-stress tests ran without any problem.
Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some KernelDoc descriptions are updated to match the current code.
No code changes.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PAGE_BUG - repalce it with BUG and BUG_ON.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses
page_mapping(page). The comments in the patch (the entire patch is the
addition of this comment) try to explain further how and why smp_mb() is
used.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anton Altaparmakov <aia21@cam.ac.uk> points out:
- It calls fault_in_pages_readable() which is completely bogus if @nr_segs >
1. It needs to be replaced by a to be written
"fault_in_pages_readable_iovec()".
- It increments @buf even in the iovec case thus @buf can point to random
memory really quickly (in the iovec case) and then it calls
fault_in_pages_readable() on this random memory.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We will return NULL from filemap_getpage when a page does not exist in the
page cache and MAP_NONBLOCK is specified, here:
page = find_get_page(mapping, pgoff);
if (!page) {
if (nonblock)
return NULL;
goto no_cached_page;
}
But we forget to do so when the page in the cache is not uptodate. The
following could result in a blocking call:
/*
* Ok, found a page in the page cache, now we need to check
* that it's up-to-date.
*/
if (!PageUptodate(page))
goto page_not_uptodate;
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!