Use ARRAY_SIZE macro already defined in kernel.h
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix for inotify read bug (bugzilla.kernel.org #6999)
Problem Description:
When reading from an inotify device with an insufficient sized buffer, read(2)
will return 0 with no errno set. This is because of an logically incorrect
action from the user program thus should return an more logical value. My
suggestion is return -EINVAL as for bind(2).
This patch is based on the proposal from Ryan <wolf0403@hotmail.com>, and
feedback from John McCutchan <john@johnmccutchan.com>.
Return -EINVAL if we have not passed in enough buffer space to read a single
inotify event, rather than 0 which indicates that there is nothing to read.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: "John McCutchan" <john@johnmccutchan.com>
Cc: Ryan <wolf0403@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref. This
reduces list search and lock hold time aswell as getting rid of one of the
few uses of file_list_lock which Ingo identified as a scalability problem.
Previously we called dq_op->initialize for every inode handing of a
writeable file that wasn't initialized before. Now we're calling it for
every inode that has a non-zero i_writecount, aka a writeable file
descriptor refering to it.
Thanks a lot to Jan Kara for running this patch through his quota test
harness.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under
#ifdef CONFIG_QUOTA. Also clean the resulting code up a tiny little bit by
testing sb->dq_op earlier - it's constant over a filesystems lifetime.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is a bugfix to d_path.
First, when d_path() hits a lazily unmounted mount point, it tries to
prepend the name of the lazily unmounted dentry to the path name. It gets
this wrong, and also overwrites the slash that separates the name from the
following pathname component. This is demonstrated by the attached test
case, which prints "getcwd returned d_path-bugsubdir" with the bug. The
correct result would be "getcwd returned d_path-bug/subdir".
It could be argued that the name of the root dentry should not be part of
the result of d_path in the first place. On the other hand, what the
unconnected namespace was once reachable as may provide some useful hints
to users, and so that seems okay.
Second, it isn't always possible to tell from the __d_path result whether
the specified root and rootmnt (i.e., the chroot) was reached: lazy
unmounts of bind mounts will produce a path that does start with a
non-slash so we can tell from that, but other lazy unmounts will produce a
path that starts with a slash, just like "ordinary" paths.
The attached patch cleans up __d_path() to fix the bug with overlapping
pathname components. It also adds a @fail_deleted argument, which allows
to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock
can then also be moved into __d_path(). The patch also makes sure that
paths will only start with a slash for paths which are connected to the
root and rootmnt.
The @fail_deleted argument could be added to d_path() as well: this would
allow callers to recognize deleted files, without having to resort to the
ambiguous check for the " (deleted)" string at the end of the pathnames.
This is not currently done, but it might be worthwhile.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with
just "#ifdef DEBUG".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic
chardev major allocation can hand out majors which LANANA has defined as being
for local/experimental use.
Cc: Torben Mathiasen <device@lanana.org>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tomas Klas <tomas.klas@mepatek.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't hide buffer_unwritten behind buffer_delay() and remove the hack that
clears unexpected buffer_unwritten() states now that it can't happen.
Signed-off-by: Dave Chinner <dgc@sgi.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Timothy Shimmin <tes@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
bufferhead. Recently, I found the long standing mmap/unwritten extent
conversion bug, and it was to do with partial page invalidation not clearing
the unwritten flag from bufferheads attached to the page but beyond EOF. See
here for a full explaination:
http://oss.sgi.com/archives/xfs/2006-12/msg00196.html
The solution I have checked into the XFS dev tree involves duplicating code
from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
and then calling block_invalidatepage() to do the rest.
Christoph suggested that this would be better solved by pushing the unwritten
flag into the common buffer head flags and just adding the call to
discard_buffer():
http://oss.sgi.com/archives/xfs/2006-12/msg00239.html
The following patch makes BH_Unwritten a first class citizen.
Signed-off-by: Dave Chinner <dgc@sgi.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://oss.sgi.com:8090/xfs/xfs-2.6: (33 commits)
[XFS] Don't use kmap in xfs_iozero.
[XFS] Remove a bunch of unused functions from XFS.
[XFS] Remove unused arguments from the XFS_BTREE_*_ADDR macros.
[XFS] Remove unused header files for MAC and CAP checking functionality.
[XFS] Make freeze code a little cleaner.
[XFS] Remove unused argument to xfs_bmap_finish
[XFS] Clean up use of VFS attr flags
[XFS] Remove useless memory barrier
[XFS] XFS sysctl cleanups
[XFS] Fix assertion in xfs_attr_shortform_remove().
[XFS] Fix callers of xfs_iozero() to zero the correct range.
[XFS] Ensure a frozen filesystem has a clean log before writing the dummy
[XFS] Fix sub-block zeroing for buffered writes into unwritten extents.
[XFS] Re-initialize the per-cpu superblock counters after recovery.
[XFS] Fix block reservation changes for non-SMP systems.
[XFS] Fix block reservation mechanism.
[XFS] Make growfs work for amounts greater than 2TB
[XFS] Fix inode log item use-after-free on forced shutdown
[XFS] Fix attr2 corruption with btree data extents
[XFS] Workaround log space issue by increasing XFS_TRANS_PUSH_AIL_RESTARTS
...
They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".
And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jbd function called instead of fs specific one.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the kernel config option ZISOFS_FS, since it appears that the actual
option is simply ZISOFS.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.
Mingming later sent the same patch, saying:
We are running SDET benchmark and saw double free issue for ext3 extended
attributes block, which complains the same xattr block already being freed (in
ext3_xattr_release_block()). The problem could also been triggered by
multiple threads loop untar/rm a kernel tree.
The race is caused by missing a memory barrier at unlock_buffer() before the
lock bit being cleared, resulting in possible concurrent h_refcounter update.
That causes a reference counter leak, then later leads to the double free that
we have seen.
Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
bit is being cleared, however, there is no memory barrier *before* the bit is
cleared. On some arch the h_refcount update instruction and the clear bit
instruction could be reordered, thus leave the critical section re-entered.
The race is like this: For example, if the h_refcount is initialized as 1,
cpu 0: cpu1
-------------------------------------- -----------------------------------
lock_buffer() /* test_and_set_bit */
clear_buffer_locked(bh);
lock_buffer() /* test_and_set_bit */
h_refcount = h_refcount+1; /* = 2*/ h_refcount = h_refcount + 1; /*= 2 */
clear_buffer_locked(bh);
.... ......
We lost a h_refcount here. We need a memory barrier before the buffer head lock
bit being cleared to force the order of the two writes. Please apply.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend the set of "__attribute__" shortcut macros, and remove identical
(and now superfluous) definitions from a couple of source files.
based on a page at robert love's blog:
http://rlove.org/log/2005102601
extend the set of shortcut macros defined in compiler-gcc.h with the
following:
#define __packed __attribute__((packed))
#define __weak __attribute__((weak))
#define __naked __attribute__((naked))
#define __noreturn __attribute__((noreturn))
#define __pure __attribute__((pure))
#define __aligned(x) __attribute__((aligned(x)))
#define __printf(a,b) __attribute__((format(printf,a,b)))
Once these are in place, it's up to subsystem maintainers to decide if they
want to take advantage of them. there is already a strong precedent for
using shortcuts like this in the source tree.
The ones that might give people pause are "__aligned" and "__printf", but
shortcuts for both of those are already in use, and in some ways very
confusingly. note the two very different definitions for a macro named
"ALIGNED":
drivers/net/sgiseeq.c:#define ALIGNED(x) ((((unsigned long)(x)) + 0xf) & ~(0xf))
drivers/scsi/ultrastor.c:#define ALIGNED(x) __attribute__((aligned(x)))
also:
include/acpi/platform/acgcc.h:
#define ACPI_PRINTF_LIKE(c) __attribute__ ((__format__ (__printf__, c, c+1)))
Given the precedent, then, it seems logical to at least standardize on a
consistent set of these macros.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Naming is confusing, ext3_inc_count manipulates i_nlink not i_count
- handle argument passed in is not used
- ext3 and ext4 already call inc_nlink and dec_nlink directly in other places
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is
0. Doing otherwise has the potential to corrupt the orphan inode list,
because we'd wind up with an inode with a non-zero link count on the list,
and it will never get properly cleaned up & removed from the orphan list
before it is freed.
[akpm@osdl.org: build fix]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or
ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a
kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.
This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.
We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured. We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).
Likewise don't set the XATTR_USER flag when built without XATTR support.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.
Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.
Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch to identify AIX disks and ignore them has caused at least one
machine to fail to find the root partition on 2.6.19. The patch is:
http://lkml.org/lkml/2006/7/31/117
The problem is some disk formatters do not blow away the first 4 bytes
of the disk. If the disk we are installing to used to have AIX on it,
then the first 4 bytes will still have IBMA in EBCDIC.
The install in question was debian etch. Im not sure what the best fix
is, perhaps the AIX detection code could check more than the first 4
bytes.
The whole partition info for primary partitions is in this block:
dd if=/dev/sdb count=$(( 4 * 16 )) bs=1 skip=$(( 0x1be ))
All other data do not matter, beside the 0x55aa marker at the end of the
first block.
Signed-off-by: Olaf Hering <olh@suse.de>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().
Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This one was pointed out on the MOKB site:
http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html
If a directory's i_size is corrupted, ext2_find_entry() will keep
processing pages until the i_size is reached, even if there are no more
blocks associated with the directory inode. This patch puts in some
minimal sanity-checking so that we don't keep checking pages (and issuing
errors) if we know there can be no more data to read, based on the block
count of the directory inode.
This is somewhat similar in approach to the ext3 patch I sent earlier this
year.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When igrab() is calling __iget() on an inode it should check if
clear_inode() has been called on the inode already. Otherwise there is a
race window between clear_inode() and destroy_inode() where igrab() calls
__iget() which leads to already free inodes on the inode lists.
Signed-off-by: Vandana Rungta <vandana@novell.com>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I added IS_NOATIME(inode) macro definition in include/linux/fs.h, true if
the inode superblock is marked readonly or noatime.
This new macro is then used in touch_atime() instead of separatly testing
MS_RDONLY and MS_NOATIME
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As pointed out by Hugh, ramfs would also benefit from using the new
set_page_dirty aop method for memory backed file systems.
Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Values are available via ZVC sums.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some partitioning systems create special partitions that
span the entire disk. One example are Sun partitions, and
this whole-disk partition exists to tell the firmware the
extent of the entire device so it can load the boot block
and do other things.
Such partitions should not be treated as normal partitions,
because all the other partitions overlap this whole-disk one.
So we'd see multiple instances of the same UUID etc. which
we do not want. udev and friends can thus search for this
'whole_disk' attribute and use it to decide to ignore the
partition.
Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kmap() is inefficient and does not scale well. kmap_atomic() is a better
choice. Use the generic wrapper function instead of open coding the
kmap-memset-dcache flush-kunmap stuff.
SGI-PV: 960904
SGI-Modid: xfs-linux-melb:xfs-kern:28041a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Patch provided by Eric Sandeen (sandeen@sandeen.net).
SGI-PV: 960897
SGI-Modid: xfs-linux-melb:xfs-kern:28038a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
It makes it incrementally clearer to read the code when the top of a macro
spaghetti-pile only receives the 3 arguments it uses, rather than 2 extra
ones which are not used. Also when you start pulling this thread out of
the sweater (i.e. remove unused args from XFS_BTREE_*_ADDR), a couple
other third arms etc fall off too. If they're not used in the macro, then
they sometimes don't need to be passed to the function calling the macro
either, etc....
Patch provided by Eric Sandeen (sandeen@sandeen.net).
SGI-PV: 960197
SGI-Modid: xfs-linux-melb:xfs-kern:28037a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
xfs_mac.h and xfs_cap.h provide definitions and macros that aren't used
anywhere in XFS at all. They are left-overs from "to be implement at some
point in the future" functionality that Irix XFS has. If this
functionality ever goes into Linux, it will be provided at a different
layer, most likely through the security hooks in the kernel so we will
never need this functionality in XFS.
Patch provided by Eric Sandeen (sandeen@sandeen.net).
SGI-PV: 960895
SGI-Modid: xfs-linux-melb:xfs-kern:28036a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Fixes a few small issues (mostly cosmetic) that were picked up during the
review cycle for the last set of freeze path changes.
SGI-PV: 959267
SGI-Modid: xfs-linux-melb:xfs-kern:28035a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The firstblock argument to xfs_bmap_finish is not used by that function.
Remove it and cleanup the code a bit.
Patch provided by Eric Sandeen.
SGI-PV: 960196
SGI-Modid: xfs-linux-melb:xfs-kern:28034a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Use the the generic VFS attr flags where appropriate instead of open
coding them to the same values.
Patch provided by Eric Sandeen.
SGI-PV: 960868
SGI-Modid: xfs-linux-melb:xfs-kern:28033a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
wake_up's implementation does an implicit memory barrier so the explicit
memory barrier is not needed in vfs_sync_worker.
Patch provided by Ralf Baechle.
SGI-PV: 960867
SGI-Modid: xfs-linux-melb:xfs-kern:28032a
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Removes unneeded sysctl insert at head behaviour. Cleans up sysctl
definitions to use C99 initialisers. Patch provided by Eric W. Biederman.
SGI-PV: 960192
SGI-Modid: xfs-linux-melb:xfs-kern:28031a
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The problem is the two callers of xfs_iozero() are rounding out the range
to be zeroed to the end of a fsb and in some cases this extends past the
new eof. The call to commit_write() in xfs_iozero() will cause the Linux
inode's file size to be set too high.
SGI-PV: 960788
SGI-Modid: xfs-linux-melb:xfs-kern:28013a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
record.
The current Linux XFS freeze code is a mess. We flush the metadata buffers
out while we are still allowing new transactions to start and then fail to
flush the dirty buffers back out before writing the unmount and dummy
records to the log.
This leads to problems when the frozen filesystem is used for snapshots -
we do log recovery on a readonly image and often it appears that the log
image in the snapshot is not correct. Hence we end up with hangs, oops and
mount failures when trying to mount a snapshot image that has been created
when the filesystem has not been correctly frozen.
To fix this, we need to move th metadata flush to after we wait for all
current transactions to complete in teh second stage of the freeze. This
means that when we write the final log records, the log should be clean
and recovery should never occur on a snapshot image created from a frozen
filesystem.
SGI-PV: 959267
SGI-Modid: xfs-linux-melb:xfs-kern:28010a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
When writing less than a filesystem block of data into an unwritten extent
via buffered I/O, __xfs_get_blocks fails to set the buffer new flag. As a
result, the generic code will not zero either edge of the block resulting
in garbage being written to disk either side of the real data. Set the
buffer new state on bufferd writes to unwritten extents to ensure that
zeroing occurs.
SGI-PV: 960328
SGI-Modid: xfs-linux-melb:xfs-kern:28000a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
After filesystem recovery the superblock is re-read to bring in any
changes. If the per-cpu superblock counters are not re-initialized from
the superblock then the next time the per-cpu counters are disabled they
might overwrite the global counter with a bogus value.
SGI-PV: 957348
SGI-Modid: xfs-linux-melb:xfs-kern:27999a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
SGI-PV: 956323
SGI-Modid: xfs-linux-melb:xfs-kern:27940a
Signed-off-by: Kevin Jamieson <kjamieson@bycast.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The block reservation mechanism has been broken since the per-cpu
superblock counters were introduced. Make the block reservation code work
with the per-cpu counters by syncing the counters, snapshotting the amount
of available space and then doing a modifcation of the counter state
according to the result. Continue in a loop until we either have no space
available or we reserve some space.
SGI-PV: 956323
SGI-Modid: xfs-linux-melb:xfs-kern:27895a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The free block modification code has a 32bit interface, limiting the size
the filesystem can be grown even on 64 bit machines. On 32 bit machines,
there are other 32bit variables in transaction structures and interfaces
that need to be expanded to allow this to work.
SGI-PV: 959978
SGI-Modid: xfs-linux-melb:xfs-kern:27894a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
functions, but they
a) ignore the flags parameter completely, and b) are never called
directly, only via the flag-less defines anyway
So, drop the #define indirection, and rename mraccessf to mraccess, etc.
SGI-PV: 959138
SGI-Modid: xfs-linux-melb:xfs-kern:27711a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The existing per-cpu superblock counter code uses the global superblock
spin lock when we approach ENOSPC for global synchronisation. On larger
machines than this code was originally tested on this can still get
catastrophic spinlock contention due increasing rebalance frequency near
ENOSPC.
By introducing a sleeping lock that is used to serialise balances and
modifications near ENOSPC we prevent contention from needlessly from
wasting the CPU time of potentially hundreds of CPUs.
To reduce the number of balances occuring, we separate the need rebalance
case from the slow allocate case. Now, a counter running dry will trigger
a rebalance during which counters are disabled. Any thread that sees a
disabled counter enters a different path where it waits on the new mutex.
When it gets the new mutex, it checks if the counter is disabled. If the
counter is disabled, then we _know_ that we have to use the global counter
and lock and it is safe to do so immediately. Otherwise, we drop the mutex
and go back to trying the per-cpu counters which we know were re-enabled.
SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27612a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
gcc-4.1 and more recent aggressively inline static functions which
increases XFS stack usage by ~15% in critical paths. Prevent this from
occurring by adding noinline to the STATIC definition.
Also uninline some functions that are too large to be inlined and were
causing problems with CONFIG_FORCED_INLINING=y.
Finally, clean up all the different users of inline, __inline and
__inline__ and put them under one STATIC_INLINE macro. For debug kernels
the STATIC_INLINE macro uninlines those functions.
SGI-PV: 957159
SGI-Modid: xfs-linux-melb:xfs-kern:27585a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The {test,set,clear}_bit() operations take a bit index for the bit to
operate on. The XBT_* flags are defined as bit fields which is incorrect,
not to mention the way the bit fields are enumerated is broken too. This
was only working by chance.
Fix the definitions of the flags and make the code using them use the
{test,set,clear}_bit() operations correctly.
SGI-PV: 958639
SGI-Modid: xfs-linux-melb:xfs-kern:27565a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The message buffer used by cmn_err() is only 256 bytes and some CXFS
messages were exceeding this length. Since we were using vsprintf() and
not checking for buffer overruns we were clobbering memory beyond the
buffer. The size of the buffer has been increased to 1024 bytes so we can
capture these larger messages and we are now using vsnprintf() to prevent
overrunning the buffer size.
SGI-PV: 958599
SGI-Modid: xfs-linux-melb:xfs-kern:27561a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Geoffrey Wehrman <gwehrman@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
At the last stage of a freeze, we flush the buftarg synchronously over and
over again until it succeeds twice without skipping any buffers.
The delwri list flush skips pinned buffers, but tries to flush all others.
It removes the buffers from the delwri list, then tries to lock them one
at a time as it traverses the list to issue the I/O. It holds them locked
until we issue all of the I/O and then unlocks them once we've waited for
it to complete.
The problem is that during a freeze, the filesystem may still be doing
stuff - like flushing delalloc data buffers - in the background and hence
we can be trying to lock buffers that were on the delwri list at the same
time. Hence we can get ABBA deadlocks between threads doing allocation and
the buftarg flush (freeze) thread.
Fix it by skipping locked (and pinned) buffers as we traverse the delwri
buffer list.
SGI-PV: 957195
SGI-Modid: xfs-linux-melb:xfs-kern:27535a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The XFS quiet mount logic was inverted making quiet mounts noisy and vice
versa. Fix it.
SGI-PV: 958469
SGI-Modid: xfs-linux-melb:xfs-kern:27520a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
remap() the region we get from mmap() to mark the fact that we are
using all of the available slack space. Any slack space is used
to form a simple brk region, and potentially more stack space than
requested at load time.
Any searches of the vma chain may well fail looking for
stack (and especially arg) addresses if the remaping is not done.
The simplest example is /proc/<pid>/cmdline, since the args
are pretty much always at the top of the data/bss/stack region.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a double free of "dfid" introduced by commit
da977b2c7e and spotted by the Coverity
checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__unmap_hugepage_range() is buggy that it does not preserve dirty state of
huge_pte when unmapping hugepage range. It causes data corruption in the
event of dop_caches being used by sys admin. For example, an application
creates a hugetlb file, modify pages, then unmap it. While leaving the
hugetlb file alive, comes along sys admin doing a "echo 3 >
/proc/sys/vm/drop_caches".
drop_pagecache_sb() will happily free all pages that aren't marked dirty if
there are no active mapping. Later when application remaps the hugetlb
file back and all data are gone, triggering catastrophic flip over on
application.
Not only that, the internal resv_huge_pages count will also get all messed
up. Fix it up by marking page dirty appropriately.
Signed-off-by: Ken Chen <kenchen@google.com>
Cc: "Nish Aravamudan" <nish.aravamudan@gmail.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: <stable@kernel.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a fix of regression, which triggered by ~2.6.16.
Patch with name ufs-directory-and-page-cache-from-blocks-to-pages.patch: in
additional to conversation from block to page cache mechanism added new
checks of directory integrity, one of them that directory entry do not
across directory chunks.
But some kinds of UFS: OpenStep UFS and Apple UFS (looks like these are the
same filesystems) have different directory chunk size, then common
UFSes(BSD and Solaris UFS).
So this patch adds ability to works with variable size of directory chunks,
and set it for ufstype=openstep to right size.
Tested on darwin ufs.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2: (22 commits)
configfs: Zero terminate data in configfs attribute writes.
[PATCH] ocfs2 heartbeat: clean up bio submission code
ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
[PATCH] ocfs2: drop INET from Kconfig, not needed
ocfs2_dlm: Add timeout to dlm join domain
ocfs2_dlm: Silence some messages during join domain
ocfs2_dlm: disallow a domain join if node maps mismatch
ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres
ocfs2: Binds listener to the configured ip address
ocfs2_dlm: Calling post handler function in assert master handler
ocfs2: Added post handler callable function in o2net message handler
ocfs2_dlm: Cookies in locks not being printed correctly in error messages
ocfs2_dlm: Silence a failed convert
ocfs2_dlm: wake up sleepers on the lockres waitqueue
ocfs2_dlm: Dlm dispatch was stopping too early
ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres
ocfs2_dlm: Flush dlm workqueue before starting to migrate
ocfs2_dlm: Fix migrate lockres handler queue scanning
ocfs2_dlm: Make dlmunlock() wait for migration to complete
ocfs2_dlm: Fixes race between migrate and dirty
...
Attributes in configfs are text files. As such, most handlers expect to be
able to call functions like simple_strtoul() without checking the bounds
of the buffer. Change the call to zero terminate the buffer before calling
the client's ->store() method. This does reduce the attribute size from
PAGE_SIZE to PAGE_SIZE-1.
Also, change get_zeroed_page() to alloc_page(), as we are handling the
termination.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
predictable manner.
That is not true, especially for devices with own merge_bvec_fn().
Therefore OCFS2's heartbeat code is very likely to fail on such devices.
Move the bio_put() call into the bio's bi_end_io() function. This makes the
whole idea of trying to predict the behaviour of bio_add_page() unnecessary.
Removed compute_max_sectors() and o2hb_compute_request_limits().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
When there is a lot of multithreaded I/O usage, two threads can collide
while sending out a message to the other nodes. This is due to the lack of
locking between threads while sending out the messages.
When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
itself by acquiring a lock at invocation by calling lock_sock().
tcp_sendmsg() then loops over the buffers in the iovec, allocating
associated sk_buff's and cache pages for use in the actual send. As it does
so, it pushes the data out to tcp for actual transmission. However, if one
of those allocation fails (because a large number of large sends is being
processed, for example), it must wait for memory to become available. It
does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
contains the call to release_sock().
The following patch adds a lock to the socket container in order to
properly serialize outbound requests.
From: Zhen Wei <zwei@novell.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
OCFS2: drop 'depends on INET' since local mounts are now allowed.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Currently the ocfs2 dlm has no timeout during dlm join domain. While this is
not a problem in normal operation, this does become an issue if, say, the
other node is refusing to let the node join the domain because of a stuck
recovery. This patch adds a 90 sec timeout.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
These messages can easily be activated using the mlog infrastructure
and don't need to be enabled by default.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
There is a small window where a joining node may not see the node(s) that
just died but are still part of the domain. To fix this, we must disallow
join requests if the joining node has a different node map.
A new field node_map is added to dlm_query_join_request to send the current
nodes nodemap along with join request. On the receiving end the nodes that
are part of the cluster verifies if this new node sees all the nodes that
are still part of the cluster. They disallow the join if the maps mismatch.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Eventhough the set refmap bit message is sent before the clear refmap
message, currently there is no guarentee that the set message will be
handled before the clear. This patch prevents the clear refmap to be
processed while the node is sending assert master messages to other
nodes. (The set refmap message is sent as a response to the assert
master request).
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch binds the o2net listener to the configured ip address
instead of INADDR_ANY for security. Fixes oss.oracle.com bugzilla#814.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch prevents the dlm from sending the clear refmap message
before the set refmap. We use the newly created post function handler
routine to accomplish the task.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Currently o2net allows one handler function per message type. This
patch adds the ability to call another function to be called after
the handler has returned the message to the other node.
Handlers are now given the option of returning a context (in the form of a
void **) which will be passed back into the post message handler function.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The dlm encodes the node number and a sequence number in the lock cookie.
It also stores the cookie in the lockres in the big endian format to avoid
swapping 8 bytes on each lock request. The bug here was that it was assuming
the cookie to be in the cpu format when decoding it for printing the error
message. This patch swaps the bytes before the print.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
When the lockres is in migrate or recovery state, all convert requests
are denied with the appropriate error status that is handled on the
requester node. This patch silences the erroneous error message printed
on the master node.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The dlm was not waking up threads waiting on the lockres wait queue,
waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS
and the DLM_LOCK_RES_MIGRATING states.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
dlm_dispatch_work was not processing the queued up tasks at
the first sign of the node leaving the domain leading to not
only incompleted tasks but also a mismatch in the dlm refcnt.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This is to prevent the condition in which a previously queued
up assert master asserts after we start the migration. Now
migration ensures the workqueue is flushed before proceeding
with migrating the lock to another node. This condition is
typically encountered during parallel umounts.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The migrate lockres handler was only searching for its lock on
migrated lockres on the expected queue. This could be problematic
as the new master could have also issued a convert request
during the migration and thus moved the lock to the convert queue.
We now search for the lock on all three queues.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
dlmunlock() was not waiting for migration to complete before releasing locks
on locally mastered locks.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
dlmthread was removing lockres' from the dirty list
and resetting the dirty flag before shuffling the list.
This patch retains the dirty state flag until the lists
are shuffled.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch makes some needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This was previously broken and migration of some locks had to be temporarily
disabled. We use a new (and backward-incompatible) set of network messages
to account for all references to a lock resources held across the cluster.
once these are all freed, the master node may then free the lock resource
memory once its local references are dropped.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The problem. When implementing a network namespace I need to be able
to have multiple network devices with the same name. Currently this
is a problem for /sys/class/net/*.
What I want is a separate /sys/class/net directory in sysfs for each
network namespace, and I want to name each of them /sys/class/net.
I looked and the VFS actually allows that. All that is needed is
for /sys/class/net to implement a follow link method to redirect
lookups to the real directory you want.
Implementing a follow link method that is sensitive to the current
network namespace turns out to be 3 lines of code so it looks like a
clean approach. Modifying sysfs so it doesn't get in my was is a bit
trickier.
I am calling the concept of multiple directories all at the same path
in the filesystem shadow directories. With the directory entry really
at that location the shadow master.
The following patch modifies sysfs so it can handle a directory
structure slightly different from the kobject tree so I can implement
the shadow directories for handling /sys/class/net/.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
if a driver returns an error in fill_read_buffer(), the buffer will be
marked as filled. Subsequent reads will return eof. But there is
no data because of an error, not because it has been read.
Not marking the buffer filled is the obvious fix.
Signed-off-by: Oliver Neukum <oliver@neukum.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>