Use the special_file() macro to check whether an inode is special instead of
open-coding it.
Acked-by: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Phillip Hellewell <phillip@hellewell.homeip.net>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the "if (extlen)" case, "bh" was used uninitialized.
This patch changes the code to what seems to have been intended.
Spotted by the Coverity checker.
This patch also removes a pointless "bh = NULL" asignment (the variable
is never accessed again after this point).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Don't dereference new->s_root when we do know it's NULL.
Spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In the "if (extlen)" case, "new" might be used uninitialized.
Looking at the code, it should be initialized to 0.
Spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The Coverity checker spotted this obviously dead code.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fix means that bmap will map extents of the length requested
by the VFS rather than guessing at it, or just mapping one block
at a time. The other callers of gfs2_block_map are audited to ensure
they send the correct max extent lengths (i.e. set bh->b_size correctly).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I didn't spot that the msg_iovlen was set to 2 if there
were two elements in the iovec but left at zero if not :(
I think this might be why bob was still seeing trouble.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Originally from Mark Fasheh <mark.fasheh@oracle.com>
generic_file_splice_write() does not remove S_ISUID or S_ISGID. This is
inconsistent with the way we generally write to files.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This allows file systems to manage their own i_mutex locking while
still re-using the generic_file_splice_write() logic.
OCFS2 in particular wants this so that it can order cluster locks within
i_mutex.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.
The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
And the obsolete comment should be updated (or totally removed).
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Following function can drops d_count twice against one reference
by lookup_one_len.
<SOURCE>
/**
* sysfs_update_file - update the modified timestamp on an object attribute.
* @kobj: object we're acting for.
* @attr: attribute descriptor.
*/
int sysfs_update_file(struct kobject * kobj, const struct attribute * attr)
{
struct dentry * dir = kobj->dentry;
struct dentry * victim;
int res = -ENOENT;
mutex_lock(&dir->d_inode->i_mutex);
victim = lookup_one_len(attr->name, dir, strlen(attr->name));
if (!IS_ERR(victim)) {
/* make sure dentry is really there */
if (victim->d_inode &&
(victim->d_parent->d_inode == dir->d_inode)) {
victim->d_inode->i_mtime = CURRENT_TIME;
fsnotify_modify(victim);
/**
* Drop reference from initial sysfs_get_dentry().
*/
dput(victim);
res = 0;
} else
d_drop(victim);
/**
* Drop the reference acquired from sysfs_get_dentry() above.
*/
dput(victim);
}
mutex_unlock(&dir->d_inode->i_mutex);
return res;
}
</SOURCE>
PCI-hotplug (drivers/pci/hotplug/pci_hotplug_core.c) is only user of
this function. I confirmed that dentry of /sys/bus/pci/slots/XXX/*
have negative d_count value.
This patch removes unnecessary dput().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Handle errors thrown in disk_sysfs_symlinks(), and propagate back to
caller.
The callers and associated functions don't do a real good job of handling
kobject errors anyway (add_partition, register_disk, rescan_partitions), so
this should do until something better comes along.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When IO error happens on metadata buffer, buffer is freed from memory and
later fsync() is called, filesystems like ext2 fail to report EIO. We
solve the problem by introducing a pointer to associated address space into
the buffer_head. When a buffer is removed from a list of metadata buffers
associated with an address space, IO error is transferred from the buffer to
the address space, so that fsync can later report it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is possible for the ->fopen callback from lockd into nfsd to find that an
answer cannot be given straight away (an upcall is needed) and so the request
has to be 'dropped', to be retried later. That error status is not currently
propagated back.
So:
Change nlm_fopen to return nlm error codes (rather than a private
protocol) and define a new nlm_drop_reply code.
Cause nlm_drop_reply to cause the rpc request to get rpc_drop_reply
when this error comes back.
Cause svc_process to drop a request which returns a status of
rpc_drop_reply.
[akpm@osdl.org: fix warning storm]
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When an nfs server shuts down, lockd needs to release all the locks even
though the client still holds them.
It should therefore not 'unmonitor' the clients, so that the files in nfs/sm
will still be there when the nfs server restarts, so that those clients will
be told to reclaim their locks.
However the hosts are fully unmonitored, so statd may well remove the files.
lockd has a test for 'sm_sticky' and avoid the unmonitor call if it is set,
but it is currently not set.
So set it when tearing down lockd.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Coverity noticed that the error handling code in the NFSv4 callback client
sets cb->cb_client to NULL, then calls rpc_shutdown_client with the NULL
pointer.
Coverity: #cid 1397
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We weren't actually checking for SHARE_ACCESS_WRITE, with the result that the
owner could open a non-writeable file for write!
Continue to allow DENY_WRITE only with write access.
Thanks to Jim Rees for reporting the bug.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a client creates a file using an open which sets the mode to 000, or if a
chmod changes permissions after a file is opened, then situations may arise
where an NFS client knows that some IO is permitted (because a process holds
the file open), but the NFS server does not (because it doesn't know about the
open, and only sees that the IO conflicts with the current mode of the file).
As a hack to solve this problem, NFS servers normally allow the owner to
override permissions on IO. The client can still enforce correct
permissions-checking on open by performing an explicit access check.
In NFSv4 the client can rely on the explicit on-the-wire open instead of an
access check.
Therefore we should not be allowing the owner to override permissions on an
over-the-wire open!
However, we should still allow the owner to override permissions in the case
where the client is claiming an open that it already made either before a
reboot, or while it was holding a delegation.
Thanks to Jim Rees for reporting the bug.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no locking for ->d_revalidate, so fuse_dentry_revalidate() should use
dget_parent() instead of simply dereferencing ->d_parent.
Due to topology changes in the directory tree the parent could become negative
or be destroyed while being used. There hasn't been any reports about this
yet.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fuse considered it an error (EIO) if lookup returned a directory inode, to
which a dentry already refered. This is because directory aliases are not
allowed.
But in a network filesystem this could happen legitimately, if a directory is
moved on a remote client. This patch attempts to relax the restriction by
trying to first evict the offending alias from the cache. If this fails, it
still returns an error (EBUSY).
A rarer situation is if an mkdir races with an indenpendent lookup, which
finds the newly created directory already moved. In this situation the mkdir
should return success, but that would be incorrect, since the dentry cannot be
instantiated, so return EBUSY.
Previously checking for a directory alias and instantiation of the dentry
weren't done atomically in lookup/mkdir, hence two such calls racing with each
other could create aliased directories. To prevent this introduce a new
per-connection mutex: fuse_conn->inst_mutex, which is taken for instantiations
with a directory inode.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a spurious BUG in an unlikely race, where at least three parallel lookups
return the same inode, but with different file type. This has not yet been
observed in real life.
Allowing unlimited retries could delay fuse_iget() indefinitely, but this is
really for the broken userspace filesystem to worry about.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An inode could be returned by independent parallel lookups, in this case an
update of the lookup counter could be lost resulting in a memory leak in
userspace.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fuse didn't always call i_size_write() with i_mutex held which caused rare
hangs on SMP/32bit. This bug has been present since fuse-2.2, well before
being merged into mainline.
The simplest solution is to protect i_size_write() with the per-connection
spinlock. Using i_mutex for this purpose would require some restructuring of
the code and I'm not even sure it's always safe to acquire i_mutex in all
places i_size needs to be set.
Since most of vmtruncate is already duplicated for other reasons, duplicate
the remaining part as well, making all i_size_write() calls internal to fuse.
Using i_size_write() was unnecessary in fuse_init_inode(), since this function
is only called on a newly created locked inode.
Reported by a few people over the years, but special thanks to Dana Henriksen
who was persistent enough in helping me debug it.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Actually, the decimal representation of a 32-bit signed number can take 12
bytes, including the \0.
And then some code adds a \n as well, so let's give it 13 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is Eric Sesterhenn's jbd patch applied to jbd2.
Commit: 41716c7c21
His words:
Since commit d1807793e1 we dereference a NULL
pointer. Coverity id #1432. We set journal to NULL, and use it directly
afterwards.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is silly to use non-static variable for writting zeroes to the file.
And more seriously, foffset in core dump file dump function was incremented
too much, so some parts of core dump were shifted by size of few phdrs and
notes down, so although gdb was able to load that file, it did not make lot
of sense - in my test case data pages were shifted down by about 900 bytes.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* missing cpu_to_le64() for ChangeTime (introduced by
[CIFS] Legacy time handling for Win9x and OS/2 part 1)
* missing le16_to_cpu() for DialectIndex (introduced by
[CIFS] Do not send newer QFSInfo to legacy servers which can not support it)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diRead and diWrite are representing the page number as an unsigned int.
This causes file system corruption on volumes larger than 16TB.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
[GFS2] Update git tree name/location
[DLM] fix iovec length in recvmsg
[GFS2] Pass the correct value to kunmap_atomic
[GFS2] Fix bug where lock not held
[DLM] Kconfig: don't show an empty DLM menu
[GFS2] Fix uninitialised variable
[GFS2] Fix a size calculation error
The file based core dump code was broken by pipe changes - a relative
llseek returns the absolute file position on success, not the relative
one, so dump_seek() always failed when invoked with non-zero current
position.
Only success/failure can be tested with relative lseek, we have to trust
kernel that on success we've got right file offset. With this fix in
place I have finally real core files instead of 1KB fragments...
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
[ Cleaned it up a bit while here - use SEEK_CUR instead of hardcoding 1 ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (27 commits)
[CIFS] Missing flags2 for DFS
[CIFS] Workaround incomplete byte length returned by some
[CIFS] cifs Kconfig: don't select CONNECTOR
[CIFS] Level 1 QPathInfo needed for proper OS2 support
[CIFS] fix typo in previous patch
[CIFS] Fix old DOS time conversion to handle timezone
[CIFS] Do not need to adjust for Jan/Feb for leap day
[CIFS] Fix leaps year calculation for years after 2100
[CIFS] readdir (ffirst) enablement of accurate timestamps from legacy servers
[CIFS] Fix compiler warning with previous patch
[CIFS] Fix typo
[CIFS] Allow for 15 minute TZs (e.g. Nepal) and be more explicit about
[CIFS] Fix readdir of large directories for backlevel servers
[CIFS] Allow LANMAN21 support even in both POSIX non-POSIX path
[CIFS] Make use of newer QFSInfo dependent on capability bit instead of
[CIFS] Do not send newer QFSInfo to legacy servers which can not support it
[CIFS] Fix typo in name of new cifs_show_stats
[CIFS] Rename server time zone field
[CIFS] Handle legacy servers which return undefined time zone
[CIFS] CIFS support for /proc/<pid>/mountstats part 1
...
Manual conflict resolution in fs/cifs/connect.c
The DLM always passes the iovec length as 1, this is wrong when the circular
buffer wraps round.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Pass kaddr rather than (incorrect) struct page to kunmap_atomic.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The log lock needs to be held when manipulating the counter
for the number of free journal blocks.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Don't show an empty "Distributed Lock Manager" menu if IP_SCTP=n.
Reported by Dmytro Bagrii in kernel Bugzilla #7268.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes a bug where, in certain cases an uninitialised variable
could cause a dereference of a NULL pointer in gfs2_commit_write().
Also a typo in a comment is fixed at the same time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a size calculation error.
The size was incorrect being computed as a
negative length and then being passed to an
unsigned parameter.
This in turn would cause the allocator to
think it needed enough meta data to store
a gigabyte file for every file created.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
`select' is a bit obnoxious: the option keeps on coming back
and it's hard to work out what to do to make it go away again.
The use of `depends on' is preferred (although it has
usability problems too..)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Currently ioprio_best function first checks wethere aioprio or bioprio equals
IOPRIO_CLASS_NONE (ioprio_valid() macros does that) and if it is so it returns
bioprio/aioprio appropriately. Thus the next four lines, that set aclass/bclass
to IOPRIO_CLASS_BE, if aclass/bclass == IOPRIO_CLASS_NONE, are never executed.
The second problem: if aioprio from class IOPRIO_CLASS_NONE and bioprio from
class IOPRIO_CLASS_IDLE are passed to ioprio_best function, it will return
IOPRIO_CLASS_IDLE. It means that during __make_request we can merge two
requests and set the priority of merged request to IDLE, while one of
the initial requests originates from a process with NONE (default) priority.
So we can get a situation when a process with default ioprio will experience
IO starvation, while there is no process from real-time class in the system.
Just removing ioprio_valid check should correct situation.
Signed-off-by: Vasily Tarasov <vtaras@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
calculation in 2100 (year divisible by 100)
Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
- Calculate a variable in bvec_alloc_bs() only once needed, not earlier
(bio.o down from 18408 to 18376 Bytes, 32 Bytes saved, probably due to
data locality improvements).
- Init variable idx to silence a gcc warning which already existed in the
unmodified original base file (bvec_alloc_bs() handles idx correctly, so
there's no need for the warning):
fs/bio.c: In function `bio_alloc_bioset':
fs/bio.c:169: warning: `idx' may be used uninitialized in this function
Signed-off-by: Andreas Mohr <andi@lisas.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch destroys all the dentries attached to a superblock in one go
by:
(1) Destroying the tree rooted at s_root.
(2) Destroying every entry in the anon list, one at a time.
(3) Each entry in the anon list has its subtree consumed from the leaves
inwards.
This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.
Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
functions added by this patch. This is because:
(1) at the point the filesystem calls generic_shutdown_super(), it is not
permitted to further touch the superblock's set of dentries, and nor may
it remove aliases from inodes;
(2) the dcache memory shrinker now skips dentries that are being unmounted;
and
(3) the superblock no longer has any external references through which the VFS
can reach it.
Given these points, the only locking we need to do is when we remove dentries
from the unused list and the name hashes, which we do a directory's worth at a
time.
We also don't need to guard against reference counts going to zero unexpectedly
and removing bits of the tree we're working on as nothing else can call dput().
A cut down version of dentry_iput() has been folded into
shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
things, it also doesn't need to check for inotify watches.
In this version of the patch, the complaint about a dentry still being in use
has been expanded from a single BUG_ON() and now gives much more information.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure all dentries refs are released before calling kill_anon_super() so
that the assumption that generic_shutdown_super() can completely destroy the
dentry tree for there will be no external references holds true.
What was being done in the put_super() superblock op, is now done in the
kill_sb() filesystem op instead, prior to calling kill_anon_super().
This makes the struct autofs_sb_info::root member variable redundant (since
sb->s_root is still available), and so that is removed. The calls to
shrink_dcache_sb() are also removed since they're also redundant as
shrink_dcache_for_umount() will now be called after the cleanup routine.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure all dentries refs are released before calling kill_block_super()
so that the assumption that generic_shutdown_super() can completely destroy
the dentry tree for there will be no external references holds true.
What was being done in the put_super() superblock op, is now done in the
kill_sb() filesystem op instead, prior to calling kill_block_super().
Changes made in [try #2]:
(*) reiserfs_kill_sb() now checks that the superblock FS info pointer is set
before trying to dereference it.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of flush_dcache_page()s are missing on the I/O-error paths.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Aince all callers dereference sb, and this function does so earlier too, we
dont need the check.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of HDIO IOCTLs are not yet handled and a few others are marked
as using a pointer rather than an unsigned long. The formers include:
HDIO_GET_WCACHE, HDIO_GET_ACOUSTIC, HDIO_GET_ADDRESS and
HDIO_GET_BUSSTATE. The latters are: HDIO_SET_MULTCOUNT,
HDIO_SET_UNMASKINTR, HDIO_SET_KEEPSETTINGS, HDIO_SET_32BIT,
HDIO_SET_NOWERR, HDIO_SET_DMA, HDIO_SET_PIO_MODE and HDIO_SET_NICE.
Additionally 0x330 used to be HDIO_GETGEO_BIG and may be issued by 32-bit
`hdparm' run on a 64-bit kernel making Linux complain loudly.
This is a fix for these issues.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current error behaviour for ext2 and ext3 filesystems does not fully
correspond to the documentation and should be fixed.
According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
different on-errors behaviours:
---- start of quote man 8 mount ----
errors=continue / errors=remount-ro / errors=panic
Define the behaviour when an error is encountered. (Either ignore
errors and just mark the file system erroneous and continue, or remount
the file system read-only, or panic and halt the system.) The default is
set in the filesystem superblock, and can be changed using tune2fs(8).
---- end of quote ----
However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
handle of errors on ext3.
Then we've checked corresponding code in ext2 and discovered that it is buggy
as well:
- EXT2_ERRORS_CONTINUE is not read from the superblock (the same);
- parse_option() does not clean the alternative values and thus something
like (ERRORS_CONT|ERRORS_RO) can be set;
- if options are omitted, parse_option() does not set any of these options.
Therefore it is possible to set any combination of these options on the ext2:
- none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
options;
- any of them may be set using mount options;
- 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
superblock and other value in mount options;
- and finally all three options may be set by adding third option in remount.
Currently ext2 uses these values only in ext2_error() and it is not leading to
any noticeable troubles. However somebody may be discouraged when he will try
to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
mount options.
This patch:
EXT2_ERRORS_CONTINUE should be read from the superblock as default value for
error behaviour. parse_option() should clean the alternative options and
should not change default value taken from the superblock.
Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current error behaviour for ext2 and ext3 filesystems does not fully
correspond to the documentation and should be fixed.
According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
different on-errors behaviours:
---- start of quote man 8 mount ----
errors=continue / errors=remount-ro / errors=panic
Define the behaviour when an error is encountered. (Either ignore
errors and just mark the file system erroneous and continue, or remount
the file system read-only, or panic and halt the system.) The default is
set in the filesystem superblock, and can be changed using tune2fs(8).
---- end of quote ----
However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
handle of errors on ext3.
Then we've checked corresponding code in ext2 and discovered that it is buggy
as well:
- EXT2_ERRORS_CONTINUE is not read from the superblock (the same);
- parse_option() does not clean the alternative values and thus something
like (ERRORS_CONT|ERRORS_RO) can be set;
- if options are omitted, parse_option() does not set any of these options.
Therefore it is possible to set any combination of these options on the ext2:
- none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
options;
- any of them may be set using mount options;
- 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
superblock and other value in mount options;
- and finally all three options may be set by adding third option in remount.
Currently ext2 uses these values only in ext2_error() and it is not leading to
any noticeable troubles. However somebody may be discouraged when he will try
to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
mount options.
This patch:
EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
error behaviour.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If grow_buffers() is for some reason passed a block number which wants to lie
outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
will accidentally truncate `index' and will then instnatiate a page at the
wrong pagecache offset. This causes __getblk_slow() to go into an infinite
loop.
This can happen with corrupted disks, or with software errors elsewhere.
Detect that, and handle it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the epoll_pwait system call, that extend the event wait mechanism
with the same logic ppoll and pselect do. The definition of epoll_pwait
is:
int epoll_pwait(int epfd, struct epoll_event *events, int maxevents,
int timeout, const sigset_t *sigmask, size_t sigsetsize);
The difference between the vanilla epoll_wait and epoll_pwait is that the
latter allows the caller to specify a signal mask to be set while waiting
for events. Hence epoll_pwait will wait until either one monitored event,
or an unmasked signal happen. If sigmask is NULL, the epoll_pwait system
call will act exactly like epoll_wait. For the POSIX definition of
pselect, information is available here:
http://www.opengroup.org/onlinepubs/009695399/functions/select.html
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Someone's tab key is emitting spaces. Attempt to repair some of the damage.
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current error behaviour for ext2 and ext3 filesystems does not fully
correspond to the documentation and should be fixed.
According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
different on-errors behaviours:
---- start of quote man 8 mount ----
errors=continue / errors=remount-ro / errors=panic
Define the behaviour when an error is encountered. (Either ignore
errors and just mark the file system erroneous and continue, or remount
the file system read-only, or panic and halt the system.) The default is
set in the filesystem superblock, and can be changed using tune2fs(8).
---- end of quote ----
However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
handle of errors on ext3.
Then we've checked corresponding code in ext2 and discovered that it is buggy
as well:
- EXT2_ERRORS_CONTINUE is not read from the superblock (the same);
- parse_option() does not clean the alternative values and thus something
like (ERRORS_CONT|ERRORS_RO) can be set;
- if options are omitted, parse_option() does not set any of these options.
Therefore it is possible to set any combination of these options on the ext2:
- none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
options;
- any of them may be set using mount options;
- 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
superblock and other value in mount options;
- and finally all three options may be set by adding third option in remount.
Currently ext2 uses these values only in ext2_error() and it is not leading to
any noticeable troubles. However somebody may be discouraged when he will try
to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
mount options.
This patch:
EXT4_ERRORS_CONTINUE should be taken from the superblock as default value for
error behaviour.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@openvz.org>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I assume this means "logical sb block". So call it that.
I still don't understand the name though. A block is a block. What's
different about this one?
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_LBD=n, sector_div() expands to a plain old divide. But ext4 is
_not_ passing in a sector_t as the first argument, so...
fs/built-in.o: In function `ext4_get_group_no_and_offset':
fs/ext4/balloc.c:39: undefined reference to `__umoddi3'
fs/ext4/balloc.c:41: undefined reference to `__udivdi3'
fs/built-in.o: In function `find_group_orlov':
fs/ext4/ialloc.c:278: undefined reference to `__udivdi3'
fs/built-in.o: In function `ext4_fill_super':
fs/ext4/super.c:1488: undefined reference to `__udivdi3'
fs/ext4/super.c:1488: undefined reference to `__umoddi3'
fs/ext4/super.c:1594: undefined reference to `__udivdi3'
fs/ext4/super.c:1601: undefined reference to `__umoddi3'
Fix that up by calling do_div() directly.
Also cast the arg to u64. do_div() is only defined on u64, and ext4_fsblk_t
is supposed to be opaque.
Note especially the changes to find_group_orlov(). It was attempting to do
do_div(int, unsigned long long);
which is royally screwed up. Switched it to plain old divide.
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move '_hi' bits of block numbers in the larger part of the
block group descriptor structure
Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Similar to ext4, change blocks in JBD2 from sector_t to unsigned long long.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously when in-kernel ext4 block type is sector_t, it's only 4 bits long
under some 32bit arch (when CONFIG_LBD is not on). So we need to check the
size of sector_t before we read 48bit long on-disk blocks to in-kernel blocks.
These checks are unnecessary now as we changed the in-kernel blocks to
unsigned longlong.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change ext4 in-kernel block type (ext4_fsblk_t) from sector_t to unsigned
long long. Remove ext4 block type string micro E3FSBLK, replaced with "%llu"
[akpm@osdl.org: build fix]
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As we are planning to support 48-bit block numbers for ext4, we need to
support 48-bit block numbers for extended attributes. In the short term, we
can do this by reuse (on-disk) 16-bit padding (linux2.i_pad1 currently used
only by "hurd") as high order bits for xattr. This patch basically does that.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JBD layer in-kernel block varibles type fixes to support >32 bit block number
and convert to sector_t type.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is the patch to JBD to handle 64 bit block numbers, originally from Zach
Brown. This patch is useful only after adding support for 64-bit block
numbers in the filesystem.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make it possible to add file preallocation support in future as an RO_COMPAT
feature by recognizing uninitialized extents as holes and limiting extent
length to keep the top bit of ee_len free for marking uninitialized extents.
Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Redefine ext3 in-kernel filesystem block type (ext3_fsblk_t) from unsigned
long to sector_t, to allow kernel to handle >32 bit ext3 blocks.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On disk extents format:
/*
* this is extent on-disk structure
* it's used at the bottom of the tree
*/
struct ext3_extent {
__le32 ee_block; /* first logical block extent covers */
__le16 ee_len; /* number of blocks covered by extent */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start; /* low 32 bigs of physical block */
};
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reworked from a patch by Mingming Cao and Randy Dunlap
Signed-off-By: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
jbd and jbd2 currently use the same slab names which must be unique. The
patch below just renames jbd2's slabs.
Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao originally did this work, and Shaggy reproduced it using some
scripts from her.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a simple copy of the files in fs/jbd to fs/jbd2 and
/usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Originally part of a patch from Mingming Cao and Randy Dunlap. Reorganized
by Shaggy.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao originally did this work, and Shaggy reproduced it using some
scripts from her.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Start of the ext4 patch series. See Documentation/filesystems/ext4.txt for
details.
This is a simple copy of the files in fs/ext3 to fs/ext4 and
/usr/incude/linux/ext3* to /usr/include/ex4*
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
commit fe1668ae5b causes kernel to oops with
libhugetlbfs test suite. The problem is that hugetlb pages can be shared
by multiple mappings. Multiple threads can fight over page->lru in the
unmap path and bad things happen. We now serialize __unmap_hugepage_range
to void concurrent linked list manipulation. Such serialization is also
needed for shared page table page on hugetlb area. This patch will fixed
the bug and also serve as a prepatch for shared page table.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since commit d1807793e1 we dereference a NULL
pointer. Coverity id #1432. We set journal to NULL, and use it directly
afterwards.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
split the data structures that exist in host- and disk-endian variants,
annotate the fields of disk-endian ones, propagate changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pulled includes of endian.h from fs/befs/*.c to befs.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>