Commit Graph

9994 Commits

Author SHA1 Message Date
Jeff Layton
838726c475 cifs: fix O_APPEND on directio mounts
The direct I/O write codepath for CIFS is done through
cifs_user_write(). That function does not currently call
generic_write_checks() so the file position isn't being properly set
when the file is opened with O_APPEND.  It's also not doing the other
"normal" checks that should be done for a write call.

The problem is currently that when you open a file with O_APPEND on a
mount with the directio mount option, the file position is set to the
beginning of the file. This makes any subsequent writes clobber the data
in the file starting at the beginning.

This seems to fix the problem in cursory testing. It is, however
important to note that NFS disallows the combination of
(O_DIRECT|O_APPEND). If my understanding is correct, the concern is
races with multiple clients appending to a file clobbering each others'
data. Since the write model for CIFS and NFS is pretty similar in this
regard, CIFS is probably subject to the same sort of races. What's
unclear to me is why this is a particular problem with O_DIRECT and not
with buffered writes...

Regardless, disallowing O_APPEND on an entire mount is probably not
reasonable, so we'll probably just have to deal with it and reevaluate
this flag combination when we get proper support for O_DIRECT. In the
meantime this patch at least fixes the existing problem.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-28 14:15:32 +00:00
Steve French
6405c9cd9b Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-08-28 02:47:00 +00:00
Linus Torvalds
325a9a3d39 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Add destroy routine for dns_resolver
  [CIFS] Reorder cifs config item for better clarity
  [CIFS] Correct keys dependency for cifs kerberos support
2008-08-27 14:34:49 -07:00
Linus Torvalds
5b51a7e9d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] deal with the first call of ->show() generating no output
  [PATCH] fix ->llseek() for a bunch of directories
  [PATCH] fix regular readdir() and friends
  [PATCH] fix hpux_getdents()
  [PATCH] fix osf_getdirents()
  [PATCH] ntfs: use d_add_ci
  [PATCH] change d_add_ci argument ordering
  [PATCH] fix efs_lookup()
  [PATCH] proc: inode number fixlet
2008-08-27 14:31:44 -07:00
Steve French
bcc55c6664 [CIFS] Fix plaintext authentication
The last eight bytes of the password field were not cleared when doing lanman plaintext password authentication. This patch fixes that.

I tested it with Samba by setting password
encryption to no in the server's smb.conf.  Other servers also can be
configured to force plaintext authentication.    Note that plaintexti
authentication requires setting /proc/fs/cifs/SecurityFlags to 0x30030
on the client (enabling both LANMAN and also plaintext password support).
Also note that LANMAN support (and thus plaintext password support) requires
CONFIG_CIFS_WEAK_PW_HASH to be enabled in menuconfig.

CC: Jeff Layton <jlayton@redhat.com>
CC: Stable Kernel <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-27 21:30:22 +00:00
Jeff Layton
87ed1d65fb [CIFS] Add destroy routine for dns_resolver
Otherwise, we're leaking the payload memory.

CC: Stable Kernel <stable@vger.kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-27 21:17:41 +00:00
Linus Torvalds
0559bc8e9b Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: remove blk_queue_tag_depth() and blk_queue_tag_queue()
  block: remove unused ->busy part of the block queue tag map
  bio: fix __bio_copy_iov() handling of bio->bv_len
  bio: fix bio_copy_kern() handling of bio->bv_len
  block: submit_bh() inadvertently discards barrier flag on a sync write
  block: clean up cmdfilter sysfs interface
  block: rename blk_scsi_cmd_filter to blk_cmd_filter
  sg: restore command permission for TYPE_SCANNER
  block: move cmdfilter from gendisk to request_queue
2008-08-27 13:55:35 -07:00
Linus Torvalds
e472233fc5 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Increment the reference count of an already-active stack.
  [PATCH] configfs: Consolidate locking around configfs_detach_prep() in configfs_rmdir()
  ocfs2: correctly set i_blocks after inline dir gets expanded
  ocfs2: Jump to correct label in ocfs2_expand_inline_dir()
  ocfs2: Fix sleep-with-spinlock recovery regression
  [PATCH] ocfs2/cluster/netdebug.c: fix warning
  [PATCH] ocfs2/cluster/tcp.c: make some functions static
2008-08-27 13:54:55 -07:00
FUJITA Tomonori
aefcc28a3a bio: fix __bio_copy_iov() handling of bio->bv_len
The commit c5dec1c303 introduced
__bio_copy_iov() to add bounce support to blk_rq_map_user_iov.

__bio_copy_iov() uses bio->bv_len to copy data for READ commands after
the completion but it doesn't work with a request that partially
completed. SCSI always completes a PC request as a whole but seems
some don't.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-08-27 09:50:19 +02:00
FUJITA Tomonori
76029ff37f bio: fix bio_copy_kern() handling of bio->bv_len
The commit 68154e90c9 introduced
bio_copy_kern() to add bounce support to blk_rq_map_kern.

bio_copy_kern() uses bio->bv_len to copy data for READ commands after
the completion but it doesn't work with a request that partially
completed. SCSI always completes a PC request as a whole but seems
some don't.

This patch fixes bio_copy_kern to handle the above case. As
bio_copy_user does, bio_copy_kern uses struct bio_map_data to store
struct bio_vec.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reported-by: Nix <nix@esperi.org.uk>
Tested-by: Nix <nix@esperi.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-08-27 09:50:19 +02:00
Jens Axboe
48fd4f93a0 block: submit_bh() inadvertently discards barrier flag on a sync write
Reported by Milan Broz <mbroz@redhat.com>, commit 18ce3751 inadvertently
made submit_bh() discard the barrier bit for a WRITE_SYNC request. Fix
that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-08-27 09:50:19 +02:00
Steve French
96c2a1137b [CIFS] Reorder cifs config item for better clarity
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-26 18:32:28 +00:00
Steve French
e9775843ec [CIFS] Correct keys dependency for cifs kerberos support
Must also depend on CIFS ...

Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-26 18:22:50 +00:00
Steve French
3dae49abef Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-08-26 16:56:05 +00:00
Steve French
6ce5eecb9c [CIFS] check version in spnego upcall response
Currently, we don't check the version in the SPNEGO upcall response
even though one is provided. Jeff and Q have made the corresponding
change to the Samba client (cifs.upcall).

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-26 00:37:14 +00:00
Joel Becker
d6817cdbd1 ocfs2: Increment the reference count of an already-active stack.
The ocfs2_stack_driver_request() function failed to increment the
refcount of an already-active stack.  It only did the increment on the
first reference.  Whoops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Marcos Matsunaga <marcos.matsunaga@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-25 07:29:47 -07:00
Adrian Hunter
601c0bc467 UBIFS: allow for racing between GC and TNC
The TNC mutex is unlocked prematurely when reading leaf nodes
with non-hashed keys.  This is unsafe because the node may be
moved by garbage collection and the eraseblock unmapped, although
that has never actually happened during stress testing.

This patch fixes the flaw by detecting the race and retrying with
the TNC mutex locked.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-25 14:34:02 +03:00
Adrian Hunter
761e29f3bb UBIFS: always read hashed-key nodes under TNC mutex
Leaf-nodes that have a hashed key are stored in the
leaf-node-cache (LNC) which is protected by the TNC
mutex.  Consequently, when reading a leaf node with
a hashed key (i.e. directory entries, xattr entries)
the TNC mutex is always required.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-25 14:33:41 +03:00
Al Viro
4cdfe84b51 [PATCH] deal with the first call of ->show() generating no output
seq_read() has a subtle bug - we want the first loop there to go
until at least one *non-empty* record had fit entirely into buffer.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:10 -04:00
Al Viro
59af1584bf [PATCH] fix ->llseek() for a bunch of directories
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:09 -04:00
Al Viro
8f3f655da7 [PATCH] fix regular readdir() and friends
Handling of -EOVERFLOW.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:08 -04:00
Christoph Hellwig
2690421743 [PATCH] ntfs: use d_add_ci
d_add_ci was lifted 1:1 from ntfs.  Change ntfs to use the common
version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:06 -04:00
Christoph Hellwig
e45b590b97 [PATCH] change d_add_ci argument ordering
As pointed out during review d_add_ci argument order should match d_add,
so switch the dentry and inode arguments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:05 -04:00
Al Viro
2d8a10cd17 [PATCH] fix efs_lookup()
it needs to use d_splice_alias(), not d_add()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:04 -04:00
Alexey Dobriyan
cc99609917 [PATCH] proc: inode number fixlet
Ouch, if number taken from IDA is too big, the intent was to signal an
error, not check for overflow and still do overflowing addition.

One still needs 2^28 proc entries to notice this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:03 -04:00
Adrian Bunk
7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Louis Rilling
de6bf18e9c [PATCH] configfs: Consolidate locking around configfs_detach_prep() in configfs_rmdir()
It appears that configfs_rmdir() can protect configfs_detach_prep() retries with
less calls to {spin,mutex}_{lock,unlock}, and a cleaner code.

This patch does not change any behavior, except that it removes two useless
lock/unlock pairs having nothing inside to protect and providing a useless
barrier.

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:09:02 -07:00
Mark Fasheh
9780eb6cfa ocfs2: correctly set i_blocks after inline dir gets expanded
We were setting i_blocks based on allocation before the extent insert, which
is wrong as the value is a calculation based on ip_clusters which gets
updated as a result of the insert. This patch moves the line in question
to just after the call to ocfs2_insert_extent().

Without this fix, inline directories were temporarily having an i_blocks
value of zero immediately after expansion to extents.

Reported-and-tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:09:02 -07:00
Tao Ma
83cab5338f ocfs2: Jump to correct label in ocfs2_expand_inline_dir()
When we fail to insert extent in ocfs2_expand_inline_dir(), we should go to
out_commit, not out.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:09:02 -07:00
Mark Fasheh
a1af7d15a1 ocfs2: Fix sleep-with-spinlock recovery regression
This fixes a bug introduced with 539d826409:
    [PATCH 2/2] ocfs2: Fix race between mount and recovery

ocfs2_mark_dead_nodes() was reading journal inodes while holding the
spinlock protecting our in-memory recovery state. The fix is very simple -
the disk state is protected by a cluster lock that's already held, so we
just move the spinlock down past the read.

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:08:38 -07:00
Alexander Beregalov
a57a874b04 [PATCH] ocfs2/cluster/netdebug.c: fix warning
ocfs2/cluster/netdebug.c: fix warning

fs/ocfs2/cluster/netdebug.c:154: warning: format '%lu' expects
     type 'long unsigned int', but argument 17 has type 'suseconds_t'

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 10:56:57 -07:00
Adrian Bunk
18496e80f7 [PATCH] ocfs2/cluster/tcp.c: make some functions static
Commit 0f475b2abe (ocfs2/net: Silence build
warnings) made sense as far as it fixed compile warnings, but it was not
required that it made the functions global.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 10:56:40 -07:00
Linus Torvalds
ee26562772 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Update documentation to remind users to update mke2fs.conf
  ext4: Fix small file fragmentation
  ext4: Initialize writeback_index to 0 when allocating a new inode
  ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC
  ext4: journal credit fix for the delayed allocation's writepages() function
  ext4: Rework the ext4_da_writepages() function
  ext4: journal credits reservation fixes for DIO, fallocate
  ext4: journal credits reservation fixes for extent file writepage
  ext4: journal credits calulation cleanup and fix for non-extent writepage
  ext4: Fix bug where we return ENOSPC even though we have plenty of inodes
  ext4: don't try to resize if there are no reserved gdt blocks left
  ext4: Use ext4_discard_reservations instead of mballoc-specific call
  ext4: Fix ext4_dx_readdir hash collision handling
  ext4: Fix delalloc release block reservation for truncate
  ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty
  ext4: Handle unwritten extent properly with delayed allocation
2008-08-22 08:37:07 -07:00
Artem Bityutskiy
04da11bfcf UBIFS: fix zero-length truncations
Always allow truncations to zero, even if budgeting thinks there
is no space. UBIFS reserves some space for deletions anyway.

Otherwise, the following happans:
1. create a file, and write as much as possible there, until ENOSPC
2. truncate the file, which fails with ENOSPC, which is not good.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-21 16:48:52 +03:00
Al Viro
82d63fc9e3 cramfs: fix named-pipe handling
After commit a97c9bf33f (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.

It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).

Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.

Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>		[2.6.14 and later]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:32 -07:00
Ken Chen
2d70b68d42 fix setpriority(PRIO_PGRP) thread iterator breakage
When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting.  The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.

Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process.  Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:32 -07:00
Pavel Emelyanov
ff9bc512f1 binfmt_misc: fix false -ENOEXEC when coupled with other binary handlers
In case the binfmt_misc binary handler is registered *before* the e.g.
script one (when for example being compiled as a module) the following
situation may occur:

1. user launches a script, whose interpreter is a misc binary;
2. the load_misc_binary sets the misc_bang and returns -ENOEVEC,
   since the binary is a script;
3. the load_script_binary loads one and calls for search_binary_hander
   to run the interpreter;
4. the load_misc_binary is called again, but refuses to load the
   binary due to misc_bang bit set.

The fix is to move the misc_bang setting lower - prior to the actual
call to the search_binary_handler.

Caused by the commit 3a2e7f47 (binfmt_misc.c: avoid potential kernel
stack overflow)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: <stable@kernel.org>		[2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:31 -07:00
Clement Calmels
1804dc6e14 /proc/self/maps doesn't display the real file offset
This addresses

	http://bugzilla.kernel.org/show_bug.cgi?id=11318

In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20
than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE
equal to 4096 (i.e.  2^12).  The next seq_printf use an unsigned long for
the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset
value displayed in /proc/self/maps is truncated if the page offset is
greater than 2^20.

A test that shows this issue:

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

#define PAGE_SIZE (getpagesize())

#if __i386__
#   define U64_STR "%llx"
#elif __x86_64
#   define U64_STR "%lx"
#else
#   error "Architecture Unsupported"
#endif

int main(int argc, char *argv[])
{
	int fd;
	char *addr;
	off64_t offset = 0x10000000;
	char *filename = "/dev/zero";

	fd = open(filename, O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 1;
	}

	offset *= 0x10;
	printf("offset = " U64_STR "\n", offset);

	addr = (char*)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd,
			     offset);
	if ((void*)addr == MAP_FAILED) {
		perror("mmap64");
		return 1;
	}

	{
		FILE *fmaps;
		char *line = NULL;
		size_t len = 0;
		ssize_t read;
		size_t filename_len = strlen(filename);

		fmaps = fopen("/proc/self/maps", "r");
		if (!fmaps) {
			perror("fopen");
			return 1;
		}
		while ((read = getline(&line, &len, fmaps)) != -1) {
			if ((read > filename_len + 1)
			    && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0))
				printf("%s", line);
		}

		if (line)
			free(line);

		fclose(fmaps);
	}

	close(fd);
	return 0;
}

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Clement Calmels <cboulte@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:30 -07:00
Linus Torvalds
1bbe44f69d Merge branch 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Provide a FLAT_PLAT_INIT() definition.
  binfmt_flat: Stub in a FLAT_PLAT_INIT().
  video: export sh_mobile_lcdc panel size
  sh: select memchunk size using kernel cmdline
  sh: export sh7723 VEU as VEU2H
  input: migor_ts compile and detection fix
  sh: remove MSTPCR defines from Migo-R header file
  sh: Update sh7763rdp defconfig
  sh: Add support sh7760fb to sh7763rdp board
  sh: Add support sh_eth to sh7763rdp board
  sh: Disable 64kB hugetlbpage size when using 64kB PAGE_SIZE.
  sh: Don't export __{s,u}divsi3_i4i from SH-2 libgcc.
  fix SH7705_CACHE_32KB compilation
  sh: mach-x3proto: Fix up smc91x platform data.
2008-08-20 08:46:11 -07:00
Linus Torvalds
5f22ca9b13 vfat: fix 'sync' mount deadlock due to BKL->lock_super conversion
There was another FAT BKL conversion deadlock reported by Bart
Trojanowski due to the BKL being used as a recursive lock by FAT, which
was missed because it only triggers with 'sync' (or 'dirsync') mounts.

The recursion worked for the BKL, but after the conversion to lock_super
(which uses a mutex), it just deadlocks.

Thanks to Bart for debugging this and testing the fix.  The lock
debugging information from the original report:

  =============================================
  [ INFO: possible recursive locking detected ]
  2.6.27-rc3-bisect-00448-ga7f5aaf #16
  ---------------------------------------------
  mv/4020 is trying to acquire lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  but task is already holding lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  other info that might help us debug this:
  3 locks held by mv/4020:
   #0:  (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140
   #1:  (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110
   #2:  (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  stack backtrace:
  Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16
   [<c014e694>] validate_chain+0x984/0xea0
   [<c0108d70>] ? native_sched_clock+0x0/0xf0
   [<c014ee9c>] __lock_acquire+0x2ec/0x9b0
   [<c014f5cf>] lock_acquire+0x6f/0x90
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c044e5fd>] mutex_lock_nested+0xad/0x300
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] lock_super+0x1e/0x20
   [<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat]
   [<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80
   [<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat]
   [<f8b3a962>] fat_sync_inode+0x12/0x20 [fat]
   [<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat]
   [<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat]
   [<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat]
   [<c01b0968>] vfs_unlink+0x98/0x110
   [<c01b2400>] do_unlinkat+0x130/0x140
   [<c016a8f5>] ? audit_syscall_entry+0x105/0x150
   [<c01b253b>] sys_unlinkat+0x3b/0x40
   [<c01040d3>] sysenter_do_call+0x12/0x3f
   =======================

where the deadlock is due to the nesting of lock_super from vfat_unlink
to fat_write_inode:

 - do_unlinkat
   - vfs_unlink
     - vfat_unlink
       * lock_super
       - fat_remove_entries
         - fat_sync_inode
           - fat_write_inode
             * lock_super

and the fix is to simply remove the use of lock_super() in fat_write_inode.

The lock_super() there had been just an automatic conversion of the
kernel lock to the superblock lock, but no locking was actually needed
there, since the code in fat_write_inode already protected all relevant
accesses with a spinlock (sbi->inode_hash_lock to be exact).  The only
code inside the BKL (and thus the superblock lock) was accesses tp local
variables or calls to functions that have long been SMP-safe (i.e.
sb_bread, mark_buffe_dirty and brlese).

Bart reports:
 "Looks good.  I ran 10 parallel processes creating 1M files truncating
  them, writing to them again and then deleting them.  This patch fixes
  the issue I ran into.

  Signed-off-by: Bart Trojanowski <bart@jukie.net>"

Reported-and-tested-by: Bart Trojanowski <bart@jukie.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 08:31:19 -07:00
Steve French
3d2af3465e [CIFS] Kerberos support not considered experimental anymore
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-19 20:51:09 +00:00
Steve French
c16fefa563 [CIFS] distinguish between Kerberos and MSKerberos in upcall
Properly handle MSKRB5 by passing sec=mskrb5 to the upcall so that the
spengo blob can be generated appropriately. Also, make
decode_negTokenInit prefer whichever mechanism is first in the list.

Needed for some NetApp servers, and possibly some older
versions of Windows which treat the two KRB5 mechanisms differently.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-19 19:35:33 +00:00
Jeff Layton
cb7691b648 cifs: add local server pointer to cifs_setup_session
cifs_setup_session references pSesInfo->server several times. That
pointer shouldn't change during the life of the function so grab it
once and store it in a local var. This makes the code look a little
cleaner too.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-19 17:11:35 +00:00
Ilpo Järvinen
aab3a8c7a3 [CIFS] reindent misindented statement
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-19 14:23:37 +00:00
Jan Kara
97e1cfb086 udf: Fix error paths in udf_new_inode()
I case we failed to allocate memory for inode when creating it, we did not
properly free block already allocated for this inode. Move memory allocation
before the block allocation which fixes this issue (thanks for the idea go to
Ingo Oeser <ioe-lkml@rameria.de>). Also remove a few superfluous
initializations already done in udf_alloc_inode().

Reviewed-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2008-08-19 11:05:05 +02:00
Jan Kara
db0badc58e udf: Fix lock inversion between iprune_mutex and alloc_mutex (v2)
A memory allocation inside alloc_mutex must not recurse back into the
filesystem itself because that leads to lock inversion between iprune_mutex and
alloc_mutex (and thus to deadlocks - see traces below). alloc_mutex is actually
needed only to update allocation statistics in the superblock so we can drop it
before we start allocating memory for the inode.

tar           D ffff81015b9c8c90     0  6614   6612
 ffff8100d5a21a20 0000000000000086 0000000000000000 00000000ffff0000
 ffff81015b9c8c90 ffff81015b8f0cd0 ffff81015b9c8ee0 0000000000000000
 0000000000000003 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff803c1d8a>] __mutex_lock_slowpath+0x64/0x9b
 [<ffffffff803c1bef>] mutex_lock+0xa/0xb
 [<ffffffff8027f8c2>] shrink_icache_memory+0x38/0x200
 [<ffffffff80257742>] shrink_slab+0xe3/0x15b
 [<ffffffff802579db>] try_to_free_pages+0x221/0x30d
 [<ffffffff8025657e>] isolate_pages_global+0x0/0x31
 [<ffffffff8025324b>] __alloc_pages_internal+0x252/0x3ab
 [<ffffffff8026b08b>] cache_alloc_refill+0x22e/0x47b
 [<ffffffff8026ae37>] kmem_cache_alloc+0x3b/0x61
 [<ffffffff8026b15b>] cache_alloc_refill+0x2fe/0x47b
 [<ffffffff8026b34e>] __kmalloc+0x76/0x9c
 [<ffffffffa00751f2>] :udf:udf_new_inode+0x202/0x2e2
 [<ffffffffa007ae5e>] :udf:udf_create+0x2f/0x16d
 [<ffffffffa0078f27>] :udf:udf_lookup+0xa6/0xad
...
kswapd0       D ffff81015b9d9270     0   125      2
 ffff81015b903c28 0000000000000046 ffffffff8028cbb0 00000000fffffffb
 ffff81015b9d9270 ffff81015b8f0cd0 ffff81015b9d94c0 000000000271b490
 ffffe2000271b458 ffffe2000271b420 ffffe20002728dc8 ffffe20002728d90
Call Trace:
 [<ffffffff8028cbb0>] __set_page_dirty+0xeb/0xf5
 [<ffffffff8025403a>] get_dirty_limits+0x1d/0x22f
 [<ffffffff803c1d8a>] __mutex_lock_slowpath+0x64/0x9b
 [<ffffffff803c1bef>] mutex_lock+0xa/0xb
 [<ffffffffa0073f58>] :udf:udf_bitmap_free_blocks+0x47/0x1eb
 [<ffffffffa007df31>] :udf:udf_discard_prealloc+0xc6/0x172
 [<ffffffffa007875a>] :udf:udf_clear_inode+0x1e/0x48
 [<ffffffff8027f121>] clear_inode+0x6d/0xc4
 [<ffffffff8027f7f2>] dispose_list+0x56/0xee
 [<ffffffff8027fa5a>] shrink_icache_memory+0x1d0/0x200
 [<ffffffff80257742>] shrink_slab+0xe3/0x15b
 [<ffffffff80257e93>] kswapd+0x346/0x447
...

Reported-by: Tibor Tajti <tibor.tajti@gmail.com>
Reviewed-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2008-08-19 11:04:36 +02:00
Linus Torvalds
45edb89ffd Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] mount of IPC$ breaks with iget patch
  [CIFS] remove trailing whitespace
  [CIFS] if get root inode fails during mount, cleanup tree connection
2008-08-15 11:02:35 -07:00
Linus Torvalds
21d3bdb160 Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6: (29 commits)
  UBIFS: xattr bugfixes
  UBIFS: remove unneeded check
  UBIFS: few commentary fixes
  UBIFS: fix budgeting request alignment in xattr code
  UBIFS: improve arguments checking in debugging messages
  UBIFS: always set i_generation to 0
  UBIFS: correct spelling of "thrice".
  UBIFS: support splice_write
  UBIFS: minor tweaks in commit
  UBIFS: reserve more space for index
  UBIFS: print pid in dump function
  UBIFS: align inode data to eight
  UBIFS: improve budgeting checks
  UBIFS: correct orphan deletion order
  UBIFS: fix typos in comments
  UBIFS: do not union creat_sqnum and del_cmtno
  UBIFS: optimize deletions
  UBIFS: increment commit number earlier
  UBIFS: remove another unneeded function parameter
  UBIFS: remove unneeded function parameter
  ...
2008-08-15 10:33:07 -07:00
Bob Copeland
9419fc1c95 omfs: fix oops when file metadata is corrupted
A fuzzed fileystem image failed with OMFS when the extent count was
used in a loop without being checked against the max number of extents.
It also provoked a signed division for an array index that was checked
as if unsigned, leading to index by -1.

omfsck will be updated to fix these cases, in the meantime bail out
gracefully.

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:44 -07:00
Bob Copeland
c963343a11 omfs: fix potential oops when directory size is corrupted
Testing with a modified fsfuzzer reveals a couple of locations in omfs
where filesystem variables are ultimately used as loop counters with
insufficient sanity checking.  In this case, dir->i_size is used to
compute the number of buckets in the directory hash.  If too large,
readdir will overrun a buffer.

Since it's an invariant that dir->i_size is equal to the sysblock
size, and we already sanity check that, just use that value instead.
This fixes the following oops:

BUG: unable to handle kernel paging request at c978e004
IP: [<c032298e>] omfs_readdir+0x18e/0x32f
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
Modules linked in:

Pid: 4796, comm: ls Not tainted (2.6.27-rc2 #12)
EIP: 0060:[<c032298e>] EFLAGS: 00010287 CPU: 0
EIP is at omfs_readdir+0x18e/0x32f
EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100
ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process ls (pid: 4796, ti=cb2d3000 task=cb175f40 task.ti=cb2d3000)
Stack: 00000002 00000000 00000000 c018a820 cb2d3f94 cb2cf100 cbfb0000 ffffff10
       cbfb3b80 cbfcfaf8 000001c9 00000a09 00000000 00000000 00000000 cbfcfbc8
       c9697000 cbfb3b80 22222222 00001000 c08e6cd0 cb2cf100 cbfb3b80 cb2d3f88
Call Trace:
 [<c018a820>] ? filldir64+0x0/0xcd
 [<c018a9f2>] ? vfs_readdir+0x56/0x82
 [<c018a820>] ? filldir64+0x0/0xcd
 [<c018aa7c>] ? sys_getdents64+0x5e/0xa0
 [<c01038bd>] ? sysenter_do_call+0x12/0x31
 =======================
Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d
14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b
41 18 <8b> 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:44 -07:00
Chris Mason
7d455e0030 fs/inode.c: properly init address_space->writeback_index
write_cache_pages() uses i_mapping->writeback_index to pick up where it
left off the last time a given inode was found by pdflush or
balance_dirty_pages (or anyone else who sets wbc->range_cyclic)

alloc_inode() should set it to a sane value so that writeback doesn't
start in the middle of a file.  It is somewhat difficult to notice the bug
since write_cache_pages will loop around to the start of the file and the
elevator helps hide the resulting seeks.

For whatever reason, Btrfs hits this often.  Unpatched, untarring 30
copies of the linux kernel in series runs at 47MB/s on a single sata
drive.  With this fix, it jumps to 62MB/s.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:44 -07:00
Artem Bityutskiy
c78c7e35a4 UBIFS: xattr bugfixes
Xattr code has not been tested for a while and there were
serveral bugs. One of them is using wrong inode in
'ubifs_jnl_change_xattr()'. The other is a deadlock in
'ubifs_setxattr()': the i_mutex is locked in
'cap_inode_need_killpriv()' path, so deadlock happens when
'ubifs_setxattr()' tries to lock it again.

Thanks to Zoltan Sogor for finding these bugs.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-14 12:46:20 +03:00
Steve French
ad661334b8 [CIFS] mount of IPC$ breaks with iget patch
In looking at network named pipe support on cifs, I noticed that
Dave Howell's iget patch:

    iget: stop CIFS from using iget() and read_inode()

broke mounts to IPC$ (the interprocess communication share), and don't
handle the error case (when getting info on the root inode fails).

Thanks to Gunter who noted a typo in a debug line in the original
version of this patch.

CC: David Howells <dhowells@redhat.com>
CC: Gunter Kukkukk <linux@kukkukk.com>
CC: Stable Kernel <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-14 03:55:14 +00:00
David Howells
9e2b2dc413 CRED: Introduce credential access wrappers
The patches that are intended to introduce copy-on-write credentials for 2.6.28
require abstraction of access to some fields of the task structure,
particularly for the case of one task accessing another's credentials where RCU
will have to be observed.

Introduced here are trivial no-op versions of the desired accessors for current
and other tasks so that other subsystems can start to be converted over more
easily.

Wrappers are introduced into a new header (linux/cred.h) for UID/GID,
EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed
user_struct.  These wrappers are macros because the ordering between header
files mitigates against making them inline functions.

linux/cred.h is #included from linux/sched.h.

Further, XFS is modified such that it no longer defines and uses parameterised
versions of current_fs[ug]id(), thus getting rid of the namespace collision
otherwise incurred.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-08-14 09:35:23 +10:00
Linus Torvalds
9ea319b616 Merge git://oss.sgi.com:8090/xfs/linux-2.6
* git://oss.sgi.com:8090/xfs/linux-2.6: (45 commits)
  [XFS] Fix use after free in xfs_log_done().
  [XFS] Make xfs_bmap_*_count_leaves void.
  [XFS] Use KM_NOFS for debug trace buffers
  [XFS] use KM_MAYFAIL in xfs_mountfs
  [XFS] refactor xfs_mount_free
  [XFS] don't call xfs_freesb from xfs_unmountfs
  [XFS] xfs_unmountfs should return void
  [XFS] cleanup xfs_mountfs
  [XFS] move root inode IRELE into xfs_unmountfs
  [XFS] stop using file_update_time
  [XFS] optimize xfs_ichgtime
  [XFS] update timestamp in xfs_ialloc manually
  [XFS] remove the sema_t from XFS.
  [XFS] replace dquot flush semaphore with a completion
  [XFS] replace inode flush semaphore with a completion
  [XFS] extend completions to provide XFS object flush requirements
  [XFS] replace the XFS buf iodone semaphore with a completion
  [XFS] clean up stale references to semaphores
  [XFS] use get_unaligned_* helpers
  [XFS] Fix compile failure in xfs_buf_trace()
  ...
2008-08-13 15:17:49 -07:00
David Teigland
51409340d2 dlm: rename structs
Add a dlm_ prefix to the struct names in config.c.  This resolves a
conflict with struct node in particular, when include/linux/node.h
happens to be included.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-08-13 12:47:36 -05:00
David Teigland
cb980d9a3e dlm: add missing kfrees
A couple of unlikely error conditions were missing a kfree on the error
exit path.

Reported-by: Juha Leppanen <juha_motorsportcom@luukku.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-08-13 12:47:36 -05:00
Artem Bityutskiy
720b499c80 UBIFS: remove unneeded check
Commit d70b67c8bc fixed VFS and
it never calls FS lookup function in deleted directories now.
We may remove corresponding UBIFS check.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 18:59:09 +03:00
Artem Bityutskiy
0a883a05c5 UBIFS: few commentary fixes
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 18:59:02 +03:00
Zoltan Sogor
5acd6ff8ac UBIFS: fix budgeting request alignment in xattr code
Data length has to be aligned in the budgeting request. Code
in xattr.c did not do this.

Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:43:56 +03:00
Artem Bityutskiy
840dc6b891 UBIFS: improve arguments checking in debugging messages
Use "if (0) printk()" construct in debugging print macros to
make the debugging messages be checked even if debugging is
off.

This patch also removes some unneeded spaces and blank lines.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:42:47 +03:00
Adrian Hunter
81ffa38e15 UBIFS: always set i_generation to 0
UBIFS does not presently re-use inode numbers, so leaving
i_generation zero is most appropriate for now.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:39:53 +03:00
Adrian Hunter
3a13252c6f UBIFS: correct spelling of "thrice".
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-13 11:39:20 +03:00
Zoltan Sogor
22bc7fa8c5 UBIFS: support splice_write
Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:38:43 +03:00
Artem Bityutskiy
0010f18afc UBIFS: minor tweaks in commit
No functional changes, just lessen the amount of indentations.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:38:19 +03:00
Artem Bityutskiy
b364b41aeb UBIFS: reserve more space for index
At the moment UBIFS reserves twice old index size space for the
index. But this is not enough in some cases, because if the indexing
node are very fragmented and there are many small gaps, while the
dirty index has big znodes - in-the-gaps method would fail.

Thus, reserve trise as more, in which case we are guaranteed that
we can commit in any case.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:37:28 +03:00
Artem Bityutskiy
1de9415906 UBIFS: print pid in dump function
Useful when something fails and there are many processes
racing.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:35:58 +03:00
Artem Bityutskiy
dab4b4d2f9 UBIFS: align inode data to eight
UBIFS aligns node lengths to 8, so budgeting has to do the
same. Well, direntry, inode, and page budgets are already
aligned, but not inode data budget (e.g., data in special
devices or symlinks). Do this for inode data as well.
Also, add corresponding debugging checks.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:35:16 +03:00
Artem Bityutskiy
547000da64 UBIFS: improve budgeting checks
Budgeting is a crucial UBIFS subsystem - add more assertions
to improve requests checking. This is not compiled in when
UBIFS debugging is disabled.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:34:27 +03:00
Adrian Hunter
f769108424 UBIFS: correct orphan deletion order
The debug function that checks orphans, does so using the
TNC mutex. That means it will not see a correct picture
if the inode is removed from the orphan tree before it is
removed from TNC.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-13 11:32:53 +03:00
Adrian Hunter
7d62ff2c39 UBIFS: fix typos in comments
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-13 11:32:21 +03:00
Adrian Hunter
bc813355c7 UBIFS: do not union creat_sqnum and del_cmtno
The values in these two fields need to be preserved independently
and so a union cannot be used.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-13 11:30:04 +03:00
Artem Bityutskiy
de94eb558b UBIFS: optimize deletions
Every time anything is deleted, UBIFS writes the deletion inode
node twice - once in 'ubifs_jnl_update()' and the second time in
'ubifs_jnl_write_inode()'. However, the second write is not needed
if no commit happened after 'ubifs_jnl_update()'. This patch
checks that condition and avoids writing the deletion inode for
the second time.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:28:44 +03:00
Artem Bityutskiy
014eb04b03 UBIFS: increment commit number earlier
Increment the commit number at the beginnig of the commit, instead
of doing this after the commit. This is needed for further
optimizations.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:27:47 +03:00
Artem Bityutskiy
fd6c6b51e3 UBIFS: remove another unneeded function parameter
The 'last_reference' parameter of 'pack_inode()' is not really
needed because 'inode->i_nlink' may be tested instead. Zap it.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:27:10 +03:00
Artem Bityutskiy
1f28681ad3 UBIFS: remove unneeded function parameter
Simplify 'ubifs_jnl_write_inode()' by removing the 'deletion'
parameter which is not really needed because we may test
inode->i_nlink and check whether this is a deletion or not.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:26:25 +03:00
Artem Bityutskiy
fbfa6c884a UBIFS: do not write orphans back
Orphan inodes are deleted inodes which will disappear after FS
re-mount. There is not need to write orphan inodes back, because
they are not needed on the flash media.

So optimize orphans a little by not writing them back. Just mark
them as clean, free the budget, and report success to VFS.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:25:27 +03:00
Adrian Hunter
ff46d7b3e0 UBIFS: make ubifs_ro_mode() not inline
We use ubifs_ro_mode() quite a lot, and not in fast-path, so
there is no reason to blow the code up by having it inlined.
Also, we usually want R/O mode change to be seen to other
CPUs as soon as possible, so when we make this a function
call, we will automatically have a memory barrier.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:24:26 +03:00
Adrian Hunter
2fb42b11f6 UBIFS: ensure UBIFS switches to read-only on error
UBI transparently handles write errors by automatically copying
and remapping the affected eraseblock. If UBI is unable to do
that, for example its pool of eraseblocks reserved for bad block
handling is empty, then the error is propagated to UBIFS. UBIFS
must protect the media from falling into an inconsistent state
by immediately switching to read-only mode. In the case of log
updates, this was not being done.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-13 11:24:00 +03:00
Adrian Hunter
16dfd804b4 UBIFS: fix error return in failure mode
UBIFS recovery testing debug facility simulates media failures.
When simulating an IO error, the error code returned must be
-EIO but it was not always if the user switched off the
debug recovery testing option at the same time.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-08-13 11:22:41 +03:00
Artem Bityutskiy
1e0f358e29 UBIFS: free budget in delete_inode as well
Although the inode is marked as clean when it is being deleted,
it might stay and be used as orphan, and be marked as dirty.
So we have to free the budget when we delete it.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:22:09 +03:00
Artem Bityutskiy
7d32c2bb14 UBIFS: improve debugging
1. Print inode mode in some of debugging messages
2. Add few more useful assertions

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:20:07 +03:00
Artem Bityutskiy
182854b46f UBIFS: fix budgeting calculations
The 'ubifs_release_dirty_inode_budget()' was buggy and incorrectly
freed the budget, which led to not freeing all dirty data budget.
This patch fixes that.

Also, this patch fixes ubifs_mkdir() which passed 1 in dirty_ino_d,
which makes no sense. Well, it is harmless though.

Also, add few more useful assertions. And improve few debugging
messages.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:20:05 +03:00
Artem Bityutskiy
ce769caa50 UBIFS: print volume name as well
We encouredge people to mount using volume name, not device
numbers. So print the name of the mounted UBI volume, not just
IDs.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:15:50 +03:00
Lachlan McIlroy
c6a7b0f8a4 [XFS] Fix use after free in xfs_log_done().
The ticket allocation code got reworked in 2.6.26 and we now free tickets
whereas before we used to cache them so the use-after-free went
undetected.

SGI-PV: 985525

SGI-Modid: xfs-linux-melb:xfs-kern:31877a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-08-13 16:52:50 +10:00
Ruben Porras
c94312de22 [XFS] Make xfs_bmap_*_count_leaves void.
xfs_bmap_count_leaves and xfs_bmap_disk_count_leaves always return always
0, make them void.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31844a

Signed-off-by: Ruben Porras <ruben.porras@linworks.de>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:52:25 +10:00
Lachlan McIlroy
5695ef46ef [XFS] Use KM_NOFS for debug trace buffers
Use KM_NOFS to prevent recursion back into the filesystem which can cause
deadlocks.

In the case of xfs_iread() we hold the lock on the inode cluster buffer
while allocating memory for the trace buffers. If we recurse back into XFS
to flush data that may require a transaction to allocate extents which
needs log space. This can deadlock with the xfsaild thread which can't
push the tail of the log because it is trying to get the inode cluster
buffer lock.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31838a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-08-13 16:51:57 +10:00
Christoph Hellwig
d62c251fe4 [XFS] use KM_MAYFAIL in xfs_mountfs
Use KM_MAYFAIL for the m_perag allocation, we can deal with the error
easily and blocking forever during mount is not a good idea either.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31837a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:51:29 +10:00
Christoph Hellwig
ff4f038c6b [XFS] refactor xfs_mount_free
xfs_mount_free mostly frees the perag data, which is something that is
duplicated in the mount error path.

Move the XFS_QM_DONE call to the caller and remove the useless
mutex_destroy/spinlock_destroy calls so that we can re-use it for the
mount error path. Also rename it to xfs_free_perag to reflect what it
does.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31836a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:50:47 +10:00
Christoph Hellwig
6203300e5e [XFS] don't call xfs_freesb from xfs_unmountfs
xfs_readsb is called before xfs_mount so xfs_freesb should be called after
xfs_unmountfs, too. This means it now happens after a few things during
the of xfs_unmount which all have nothing to do with the superblock.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31835a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:50:21 +10:00
Christoph Hellwig
41b5c2e77a [XFS] xfs_unmountfs should return void
xfs_unmounts can't and shouldn't return errors so declare it as returning
void.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31833a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:49:57 +10:00
Christoph Hellwig
4249023a5d [XFS] cleanup xfs_mountfs
Remove all the useless flags and code keyed off it in xfs_mountfs.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31831a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:49:32 +10:00
Christoph Hellwig
77508ec8e6 [XFS] move root inode IRELE into xfs_unmountfs
The root inode is allocated in xfs_mountfs so it should be release in
xfs_unmountfs. For the unmount case that means we do it after the the
xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
dmapi unmount event. Note that both reference the rip variable which might
be freed by that time in case inode flushing has kicked in, so strictly
speaking this might count as a bug fix

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31830a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:49:04 +10:00
Christoph Hellwig
3a76c1ea07 [XFS] stop using file_update_time
xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
need to call file_update_time and then copy the values over to the XFS
inode. The only additional thing in file_update_time are checks not
applicable to the write path.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31829a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-08-13 16:48:12 +10:00
Christoph Hellwig
8e5975c82f [XFS] optimize xfs_ichgtime
Port a little optmization from file_update_time to xfs_ichgtime, and only
update the timestamp and mark the inode dirty if the timestamp actually
changes in the timer tick resultion supported by the running kernel.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31827a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:45:13 +10:00
Christoph Hellwig
dff35fd41f [XFS] update timestamp in xfs_ialloc manually
In xfs_ialloc we just want to set all timestamps to the current time. We
don't need to mark the inode dirty like xfs_ichgtime does, and we don't
need nor want the opimizations in xfs_ichgtime that I will introduce in
the next patch.

So just opencode the timestamp update in xfs_ialloc, and remove the new
unused XFS_ICHGTIME_ACC case in xfs_ichgtime.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31825a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:44:15 +10:00
David Chinner
ab4a9b04a3 [XFS] remove the sema_t from XFS.
Now that all users of the sema_t are gone from XFS we can finally kill it.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31823a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:42:10 +10:00
David Chinner
e1f49cf20c [XFS] replace dquot flush semaphore with a completion
Use the new completion flush code to implement the dquot flush lock.
Removes one of the final users of semaphores in the XFS code base.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31822a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:41:43 +10:00
David Chinner
c63942d3ee [XFS] replace inode flush semaphore with a completion
Use the new completion flush code to implement the inode flush lock.
Removes one of the final users of semaphores in the XFS code base.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31817a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:41:16 +10:00
David Chinner
b4dd330b9e [XFS] replace the XFS buf iodone semaphore with a completion
The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
completion. Change it to a completion and remove the last user of the
sema_t from XFS.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31815a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-08-13 16:36:11 +10:00