Commit Graph

12542 Commits

Author SHA1 Message Date
Jan Kara
cc33412fb1 quota: Improve locking
We implement dqget() and dqput() that need neither dqonoff_mutex nor dqptr_sem.
Then move dqget() and dqput() calls so that they are not called from under
dqptr_sem. This is important because filesystem callbacks aren't called from
under dqptr_sem which used to cause *lots* of problems with lock ranking
(and with OCFS2 they became close to unsolvable).

The patch also removes two functions which were introduced solely because OCFS2
needed them to cope with the old locking scheme. As time showed, they were not
enough for OCFS2 anyway and it would be unnecessary work to adapt them to the
new locking scheme in which they aren't needed.  As a result OCFS2 needs the
following patch to compile properly with quotas.  Sorry to any bisecters which
hit this in advance.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-01-16 18:02:10 +01:00
Jan Kara
6b7021ef7e ext2: also update the inode on disk when dir is IS_DIRSYNC
We used to just write changed page for IS_DIRSYNC inodes.  But we also
have to update the directory inode itself just for the case that we've
allocated a new block and changed i_size.

[akpm@linux-foundation.org: still sync the data page]
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:42 -08:00
Qinghuang Feng
1bcbf31337 btrfs & squashfs: Move btrfs and squashfsto's magic number to <linux/magic.h>
Use the standard magic.h for btrfs and squashfs.

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:38 -08:00
Linus Torvalds
bca268565f Merge branch 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
  [CVE-2009-0029] s390 specific system call wrappers
  [CVE-2009-0029] System call wrappers part 33
  [CVE-2009-0029] System call wrappers part 32
  [CVE-2009-0029] System call wrappers part 31
  [CVE-2009-0029] System call wrappers part 30
  [CVE-2009-0029] System call wrappers part 29
  [CVE-2009-0029] System call wrappers part 28
  [CVE-2009-0029] System call wrappers part 27
  [CVE-2009-0029] System call wrappers part 26
  [CVE-2009-0029] System call wrappers part 25
  [CVE-2009-0029] System call wrappers part 24
  [CVE-2009-0029] System call wrappers part 23
  [CVE-2009-0029] System call wrappers part 22
  [CVE-2009-0029] System call wrappers part 21
  [CVE-2009-0029] System call wrappers part 20
  [CVE-2009-0029] System call wrappers part 19
  [CVE-2009-0029] System call wrappers part 18
  [CVE-2009-0029] System call wrappers part 17
  [CVE-2009-0029] System call wrappers part 16
  [CVE-2009-0029] System call wrappers part 15
  ...
2009-01-14 19:58:40 -08:00
Heiko Carstens
2b66421995 [CVE-2009-0029] System call wrappers part 33
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:32 +01:00
Heiko Carstens
d4e82042c4 [CVE-2009-0029] System call wrappers part 32
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:31 +01:00
Heiko Carstens
836f92adf1 [CVE-2009-0029] System call wrappers part 31
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:31 +01:00
Heiko Carstens
6559eed8ca [CVE-2009-0029] System call wrappers part 30
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:30 +01:00
Heiko Carstens
2e4d0924eb [CVE-2009-0029] System call wrappers part 29
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:30 +01:00
Heiko Carstens
938bb9f5e8 [CVE-2009-0029] System call wrappers part 28
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:30 +01:00
Heiko Carstens
1e7bfb2134 [CVE-2009-0029] System call wrappers part 27
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:29 +01:00
Heiko Carstens
5a8a82b1d3 [CVE-2009-0029] System call wrappers part 23
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:28 +01:00
Heiko Carstens
20f37034fb [CVE-2009-0029] System call wrappers part 21
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:26 +01:00
Heiko Carstens
3cdad42884 [CVE-2009-0029] System call wrappers part 20
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:26 +01:00
Heiko Carstens
003d7ab479 [CVE-2009-0029] System call wrappers part 19
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:26 +01:00
Heiko Carstens
ca013e945b [CVE-2009-0029] System call wrappers part 17
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:25 +01:00
Heiko Carstens
002c8976ee [CVE-2009-0029] System call wrappers part 16
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:25 +01:00
Heiko Carstens
a26eab2400 [CVE-2009-0029] System call wrappers part 15
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:24 +01:00
Heiko Carstens
3480b25743 [CVE-2009-0029] System call wrappers part 14
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:24 +01:00
Heiko Carstens
6a6160a7b5 [CVE-2009-0029] System call wrappers part 13
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:23 +01:00
Heiko Carstens
64fd1de3d8 [CVE-2009-0029] System call wrappers part 12
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:23 +01:00
Heiko Carstens
257ac264d6 [CVE-2009-0029] System call wrappers part 11
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:23 +01:00
Heiko Carstens
bdc480e3be [CVE-2009-0029] System call wrappers part 10
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:22 +01:00
Heiko Carstens
a5f8fa9e9b [CVE-2009-0029] System call wrappers part 09
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:21 +01:00
Heiko Carstens
6673e0c3fb [CVE-2009-0029] System call wrapper special cases
System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.

So we handle them as special case and add some extra wrappers instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:18 +01:00
Heiko Carstens
c9da9f2129 [CVE-2009-0029] Make sys_pselect7 static
Not a single architecture has wired up sys_pselect7 plus it is the
only system call with seven parameters. Just make it static and
rename it to do_pselect which will do the work for sys_pselect6.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:16 +01:00
Heiko Carstens
1134723e96 [CVE-2009-0029] Remove __attribute__((weak)) from sys_pipe/sys_pipe2
Remove __attribute__((weak)) from common code sys_pipe implemantation.
IA64, ALPHA, SUPERH (32bit) and SPARC (32bit) have own implemantations
with the same name. Just rename them.
For sys_pipe2 there is no architecture specific implementation.

Cc: Richard Henderson <rth@twiddle.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:15 +01:00
Heiko Carstens
e55380edf6 [CVE-2009-0029] Rename old_readdir to sys_old_readdir
This way it matches the generic system call name convention.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:15 +01:00
Heiko Carstens
2ed7c03ec1 [CVE-2009-0029] Convert all system calls to return a long
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:14 +01:00
Lachlan McIlroy
cb7a97d015 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus 2009-01-14 16:29:51 +11:00
Bernd Schmidt
62568510b8 Fix timeouts in sys_pselect7
Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
seen occasional 5-second hangs from telnet.  telnetd calls select with a
NULL timeout, but with the new kernel, the system call occasionally
returns 0, which causes telnet to call sleep (5).  This did not happen
with earlier kernels.

The code in sys_pselect7 looks a bit strange, in particular the variable
"to" is initialized to NULL, then changed if a non-null timeout was
passed in, but not used further.  It needs to be passed to
core_sys_select instead of &end_time.

This bug was introduced by 8ff3e8e85f
("select: switch select() and poll() over to hrtimers").

Signed-off-by: Bernd Schmidt <bernd.schmidt@analog.com>
Reviewed-by: Ulrich Drepper <drepper@redhat.com>
Tested-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-13 14:45:17 -08:00
Linus Torvalds
c69e8839c2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: change rsbtbl rwlock to spinlock
  dlm: fix seq_file usage in debugfs lock dump
2009-01-12 15:54:27 -08:00
Linus Torvalds
0176260fc3 btrfs: fix for write_super_lockfs/unlockfs error handling
Commit c4be0c1dc4 added the ability for
write_super_lockfs to return errors, and renamed them to match.  But
btrfs didn't get converted.

Do the minimal conversion to make it compile again.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-10 06:09:52 -08:00
Takashi Sato
8e961870bb filesystem freeze: remove XFS specific ioctl interfaces for freeze feature
It removes XFS specific ioctl interfaces and request codes
for freeze feature.

This patch has been supplied by David Chinner.

Signed-off-by: Dave Chinner <dgc@sgi.com>
Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 16:54:42 -08:00
Takashi Sato
fcccf50254 filesystem freeze: implement generic freeze feature
The ioctls for the generic freeze feature are below.
o Freeze the filesystem
  int ioctl(int fd, int FIFREEZE, arg)
    fd: The file descriptor of the mountpoint
    FIFREEZE: request code for the freeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1

o Unfreeze the filesystem
  int ioctl(int fd, int FITHAW, arg)
    fd: The file descriptor of the mountpoint
    FITHAW: request code for unfreeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1
    Error number: If the filesystem has already been unfrozen,
                  errno is set to EINVAL.

[akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 16:54:42 -08:00
Takashi Sato
c4be0c1dc4 filesystem freeze: add error handling of write_super_lockfs/unlockfs
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 16:54:42 -08:00
David Brownell
2d96d1053d CORE_DUMP_DEFAULT_ELF_HEADERS depends on ELF_CORE
Kernels that don't support ELF coredumps at all surely can't be supporting
new partial-segment flavored ELF coredumps ...  don't make folk answer
Kconfig questions about that flavor.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 16:54:41 -08:00
Linus Torvalds
9a100a4464 Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-2
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-2:
  async: make async a command line option for now
  partial revert of asynchronous inode delete
2009-01-09 15:32:26 -08:00
Linus Torvalds
32b838b8cf Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [JFFS2] remove junk prototypes
2009-01-09 15:29:04 -08:00
Linus Torvalds
31aeb6c815 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  MAINTAINERS: squashfs entry
  Squashfs: documentation
  Squashfs: initrd support
  Squashfs: Kconfig entry
  Squashfs: Makefiles
  Squashfs: header files
  Squashfs: block operations
  Squashfs: cache operations
  Squashfs: uid/gid lookup operations
  Squashfs: fragment block operations
  Squashfs: export operations
  Squashfs: super block operations
  Squashfs: symlink operations
  Squashfs: regular file operations
  Squashfs: directory readdir operations
  Squashfs: directory lookup operations
  Squashfs: inode operations
2009-01-09 15:18:49 -08:00
Linus Torvalds
c40f6f8bbc Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
  NOMMU: Support XIP on initramfs
  NOMMU: Teach kobjsize() about VMA regions.
  FLAT: Don't attempt to expand the userspace stack to fill the space allocated
  FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
  NOMMU: Improve procfs output using per-MM VMAs
  NOMMU: Make mmap allocation page trimming behaviour configurable.
  NOMMU: Make VMAs per MM as for MMU-mode linux
  NOMMU: Delete askedalloc and realalloc variables
  NOMMU: Rename ARM's struct vm_region
  NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
2009-01-09 14:00:58 -08:00
Arjan van de Ven
b32714ba29 partial revert of asynchronous inode delete
let the core of this one bake in -next as well, but leave
some of the infrastructure in place.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2009-01-09 13:15:49 -08:00
Artem Bityutskiy
ab5610b434 [JFFS2] remove junk prototypes
'rb_prev()', 'rb_next()' and 'rb_replace_node()' are declared in
include/linux/rbtree.h, no need for JFFS2 to re-declare them. I
believe these are left-overs from the old days when the common
RB tree code did not have those call and JFFS2 had private
implementation.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-09 21:05:21 +00:00
Linus Torvalds
73d59314e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (864 commits)
  Btrfs: explicitly mark the tree log root for writeback
  Btrfs: Drop the hardware crc32c asm code
  Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING
  Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook
  Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies
  Btrfs: tree logging checksum fixes
  Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents
  Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation
  Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat code
  Btrfs: drop EXPORT symbols from extent_io.c
  Btrfs: Fix checkpatch.pl warnings
  Btrfs: Fix free block discard calls down to the block layer
  Btrfs: avoid orphan inode caused by log replay
  Btrfs: avoid potential super block corruption
  Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_super
  Btrfs: fix a memory leak in btrfs_get_sb
  Btrfs: Fix typo in clear_state_cb
  Btrfs: Fix memset length in btrfs_file_write
  Btrfs: update directory's size when creating subvol/snapshot
  Btrfs: add permission checks to the ioctls
  ...
2009-01-09 13:01:38 -08:00
Linus Torvalds
6ddaab20c3 Merge branch 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block:
  block: fix bug in ptbl lookup cache
2009-01-09 12:57:34 -08:00
Neil Brown
54b0d12769 block: fix bug in ptbl lookup cache
Neil writes:

   Hi Jens,

    I've found a little bug for you.  It was introduced by
        a6f23657d3

        block: add one-hit cache for disk partition lookup

    and has the effect of killing my machine whenever I try to assemble
    an md array :-(
    One of the devices in the array has partitions, and mdadm always
    deletes partitions before putting a whole-device in an array (as it
    can cause confusion).  The next IO to that device locks the machine.
    I don't really understand exactly why it locks up, but it happens in
    disk_map_sector_rcu().  This patch fixes it.

Which is due to a missing clear of the (now) stale partition lookup
data. So clear that when we delete a partition.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-01-09 21:46:13 +01:00
Linus Torvalds
7c51d57e9d Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (67 commits)
  [MTD] [MAPS] Fix printk format warning in nettel.c
  [MTD] [NAND] add cmdline parsing (mtdparts=) support to cafe_nand
  [MTD] CFI: remove major/minor version check for command set 0x0002
  [MTD] [NAND] ndfc driver
  [MTD] [TESTS] Fix some size_t printk format warnings
  [MTD] LPDDR Makefile and KConfig
  [MTD] LPDDR extended physmap driver to support LPDDR flash
  [MTD] LPDDR added new pfow_base parameter
  [MTD] LPDDR Command set driver
  [MTD] LPDDR PFOW definition
  [MTD] LPDDR QINFO records definitions
  [MTD] LPDDR qinfo probing.
  [MTD] [NAND] pxa3xx: convert from ns to clock ticks more accurately
  [MTD] [NAND] pxa3xx: fix non-page-aligned reads
  [MTD] [NAND] fix nandsim sched.h references
  [MTD] [NAND] alauda: use USB API functions rather than constants
  [MTD] struct device - replace bus_id with dev_name(), dev_set_name()
  [MTD] fix m25p80 64-bit divisions
  [MTD] fix dataflash 64-bit divisions
  [MTD] [NAND] Set the fsl elbc ECCM according the settings in bootloader.
  ...

Fixed up trivial debug conflicts in drivers/mtd/devices/{m25p80.c,mtd_dataflash.c}
2009-01-09 12:37:15 -08:00
Chris Mason
e293e97e36 Btrfs: explicitly mark the tree log root for writeback
Each subvolume has an extent_state_tree used to mark metadata
that needs to be sent to disk while syncing the tree.  This is
used in addition to the dirty bits on the pages themselves so that
a single subvolume can be sent to disk efficiently in disk order.

Normally this marking happens in btrfs_alloc_free_block, which also does
special recording of dirty tree blocks for the tree log roots.

Yan Zheng noticed that when the root of the log tree is allocated, it is added
to the wrong writeback list.  The fix used here is to explicitly set
it dirty as part of tree log creation.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-09 13:14:17 -05:00
Nick Piggin
0087167c9d [XFS] use scalable vmap API
Implement XFS's large buffer support with the new vmap APIs. See the vmap
rewrite (db64fe02) for some numbers. The biggest improvement that comes from
using the new APIs is avoiding the global KVA allocation lock on every call.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2009-01-09 17:09:47 +11:00
Nick Piggin
958f8c0e4f [XFS] remove old vmap cache
XFS's vmap batching simply defers a number (up to 64) of vunmaps, and keeps
track of them in a list. To purge the batch, it just goes through the list and
calls vunamp on each one. This is pretty poor: a global TLB flush is generally
still performed on each vunmap, with the most expensive parts of the operation
being the broadcast IPIs and locking involved in the SMP callouts, and the
locking involved in the vmap management -- none of these are avoided by just
batching up the calls. I'm actually surprised it ever made much difference.
(Now that the lazy vmap allocator is upstream, this description is not quite
right, but the vunmap batching still doesn't seem to do much)

Rip all this logic out of XFS completely. I will improve vmap performance
and scalability directly in subsequent patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2009-01-09 17:09:25 +11:00