Commit Graph

463 Commits

Author SHA1 Message Date
Jens Axboe
4aff5e2333 [PATCH] Split struct request ->flags into two parts
Right now ->flags is a bit of a mess: some are request types, and
others are just modifiers. Clean this up by splitting it into
->cmd_type and ->cmd_flags. This allows introduction of generic
Linux block message types, useful for sending generic Linux commands
to block devices.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:23:37 +02:00
Rik Snel
3c164bd815 [BLOCK] dm-crypt: trivial comment improvements
Just some minor comment nits.

- little-endian is better than low-endian
- and since it is called essiv everywere it should also be essiv
  in the comments (and not ess_iv)

Signed-off-by: Rik Snel <rsnel@cube.dyndns.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:46:27 +10:00
Herbert Xu
3505868791 [CRYPTO] users: Use crypto_hash interface instead of crypto_digest
This patch converts all remaining crypto_digest users to use the new
crypto_hash interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:46:21 +10:00
Herbert Xu
d1806f6a97 [BLOCK] dm-crypt: Use block ciphers where applicable
This patch converts dm-crypt to use the new block cipher type where
applicable.  It also changes simple cipher operations to use the new
encrypt_one/decrypt_one interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21 11:46:13 +10:00
NeilBrown
ddac7c7e3a [PATCH] md: Fix issues with referencing rdev in md/raid1
We need to be careful when referencing mirrors[i].rdev.  It can disappear
under us at various times.

So:
  fix a couple of problem places.
  comment a couple of non-problem places
  move an 'atomic_add' which deferences rdev down a little
    way to some where where it is sure to not be NULL.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-01 11:39:08 -07:00
NeilBrown
6394cca548 [PATCH] md: fix recent breakage of md/raid1 array checking
A recent patch broke the ability to do a user-request check of a raid1.
This patch fixes the breakage and also moves a comment that was dislocated
by the same patch.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27 11:01:31 -07:00
NeilBrown
8469219596 [PATCH] md: avoid backward event updates in md superblock when degraded.
If we
  - shut down a clean array,
  - restart with one (or more) drive(s) missing
  - make some changes
  - pause, so that they array gets marked 'clean',
the event count on the superblock of included drives
will be the same as that of the removed drives.
So adding the removed drive back in will cause it
to be included with no resync.

To avoid this, we only update the eventcount backwards when the array
is not degraded.  In this case there can (should) be no non-connected
drives that we can get confused with, and this is the particular case
where updating-backwards is valuable.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27 11:01:31 -07:00
Daniel Kobras
c06aad854f [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
we experienced frequent deadlock of the system under high i/o load.  'cat
/dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
after a few GB, 'cp' would be left in 'D' state along with kjournald and
kmirrord.  The functions cp and kjournald were blocked in did vary, but
kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
that this problem has been around even before.

So much for the facts, here's my interpretation: mempool_alloc() first tries
to atomically allocate the requested memory, or falls back to hand out
preallocated chunks from the mempool.  If both fail, it puts the calling
process (kmirrord in this case) on a private waitqueue until somebody refills
the pool.  Where the only 'somebody' is kmirrord itself, so we have a
deadlock.

I worked around this problem by falling back to a (blocking) kmalloc when
before kmirrord would have ended up on the waitqueue.  This defeats part of
the benefits of using the mempool, but at least keeps the system running.  And
it could be done with a two-line change.  Note that mempool_alloc() clears the
GFP_NOIO flag internally, and only uses it to decide whether to wait or return
an error if immediate allocation fails, so the attached patch doesn't change
behaviour in the non-deadlocking case.  Path is against current git
(2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
2.6.15, where this patch makes the difference between random lockup and a
stable system.

Signed-off-by: Daniel Kobras <kobras@linux.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27 11:01:28 -07:00
Michal Miroslaw
485311a23c [PATCH] dm: BUG/OOPS fix
Fix BUG I tripped on while testing failover and multipathing.

BUG shows up on error path in multipath_ctr() when parse_priority_group()
fails after returning at least once without error.  The fix is to
initialize m->ti early - just after alloc()ing it.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c027c3d2
*pde = 00000000
Oops: 0000 [#3]
Modules linked in: qla2xxx ext3 jbd mbcache sg ide_cd cdrom floppy
CPU:    0
EIP:    0060:[<c027c3d2>]    Not tainted VLI
EFLAGS: 00010202   (2.6.17.3 #1)
EIP is at dm_put_device+0xf/0x3b
eax: 00000001   ebx: ee4fcac0   ecx: 00000000   edx: ee4fcac0
esi: ee4fc4e0   edi: ee4fc4e0   ebp: 00000000   esp: c5db3e78
ds: 007b   es: 007b   ss: 0068
Process multipathd (pid: 15912, threadinfo=c5db2000 task=ef485a90)
Stack: ec4eda40 c02816bd ee4fc4c0 00000000 f7e89498 f883e0bc c02816f6 f7e89480
       f7e8948c c0281801 ffffffea f7e89480 f883e080 c0281ffe 00000001 00000000
       00000004 dfe9cab8 f7a693c0 f883e080 f883e0c0 ca4b99c0 c027c6ee 01400000
Call Trace:
 <c02816bd> free_pgpaths+0x31/0x45  <c02816f6> free_priority_group+0x25/0x2e
 <c0281801> free_multipath+0x35/0x67  <c0281ffe> multipath_ctr+0x123/0x12d
 <c027c6ee> dm_table_add_target+0x11e/0x18b  <c027e5b4> populate_table+0x8a/0xaf
 <c027e62b> table_load+0x52/0xf9  <c027ec23> ctl_ioctl+0xca/0xfc
 <c027e5d9> table_load+0x0/0xf9  <c0152146> do_ioctl+0x3e/0x43
 <c0152360> vfs_ioctl+0x16c/0x178  <c01523b4> sys_ioctl+0x48/0x60
 <c01029b3> syscall_call+0x7/0xb
Code: 97 f0 00 00 00 89 c1 83 c9 01 80 e2 01 0f 44 c1 88 43 14 8b 04 24 59 5b 5e 5f 5d c3 53 89 c1 89 d3 ff 4a 08 0f 94 c0 84 c0 74 2a <8b> 01 8b 10 89 d8 e8 f6 fb ff ff 8b 03 8b 53 04 89 50 04 89 02
EIP: [<c027c3d2>] dm_put_device+0xf/0x3b SS:ESP 0068:c5db3e78

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
NeilBrown
f9abd1ace4 [PATCH] md: Fix a bug that recently crept into md/linear
A recent patch that allowed linear arrays to be reconfigured on-line
allowed in a bug which results in divide by zero - not all
mddev->array_size were converted to conf->array_size.

This patch finished the conversion and fixed the bug.

The offending patch was commit 7c7546ccf6.

Thanks to Simon Kirby <sim@netnation.com> for the bug report.

Cc: Simon Kirby <sim@netnation.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-06 08:57:46 -07:00
Andrew Morton
d0a0a5ee7a [PATCH] md: fix oops in error-handling
During early MD setup (superblock reading), we don't have a personality yet.
But the error-handling code tries to dereference mddev->pers.  Fix.

Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown
d695043259 [PATCH] md: include sector number in messages about corrected read errors
This is generally useful, but particularly helps see if it is the same sector
that always needs correcting, or different ones.

[akpm@osdl.org: fix printk warnings]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown
67463acb64 [PATCH] md: require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs
The ioctl requires CAP_SYS_ADMIN, so sysfs should too.  Note that we don't
require CAP_SYS_ADMIN for reading attributes even though the ioctl does.
There is no reason to limit the read access, and much of the information is
already available via /proc/mdstat

Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown
80ca3a44f5 [PATCH] md: unify usage of symbolic names for perms
Some places we use number (0660) someplaces names (S_IRUGO).  Change all
numbers to be names, and change 0655 to be what it should be.

Also make some formatting more consistent.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown
5e3db645f8 [PATCH] md: fix usage of wrong variable in raid1
Though it rarely matters, we should be using 's' rather than r1_bio->sector
here.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown
ae3c20ccf8 [PATCH] md: fix some small races in bitmap plugging in raid5
The comment gives more details, but I didn't quite have the sequencing write,
so there was room for races to leave bits unset in the on-disk bitmap for
short periods of time.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown
7c785b7a18 [PATCH] md: fix a plug/unplug race in raid5
When a device is unplugged, requests are moved from one or two (depending on
whether a bitmap is in use) queues to the main request queue.

So whenever requests are put on either of those queues, we should make sure
the raid5 array is 'plugged'.  However we don't.  We currently plug the raid5
queue just before putting requests on queues, so there is room for a race.  If
something unplugs the queue at just the wrong time, requests will be left on
the queue and nothing will want to unplug them.  Normally something else will
plug and unplug the queue fairly soon, but there is a risk that nothing will.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
NeilBrown
ff4e8d9a9f [PATCH] md: fix resync speed calculation for restarted resyncs
We introduced 'io_sectors' recently so we could count the sectors that causes
io during resync separate from sectors which didn't cause IO - there can be a
difference if a bitmap is being used to accelerate resync.

However when a speed is reported, we find the number of sectors processed
recently by subtracting an oldish io_sectors count from a current
'curr_resync' count.  This is wrong because curr_resync counts all sectors,
not just io sectors.

So, add a field to mddev to store the curren io_sectors separately from
curr_resync, and use that in the calculations.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
NeilBrown
0b8c9de05c [PATCH] md: delay starting md threads until array is completely setup
When an array is started we start one or two threads (two if there is a
reshape or recovery that needs to be completed).

We currently start these *before* the array is completely set up and in
particular before queue->queuedata is set.  If the thread actually starts
very quickly on another CPU, we can end up dereferencing queue->queuedata
and oops.

This patch also makes sure we don't try to start a recovery if a reshape is
being restarted.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
NeilBrown
31b65a0d38 [PATCH] md: set desc_nr correctly for version-1 superblocks
This has to be done in ->load_super, not ->validate_super

Without this, hot-adding devices to an array doesn't always
work right - though there is a work around in mdadm-2.5.2 to
make this less of an issue.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
NeilBrown
f4370781d8 [PATCH] md: possible fix for unplug problem
I have reports of a problem with raid5 which turns out to be because the raid5
device gets stuck in a 'plugged' state.  This shouldn't be able to happen as
3msec after it gets plugged it should get unplugged.  However it happens
none-the-less.  This patch fixes the problem and is a reasonable thing to do,
though it might hurt performance slightly in some cases.

Until I can find the real problem, we should probably have this workaround in
place.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
Ingo Molnar
663d440eaa [PATCH] lockdep: annotate blkdev nesting
Teach special (recursive) locking code to the lock validator.

Effects on non-lockdep kernels:

- the introduction of the following function variants:

  extern struct block_device *open_partition_by_devnum(dev_t, unsigned);

  extern int blkdev_put_partition(struct block_device *);

  static int
  blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);

 which on non-lockdep are the same as open_by_devnum(), blkdev_put()
 and blkdev_get().

- a subclass parameter to do_open(). [unused on non-lockdep]

- a subclass parameter to __blkdev_put(), which is a new internal
  function for the main blkdev_put*() functions. [parameter unused
  on non-lockdep kernels, except for two sanity check WARN_ON()s]

these functions carry no semantical difference - they only express
object dependencies towards the lockdep subsystem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:10 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Linus Torvalds
602cada851 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits)
  [PATCH] devfs: Remove it from the feature_removal.txt file
  [PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
  [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
  [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
  [PATCH] devfs: Remove devfs_remove() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
  [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree
  [PATCH] devfs: Remove devfs support from the sound subsystem
  [PATCH] devfs: Remove devfs support from the ide subsystem.
  [PATCH] devfs: Remove devfs support from the serial subsystem
  [PATCH] devfs: Remove devfs from the init code
  [PATCH] devfs: Remove devfs from the partition code
  ...
2006-06-29 14:19:21 -07:00
Adrian Bunk
cfb9e32f2f [PATCH] drivers/md/raid5.c: remove an unused variable
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:21 -07:00
Greg Kroah-Hartman
890fbae281 [PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
Just removes a few unused #defines and fixes some comments due to
devfs now being gone.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:09 -07:00
Greg Kroah-Hartman
ce7b0f46bb [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
And remove the now unneeded number field.
Also fixes all drivers that set these fields.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman
96192ff1a9 [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
Also fixes all drivers that set this field.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman
ff23eca3e8 [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman
8ab5e4c15b [PATCH] devfs: Remove devfs_remove() function from the kernel tree
Removes the devfs_remove() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:07 -07:00
Greg Kroah-Hartman
1a715c5cf9 [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
Removes the devfs_mk_bdev() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:06 -07:00
Greg Kroah-Hartman
95dc112a57 [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
Removes the devfs_mk_dir() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:06 -07:00
Adrian Bunk
0538195424 [PATCH] drivers/md/md.c: make code static
Make needlessly global code static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
NeilBrown
f655675b3f [PATCH] md: Allow the write_mostly flag to be set via sysfs
It appears in /sys/mdX/md/dev-YYY/state
and can be set or cleared by writing 'writemostly' or '-writemostly'
respectively.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
NeilBrown
a94213b1fa [PATCH] md: Allow resync_start to be set and queried via sysfs
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
NeilBrown
d4dbd0250e [PATCH] md: Allow raid 'layout' to be read and set via sysfs
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
45dc2de1e5 [PATCH] md: Allow rdev state to be set via sysfs
The md/dev-XXX/state file can now be written:

 "faulty" simulates an error on the device
 "remove" removes the device from the array (if it is not busy)

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
9e653b6342 [PATCH] md: Set/get state of array via sysfs
This allows the state of an md/array to be directly controlled via sysfs and
adds the ability to stop and array without tearing it down.

Array states/settings:

 clear
     No devices, no size, no level
     Equivalent to STOP_ARRAY ioctl
 inactive
     May have some settings, but array is not active
        all IO results in error
     When written, doesn't tear down array, but just stops it
 suspended (not supported yet)
     All IO requests will block. The array can be reconfigured.
     Writing this, if accepted, will block until array is quiescent
 readonly
     no resync can happen.  no superblocks get written.
     write requests fail
 read-auto
     like readonly, but behaves like 'clean' on a write request.

 clean - no pending writes, but otherwise active.
     When written to inactive array, starts without resync
     If a write request arrives then
       if metadata is known, mark 'dirty' and switch to 'active'.
       if not known, block and switch to write-pending
     If written to an active array that has pending writes, then fails.
 active
     fully active: IO and resync can be happening.
     When written to inactive array, starts with resync

 write-pending (not supported yet)
     clean, but writes are blocked waiting for 'active' to be written.

 active-idle
     like active, but no writes have been seen for a while (100msec).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
4254376914 [PATCH] md: Don't write dirty/clean update to spares - leave them alone
- record the 'event' count on each individual device (they
  might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
  block only needs to be updated to record a clean<->dirty
  transition.
- Prefer odd event numbers for dirty states and even numbers
  for clean states
- Using all the above, don't update the superblock on
  a spare device if the update is just doing a clean-dirty
  transition.  To accomodate this, a transition from
  dirty back to clean might now decrement the events counter
  if nothing else has changed.

The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
07d84d109d [PATCH] md: Allow re-add to work on array without bitmaps
When an array has a bitmap, a device can be removed and re-added and only
blocks changes since the removal (as recorded in the bitmap) will be resynced.

It should be possible to do a similar thing to arrays without bitmaps.  i.e.
if a device is removed and re-added and *no* changes have been made in the
interim, then the add should not require a resync.

This patch allows that option.  This means that when assembling an array one
device at a time (e.g.  during device discovery) the array can be enabled
read-only as soon as enough devices are available, but extra devices can still
be added without causing a resync.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
3285edf152 [PATCH] md: Fix bug that stops raid5 resync from happening
As data_disks is *less* than raid_disks, the current test here is obviously
wrong.  And as the difference is already available in conf->max_degraded, it
makes much more sense to use that.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
akpm@osdl.org
b3cc9ec76b [PATCH] md: Fix Kconfig error
RAID5 recently changed to RAID456

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
Justin Piszcz
4d2554d045 [PATCH] md: md Kconfig speeling feex
I was experimenting with Linux SW raid today and found a spelling error when
reading the help menus...  (and fly spell found more).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
8838832830 [PATCH] md: Calculate correct array size for raid10 in new offset mode
The size calculation made assumtion which the new offset mode didn't
follow.  This gets the size right in all cases.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
ce25c31bdd [PATCH] md: Change md/bitmap file handling to use bmap to file blocks-fix
Fix problems with new bmap based access to bitmap files.

1/ When not using a file based bitmap, attach a NULL list of buffers
   to each page so the common free_buffer routine can cope.
2/ Use submit_bh to read as well as write, rather than vfs_read.
   This makes read and write more symetric.
3/ sync the file before reading, to ensure that the page cache has no
   dirty pages that might get written out later.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
d785a06a0b [PATCH] md/bitmap: change md/bitmap file handling to use bmap to file blocks
If md is asked to store a bitmap in a file, it tries to hold onto the page
cache pages for that file, manipulate them directly, and call a cocktail of
operations to write the file out.  I don't believe this is a supportable
approach.

This patch changes the approach to use the same approach as swap files.  i.e.
bmap is used to enumerate all the block address of parts of the file and we
write directly to those blocks of the device.

swapfile only uses parts of the file that provide a full pages at contiguous
addresses.  We don't have that luxury so we have to cope with pages that are
non-contiguous in storage.  To handle this we attach buffers to each page, and
store the addresses in those buffers.

With this approach the pagecache may contain data which is inconsistent with
what is on disk.  To alleviate the problems this can cause, md invalidates the
pagecache when releasing the file.  If the file is to be examined while the
array is active (a non-critical but occasionally useful function), O_DIRECT io
must be used.  And new version of mdadm will have support for this.

This approach simplifies a lot of code:
 - we no longer need to keep a list of pages which we need to wait for,
   as the b_endio function can keep track of how many outstanding
   writes there are.  This saves a mempool.
 - -EAGAIN returns from write_page are no longer possible (not sure if
    they ever were actually).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
acc55e2201 [PATCH] md/bitmap: tidy up i_writecount handling in md/bitmap
md/bitmap modifies i_writecount of a bitmap file to make sure that no-one else
writes to it.  The reverting of the change is sometimes done twice, and there
is one error path where it is omitted.

This patch tidies that up.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
0cdd02cabd [PATCH] md/bitmap: remove dead code from md/bitmap
bitmap_active is never called, and the BITMAP_ACTIVE flag is never users or
tested, so discard them both.

Also remove some out-of-date 'todo' comments.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
a647e4bc5c [PATCH] md/bitmap: remove unnecessary page reference manipulations from md/bitmap code
md/bitmap gets a collection of pages representing the bitmap when it
initialises the bitmap, and puts all the references when discarding the
bitmap.

It also occasionally takes extra references without any good reason, and
sometimes drops them ...  though it doesn't always drop them, which can result
in a memory leak.

This patch removes the unnecessary 'get_page' calls, and the corresponding
'put_page' calls.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
e16b68b6e4 [PATCH] md/bitmap: use set_bit etc for bitmap page attributes
In particular, this means that we use 4 bits per page instead of a whole
unsigned long.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
ec7a3197f4 [PATCH] md/bitmap: cleaner separation of page attribute handlers in md/bitmap
md/bitmap has some attributes per-page.  Handling of these attributes in
largely abstracted in set_page_attr and clear_page_attr.  However
get_page_attr exposes the format used to store them.  So prior to changing
that format, introduce test_page_attr instead of get_page_attr, and make
appropriate usage changes.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
0b79ccf0cd [PATCH] md/bitmap: remove bitmap writeback daemon
md/bitmap currently has a separate thread to wait for writes to the bitmap
file to complete (as we cannot get a callback on that action).

However this isn't needed as bitmap_unplug is called from process context and
waits for the writeback thread to do it's work.  The same result can be
achieved by doing the waiting directly in bitmap_unplug.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
d7375ab324 [PATCH] md/bitmap: fix online removal of file-backed bitmaps
When "mdadm --grow /dev/mdX --bitmap=none" is used to remove a filebacked
bitmap, the bitmap was disconnected from the array, but the file wasn't closed
(until the array was stopped).

The file also wasn't closed if adding the bitmap file failed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
52c03291a8 [PATCH] md: split reshape portion of raid5 sync_request into a separate function
... as raid5 sync_request is WAY too big.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
Adrian Bunk
5e56341d02 [PATCH] md: make md_print_devices() static
This patch makes the needlessly global md_print_devices() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
c93983bf51 [PATCH] md: support stripe/offset mode in raid10
The "industry standard" DDF format allows for a stripe/offset layout where
data is duplicated on different stripes.  e.g.

  A  B  C  D
  D  A  B  C
  E  F  G  H
  H  E  F  G

(columns are drives, rows are stripes, LETTERS are chunks of data).

This is similar to raid10's 'far' mode, but not quite the same.  So enhance
'far' mode with a 'far/offset' option which follows the layout of DDFs
stripe/offset.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
7c7546ccf6 [PATCH] md: allow a linear array to have drives added while active
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
5fd6c1dce0 [PATCH] md: allow checkpoint of recovery with version-1 superblock
For a while we have had checkpointing of resync.  The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.

Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
a8a55c387d [PATCH] md: remove nuisance message at shutdown
At shutdown, we switch all arrays to read-only, which creates a message for
every instantiated array, even those which aren't actually active.

So remove the message for non-active arrays.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
16a53ecc35 [PATCH] md: merge raid5 and raid6 code
There is a lot of commonality between raid5.c and raid6main.c.  This patches
merges both into one module called raid456.  This saves a lot of code, and
paves the way for online raid5->raid6 migrations.

There is still duplication, e.g.  between handle_stripe5 and handle_stripe6.
This will probably be cleaned up later.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
16f17b39f3 [PATCH] md: increase the delay before marking metadata clean, and make it configurable
When a md array has been idle (no writes) for 20msecs it is marked as 'clean'.
 This delay turns out to be too short for some real workloads.  So increase it
to 200msec (the time to update the metadata should be a tiny fraction of that)
and make it sysfs-configurable.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
9443a1d1f7 [PATCH] md: remove useless ioctl warning
This warning was slightly useful back in 2.2 days, but is more an annoyance
now.  It makes it awkward to add new ioctls (that we we are likely to do that
in the current climate, but it is possible).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
NeilBrown
8932c2e0dc [PATCH] md: remove arbitrary limit on chunk size
The largest chunk size the code can support without substantial surgery is
2^30 bytes, so make that the limit instead of an arbitrary 4Meg.  Some day,
the 'chunksize' should change to a sector-shift instead of a byte-count.  Then
no limit would be needed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
NeilBrown
c70810b327 [PATCH] md: reformat code in raid1_end_write_request to avoid goto
A recent change made this goto unnecessary, so reformat the code to make it
clearer what is happening.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
72d9486169 [PATCH] dm: improve error message consistency
Tidy device-mapper error messages to include context information
automatically.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
5c6bd75d06 [PATCH] dm: prevent removal if open
If you misuse the device-mapper interface (or there's a bug in your userspace
tools) it's possible to end up with 'unlinked' mapped devices that cannot be
removed until you reboot (along with uninterruptible processes).

This patch prevents you from removing a device that is still open.

It introduces dm_lock_for_deletion() which is called when a device is about to
be removed to ensure that nothing has it open and nothing further can open it.
 It uses a private open_count for this which also lets us remove one of the
problematic bdget_disk() calls elsewhere.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
David Teigland
c2ade42dd3 [PATCH] dm: create error table
Add a library function dm_create_error_table() to create a table that rejects
any I/O sent to a device with EIO.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
17b2f66f2a [PATCH] dm: add exports
Move definitions of core device-mapper functions for manipulating mapped
devices and their tables to <linux/device-mapper.h> advertising their
availability for use elsewhere in the kernel.

Protect the contents of device-mapper.h with ifdef __KERNEL__.  And throw
in a few formatting clean-ups and extra comments.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
2b06cfff12 [PATCH] dm: consolidate creation functions
Merge dm_create() and dm_create_with_minor() by introducing the special value
DM_ANY_MINOR to request the allocation of the next available minor number.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
David Teigland
814d68629b [PATCH] dm table split_args: handle no input
Return sense if dm_split_args is called with a NULL input parameter.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Jonathan Brassow
ce503f59ae [PATCH] dm kcopyd: error accumulation fix
kcopyd should accumulate errors - otherwise I/O failures may be ignored
unintentionally.

And invert 'success' (used in a future patch), using a more intuitive
!(read_err || write_err).

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Alasdair G Kergon
8a835f11bc [PATCH] dm mirror log: sync_count fix
When a mirror is reduced in size, clear the part of the bitmap that is no
longer used.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Alasdair G Kergon
29121bd0b0 [PATCH] dm mirror log: bitset_size fix
Fix the 'sizeof' in the region log bitmap size calculation: it's uint32_t, not
unsigned long - this breaks on some archs.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Alasdair G Kergon
b7cca195c4 [PATCH] dm mirror log: refactor context
Refactor the code that creates the core and disk log contexts to avoid the
repeated allocation of clean_bits introduced by the last patch.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Kevin Corry
702ca6f0be [PATCH] dm mirror log: sector size fix
On-disk logs for dm-mirror devices are currently hard-coded to use 512 byte
hard-sector-sizes.  This patch fixes dm-log so it will work with devices with
non-512-byte hard-sector-sizes.

To maintain full compatibility, instead of moving the clean-bits bitset to a
bitset, and enlarges the disk-header buffer to encompass both the header and
the bitset.  The I/O routines for the bitset are removed, and the I/O routines
for the disk-header now also read/write the bitset.

Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Milan Broz
143535396c [PATCH] dm table: get_target: fix last index
The table is indexed from 0, so an index equal to t->num_targets should be
rejected.

(There is no code in the current tree that would exercise this bug.)

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Neil Brown
e4c8b3ba34 [PATCH] dm: mirror sector offset fix
The device-mapper core does not perform any remapping of bios before passing
them to the targets.  If a particular mapping begins part-way into a device,
targets obtain the sector relative to the start of the mapping by subtracting
ti->begin.

The dm-raid1 target didn't do this everywhere: this patch fixes it, taking
care to subtract ti->begin exactly once for each bio.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Jeff Mahoney
f0b0411536 [PATCH] dm: fix block device initialisation
In alloc_dev(), we register the device with the block layer and then continue
to initialize the device.  But register_disk() makes the device available to
be opened before we have completed initialising it.

This patch moves the final bits of the initialization above the disk
registration.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Jeff Mahoney
10da4f795f [PATCH] dm: add module ref counting
The reference counting on dm-mod is zero if no mapped devices are open.  This
is incorrect, and can lead to an oops if the module is unloaded while mapped
devices exist.

This patch claims a reference to the module whenever a device is created, and
drops it again when the device is freed.

Devices must be removed before dm-mod is unloaded.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Jeff Mahoney
7ec75f2547 [PATCH] dm: fix mapped device ref counting
To avoid races, _minor_lock must be held while changing mapped device
reference counts.

There are a few paths where a mapped_device pointer is returned before a
reference is taken.  This patch fixes them.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:35 -07:00
Jeff Mahoney
fba9f90e56 [PATCH] dm: add DMF_FREEING
There is a chicken and egg problem between the block layer and dm in which the
gendisk associated with a mapping keeps a reference-less pointer to the
mapped_device.

This patch uses a new flag DMF_FREEING to indicate when the mapped_device is
no longer valid.  This is checked to prevent any attempt to open the device
from succeeding while the device is being destroyed.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:34 -07:00
Jeff Mahoney
f32c10b099 [PATCH] dm: change minor_lock to spinlock
While removing a device, another another thread might attempt to resurrect it.

This patch replaces the _minor_lock mutex with a spinlock and uses
atomic_dec_and_lock() to serialize reference counting in dm_put().

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:34 -07:00
Jeff Mahoney
62f75c2f32 [PATCH] dm: move idr_pre_get
idr_pre_get() can sleep while allocating memory.

The next patch will change _minor_lock into a spinlock, so this patch moves
idr_pre_get() outside the lock in preparation.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:34 -07:00
Jeff Mahoney
ba61fdd17d [PATCH] dm: fix idr minor allocation
One part of the system can attempt to use a mapped device before another has
finished initialising it or while it is being freed.

This patch introduces a place holder value, MINOR_ALLOCED, to mark the minor
as allocated but in a state where it can't be used, such as mid-allocation or
mid-free.  At the end of the initialization, it replaces the place holder with
the pointer to the mapped_device, making it available to the rest of the dm
subsystem.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:34 -07:00
Alasdair G Kergon
c51c275249 [PATCH] dm snapshot: unify chunk_size
Persistent snapshots currently store a private copy of the chunk size.
Userspace also supplies the chunk size when loading a snapshot.  Ensure
consistency by only storing the chunk_size in one place instead of two.

Currently the two sizes will differ if the chunk size supplied by userspace
does not match the chunk size an existing snapshot actually uses.  Amongst
other problems, this causes an incorrect 'percentage full' to be reported.

The patch ensures consistency by only storing the chunk_size in one place,
removing it from struct pstore.  Some initialisation is delayed until the
correct chunk_size is known.  If read_header() discovers that the wrong chunk
size was supplied, the 'area' buffer (which the header already got read into)
is reinitialised to the correct size.

[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:34 -07:00
Akinobu Mita
179e09172a [PATCH] drivers: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under drivers/.

Acked-by: Corey Minyard <minyard@mvista.com>
Cc: Ben Collins <bcollins@debian.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Alasdair Kergon <dm-devel@redhat.com>
Cc: Gerd Knorr <kraxel@bytesex.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Pavlic <fpavlic@de.ibm.com>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: Andrew Vasquez <linux-driver@qlogic.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Adrian Bunk
a5d6839b75 [PATCH] drivers/md/raid6algos.c: fix a NULL dereference
This patch fixes a NULL dereference spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:08 -07:00
NeilBrown
c331eb04b9 [PATCH] md: Fix badness in sysfs_notify caused by md_new_event
From: NeilBrown <neilb@suse.de>

If an error is reported by a drive in a RAID array (which is done via
bi_end_io - in interrupt context), we call md_error and md_new_event which
calls sysfs_notify.  However sysfs_notify grabs a mutex and so cannot be
called in interrupt context.

This patch just creates a variant of md_new_event which avoids the sysfs
call, and uses that.  A better fix for later is to arrange for the event to
be called from user-context.

Note: avoiding the sysfs call isn't a problem as an error will not, by
itself, modify the sync_action attribute.  (We do still need to
wake_up(&md_event_waiters) as an error by itself will modify /proc/mdstat).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-31 16:27:11 -07:00
Neil Brown
c71d48877e [PATCH] Unlock md devices when stopping them on reboot.
otherwise we get nasty messages about locks not being released.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-26 11:52:11 -07:00
NeilBrown
5c4c33318d [PATCH] md: fix possible oops when starting a raid0 array
This loop that sets up the hash_table has problems.

Careful examination will show that the last time through, everything but
the first line is pointless.  This is because all it does is change 'cur'
and 'size' and neither of these are used after the loop.  This should ring
warning bells...  That last time through the loop,

        size += conf->strip_zone[cur].size

can index off the end of the strip_zone array.  Depending on what it finds
there, it might exit the loop cleanly, or it might spin going further and
further beyond the array until it hits an unmapped address.

This patch rearranges the code so that the last, pointless, iteration of
the loop never happens.  i.e.  the one statement of the last loop that is
needed is moved the the end of the previous loop - or to before the loop
starts - and the loop counter starts from 1 instead of 0.

Cc: "Don Dupuis" <dondster@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23 10:35:31 -07:00
NeilBrown
2adc7d47c4 [PATCH] md: Fix inverted test for 'repair' directive.
We should be able to write 'repair' to /sys/block/mdX/md/sync_action,
however due to and inverted test, that always given EINVAL.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21 12:59:17 -07:00
NeilBrown
5e7dd2ab6b [PATCH] md: Fix 'rdev->nr_pending' count when retrying barrier requests
When retrying a failed BIO_RW_BARRIER request, we need to keep the reference
in ->nr_pending over the whole retry.  Currently, we only hold the reference
if the failed request is the *last* one to finish - which is silly, because it
would normally be the first to finish.

So move the rdev_dec_pending call up into the didn't-fail branch.  As the rdev
isn't used in the later code, calling rdev_dec_pending earlier doesn't hurt.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:42 -07:00
NeilBrown
62de608da0 [PATCH] md: Improve detection of lack of barrier support in raid1
Move the test for 'do barrier work' down a bit so that if the first write to a
raid1 is a BIO_RW_BARRIER write, the checking done by superblock writes will
cause the right thing to happen.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:42 -07:00
NeilBrown
bea2771871 [PATCH] md: Change ENOTSUPP to EOPNOTSUPP
Because that is what you get if a BIO_RW_BARRIER isn't supported!

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:42 -07:00
NeilBrown
e0a33270ed [PATCH] md: Fixed refcounting/locking when attempting read error correction in raid10
We need to hold a reference to rdevs while reading and writing to attempt to
correct read errors.  This reference must be taken under an rcu lock.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:42 -07:00
NeilBrown
df30d0f4ca [PATCH] md: Avoid oops when attempting to fix read errors on raid10
We should add to the counter for the rdev *after* checking if the rdev is
NULL!!!

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:42 -07:00
Ingo Molnar
5dc5cf7dd2 [PATCH] md: locking fix
- fix mddev_lock() usage bugs in md_attr_show() and md_attr_store().
  [they did not anticipate the possibility of getting a signal]

- remove mddev_lock_uninterruptible() [unused]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-20 07:54:04 -07:00
NeilBrown
4508a7a734 [PATCH] sysfs: Allow sysfs attribute files to be pollable
It works like this:
  Open the file
  Read all the contents.
  Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
  When poll returns,
     close the file and go to top of loop.
   or lseek to start of file and go back to the 'read'.

Events are signaled by an object manager calling
   sysfs_notify(kobj, dir, attr);

If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).

This has a cost of one int  per attribute, one wait_queuehead per kobject,
one int per open file.

The name "sysfs_notify" may be confused with the inotify
functionality.  Maybe it would be nice to support inotify for sysfs
attributes as well?

This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-14 11:41:24 -07:00
NeilBrown
6f91fe88e4 [PATCH] md: make sure 64bit fields in version-1 metadata are 64-bit aligned
reshape_position is a 64bit field that was not 64bit aligned.  So swap with
new_level.

NOTE: this is a user-visible change.  However:
  - The bad code has not appeared in a released kernel
  - This code is still marked 'experimental'
  - This only affects version-1 superblock, which are not in wide use
  - These field are only used (rather than simply reported) by user-space
    tools in extemely rare circumstances : after a reshape crashes in the
    first second of the reshape process.

So I believe that, at this stage, the change is safe.  Especially if people
heed the 'help' message on use mdadm-2.4.1.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:30 -07:00
Eric Sesterhenn
b638548384 BUG_ON() Conversion in md/raid10.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:34:29 +02:00