This is a direct port of the raid5 patch.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most awkward part of this is delaying write requests until bitmap updates have
been flushed.
To achieve this, we have a sequence number (seq_flush) which is incremented
each time the raid5 is unplugged.
If the raid thread notices that this has changed, it flushes bitmap changes,
and assigned the value of seq_flush to seq_write.
When a write request arrives, it is given the number from seq_write, and that
write request may not complete until seq_flush is larger than the saved seq
number.
We have a new queue for storing stripes which are waiting for a bitmap flush
and an extra flag for stripes to record if the write was 'degraded' and so
should not clear the a bit in the bitmap.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
version-1 superblocks are not (normally) 4K long, and can be of variable size.
Writing the full 4K can cause corruption (but only in non-default
configurations).
With this patch the super-block-flavour can choose a size to read, and set a
size to write based on what it finds.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
read_sb_page() assumed that if sync_page_io fails, the device would be marked
faultly. However it isn't. So in the face of error, read_sb_page would loop
forever.
Redo the logic so that this cannot happen.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As this is used to flag an internal bitmap.
Also, introduce symbolic names for feature bits.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is possibly (and occasionally useful) to have a raid1 without persistent
superblocks. The code in add_new_disk for adding a device to such an array
always tries to read a superblock.
This will obviously fail.
So do the appropriate test and call md_import_device with
appropriate args.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When hot-adding a bitmap, bitmap_daemon_work could get called while the bitmap
is being created, so don't set mddev->bitmap until the bitmap is ready.
This requires freeing the bitmap inside bitmap_create if creation failed
part-way through.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 'lastrun' time wasn't being initialised, so it could be half a
jiffie-cycle before it seemed to be time to do work again.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A state of 0 mean 'not quiesced'
A state of 1 means 'is quiesced'
The original code got this wrong.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
linear currently uses division by the size of the smallest componenet device
to find which device a request goes to. If that smallest device is larger
than 2 terabytes, then the division will not work on some systems.
So we introduce a pre-shift, and take care not to make the hash table too
large, much like the code in raid0.
Also get rid of conf->nr_zones, which is not needed.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a device is flagged 'WriteMostly' and the array has a bitmap, and the
bitmap superblock indicates that write_behind is allowed, then write_behind is
enabled for WriteMostly devices.
Write requests will be acknowledges as complete to the caller (via b_end_io)
when all non-WriteMostly devices have completed the write, but will not be
cleared from the bitmap until all devices complete.
This requires memory allocation to make a local copy of the data being
written. If there is insufficient memory, then we fall-back on normal write
semantics.
Signed-Off-By: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows a device in a raid1 to be marked as "write mostly". Read requests
will only be sent if there is no other option.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both file-bitmaps and superblock bitmaps are supported.
If you add a bitmap file on the array device, you lose.
This introduces a 'default_bitmap_offset' field in mddev, as the ioctl used
for adding a superblock bitmap doesn't have room for giving an offset. Later,
this value will be setable via sysfs.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we find a 'stale' bitmap, possibly because it is new, we should just
assume every bit needs to be set, but rather base the setting of bits on the
current state of the array (degraded and recovery_cp).
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
... otherwise we loose a reference and can never free the file.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix another bug in dm-raid1.c that the dirty region may stay in or be moved
to clean list and freed while in use.
It happens as follows:
CPU0 CPU1
------------------------------------------------------------------------------
rh_dec()
if (atomic_dec_and_test(pending))
<the region is still marked dirty>
rh_inc()
if the region is clean
mark the region dirty
and remove from clean list
mark the region clean
and move to clean list
atomic_inc(pending)
At this stage, the region is in clean list and will be mistakenly reclaimed
by rh_update_states() later.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
md does not yet support BIO_RW_BARRIER, so be honest about it and fail
(-EOPNOTSUPP) any such requests.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
'this_sector' is a virtual (array) address while 'head_position' is a physical
(device) address, so substraction doesn't make any sense. devs[slot].addr
should be used instead of this_sector.
However, this patch doesn't make much practical different to the read
balancing due to the effects of later code.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens:
->bi_set is totally unnecessary bloat of struct bio. Just define a proper
destructor for the bio and it already knows what bio_set it belongs too.
Peter:
Fixed the bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch goes through the current users of the crypto layer and sets
CRYPTO_TFM_REQ_MAY_SLEEP at crypto_alloc_tfm() where all crypto operations
are performed in process context.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible for this to still have flags in it and a previous instance
has been stopped, and that confused the new array using the same mddev.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just discovered this is needed for module auto-loading.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We weren't actually waking up the md thread after setting
MD_RECOVERY_NEEDED when assembling an array, so it is possible to lose a
race and not actually start resync.
So add a call to md_wakeup_thread, and while we are at it, remove all the
"if (mddev->thread)" guards as md_wake_thread does its own checking.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
... otherwise we might try to load a bitmap from an array which hasn't one.
The bug is that if you create an array with an internal bitmap, shut it down,
and then create an array with the same md device, the md drive will assume it
should have a bitmap too. As the array can be created with a different md
device, it is mostly an inconvenience. I'm pretty sure there is no risk of
data corruption.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This code was never designed to handle more than one instance of do_work()
running at once.
Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The recent change to never ignore the bitmap, revealed that the bitmap isn't
begin flushed properly when an array is stopped.
We call bitmap_daemon_work three times as there is a three-stage pipeline for
flushing updates to the bitmap file.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Firstly, R1BIO_Degraded was being set in a number of places in the resync
code, but is never used there, so get rid of those settings.
Then: When doing a resync, we want to clear the bit in the bitmap iff the
array will be non-degraded when the sync has completed. However the current
code would clear the bitmap if the array was non-degraded when the resync
*started*, which obviously isn't right (it is for 'resync' but not for
'recovery' - i.e. rebuilding a failed drive).
This patch calculated 'still_degraded' and uses the to tell bitmap_start_sync
whether this sync should clear the corresponding bit.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The code currently will ignore the bitmap if the array seem to be in-sync.
This is wrong if the array is degraded, and probably wrong anyway. If the
bitmap says some chunks are not in in-sync, and the superblock says everything
IS in sync, then something is clearly wrong, and it is safer to trust the
bitmap.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Until the bitmap code was added,
modprobe md
would load the md module. But now the md module is called 'md-mod', so we
really need an alias for backwards comparability.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The default resync_max_sector is set to "mddev->size << 1". If the
raid-personality-module updates mddev->size, it must update
resync_max_sectors too.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is an attempt to fix deadlocks discovered in the core dm.
The problems boil down to md->lock having to be held in too many places, so
I've split it into two: md->suspend_lock and md->io_lock.
suspend_lock is now held throughout dm_suspended() as well as dm_resume()
and dm_swap_table() so that these functions cannot run concurrently:
there's no requirement for that and it added complexity.
DMF_FS_LOCKED becomes redundant: DMF_SUSPENDED provides adequate
protection.
Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid another bdget_disk which can deadlock.
Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some code tidy-ups in preparation for the next patches. Change
dm_table_pre/postsuspend_targets to accept NULL. Use dm_suspended()
throughout.
Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
`gcc -W' likes to complain if the static keyword is not at the beginning of
the declaration. This patch fixes all remaining occurrences of "inline
static" up with "static inline" in the entire kernel tree (140 occurrences in
47 files).
While making this change I came across a few lines with trailing whitespace
that I also fixed up, I have also added or removed a blank line or two here
and there, but there are no functional changes in the patch.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without this, and attempt to 'grow' an array will claim to have synced the
extra part without actually having done anything.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to be careful differentiating between a resync of a complete array,
in which we can clear the bitmap, and a resync of a degraded array, in
which we cannot.
This patch cleans all that up.
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Apparently sector_div is only guaranteed to work with a 32bit divisor, even
on 64bit architectures. So allow for this in raid0.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Handle writes to a snapshot-origin device that has been extended since the
snapshot was taken.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Prevent more than one priority group initialisation function from being
outstanding at once. Otherwise the completion functions interfere with each
other. Also, reloading the table could reference a freed pointer.
Only reset queue_io in pg_init_complete if another pg_init isn't required.
Skip process_queued_ios if the queue is empty so that we only trigger a
pg_init if there's I/O.
Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To avoid deadlock when suspending a multipath device after all its paths have
failed, stop queueing any I/O that is about to fail *before* calling
freeze_bdev instead of after.
Instead of setting a multipath 'suspended' flag which would have to be reset
if an error occurs during the process, save the previous queueing state and
leave userspace to restore if it wishes.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The multipath destructor must flush its workqueue. Otherwise items that
reference the destroyed object could remain.
From: "goggin, edward" <egoggin@emc.com>
Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dm multipath will report barriers as not supported with this patch.
Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set the target's split_io field when building a dm-mirror device so
incoming bios won't span the mirror's internal regions. Without this,
regions can be accessed while not holding correct locks and data corruption
is possible.
Reported-By: "Zhao Qian" <zhaoqian@aaastor.com>
From: Kevin Corry <kevcorry@us.ibm.com>
Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
insert a missing bio_put when writting the md superblock.
Without this we have a steady growth in the "bio" slab.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. Establish a simple API for process freezing defined in linux/include/sched.h:
frozen(process) Check for frozen process
freezing(process) Check if a process is being frozen
freeze(process) Tell a process to freeze (go to refrigerator)
thaw_process(process) Restart process
frozen_process(process) Process is frozen now
2. Remove all references to PF_FREEZE and PF_FROZEN from all
kernel sources except sched.h
3. Fix numerous locations where try_to_freeze is manually done by a driver
4. Remove the argument that is no longer necessary from two function calls.
5. Some whitespace cleanup
6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
cleared before setting PF_FROZEN, recalc_sigpending does not check
PF_FROZEN).
This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch creates a new kstrdup library function and changes the "local"
implementations in several places to use this function.
Most of the changes come from the sound and net subsystems. The sound part
had already been acknowledged by Takashi Iwai and the net part by David S.
Miller.
I left UML alone for now because I would need more time to read the code
carefully before making changes there.
Signed-off-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>