The oops is characteristic of the underlying device being removed from
visibility before the class device, and sure enough we do device_del()
before transport_unregister() in the scsi_target_reap() routines. I've
no idea why this is suddenly showing up, since the code has been in
there since that function was first invented. However, I've confirmed
this fixes Andrew Vasquez's boot oops.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch prevents libata from incorrectly modifying inquiry
VPD pages and command support data from ATAPI devices. I have tested
the patch with a SATA ATAPI tape drive on an AHCI controller.
Patch is against kernel 2.4.32 with 2.4.32-libata1.patch applied.
Anthony J. Battersby
Cybernetics
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Miquel van Smoorenburg <miquels@cistron.nl> forwarded me this fix to
resolve a deadlock condition that occurs due to the API change in
2.6.13+ kernels dropping the host locking when entering the error
handling. They all end up calling adpt_i2o_post_wait(), which if you
call it unlocked, might return with host_lock locked anyway and that
causes a deadlock.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
scsi_reap_target() was desgined to be called from any context.
However it must do a device_del() of the target device, which may only
be called from user context. Thus we have to reimplement
scsi_reap_target() via a workqueue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In the scenario that a link was broken, the devloss timer for each
rport was expire at roughly the same time, causing lots of "delete"
workqueue items being queued. Depth is dependent upon the number of
rports that were on the link.
The rport target remove calls were calling flush_scheduled_work(),
which would interrupt the stream, and start the next workqueue item,
which did the same thing, and so on until recursion depth was large.
This fix stops the recursion in the initial delete path, and pushes it
off to a host-level work item that reaps the dead rports.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This follows on from Jens' patch and consolidates all of the ULD
separate handlers for REQ_BLOCK_PC into a single call which has his
fix for our direction bug.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we got a device only capable of async, we would zero out goal->period
which would cause us to try PPR negotiations. Leave goal->period alone,
and check goal->offset before doing PPR. Kudos to Daniel Forsgren for
figuring this out.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Some hardware does not support the PACKET command at all.
Other hardware supports ATAPI, but the driver does something nasty such
as calling BUG() when an ATAPI command is issued.
For these such cases, we mark them with a new flag, ATA_FLAG_NO_ATAPI.
Initial version contributed by Ben Collins.
Fix incorrect pointer usage on two calls to kunmap_atomic().
This seems to happen a lot, because kunmap() wants the struct page *,
whereas kunmap_atomic() instead wants the mapped virtual address.
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
There is a double free in the scsi scan code if a LLDD's slave_alloc()
call fails. There is a direct call to scsi_free_queue and then the
following put_device calls the release function, which also frees the
queue.
Remove the redundant scsi_free_queue.
Signed-off-by: Brian King <brking@us.ibm.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
[ Also removed some strange whitespace artifacts in that area ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current scsi scanning code appears to have a use after free
bug is a LLDD's slave_alloc fails. Remove the redundant
scsi_free_queue.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This reverts commit 1b0997f561, which in
turn reverted 34ea80ec6a (which is thus
re-instated).
Quoth James Bottomley:
"All it's doing is deferring the device_put() from the
scsi_put_command() to after the scsi_run_queue(), which doesn't fix
the sleep while atomic problem of the device release method. In both
cases we still get the semaphore in atomic context problem which is
caused by scsi_reap_target() doing a device_del(), which I assumed
(wrongly) was valid from atomic context."
who also promised to fix scsi_reap_target().
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The scsi_library routines don't correctly set DMA_NONE when
req->data_len is zero (instead they check the command type first, so
if it's write, we end up with req->data_len == 0 and direction as
DMA_TO_DEVICE which confuses some drivers)
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The eh_action semaphore in scsi_eh_send_command is cleared after a
command timeout. The command is subsequently aborted and the abort
will try to call scsi_done() on it. Unfortunately, the scsi_eh_done()
routine unconditinally completes the semaphore (which is now null).
Fix this race by makiong the scsi_eh_done() routine check that the
semaphore is non null before completing it (mirroring the ordinary
command done/timeout logic).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The SCSI megaraid drive goes to great effort to kmap
the scatterlist buffer (if used), but then uses the
wrong pointer when copying to it afterward.
Signed-off-by: Mark Lord <lkml@rtr.ca>
Acked by: Ju, Seokmann <Seokmann.Ju@engenio.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Properly check FC_RESID for any non-transfered bytes
regardless of firmware completion status.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A regression in a recent change
33135aa2a5 caused the driver
to mistakenly drop handling of AENs. Due to the incorrect
handling, ports would not reappear after RSCNs and LIPs.
Drops unused/incorrect compound #define from qla_def.h.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This makes ibmvscsi work correctly with the recent set of kexec
patches that went in. This is based on work by Michael Ellerman, who
chased this initially. He validated that it works during kexec.
Handle kexec correctly in ibmvscsi. During kexec the adapter
will not get cleaned up correctly, so we may need to reset it
to make it sane again.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch makes ata_scsi_pass_thru() properly set result code and
sense data on translation failures.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This reverts commit 34ea80ec6a.
It does a put_device() from softirq context, which is bad since it gets
a semaphore for reading.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sg's st_map_user_pages is modelled on an earlier version of st's
sgl_map_user_pages, and has the same bug: if get_user_pages got some but
not all of the pages, then those got were released, but the positive res
code returned implied that they were still to be freed.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2.6.15-rc1 made sg's st_unmap_user_pages and st's sgl_unmap_user_pages
BUG on a PageReserved page. But that's wrong: they could be unmapping
the ZERO_PAGE, which is marked PG_reserved; and perhaps others (while
get_user_pages is still permitted on VM_PFNMAP areas - that may change).
More change is needed here: sg claims to dirty even pages written from,
and st claims not to dirty even pages read into; and SetPageDirty is not
adequate for this nowadays. Fixes to those follow in a later patch: for
the moment just fix the 2.6.15 regression.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Nick and I had already been looking at drivers/scsi/{sg.c,st.c},
brought there by __put_page in sg.c's peculiar sg_rb_correct4mmap,
which we'd like to remove. But that's irrelevant to your pain, except...
One extract from the patches I'd like to send Doug and Kai for 2.6.15
or 2.6.16 is this below: since the incomplete get_user_pages path omits
to reset res, but has already released all the pages, it will result in
premature freeing of user pages, and behaviour just like you've seen.
Though I'd have thought incomplete get_user_pages was an exceptional
case, and a bit surprised you'd encounter it. Perhaps there's some
other premature freeing in the driver, and this instance has nothing
whatever to do with it.
If the problem were easily reproducible, it'd be great if you could
try this patch; but I think you've said it's not :-(
Signed-off-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Enabling these features causes problems with some drives, so disable
them until they're debugged
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn.
scsi_bios_ptable return value is not being checked in aac_biosparm.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Some SCSI devices apparently get very confused if we try to use the
echo buffer on a non-DT negotiated bus (this mirrors the problems of
using PPR on non-LVD for some devices). The fix is to be far more
conservative about when we use an echo buffer. With this patch, we'll
now see what parameters are negotiated by the read only test, and only
look for an echo buffer if DT is negotiated.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fixes locking in megaraid.c, namely:
(1) make sure megaraid_queue release the adapter lock by changing the
code to have a single return
(2) remove the errornous scsi_assign_lock call
Testing by Burton Windle.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Burton Windle <bwindle@fint.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To transport scsi reset command to device aic7xxx reset handler looks
at the driver's pending_list and searches any proper command. However
the search condition has been inverted: ahc_match_scb() returns TRUE
if a matched command is found. As a result the reset on required
devices did not turn out well, a correctly working neighbour device
may be surprised by the reset. aic7xxx reset handler reports about the
success, but really the original situation is not corrected yet.
Signed-off-by: Vasily Averin <vvs@sw.ru>
Naturally, there's a corresponding problem in the aic79xx driver, so
I've also added the same fix for that.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_get_command() attempts to write into a structure that may not have
been successfully allocated. Move this write inside the if statement that
ensures we won't panic the kernel with a NULL pointer dereference.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The structure ide_driver_t have a .owner field which is a duplicate
of .gendriver.owner field (.gen_driver is a struct device_driver).
This patch removes ide_driver_t's owner field.
Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
sil24_error_intr logs all error interrupts. ATAPI devices generates
many harmless errors which can be ignored and all serious ones are
reported via sense data by SCSI layer. Don't log device errors from
ATAPI devices.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch implements ATAPI support for sil24 and bumps driver version
to 0.23.
Signed-off-by: Tejun Heo <htejun@gmail.com>
--
Jeff, it has been converted to use ->dev_config as pointed out.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
There seems to be no way to obtain device signature from sil24 after
SATA phy reset and SRST is needed anyway for later port multiplier
suppport. This patch converts sil24_phy_reset to use SRST instaed.
Signed-off-by: Tejun Heo <htejun@gmail.com>
--
Jeff, I didn't remove the 10ms sleep just to be on the safe side. I
think we can live with 10ms sleep on SRST.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
When an error condition is raised by device via D2H FIS or SDB. sil24
controller should be restarted by setting PORT_CS_INIT and waiting
until PORT_CS_RDY is asserted instead of resetting the controller.
This patch implements sil24_restart_controller for those cases. This
patch also makes sure that PORT_CS_RDY is asserted on
sil24_reset_controller completion.
Signed-off-by: Tejun Heo <htejun@gmail.com>
--
Jeff, delay is reduced to 1us and cnt increased to 10k. My sil3124
turns on PORT_CS_RDY on the second iteration even without any delay.
I think 10k * 1us should be more than enough.
I tried to convert both restart and reset to use msleep's with work
queue, but if we do that, host_set lock should be released after
initiating restart or reset, leading to race condition among
reset/restart, other interrupts and timeout. Implementing
synchronization among those in low-level driver doesn't seem right.
Well, reduced timeout should work for the time being.
Thanks.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Handle errata (it was unintentional on this h/w, whereas its intentional
on others) whereby the nIEN bit in Device Control is ignored, leading to
a situation where a hardware interrupt completes the qc before the
polling code has a chance to.
This will get fixed The Right Way(tm) once Albert Lee's irq-pio
branch is merged, as the more natural PIO method on this hardware is
interrupt-driven.
- DMA boundary was being handled incorrectly. Copied the code from
ata_fill_sg(), since Marvell has the same DMA boundary needs.
(we can't use ata_fill_sg directly since we have different hardware
descriptors)
- cleaned up the SATA phy reset code, to deal with various errata
ATA devices don't generate many errors, so the preferred method is to
printk() when they occur.
ATAPI devices generate tons of exceptions during the normal course
of operation, so this change skips logging the most common class of
errors.
The following code segment is not functional because the transfer cycle time speficied by
the EIDE device is later overwritten by ata_timing_quantize():
/*
* If the drive is an EIDE drive, it can tell us it needs extended
* PIO/MW_DMA cycle timing.
*/
if (adev->id[ATA_ID_FIELD_VALID] & 2) { /* EIDE drive */
memset(&p, 0, sizeof(p));
(snip)
ata_timing_merge(&p, t, t, ATA_TIMING_CYCLE | ATA_TIMING_CYC8B);
<== uninitialized "t" is used here
}
/*
* Convert the timing to bus clock counts.
*/
ata_timing_quantize(s, t, T, UT); <== t is overwritten by quantized s
The patch has been submitted for ide-timing.h before:
http://marc.theaimsgroup.com/?l=linux-ide&m=110820013425454&w=2
Resubmitted for libata.
Changes:
- Minor fix to honor the following transfer cycle time speficied by the device
- id[65]: Minimum Multiword DMA transfer cycle time per word
- id[67]: Minimum PIO transfer cycle time without flow control
- id[68]: Minimum PIO transfer cycle time with IORDY
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
=======
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Adds constants for ATAPI support to sata_sil24. This patch is
originally from Jeff Garzik <jgarzik@pobox.com>.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>