The problem lies in the way the error handler uses TEST UNIT READY to
tell whether error recovery has succeeded. The scsi_eh_tur function
gives up after one round of retrying; after that it decides that more
error recovery is needed.
However TUR is liable to report sense data indicating a retry is needed
when in fact error recovery has succeeded. A typical example might be
SK=2, ASC=4, ASCQ=1 (Logical unit in process of becoming ready). The mere
fact that we were able to get a sensible reply to the TUR should indicate
that the device is working well enough to stop error recovery.
I ran across a case back in January where this happened. A CD-ROM drive
timed out the INQUIRY command, and a device reset fixed the blockage.
But then the drive kept responding with 2/4/1 -- because it was spinning
up I suppose -- until the error handler gave up and placed it offline.
If the initial INQUIRY had received the 2/4/1 instead, everything would
have worked okay. It doesn't seem reasonable for things to fail just
because the error handler had started running.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The maximum size of a scatter-gather list that the current IBM VSCSI
Client can handle is 10. This patch adds large scatter-gather support
to the client so that it is capable of handling up to SG_ALL(255)
number of requests in the scatter-gather list.
Signed-off-by: Linda Xie <lxie@us.ibm.com>
Acked by: Dave C Boutcher <sleddog@us.ibm.com>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On ISP24xx parts, stop execution of firmware during ISP
tear-down.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: Nishanth Aravamudan <nacc@us.ibm.com>
Replace schedule_timeout() with
msleep()/msleep_interruptible() as appropriate, to guarantee the task
delays as expected.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
fc_remove_host() should only be called after a scsi_host has
been successfully added via scsi_add_host() -- any failures
while qla2xxx probing would result in an incorrect call to
fc_remove_host() during cleanup.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export additional host information via the shost_attrs member in
the scsi_host template. Attributes include: driver version,
firmware version, ISP serial number, ISP type, ISP product ID,
HBA model name, HBA model description, PCI interconnect
information, and HBA port state.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In a corner-case failure where the request-q does not
contain enough entries for a given request, pci_unmap_sg()
would be called twice. Remove direct call and let the
failure-path logic handle the unmapping.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove unnecessary RISC pause/release barriers during
ISP24xx flash manipulation. The ISP24xx can arbitrate flash
access requests during RISC executions.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Original implementation used an overloaded bit in the EFI
parameters. The correct bit is BIT_4 of the special_options
section of NVRAM.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove redundant qla2x00_target_reset() function in favour of
the equivalent qla2x00_device_reset(). Update callers of
old function.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In an FL topology, limit port recognition to those devices
not within the same area and domain of the ISP. The
firmware will recogonize such devices during local-loop
discovery.
Some devices may respond to a PLOGI before they have
completed their fabric login or they may not be a public
device. In this case they will report:
domain == 00
area == 00
alpa == <XX>
which is valid. Exclude such devices from local loop
discovery.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export COS information for the fc_host and fc_remote_port
objects added by the driver.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In order to efficiently utilise the ISP's IOCB
request-queue, use the dma_get_required_mask() function to
determine the use of command-type 2 or 3 IOCBs when queueing
SCSI commands. This applies to ISP2[123]xx chips only, as
the ISP24xx uses command-type 7 IOCBs which use 64bit DSDs.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Assorted endianess fixes. I'll work on full endianess annotations
later.
Acked by: Moore, Eric Dean <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
#include of C files and macro tricks to rename symbols are evil and just
cause trouble. Let's doublicate the two functions as they're going to
go away soon enough anyway.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This was noticed by Doug Bazamic and the fix found by Mark Salyzyn at
Adaptec.
There was an error in the BUG_ON() statement that validated the
calculated fib size which can cause the driver to panic.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch adopts the same solution as proposed by Kai M. in
a post titled: "[PATCH] SCSI tape signed/unsigned fix".
The fix is in a function that the sg driver borrowed from
the st driver so its maintenance is a little easier if
the functions remain the same after the fix.
- change nr_pages type from unsigned to signed so errors
from get_user_pages() call are properly handled
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
reported by Doug Gilbert and fixed by him in sg.c (see [PATCH] sg direct
io/mmap oops). Doug fixed the comparison in sg.c. This fix for st.c does not
touch the comparison but makes both arguments signed to remove the
problem. The new code is adapted from linux/fs/bio.c.
Signed-off-by: Kai Makisara <kai.makisara@kolumbus.fi>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The idea behind a RAID class is to provide a uniform interface to all
RAID subsystems (both hardware and software) in the kernel.
To do that, I've made this class a transport class that's entirely
subsystem independent (although the matching routines have to match per
subsystem, as you'll see looking at the code). I put it in the scsi
subdirectory purely because I needed somewhere to play with it, but it's
not a scsi specific module.
I used a fusion raid card as the test bed for this; with that kind of
card, this is the type of class output you get:
jejb@titanic> ls -l /sys/class/raid_devices/20\:0\:0\:0/
total 0
lrwxrwxrwx 1 root root 0 Aug 16 17:21 component-0 -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:1:0/20:1:0:0/
lrwxrwxrwx 1 root root 0 Aug 16 17:21 component-1 -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:1:1/20:1:1:0/
lrwxrwxrwx 1 root root 0 Aug 16 17:21 device -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:0:0/20:0:0:0/
-r--r--r-- 1 root root 16384 Aug 16 17:21 level
-r--r--r-- 1 root root 16384 Aug 16 17:21 resync
-r--r--r-- 1 root root 16384 Aug 16 17:21 state
So it's really simple: for a SCSI device representing a hardware raid,
it shows the raid level, the array state, the resync % complete (if the
state is resyncing) and the underlying components of the RAID (these are
exposed in fusion on the virtual channel 1).
As you can see, this type of information can be exported by almost
anything, including software raid.
The more difficult trick, of course, is going to be getting it to
perform configuration type actions with writable attributes.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since the attribute container deletes from a klist while it's walking
it, it is vulnerable to the problem (and fix) here:
http://marc.theaimsgroup.com/?l=linux-scsi&m=112485448830217
The attached fixes this (but won't compile without the above).
It also fixes the logical reversal in the traversal loop which meant
that we were never actually traversing the loop to hit this bug in the
first place.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
One of the changes in the attribute_container code in the scsi-misc tree
was to add a lock to protect the list of devices per container. This,
unfortunately, leads to potential scheduling while atomic problems if
there's a sleep in the function called by a trigger.
The correct solution is to use the kernel klist infrastructure instead
which allows lockless traversal of a list.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Here's the problem. Try to do this on 2.6.12:
- Kill udev and HAL
- Insert a CD-ROM into a SCSI or USB CD-ROM drive
- Run dd if=/dev/scd0
- cat /sys/block/sr0/size
- Eject the CD, insert a different one
- Run dd if=/dev/scd0
This is likely to do "access beyond the end of device", if you let it
- cat /sys/block/sr0/size
This shows the size of a previous CD, even though dd was supposed
to revalidate the device.
- Run dd if=/dev/scd0
The second run of dd works correctly!
The bug was introduced in 2.5.31, when Al fixes the recursive opens
in partitioning. Before, the code worked like this:
- Block layer called cdrom_open directly
- cdrom_open called sr_open
- sr_open called check_disk_change
- check_disk_change called sr_media_change
- sr_media_change did cd->needs_disk_change=1
- before returning sr_open tested cd->needs_disk_change
and called get_sector_size.
In 2.6.12, the check_disk_change is called from cdrom_open only. Thus:
- Block layer calls sr_bd_open
- sr_bd_open calls cdrom_open
- cdrom_open calls sr_open
- sr_open tests cd->needs_disk_change, which wasn't set yet; returns
- cdrom_open calls check_disk_change
- check_disk_change calls sr_media_change
- sr_media_change does cd->needs_disk_change=1, but nobody cares
Acked by: Alexander Viro <aviro@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch fixes a long term borkenness in
ibmvscsi where we were using the wrong timeout
field from the scsi command (and using the
wrong units.) Now broken by the fact that the
scsi_cmnd timeout field is gone entirely.
This only worked before because all the SCSI
targets assumed that 0 was default.
Signed-off-by: Dave Boutcher <boutcher@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
C files should include the files with the prototypes for their global
functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
With the removal of the spinlocking around eh calls, we need to add a
little more locking back in, otherwise we do some naked list
manipulation.
Signed-off-by: Dave Boutcher <boutcher@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch fixes the bad assumption of the aacraid driver with use_sg.
I used the 3w-xxxx driver fix as a guide for this.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If your transport class sets the ATTRIBUTE_CONTAINER_NO_CLASSDEVS flag,
then its configure method never gets called. This patch fixes that so
that the configure method is called with a NULL classdev.
Also remove a spurious inverted comma in the transport_class comments.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
remove lots of completely dead code from aiclib, there's not a lot left
and even what's left is rather useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
remove ahd_tailq and do sane pci probing. ported over from aic7xxx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
remove some dead cruft, as done already in aic7xxx
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I recently tried to construct a totally generic transport class and
found there were certain features missing from the current abstract
transport class. Most notable is that you have to hang the data on the
class_device but most of the API is framed in terms of the generic
device, not the class_device.
These changes are two fold
- Provide the class_device to all of the setup and configure APIs
- Provide and extra API to take the device and the attribute class and
return the corresponding class_device
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch is necessary if we begin exposing underlying physical disks
(which can attach to the SPI transport class) of the hardware RAID
cards, since we don't want any SPI parameters binding to the RAID
devices.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: Christoph Hellwig <hch@lst.de>
Multi-function cards need to inherit the PCI flags from the master PCI
device.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: "Martin J. Bligh" <mbligh@mbligh.org>
drivers/scsi/aic7xxx/aic7770.c: In function `aic7770_config':
drivers/scsi/aic7xxx/aic7770.c:129: warning: unused variable `l'
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: Andrew Morton <akpm@osdl.org>
drivers/scsi/scsi.c: In function `scsi_softirq':
drivers/scsi/scsi.c:814: warning: int format, long int arg (arg 4)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Replace use of lpfc_put_lun with midlayer's int_to_scsilun
Remove driver's local definition of lpfc_put_lun (which converts an
int back to a 64-bit LUN) and replace it's use with the recently added
int_to_scsilun function provided by the midlayer.
Note: Embedding midlayer structure in our structure caused
need for more files to include midlayer headers.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix handling of the dev_loss and nodev timeouts.
Symptoms: when remote port disappears for a period of time longer then
either nodev_tmo or dev_loss_tmo, the lpfc driver worker thread will
stall removing that remote port.
Cause: removing remote port involves un-blocking and sync-ing
corresponding block device queue. But corresponding node in the lpfc
driver is still in the NPR(?node port recovery?) state and mid-layer
gets SCSI_MLQUEUE_HOST_BUSY as a return value when it is trying to call
queuecommand() with command for that node (AKA remote port)
Fix: Instead of returning SCSI_MLQUEUE_HOST_BUS from queuecommand() for
nodes in NPR states complete it with retry-able error code DID_BUS_BUSY
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix panic in lpfc_get_stats()
Symptoms: Panic on sysfs stats access
Cause: In lpfc_get_stats() we are writing to memory that we do not
own.
Fix: Fix our stats structure allocation. Embed phba->link_stats in
struct lpfc_hba and stop treating it like rogue structure.
Note: Embedding midlayer/transport structure in our structure caused
need for more files to include midlayer/transport headers.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Clear task management bits when preparing SCSI commands
In lpfc_scsi_prep_cmnd, clear the task management bits (fcpCntl2 member
in the fcp_cmd structure) when preparing regular SCSI commands.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix panic on lip and cable pull
Symptoms: Panic on lip or cable pull
Cause: Use after free of nlp in lpfc_nlp_remove()
Fix: Do not make FC transport calls after a node is removed. Transport
calls are disabled by ignoring the initial delete transition.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
IOCB BDE not getting fully initialized during reuse
Symptoms: Driver gets Status 3 and Reason 0x13 on IOCB completions.
Cause: The IOCB bpl.bdeSize and bdeFlags are not getting initialized on reuse.
Fix: Reinitialize these fields in prep_dma each time an IOCB is used.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: Steve Wilcox <spwilcox@att.com>
In order to properly report LUN's > 7, the DEC HSG80 definition in
scsi_devinfo.c needs to include BLIST_REPORTLUN2 rather than
BLIST_SPARSELUN. I've tested this change with several HSG firmware
revisions and with both Emulex and Qlogic HBA's.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There's a spurious (and illegal since it's marked __exit) call to
ahc_linux_exit() in ahc_linux_init() which causes a double list
deletion of the transport class; remove it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>