libata implemented a feature to schedule EH without an associated EH
by manipulating shost->host_eh_scheduled in ata_scsi_schedule_eh()
directly. Move this function to scsi_error.c and rename it to
scsi_schedule_eh(). It is now an exported API for SCSI transports and
exported via new header file drivers/scsi/scsi_transport_api.h
This patch also de-export scsi_eh_wakeup() which was exported
specifically for ata_scsi_schedule_eh().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
LLDDs rely on libata that certain EH actions are automatically taken
on some errors. If the port is frozen or one or more qc's have failed
with HSM violation or timeout, softreset is enforced (LLDD can ask for
storonger EH action at will). If any other error condition exists,
libata EH always revalidates.
This behavior existed in earlier revisions of new EH but lost during
development process. This patch restores it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix the HSM error_mask mapping.
Changes:
- Better mapping in ac_err_mask()
- In HSM_ST_FIRST ans HSM_ST state, check ATA_ERR|ATA_DF and map it to AC_ERR_DEV instead of AC_ERR_HSM.
- In HSM_ST_FIRST and HSM_ST state, map DRQ=1 ERR=1 to AC_ERR_HSM.
- For PIO data in and DRQ=1 ERR=1, add check after the junk data block is read.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This fixes a byte-swap issue on PPC, found by Zang Roy-r61911
on the powerpc platform. His original patch also had some other
platform-specific changes in #ifdef's, but I'm not sure yet how to
incorporate them. Look for another patch for those (soon).
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The driver currently keeps local copies of the hardware request/response queue indexes.
But it expends significant effort ensuring consistency between the two views,
and still gets it wrong after an error or reset occurs.
This patch removes the local copies, in favour of just accessing the hardware
whenever we need them. Eventually this may need to be tweaked again for NCQ,
but for now this works and solves problems some users were seeing.
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The 60xx chips, and possibly others, incorrectly assert DEV_IRQ interrupts
on a regular basis. The cause of this is under investigation (by me and
in theory by Marvell also), but regardless we do need to deal with these events.
This patch tidies up some interrupt handler code, and ensures that we ignore
DEV_IRQ interrupts when the drive still has ATA_BUSY asserted.
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The interface control register of the 60xx (and later) Marvell chip
requires certain bits to always be set when writing to it. These bits
incorrectly read-back as zeros, so the pattern must be ORed in
with each write of the register. Also, bit 12 should NOT be set
(note that Marvell's own driver also had bit-12 wrong here).
While we're at it, we also now do pci_set_master() in the init code.
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
In some systems, it is possible that the BIOS may have enabled interrupt coalescing
for the Marvell controllers which support it. This patch adds code to detect/ack
interrupts from the chip's coalescing (combing) logic.
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The mv_err_intr() function is invoked from the driver's interrupt handler,
as well as from the timeout function. This patch prevents it from triggering
a one-after-the-other double reset of the controller when invoked
from the timeout function.
This also adds a check for a timeout race condition that has been observed
to occur with this driver in earlier kernels. This should not be needed,
in theory, but in practice it has caught bugs. Maybe nuke it at a later date.
Signed-off-by: Mark Lord <liml@rtr.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement NCQ support. Sil24 has 31 command slots and all of them are
used for NCQ command queueing. libata guarantees that no other
command is in progress when it issues an internal command, so always
use tag 0 for internal commands.
Signed-off-by: Tejun Heo <htejun@gmail.com>
With NCQ, there are multiple sg tables, so pp->cmd_tbl_sg doesn't cut
it. Directly calculate sg table address from pp->cmd_tbl.
Signed-off-by: Tejun Heo <htejun@gmail.com>
* Rename CMD_TBL_HDR to CMD_TBL_HDR_SZ as it's size not offset.
* Define MAX_CMDS and CMD_SZ and use them in calculation of other
constants.
* Define CMD_TBL_AR_SZ as product of CMD_TBL_SZ and MAX_CMDS, and use
it when calculating PRIV_DMA_SZ.
* CMD_SLOT_SZ is also dependent on MAX_CMDS but hasn't been changed
because I didn't want to change the value used by the original code
(32 commands). Later NCQ change will bump MAX_CMDS to 32 anyway and
the hard coded 32 can be changed to MAX_CMDS then.
* Reorder HOST_CAP_* flags.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Now that all NCQ related stuff are in place, implement NCQ device
configuration and bump ATA_MAX_QUEUE to 32 thus activating NCQ
support.
Original implementation is from Jens Axboe.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Update EH to handle NCQ. ata_eh_autopsy() is updated to call
ata_eh_analyze_ncq_error() which reads log page 10h on NCQ device
error and updates eh_context accordingly. ata_eh_report() is updated
to report SActive.
Signed-off-by: Tejun Heo <htejun@gmail.com>
This patch implements NCQ command translation and exclusion. Note
that NCQ commands don't use ata_rwcmd_protocol() to choose ATA
command. This is because, unlike non-NCQ RW commands, NCQ commands
can only be used for NCQ protocol and FUA handling is done with a flag
rather than separate command.
NCQ enabled device will have queue depth larger than one but no two
non-NCQ commands can be issued simultaneously, neither can a non-NCQ
command and NCQ commands. This patch makes ata_scsi_translate()
return SCSI_MLQUEUE_DEVICE_BUSY if such exclusion is necessary. SCSI
midlayer will retry the command later.
As SCSI midlayer always retries once a command completes, this doesn't
incur unnecessary delays and as most commands will be NCQ ones for NCQ
device, so the overhead should be negligible.
Initial implementation is from Jens Axboe and using
SCSI_MLQUEUE_DEVICE_BUSY for exclusion is suggested by Jeff Garzik.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add ap->qc_active and ap->sactive, mask of all active qcs and libata's
view of the SActive register, respectively. Also, implement
ata_qc_complete_multiple() which takes new qc_active mask and complete
multiple qcs according to the mask.
These will be used to track NCQ commands and complete them. The
distinction between ap->qc_active and ap->sactive is also useful for
later PM implementation.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Rename ap->qactive to ap->qc_allocated. This is to accomodate
addition of ap->qc_active, mask of active qcs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
ata_scsi_translate() will need to return SCSI_ML_QUEUE_DEVICE_BUSY to
achieve exlusion between NCQ and non-NCQ commands or among non-NCQ
commands. Pass its return value upward to SCSI midlayer.
Signed-off-by: Tejun Heo <htejun@gmail.com>
* kill ata_poll_qc_complete() and implement/use ata_hsm_qc_complete()
which completes qcs in new EH compliant manner from HSM
* don't print error message from ata_hsm_move(). it's responsibility
of EH.
* kill ATA_FLAG_NOINTR usage in bmdma EH
Signed-off-by: Tejun Heo <htejun@gmail.com>
Convert sata_sil24 to new EH.
* When port is frozen, IRQ for the port is masked.
* sil24_softreset() doesn't need to mangle with IRQ mask anymore.
libata ensures that the port is frozen during reset.
* Only turn on interrupts which are handled by interrupt handler and
EH. As we don't handle SDB notify yet, turn it off. DEV_XCHG and
UNK_FIS are handled by EH and thus turned on.
* sil24_softreset() usually fails to recover the port after DEV_XCHG.
ATA_PORT_HARDRESET is used as recovery action for DEV_XCHG.
* sil24 may be invoked without any active command. e.g. DEV_XCHG irq
occuring while no qc in progress still triggers EH and will reset
the port and revalidate attached device.
Signed-off-by: Tejun Heo <htejun@gmail.com>
During multiblock PIO, multiple PIOS interrupts are generated before
qc compltion. Current code prints unnecessary message for such cases.
This is exposed when new EH slows down attached device into PIO mode.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Convert AHCI to new EH. Unfortunately, ICH7 AHCI reacts badly if IRQ
mask is diddled during operation. So, freezing is implemented by
unconditionally clearing interrupt conditions while frozen.
* Interrupts are categorized according to required action.
e.g. Connection status or unknown FIS error requires freezing the
port while TF or HBUS_DATA don't.
* Only CONNECT (reflects SErr.X) interrupt is taken into account not
PHYRDY (SErr.N), as CONNECT is better cue for starting EH.
* AHCI may be invoked without any active command. e.g. CONNECT irq
occuring while no qc in progress still triggers EH and will reset
the port and revalidate attached device.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Convert sata_sil to new EH. As these controllers have hardware
interrupt mask and are known to have screaming interrupts issues, use
hardware IRQ masking for freezing. sil_freeze() masks interrupts for
the port and sil_thaw() unmasks them. As ports are automatically
frozen before probing reset, there is no need to initialize interrupt
masks sil_init_onde(). Remove related code.
Other than freezing, sata_sil uses stock BMDMA EH routines.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement new EH. The exported interface is ata_do_eh() which is to
be called from ->error_handler and performs the following steps to
recover the failed port.
ata_eh_autopsy() : analyze SError/TF, determine the cause of failure
and required recovery actions and record it in
ap->eh_context
ata_eh_report() : report the failure to user
ata_eh_recover() : perform recovery actions described in ap->eh_context
ata_eh_finish() : finish failed qcs
LLDDs can customize error handling by modifying eh_context before
calling ata_do_eh() or, if necessary, doing so inbetween each major
steps by calling each step explicitly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
struct ata_eh_info serves as the communication channel between
execution path and EH. Execution path describes detected error
condition in ap->eh_info and EH recovers the port using it. To avoid
missing error conditions detected during EH, EH makes its own copy of
eh_info and clears it on entry allowing error info to accumulate
during EH.
Most EH states including EH's copy of eh_info are stored in
ap->eh_context (struct ata_eh_context) which is owned by EH and thus
doesn't require any synchronization to access and alter. This
standardized context makes it easy to integrate various parts of EH
and extend EH to handle multiple links (for PM).
Signed-off-by: Tejun Heo <htejun@gmail.com>
This patch implements ata_ering and uses it to define dev->ering.
ata_ering is a ring buffer which records libata errors - whether a
command was for normar IO request, err_mask and timestamp. Errors are
recorded per-device in dev->ering. This will be used by EH to
determine recovery actions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
SCSI command completion path used to do some part of EH including
printing messages and obtaining sense data. With new EH, all these
are responsibilities of the EH, update SCSI command completion path to
reflect this.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Update ata_exec_internal() such that it uses new EH framework.
->post_internal_cmd() is always invoked regardless of completion
status. Also, when ata_exec_internal() detects a timeout condition
and new EH is in place, it freezes the port as timeout for normal
commands would do.
Note that ata_port_flush_task() is called regardless of
wait_for_completion status. This is necessary as exceptions unrelated
to the qc can abort the qc, in which case PIO task could still be
running after the wait for completion returns.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Update ata_scsi_error() for new EH. ata_scsi_error() is responsible
for claiming timed out qcs and invoking ->error_handler in safe and
synchronized manner. As the state of the controller is unknown if a
qc has timed out, the port is frozen in such cases.
Note that ata_scsi_timed_out() isn't used for new EH. This is because
a timed out qc cannot be claimed by EH without freezing the port and
freezing the port in ata_scsi_timed_out() results in unnecessary
abortion of other active qcs. ata_scsi_timed_out() can be removed
once all drivers are converted to new EH.
While at it, add 'TODO: kill' comments to old EH functions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
PIO executes without holding host_set lock, so it cannot be
synchronized using the same mechanism as interrupt driven execution.
port_task framework makes sure that EH is not entered until PIO task
is flushed, so PIO task can be sure the qc in progress won't go away
underneath it. One thing it cannot be sure of is whether the qc has
already been scheduled for EH by another exception condition while
host_set lock was released.
This patch makes ata_poll_qc-complete() handle such conditions
properly and make it freeze the port if HSM violation is detected
during PIO execution.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Freezing is performed atomic w.r.t. host_set->lock and once frozen
LLDD is not allowed to access the port or any qc on it. Also, libata
makes sure that no new qc gets issued to a frozen port.
A frozen port is thawed after a reset operation completes
successfully, so reset methods must do its job while the port is
frozen. During initialization all ports get frozen before requesting
IRQ, so reset methods are always invoked on a frozen port.
Optional ->freeze and ->thaw operations notify LLDD that the port is
being frozen and thawed, respectively. LLDD can disable/enable
hardware interrupt in these callbacks if the controller's IRQ mask can
be changed dynamically. If the controller doesn't allow such
operation, LLDD can check for frozen state in the interrupt handler
and ack/clear interrupts unconditionally while frozen.
Signed-off-by: Tejun Heo <htejun@gmail.com>
ata_port_schedule_eh() directly schedules EH for @ap without
associated qc. Once EH scheduled, no further qc is allowed and EH
kicks in as soon as all currently active qc's are drained.
ata_port_abort() schedules all currently active commands for EH by
qc_completing them with ATA_QCFLAG_FAILED set. If ata_port_abort()
doesn't find any qc to abort, it directly schedule EH using
ata_port_schedule_eh().
These two functions provide ways to invoke EH for conditions which
aren't directly related to any specfic qc.
Signed-off-by: Tejun Heo <htejun@gmail.com>
There are several ways a qc can get schedule for EH in new EH. This
patch implements one of them - completing a qc with ATA_QCFLAG_FAILED
set or with non-zero qc->err_mask. ALL such qc's are examined by EH.
New EH schedules a qc for EH from completion iff ->error_handler is
implemented, qc is marked as failed or qc->err_mask is non-zero and
the command is not an internal command (internal cmd is handled via
->post_internal_cmd). The EH scheduling itself is performed by asking
SCSI midlayer to schedule EH for the specified scmd.
For drivers implementing old-EH, nothing changes. As this change
makes ata_qc_complete() rather large, it's not inlined anymore and
__ata_qc_complete() is exported to other parts of libata for later
use.
Signed-off-by: Tejun Heo <htejun@gmail.com>
New EH framework has clear distinction about who owns a qc. Every qc
starts owned by normal execution path - PIO, interrupt or whatever.
When an exception condition occurs which affects the qc, the qc gets
scheduled for EH. Note that some events (say, link lost and regained,
command timeout) may schedule qc's which are not directly related but
could have been affected for EH too. Scheduling for EH is atomic
w.r.t. ap->host_set->lock and once schedule for EH, normal execution
path is not allowed to access the qc in whatever way. (PIO
synchronization acts a bit different and will be dealt with later)
This patch make ata_qc_from_tag() check whether a qc is active and
owned by normal path before returning it. If conditions don't match,
NULL is returned and thus access to the qc is denied.
__ata_qc_from_tag() is the original ata_qc_from_tag() and is used by
libata core/EH layers to access inactive/failed qc's.
This change is applied only if the associated LLDD implements new EH
as indicated by non-NULL ->error_handler
Signed-off-by: Tejun Heo <htejun@gmail.com>
New EH may issue internal commands to recover from error while failed
qc's are still hanging around. To allow such usage, reserve tag
ATA_MAX_QUEUE-1 for internal command. This also makes it easy to tell
whether a qc is for internal command or not. ata_tag_internal() test
implements this test.
To avoid breaking existing drivers, ata_exec_internal() uses
ATA_TAG_INTERNAL only for drivers which implement ->error_handler.
For drivers using old EH, tag 0 is used. Note that this makes
ata_tag_internal() test valid only when ->error_handler is
implemented. This is okay as drivers on old EH should not and does
not have any reason to use ata_tag_internal().
Signed-off-by: Tejun Heo <htejun@gmail.com>