Fix a (rare) race condition in mv_interrupt() when using MSI.
The value of hpriv->main_irq_mask_addr can change on on the fly,
and without this patch we could end up writing back a stale copy
to the hardware.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
For some reason, sata_mv doesn't clear interrupt status during init
when it's running on an SoC host adapter. If the bootloader has
touched the SATA controller before starting Linux, Linux can end up
enabling the SATA interrupt with events pending, which will cause the
interrupt to be marked as spurious and then be disabled, which then
breaks all further accesses to the controller.
This patch makes the SoC path clear interrupt status on init like in
the non-SoC case.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Fix chip type for the Highpoint RocketRAID 1740 and 1742 PCI cards.
These really do have Marvell 6042 chips on them, rather than the 5081 chip.
Confirmed by multiple (two) users (for the 1740), and by examining
the product photographs from Highpoint's web site.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Enable reliable use of Message-Signaled Interrupts (MSI) in sata_mv
by masking further chip interrupts within the main interrupt handler.
Based upon a suggestion by Grant Grundler.
MSI is working reliably in all of my test systems here now.
Signed-off-by: Mark Lord <mlord@pobox.com>
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
I noticed that during initialization sata_mv.c assumes that the main
interrupt mask has its default value of 0. The function
mv_platform_probe(..) initializes a shadow irq mask with 0 assuming
that's the value of the controller's register. Now
mv_set_main_irq_mask(..) only writes the controller's register if the
new value differs from the "shadowed" value. This is fatal when trying
to disable all interrupts in mv_init_host(..), i.e. the following
function call does not write anything to the main irq mask register:
mv_set_main_irq_mask(host, ~0, 0);
The effect I see on my machine (QNAP TS-109 II) with booting via kexec
(with Linux as a 2nd-stage boot loader) is that if the sata_mv module
was still loaded when performing kexec, then the new kernel's sata_mv
module starts up with interrupts enabled. This results in an unhandled
IRQ and breaks the boot process.
The unhandled interrupt itself might also be fixed by Lennert's patch
proposed at http://markmail.org/message/kwvzxstnlsa3s26w which I did not
try yet.
However I still propose to additionally initialize the shadow variable
with the current contents of the main irq mask register to get both in
sync and allow proper disabling the main irq mask. This fixes the
unhandled irq on my machine.
Signed-off-by: Thomas Reitmayr <treitmayr@devbase.at>
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Remove unneeded nsect restriction from GenII NCQ path,
and improve comments to explain why this is not a problem.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Remove silly read-modify-write sequences when clearing interrupts
in hc_irq_cause. This gets rid of unneeded MMIO reads, resulting in
a slight performance boost when switching between EDMA and non-EDMA
modes (eg. for cache flushes).
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Fix a longstanding bug for the 8-port Marvell Sata controllers (508x/6081),
where accesses to the upper 4 ports would cause lost-interrupts / timeouts
for the lower 4-ports. With this patch, the 6081 boards should finally be
reliable enough for mainstream use with Linux.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Logically, SCR access ops should take @link; however, there was no
compelling reason to convert all SCR access ops when adding @link
abstraction as there's one-to-one mapping between a port and a non-PMP
link. However, that assumption won't hold anymore with the scheduled
addition of slave link.
Make SCR access ops per-link.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The SoC sata port is based on the 7042/6042 devices (Gen IIE). This patch
will fix various issues when working with PMP and/or NCQ.
Signed-off-by: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
sata_mv allowed issuing two DMA commands concurrently which the
hardware allows. Unfortunately, libata core layer isn't ready for
this yet and spews ugly warning message and malfunctions on this.
Don't allow concurrent DMA commands for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
There is a miniscule chance that two separate host controllers
might be in sata_mv at the same time and manage to decrement
the static limit_warnings variable below zero.
Fix the comparison to deal with it.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Chip errata sometimes prevents reliable use of PIO commands which involve
more than a single DRQ (data request). In normal operation, libata should
not generate such PIO commands (uses DMA instead), but they could be sent
in via SG_IO from userspace.
A full workaround might be to break up such commands into sequences
of single DRQ ones, but that's just way too complex for something
that doesn't normally happen in real life.
So, allow the attempt (it often works, despite the errata),
but log the event for reference when somebody screams.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The early chipsets cannot safely handle Async Notification (AN),
but 6041/6081 chip revision "C0" (and newer) can handle it.
So allow AN for "C0" and higher.
This enables use of hotplug on PMP ports for the 6041/6081 PCI Rev.9 chips.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The handling for PHY_MODE4 was originally just cloned from the
Marvell proprietary driver (with their blessing).
But we can do better than that.
Tidy things up with some judicious mask definitions, to improve maintainability.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The 5182 System-On-Chip (SOC) variant wants certain lower
bits to be cleared on any write to the PHY_MODE3 register.
If/when support is added for other SOC variants, we'll need
some way to uniquely identify the 5182, and not perform this
workaround for the others.
But for now, it is the only SOC variant we support here.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The "B2" variant of the 6041/6081 (genII) chips requires
that the PHY_MODE3 register be rewritten after any write
to PHY_MODE4.
This fixes a regression introduced by an earlier patch.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The only public release of the 6042/7042 chips was/is revision "B0".
Remove code that attempted to deal with earlier, non-released revs.
This matches the logic of the current Marvell "proprietary" driver.
Also, bump up the sata_mv version number, to reflect this batch of erratas.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Fix and update the errata handling for the PHY_MODEx registers.
This improves receiver noise tolerance, among other things.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Convert the System-on-Chip flag from a host flag to an hpriv flag,
for better consistency with other chip-rev flags, and for easier use
in errata fixes etc.
Also change the related "HAS_PCI()" into "!IS_SOC()" for better consistency
of naming/use (everything else SOC-related already uses "SOC").
There are no functionality changes in this patch.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Check for an empty request queue before stopping EDMA after a FBS-NCQ error,
as per recommendation from the Marvell datasheet.
This ensures that the EDMA won't suddenly become active again
just after our subsequent check of the empty/idle bits.
Also bump DRV_VERSION.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Part five of simplifying/fixing handling of the main_irq_mask register
to resolve unexpected interrupt issues observed in 2.6.26-rc*.
Keep a cached copy of the main_irq_mask so that we don't have
to stall the CPU to read it on every pass through mv_interrupt.
This significantly speeds up interrupt handling, both for sata_mv,
and for any other driver/device sharing the same PCI IRQ line.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Part four of simplifying/fixing handling of the main_irq_mask register
to resolve unexpected interrupt issues observed in 2.6.26-rc*.
Ignore masked IRQs in mv_interrupt().
This prevents "unexpected device interrupt while idle" messages.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Part three of simplifying/fixing handling of the main_irq_mask register
to resolve unexpected interrupt issues observed in 2.6.26-rc*.
Partially fix a reported bug whereby we sometimes miss seeing drives on
a port-multiplier, as reported by Gwendal Grignou <gwendal@google.com>.
The problem was that we were receiving unexpected interrupts
during EH from POLLed commands while accessing port-multiplier registers.
These unexpected interrupts can be prevented by masking the DONE_IRQ bit
for the port whenever not operating in EDMA mode.
Also fix port_stop() to mask all port interrupts.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Part two of simplifying/fixing handling of the main_irq_mask register
to resolve unexpected interrupt issues observed in 2.6.26-rc*.
Consolidate all updates of the host main_irq_mask register
into a single function. This simplifies maintenance,
and also prepares the way for caching it (later).
No functionality changes in this update.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Part one of simplifying/fixing handling of the main_irq_mask register
to resolve unexpected interrupt issues observed in 2.6.26-rc*.
Don't blindly enable port IRQs at host init time.
Instead, enable only the bits that we want,
which in this case is simply the PCI_ERR bit.
The per-port bits can wait until the ports are reset/probed for devices.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Now that we handle the FIS_IRQ_CAUSE register correctly,
we can also now handle SATA asynchronous notification events.
So enable them, but only for the more modern GenIIe chips.
(older chips have unaddressed errata issues related to this).
This fixes hot plug/unplug for port-muliplier ports.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Group all of the flags for GenIIe devices into a common definition,
to ensure that any updates to them are shared by all GenIIe devices.
This will help make future maintenance somewhat simpler.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Fix handling of the FIS_IRQ_CAUSE register in sata_mv.
This register exists *only* on GenIIe devices, so don't bother
writing to it on older chips. Also, it has to be read/cleared
in mv_err_intr() before clearing the main ERR_IRQ_CAUSE register.
This keeps sata_mv from getting stuck forever on certain error types.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Always request a softreset after hardreset succeeds.
This fixes a regression reported by Martin Michlmayr <tbm@cyrius.com>.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Some tidying as suggested by Grant Grundler.
Nuke local bit-counting function from sata_mv in favour of using hweight16().
Also add a short explanation for the 15msec timeout used when waiting for empty/idle.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Convert sata_mv's EH for FIS-based switching (FBS) over to the
sequence recommended by Marvell. This enables us to catch/analyze
multiple failed links on a port-multiplier when using NCQ.
To do this, we clear the ERR_DEV bit in the EDMA Halt-Conditions register,
so that the EDMA engine doesn't self-disable on the first NCQ error.
Our EH code sets the MV_PP_FLAG_DELAYED_EH flag to prevent new commands
being queued while we await completion of all outstanding NCQ commands
on all links of the failed PM.
The SATA Test Control register tells us which links have failed,
so we must only wait for any other active links to finish up
before we stop the EDMA and run the .error_handler afterward.
The patch also includes skeleton code for handling of non-NCQ FBS operation.
This is more for documentation purposes right now, as that mode is not yet
enabled in sata_mv.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Introduce a new "delayed error handling" mechanism in sata_mv,
to enable us to eventually deal with multiple simultaneous NCQ
failures on a single host link when a PM is present.
This involves a port flag (MV_PP_FLAG_DELAYED_EH) to prevent new
commands being queued, and a pmp bitmap to indicate which pmp links
had NCQ errors.
The new mv_pmp_error_handler() uses those values to invoke
ata_eh_analyze_ncq_error() on each failed link, prior to freezing
the port and passing control to sata_pmp_error_handler().
This is based upon a strategy suggested by Tejun.
For now, we just implement the delayed mechanism.
The next patch in this series will add the multiple-NCQ EH code
to take advantage of it.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Separate out the inner loop body of mv_host_intr()
into it's own function called mv_port_intr().
This should help maintainabilty.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Remove the unwanted reads of hc_irq_cause from mv_host_intr(),
thereby removing a bug whereby we were not always reading it when needed..
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Sigh. Undo some earlier changes to mv_port_intr(),
so that we now read/clear SError again in all cases.
Arrange the top of the function to be as close as possible
to what we need for a later update (in this series) for ERR_DEV handling.
Fix things so that libata-eh can attempt a READ_LOG_EXT_10H
in response to a failed NCQ command, by just doing a local
mv_eh_freeze() rather than ata_port_freeze().
This will now fully handle NCQ errors much of the time,
but more fixes are needed for FBS/PMP, and for certain chip errata.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Rearrange mv_config_fbs() to more closely follow the (corrected) datasheet
recommendations for NCQ and FIS-based switching (FBS).
Also, maintain a port flag to let us know when FBS is enabled.
We will make more use of that flag later in this patch series.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Part 1 of workaround for errata "sata#25" for the 60x1 series
(the second half of this errata workaround is still in development.
Bit22 of the GPIO port has to be set "on" when in NCQ mode.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The EDMA engine cannot tolerate a mix of NCQ/non-NCQ commands,
and cannot be used for PIO at all. So we need to prevent libata
from trying to feed us such mixtures.
Introduce mv_qc_defer() for this purpose, and use it for all chip versions.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
When performing EH, it is recommended to wait for the EDMA engine
to empty out requests-in-progress before disabling EDMA.
Introduce code to poll the EDMA_STATUS register for idle/empty bits
before disabling EDMA. For non-EH operation, this will normally exit
without delay, other than the register read.
A later series of patches may focus on eliminating this and various
other register reads (when possible) throughout the driver,
but for now we're focussing on solid reliablity.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Some of the GenIIe EDMA optimizations should not be used
for non-PCI (SOC) devices, and nor for certain configurations
of conventional PCI (non PCI-X, PCIe) buses.
Logic taken/simplified from that in the Marvell proprietary driver.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
More cosmetic changes; no code changes.
-- try and improve consistency of naming.
-- add missing _OFS to tails of register offset definitions.
-- rename mv_setup_ifctl() to mv_setup_ifcfg(), since that's what it really does.
-- remove/move some dead comments
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tidy up naming of things associated with the PCI / SOC chip
"main irq cause/mask" registers, as inspired by Jeff.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Re-enable hotplug, now that the interrupt/error handling are mostly sane.
Also update the TODO list at the top.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Here it is again, minus the checkpatch.pl complaint:
Rework mv_err_intr() to leave the SError bits as-is,
so that libata-eh has a chance to see/use them.
We originally thought that clearing them here was necessary
before writing back to edma_err_cause (per the Marvell datasheets),
but we will end up reseting the chip regardless in those cases.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Continue fixing the interrupt handling logic.
Get rid of mv_intr_pio(), by using ata_sff_host_intr() for PIO..
Add a mv_unexpected_intr() catch-all for "impossible" scenarios,
where we get an interrupt that shouldn't have happened
(never seen in testing, but just in case..).
Rearrange the logic so that we always process completed
response queue entries before looking for other events,
This avoids having to re-issue commands that had already succeeded.
As part of this, we split out some duplicated functionality
into a new function, mv_get_active_qc().
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tidy up host controller interrupt handling, by moving the weirdo
bit shifting from mv_interrupt() to mv_host_intr().
This lets us take advantage of the MV_PORT_TO_SHIFT_AND_HARDPORT() macro
from an earlier patch to greatly simplify the port numbering logic.
Also, defer reading the hc_irq_cause (one per hc) until it is
actually proven to be needed. This may save a microsecond or
so per interrupt, on average (a later patchset will further reduce
unnecessary register reads throughout the driver).
Apart from that, we still leave the actual IRQ handling logic alone.
Subsequent patches in this series will address that.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Try and simplify handling of the request/response queues.
Maintain the cached copies of queue indexes in a fully-masked state,
rather than having each use of them have to do the masking.
Split off handling of a single crpb response into a separate function,
to reduce complexity in the main mv_process_crpb_entries() routine.
Ignore the rarely-valid error bits from the crpb status field,
as we already handle that information in mv_err_intr().
For now, preserve the rest of the original logic.
A later patch will deal with fixing that separately.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>