When checking the states passed in, mlx4_qp_modify() accidentally checks
cur_state twice rather than checking cur_state and new_state. Fix this
to make sure that both values are in-bounds.
Since these values may be passed in from userspace, this bug results in
userspace being able to trigger an oops.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix thinko in commit eaf559bf ("mlx4_core: Don't free special QPs in
QP number bitmap"). The old commit had the logic exactly backwards
and ended up freeing *only* special QPs, which not only left the
original bug in place but also introduced the problem that the QP
number bitmap would get full after a while.
Found by Dotan Barak of Mellanox.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When mlx4_buf_free() is called from the error path of
mlx4_buf_alloc(), it may be passed a buffer structure that does not
have all pages filled in. Add a check for NULL to mlx4_buf_free() so
we avoid passing NULL to dma_free_coherent() (which will crash).
Signed-off-by: Ali Ayoub <ali@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.
Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
IPoIB/cm: Use common CQ for CM send completions
IB/uverbs: Fix checking of userspace object ownership
IB/mlx4: Sanity check userspace send queue sizes
IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
IB/ehca: Enable large page MRs by default
IB/ehca: Change meaning of hca_cap_mr_pgsize
IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
IB/ehca: Fix masking error in {,re}reg_phys_mr()
IB/ehca: Supply QP token for SRQ base QPs
IPoIB: Use round_jiffies() for ah_reap_task
RDMA/cma: Fix deadlock destroying listen requests
RDMA/cma: Add locking around QP accesses
IB/mthca: Avoid alignment traps when writing doorbells
mlx4_core: Kill mlx4_write64_raw()
The current INIT_HCA firmware command timeout is sufficient for the
default number of resources (QPs, CQs, etc) being allocated, but if
the HCA profile is modified to increase the amount of resources, then
a spurious timeout is detected and HCA initialization fails.
Increase the timeout for the INIT_HCA command to 10 seconds, which
also brings it into line with all the other command timeouts.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 3d73c288 ("mlx4_core: Fix section mismatches") introduced a
stupid bug in device init: when some of mlx4_init_one() was split off
into __mlx4_init_one(), the call from the main mlx4_init_one()
function was back to mlx4_init_one() rather than to __mlx4_init_one(),
which leads to an obvious infinite loop if the function is every
called.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit ee49bd93 ("mlx4_core: Reset device when internal error is
detected") introduced some section mismatch problems when
CONFIG_HOTPLUG=n, because the error recovery code tears down and
reinitializes the device after everything is loaded, which ends up
calling into lots of code marked __devinit and __devexit from regular
.text. Fix this by getting rid of these now-incorrect section
markers.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Firmware commands are sent to the HCA by writing multiple words to a
command register block. Access to this block of registers is
serialized with a mutex. However, on large SGI systems writes to the
register block may be reordered within the system interconnect and
reach the HCA in a different order than they were issued (even with
the mutex). Fix this by adding an mmiowb() before dropping the mutex.
This bug was observed with real workloads with the similar FW command
code in the mthca driver, and adding the mmiowb() as in commit
66547550 ("IB/mthca: Use mmiowb() to avoid firmware commands getting
jumbled up") was confirmed to fix the problems, so we should add the
same fix to mlx4.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Increase the number of QPs allowed per multicast group from 8 to 56.
This allows for one QP per core on 16-core systems, which are now
quite common, and allows some space for future growth.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement FMRs for mlx4. This is an adaptation of code from mthca.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Write MTT entries directly to ICM from the driver (eliminating use of
WRITE_MTT command). This reduces the number of FW commands needed to
register an MR by at least a factor of 2 and speeds up memory
registration significantly. This code will also be used to implement
FMRs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Everything that uses caps.reserved_mtts expects it to be a count of MTT
segments, not MTT entries. So convert the value that the FW gives us to
a count of segments.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Taking ilog2(dev->caps.reserved_mtts) to find out the order to pass to
the MTT buddy allocator will do the wrong thing if reserved_mtts is ever
not a power of 2. Be safe and use fls(dev->caps.reserved_mtts - 1).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Enable having ICM tables in coherent memory, and use coherent memory
for the dMPT table. This will allow writing MPT entries for MRs both
via the SW2HW_MPT command and also directly by the driver for FMR
remapping without needing to flush or worry about cacheline boundaries.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
display the following device information under /sys/class/infiniband/mlx4_X:
board_id, fw_ver, hw_rev, hca_type.
This patch makes this information available to userspace utilities
such as ibstat and ibv_devinfo.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The SRC ("scalable RC") transport has been renamed to XRC ("extended
RC"), to avoid having an abbreviation that is so easily confused with an
abbreviation for "source." Update the HCA capability decoding output to
use the new name.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4_srq_query() returns a big-endian 16-bit value through an int *,
which screws up sparse checking. Fix this so that a CPU-endian value
is returned.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Recover from MSI-X errors by automatically falling back on regular
interrupt, instead of asking the user to do this manually. This makes
it possible to enable MSI-X by default, and will make it possible to
get rid of the msi_x module option in the future.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Special QPs are not allocated using the regular QP number bitmap, so
when they are destroyed, their QP number should not be freed in the
bitmap.
Found by Dotan Barak of Mellanox.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rename GO_BIT_TIMEOUT to GO_BIT_TIMEOUT_MSECS for clarity, and
actually use it as the go bit timeout (instead of having the define
but then ignoring it and using a hard-coded 10 * HZ for the actual
timeout).
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Put a 1000 msec delay after resetting the device before attempting to
do config cycles on it. Not waiting causes system hangs on some
chipsets, e.g. Intel E7520, when the driver is loaded.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4_mr_alloc() doesn't actually allocate mr (it just initializes the
pointer that the caller passes in), so it shouldn't free it if an
error occurs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The FW command token is currently only updated on a command completion
event. This means that on command timeout, the same token will be
reused for new command, which results in a mess if the timed out
command *does* eventually complete.
This is the same change as the patch for mthca from Michael
S. Tsirkin <mst@dev.mellanox.co.il> that was just merged. It seems
sensible to avoid gratuitous differences in FW command processing
between mthca and mlx4.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the maximum number of outstanding RDMA reads allowed as a
target from 4 to 16 to per QP. This allows RDMA read operations to
pipeline better.
Pointed out by Dotan Barak and Sagi Rotem.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reset the device when an internal error is detected.
Also, detect errors by polling the error buffer rather than using
interrupts. This is more robust and doesn't depend on MSI-X. Remove
the old interrupt handler entirely, since we don't want to support two
mechanisms for detecting internal errors.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Get the maximum message size from the device capabilities returned
from the QUERY_DEV_CAP firmware command, rather than hard-coding 2 GB.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4.h uses struct mutex, so although <linux/mutex.h> seems to be pulled in
indirectly by one of the headers it includes, the right thing to do is
to include <linux/mutex.h> directly.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0.
Signed-off-by: Bill Nottingham <notting@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Upcoming firmware introduces command interface revision 3, which
changes the way port capabilities are queried and set. Update the
driver to handle both the new and old command interfaces by adding a
new MLX4_FLAG_OLD_PORT_CMDS that it is set after querying the firmware
interface revision and then using the correct interface based on the
setting of the flag.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
New ConnectX firmware introduces FW command interface revision 2,
which requires that for each QP, a chunk of send queue entries (the
"headroom") is kept marked as invalid, so that the HCA doesn't get
confused if it prefetches entries that haven't been posted yet. Add
code to the driver to do this, and also update the user ABI so that
userspace can request that the prefetcher be turned off for userspace
QPs (we just leave the prefetcher on for all kernel QPs).
Unfortunately, marking send queue entries this way is confuses older
firmware, so we change the driver to allow only FW command interface
revisions 2. This means that users will have to update their firmware
to work with the new driver, but the firmware is changing quickly and
the old firmware has lots of other bugs anyway, so this shouldn't be too
big a deal.
Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a dMPT entry has the PA flag (direct physical address) set, then
the (unused) MTT base address field has to be set to 0.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
HCA firmware with incompatible changes to the FW commmand interface is
coming soon. Add a check of the interface revision during
initialization and bail out if the firmware advertises a revision that
the driver doesn't know about. This will avoid strange failures later
if the driver goes on using the wrong interface revision.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to pass the same dev_id to free_irq() and request_irq(). When
using MSI-X, the MLX4_EQ_CATAS interrupt uses a different dev_id from
the other interrupts.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We may call mlx4_dispatch_event() before mlx4_register_device() is
called for a device, because for example a catastrophic error happens
immediately after we enable interrupts. Therefore priv->ctx_list and
priv->ctx_lock need to be initialized earlier.
This bug was actually exposed by the MSI-X bug that returned IRQ numbers
to drivers in reverse order, so that the first FW command
interrupt looked to mlx4 like a catastrophic error.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The reserved6 field should be 64 bits, not just 16 bits. Without
this, the structure does not match the hardware layout on 32-bit
architectures: the db_rec_addr field ends up at offset 52 instead of
offset 56. The bug slipped by because the alignment of __be64 members
ends up putting it in the right place on x86-64.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set last allocated object to the object after the one just allocated
before ORing in the extra top bits. Also handle the case where this
wraps around.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't overrun fname[] array when decoding device flags.
This was spotted by the Coverity checker (CID 1642).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB/cm: Optimize stale connection detection
IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ
IB/mthca: Fix posting >255 recv WRs for Tavor
RDMA/cma: Add check to validate that cm_id is bound to a device
RDMA/cma: Fix synchronization with device removal in cma_iw_handler
RDMA/cma: Simplify device removal handling code
IB/ehca: Disable scaling code by default, bump version number
IB/ehca: Beautify sysfs attribute code and fix compiler warnings
IB/ehca: Remove _irqsave, move #ifdef
IB/ehca: Fix AQP0/1 QP number
IB/ehca: Correctly set GRH mask bit in ehca_modify_qp()
IB/ehca: Serialize hypervisor calls in ehca_register_mr()
IB/ipath: Shadow the gpio_mask register
IB/mlx4: Fix uninitialized spinlock for 32-bit archs
mlx4_core: Remove unused doorbell_lock
net: Trivial MLX4_DEBUG dependency fix.
Add an InfiniBand driver for Mellanox ConnectX adapters. Because
these adapters can also be used as ethernet NICs and Fibre Channel
HBAs, the driver is split into two modules:
mlx4_core: Handles low-level things like device initialization and
processing firmware commands. Also controls resource allocation
so that the InfiniBand, ethernet and FC functions can share a
device without stepping on each other.
mlx4_ib: Handles InfiniBand-specific things; plugs into the
InfiniBand midlayer.
Signed-off-by: Roland Dreier <rolandd@cisco.com>