last pointer is not updated when QP is modified to reset state. This
causes data corruption if WQEs are already posted on the queue.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Calculation of QP capabilities still isn't exactly right in mthca:
max_send_sge/max_recv_sge fields returned in create_qp can exceed the
handware supported limits.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Responder resources are only required to handle RDMA reads and atomic
operations, not RDMA writes. So the driver should allow RDMA writes
even if responder resources are set to 0. This is especially
important for the UC transport -- with the old code, it was impossible
to enable RDMA writes for UC QPs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In Tavor mode, when posting a long list of receive work requests, a
doorbell must be rung every 256 requests. Add code to do this when
required.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle case where prod_index has wrapped around and become less than
cq->cons_index by checking that their difference as a signed int is
positive rather than comparing directly.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The size of work requests for atomic operations was computed
incorrectly in mthca: all sizeofs need to be divided by 16.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the computation of QP capabilities (max scatter/gather entries,
max inline data, etc) into the kernel, and have the uverbs module
return the values as part of the create QP response. This keeps
precise knowledge of device limits in the low-level kernel driver.
This requires an ABI bump, so while we're making changes, get rid of
the max_sge parameter for the modify SRQ command -- it's not used and
shouldn't be there.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a typo in the rearming of the catastrophic error polling timer: we
should rearm the timer as long as the stop flag is _not_ set.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Report the device's real page size capability in mthca_query_device().
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure that the P_Key index passed into mthca_modify_qp() is
within the device's P_Key table.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Mellanox has decided that the components of the firmware version are
really meant to be displayed in decimal, e.g. 0x000400070190 is
version 4.7.400. Change the format we use from "%x.%x.%x" to
"%d.%d.%d" to match this convention.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35
source lines and about 500 bytes of text.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix wqe_to_link() to use a structure field that we know is definitely
always unused for receive work requests, so that it really avoids the
free list corruption bug that the comment claims it does.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement reporting asynchronous CQ events in Mellanox HCA driver.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add some initial support for detecting and reporting catastrophic
errors reported by Mellanox HCAs. We start a periodic timer which
polls the catastrophic error reporting buffer in device memory. If an
error is detected, we dump the contents of the buffer for port-mortem
debugging, and report a fatal asynchronous error to higher levels.
In the future we can try to recover from these errors by resetting the
device, but this will require some work in higher-level code as well.
Let's get this in now, so that we at least get catastrophic errors
reported in logs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The UC transport does not support RDMA reads or atomic operations, so
we shouldn't require or even allow the consumer to set attributes
relating to these operations for UC QPs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The MAD layer was violating the DMA API by touching data buffers used
for sends after the DMA mapping was done. This causes problems on
non-cache-coherent architectures, because the device doing DMA won't
see updates to the payload buffers that exist only in the CPU cache.
Fix this by having all MAD consumers use ib_create_send_mad() to
allocate their send buffers, and moving the DMA mapping into the MAD
layer so it can be done just before calling send (and after any
modifications of the send buffer by the MAD layer).
Tested on a non-cache-coherent PowerPC 440SPe system.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We should always re-arm an event queue's interrupt in
mthca_tavor_interrupt() if the corresponding bit is set in the event cause
register (ECR), even if we didn't find any entries in the EQ. If we don't,
then there's a window where we miss an EQ entry and then get stuck because
we don't get another EQ event.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should always re-arm an event queue's interrupt in
mthca_tavor_interrupt() if the corresponding bit is set in the event
cause register (ECR), even if we didn't find any entries in the EQ.
If we don't, then there's a window where we miss an EQ entry and then
get stuck because we don't get another EQ event.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Avoid entering a QP as member of a multicast group multiple times.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make the type parameter of mthca_alloc_db() be an enum mthca_db_type
instead of an int. This doesn't have any practical effect but
documents the functions a little better.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Give each device a uverbs_cmd_mask, so that a low-level driver can
control which methods may be called on behalf of userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Check the sizes of CQs, QPs and SRQs when creating objects, and fail
instead of creating too-big queues. Also return real limits instead
of just plausible-sounding values from mthca_query_device().
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The hardware relies on us keeping one extra work request that never
gets used in SRQs. Add checks to the SRQ work request posting
functions so that they fail when someone is about to use up that extra
work request, rather than when someone uses the very last work request.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Our hardware supports generating an event when the number of receives
posted to a shared receive queue (SRQ) falls below a user-specified
limit. Implement mthca_modify_srq() to arm the limit, and add code to
handle dispatching SRQ events when they occur.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add code to fill in the bad_pkey_cntr, max_mtu, active_mtu and
subnet_timeout fields in mthca_query_port().
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add abi_version attribute to uverbs class devices to allow for
ABI versioning of device-specific interfaces.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Return correct atomic capability flag from mthca query function.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remember to free the multicast group context memory table.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IB spec defines the field to be 32 bits, not 16 bits.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When allocating a table for mem-free HCA context, don't assume that
obj_size * nobj is an even multiple of MTHCA_TABLE_CHUNK_SIZE. In
particular, make sure we allocate at least one slot even if the table
is smaller than MTHCA_TABLE_CHUNK_SIZE.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The loop in mthca_map_cmd() would fill one entry past the end of the
mailbox buffer before calling the firmware command.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We should use the first word of the clear interrupt register if
the bit we're after is < 32, not < 31.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If we allocate a bunch of doorbell records and then free them, we'll
end up with completely empty pages, which we then free. However, when
we come back to allocate more doorbell pages, we have to reallocate
those empty pages rather than always trying to take a slot that we've
never used. If we don't, we eventually use up every slot and fail to
allocate a doorbell record, even though we have plenty of free space.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Userspace SRQs don't have a buffer allocated for them in the kernel, so
it doesn't make sense to set srq->last during initialization. In fact,
this can crash trying to follow a nonexistent buffer pointer.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The error handling paths in mthca_tavor_post_srq_recv() and
mthca_arbel_post_srq_recv() are quite bogus, the result of a
screwed up merge. Fix them so they work as intended.
Pointed out by Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In mthca_create_eq(), we call get_eqe() before setting eq->nent. This
is wrong, because get_eqe() uses eq->nent. Fix this, and clean up the
code a little while we're at it. (We got lucky with the current code,
because eq->nent was cleared to 0, which get_eqe() made happen to do
the right thing)
Pointed out by Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix posting first WQE for mem-free HCAs: we need to link to previous
WQE even in that case. While we're at it, simplify code for
Tavor-mode HCAs. We don't really need the conditional test there
either; we can similarly always link to the previous WQE.
Based on Michael S. Tsirkin's analogous fix for userspace libmthca.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The hardware reads the ACK timeout field from the most significant 5
bits of struct mthca_qp_path's ackto field, not the least significant
bits. This fix has the driver put the timeout in the right place.
Without this, we get a timeout that is 2^8 times too small.
Signed-off-by: Roland Dreier <rolandd@cisco.com>