In case of error, the function ucma_alloc_multicast() returns a NULL
pointer, but never returns an ERR pointer. So after a call to this
function, an IS_ERR test should be replaced by a NULL test.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@match bad_is_err_test@
expression x, E;
@@
x = ucma_alloc_multicast(...)
... when != x = E
IS_ERR(x)
// </smpl>
Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Referencing cm_node after it is freed via rem_ref_cm_node() causes a
slab corruption. There is no need to set cm_node->cm_id to NULL in
mini_cm_close().
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set RLKEY bit in the HW context for kernel QPs so that kernel QPs can
use the reserved L_Key for memory reference.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix routed RDMA connections to destinations where the next hop is not
the final destination. Use neigh_*() to properly locate neighbor.
Signed-off-by: Bob Sharp <bsharp@neteffect.com>
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Mask off a critical error after 100 critical error interrupts to
keep the system "sane".
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Mask off MAC interrupts on netdev_stop to prevent spurious MAC interrupts
on unload/reload of iw_nes.
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fill in firmware version for ethtool_drvinfo.
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use timer value set via ethtool intead of #defines.
Signed-off-by: John Lacombe <jlacombe@neteffect.com>
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use correct define for max TSO fragments.
Signed-off-by: Bob Sharp <bsharp@neteffect.com>
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Clear MDC bits before setting them to a new value. Adjust MDC value
for 10G.
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a module parameter wqm_quanta. It controls the number of segments
transmitted at a time.
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change permission to 0644 so root can set mpa_version, disable_mpa_crc,
send_first, and nes_drv_opt at runtime.
Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When running ibv_devinfo, the active_mtu returned is garbage. This is
due to the field not being populated in the query_port function in the
driver. The patch below populates the active_mtu field with a MTU of
2k. It also zeros the struct, so that any new additions to it will
return 0.
Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for NetEffect 4 port 1G HP blade card. The mapping
between physical port and MAC is different from the standup card.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
commit 110cf374 ("infiniband: make cm_device use a struct device and
not a kobject.") introduced a memory leak, since it deleted
cm_release_dev_obj(), which was where cm_dev was freed. Fix this by
freeing the leaked structure after calling device_unregister().
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock. Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.
The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX. We kill off priv->tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv->lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.
Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Back in prehistoric (pre-git!) days, the kernel's MSI-X support did
request_mem_region() on a device's MSI-X tables, which meant that a
driver that enabled MSI-X couldn't use pci_request_regions() (since
that would clash with the PCI layer's MSI-X request).
However, that was removed (by me!) years ago, so mthca can just use
pci_request_regions() and pci_release_regions() instead of its own
much more complicated code that avoids requesting the MSI-X tables.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle the case where posting a send is requested when the link is
down. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1117>.
Signed-off-by: Yannick Cote <yannick.cote@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event. This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL. A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).
Fix this by updating path->ah and freeing old_ah only when the path
record query is successful. This prevents the neighbour AH and that
path AH from getting out of sync.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194>
Reported-by: Rabah Salem <ravah@mellanox.com>
Debugged-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A break after a return serves no purpose, remove it.
Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fixes the problem of incoming BMA responses being dropped due to
a bad "is response" check. Fix the test to use the ib_response_mad()
predicate, which correctly handles BMA MADs.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>.
Signed-off-by: Michael Brooks <michael.brooks@qlogic.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code to set the source LID in the sent LRH was not setting the low
bits if LMC != 0 for RC/UC QPs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a QP goes into error state, it is required that CQ entries with a
flush error status are delivered to the application for any
outstanding work requests. eHCA does not do this in hardware, so this
patch adds software flush CQE generation to the ehca driver.
Whenever a QP gets into error state, it is added to the QP error list
of its respective CQ. If the error QP list of a CQ is not empty,
poll_cq() generates flush CQEs before polling the actual CQ.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
RDMA/nes: Fix client side QP destroy
IB/mlx4: Fix up fast register page list format
mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop(). We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly. This
works because we only flush the ipoib_workqueue with the RTNL not held.
The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes. This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().
This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix QP not being destroyed properly on the client, which leads to
userspace programs hanging on exit. This is a missing chunk from the
connection management rewrite in commit 6492cdf3 ("RDMA/nes: CM
connection setup/teardown rework").
Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Byte swap the addresses in the page list for fast register work requests
to big endian to match what the HCA expectx. Also, the addresses must
have the "present" bit set so that the HCA knows it can access them.
Otherwise the HCA will fault the first time it accesses the memory
region.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Initialize the L_Key and R_Key for memory regions returned from
mlx4_ib_alloc_fast_reg_mr(). Otherwise callers just get garbage for
the memory keys and can't do anything useful with these MRs.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch lets the files using linux/version.h match the files that
#include it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue. However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.
Fix this by simply not flushing the workqueue from ipoib_stop(). It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The check for max physical address was incorrect, thus limiting the
range of allowed physical addresses.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a UD QP has some work requests queued to be sent by the DMA engine
followed by a local loopback work request, we have to wait for the
previous work requests to finish or the completion for the local
loopback work request would be generated out of order. The problem
was that the work request queue pointer was already updated so that
the request would not be processed when the DMA queue drained.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Under rare circumstances, the ehca hardware might erroneously generate
two CQEs for the same WQE, which is not compliant to the IB spec and
will cause unpredictable errors like memory being freed twice. To
avoid this problem, the driver needs to detect the second CQE and
discard it.
For this purpose, introduce an array holding as many elements as the
SQ of the QP, called sq_map. Each sq_map entry stores a "reported"
flag for one WQE in the SQ. When a work request is posted to the SQ,
the respective "reported" flag is set to zero. After the arrival of a
CQE, the flag is set to 1, which allows to detect the occurence of a
second CQE.
The mapping between WQE / CQE and the corresponding sq_map element is
implemented by replacing the lowest 16 Bits of the wr_id with the
index in the queue map. The original 16 Bits are stored in the sq_map
entry and are restored when the CQE is passed to the application.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The idr_find() function may fail when trying to get the QP that is
associated with a CQE, e.g. when a QP has been destroyed between the
generation of a CQE and the poll request for it. In consequence, the
return value of idr_find() must be checked and the CQE must be
discarded when the QP cannot be found.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the ehca driver detects an invalid opcode in a CQE, it currently
passes the CQE to the application and returns with success. This patch
changes the CQE handling to discard CQEs with invalid opcodes and to
continue reading the next CQE from the CQ.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rename the "poll_cq_one_read_cqe" goto label to what it actually does,
namely "repoll".
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since the introduction of the port auto-detect mode for ehca, calls to
modify_qp() may be cached in the device driver when the ports are not
activated yet. When a modify_qp() call is cached, the qp state remains
untouched until the port is activated, which will leave the qp in the
reset state. In the reset state, however, it is not allowed to post SQ
WQEs, which confuses applications like ib_mad.
The solution for this problem is to immediately set the qp state as
requested by modify_qp(), even when the call is cached.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There are users that are running UDP applications that require a large
receive queue size in order to get good performance. To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL
IB/mlx4: Allow 4K messages for UD QPs
mlx4_core: Add ethernet fields to CQE struct
IB/ipath: Fix printk format warnings
RDMA/cxgb3: Fix deadlock initializing iw_cxgb3 device
RDMA/cxgb3: Fix up MW access rights
RDMA/cxgb3: Fix QP capabilities
RDMA/cma: Remove padding arrays by using struct sockaddr_storage
IB/ipath: Use unsigned long for irq flags
IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()