- MWs don't have local read/write permissions.
- Set the MW_BIND enabled bit if a MR has MW_BIND access.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
- set max_fast_reg_page_list_len device attribute.
- add iwch_alloc_fast_reg_mr function.
- add iwch_alloc_fastreg_pbl
- add iwch_free_fastreg_pbl
- adjust the WQ depth for kernel mode work queues to account for
fastreg possibly taking 2 WR slots.
- add fastreg_mr work request support.
- add local_inv work request support.
- add send_with_inv and send_with_se_inv work request support.
- removed useless duplicate enums/defines for TPT/MW/MR stuff.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently, iw_cxgb3 is severely limited on the amount of userspace
memory that can be registered in in a single memory region, which
causes big problems for applications that expect to be able to
register 100s of MB.
The problem is that the driver uses a single kmalloc()ed buffer to
hold the physical buffer list (PBL) for the entire memory region
during registration, which means that 8 bytes of contiguous memory are
required for each page of memory being registered. For example, a 64
MB registration will require 128 KB of contiguous memory with 4 KB
pages, and it unlikely that such an allocation will succeed on a busy
system.
This is purely a driver problem: the temporary page list buffer is not
needed by the hardware, so we can fix this by writing the PBL to the
hardware in page-sized chunks rather than all at once. We do this by
splitting the memory registration operation up into several steps:
- Allocate PBL space in adapter memory for the full registration
- Copy PBL to adapter memory in chunks
- Allocate STag and enable memory region
This also allows several other cleanups to the __cxio_tpt_op()
interface and related parts of the driver.
This change leaves the reregister memory region and memory window
operations broken, but they already didn't work due to other
longstanding bugs, so fixing them will be left to a later patch.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Open MPI, Intel MPI and other applications don't respect the iWARP
requirement that the client (active) side of the connection send the
first RDMA message. This class of application connection setup is
called peer-to-peer. Typically once the connection is setup, _both_
sides want to send data.
This patch enables supporting peer-to-peer over the chelsio RNIC by
enforcing this iWARP requirement in the driver itself as part of RDMA
connection setup.
Connection setup is extended, when the peer2peer module option is 1,
such that the MPA initiator will send a 0B Read (the RTR) just after
connection setup. The MPA responder will suspend SQ processing until
the RTR message is received and reply-to.
In the longer term, this will be handled in a standardized way by
enhancing the MPA negotiation so peers can indicate whether they
want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
should be sent. This will be done by standardizing a few bits of the
private data in order to negotiate all this. However this patch
enables peer-to-peer applications now and allows most of the required
firmware and driver changes to be done and tested now.
Design:
- Add a module option, peer2peer, to enable this mode.
- New firmware support for peer-to-peer mode:
- a new bit in the rdma_init WR to tell it to do peer-2-peer
and what form of RTR message to send or expect.
- process _all_ preposted recvs before moving the connection
into rdma mode.
- passive side: defer completing the rdma_init WR until all
pre-posted recvs are processed. Suspend SQ processing until
the RTR is received.
- active side: expect and process the 0B read WR on offload TX
queue. Defer completing the rdma_init WR until all
pre-posted recvs are processed. Suspend SQ processing until
the 0B read WR is processed from the offload TX queue.
- If peer2peer is set, driver posts 0B read request on offload TX
queue just after posting the rdma_init WR to the offload TX queue.
- Add CQ poll logic to ignore unsolicitied read responses.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
__FUNCTION__ is gcc-specific, use __func__ instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.
Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.
This has a number of advantages:
- It is better design from the standpoint of making generic code a
library that can be used or overridden by device-specific code as
the details of specific devices dictate.
- Drivers that do not need to pin userspace memory regions do not
need to take the performance hit of calling ib_mem_get(). For
example, although I have not tried to implement it in this patch,
the ipath driver should be able to avoid pinning memory and just
use copy_{to,from}_user() to access userspace memory regions.
- Buffers that need special mapping treatment can be identified by
the low-level driver. For example, it may be possible to solve
some Altix-specific memory ordering issues with mthca CQs in
userspace by mapping CQ buffers with extra flags.
- Drivers that need to pin and DMA map userspace memory for things
other than memory regions can use ib_umem_get() directly, instead
of hacks using extra parameters to their reg_phys_mr method. For
example, the mlx4 driver that is pending being merged needs to pin
and DMA map QP and CQ buffers, but it does not need to create a
memory key for these buffers. So the cleanest solution is for mlx4
to call ib_umem_get() in the create_qp and create_cq methods.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix memory region permission problems:
- remove useless and redundant iwch_mem_perms enum.
- create ib_to_tpt_access_rights() for mapping ib access rights
to T3 TPT permissions.
- create ib_to_mwbind_access_rights() for mapping ib access rights
to T3 MWBIND WR permissions.
- fix up the mem reg code to utilize the new functions.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- don't mark static functions in C files as inline - gcc should know
best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
- cxio_hal.c: cxio_hal_clear_qp_ctx()
- iwch_provider.c: iwch_get_qp()
- remove the following unused global functions:
- cxio_hal.c: cxio_allocate_stag()
- cxio_resource.: cxio_hal_get_rhdl()
- cxio_resource.: cxio_hal_put_rhdl()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove the Open Grid Computing copyright. It shouldn't be there.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an RDMA/iWARP driver for the Chelsio T3 1GbE and 10GbE adapters.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>