Commit Graph

317 Commits

Author SHA1 Message Date
Roland Dreier
b7f9c112a5 IB/mthca: Free correct MPT on error exit from mthca_fmr_alloc()
When mthca_fmr_alloc() returns an error, it should free the MPT at the
index key, not mr->ibmr.lkey, since the lkey has been mangled by
hw_index_to_key() and no longer is the real index.  This bug causes
corruption of the MPT table free bitmap when mthca_fmr_alloc() fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:42:50 -08:00
Marcin Slusarz
5163dc1a64 IB/mthca: Convert to use be16_add_cpu()
replace:

	big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
						expression_in_cpu_byteorder);

with:

	beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);

Generated with a semantic patch.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-13 07:47:47 -08:00
Roland Dreier
fe174357eb IB/mthca: Add missing sg_init_table() in mthca_map_user_db()
Usually harmless, since the scatterlist is always hard-coded to a length
of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it.
This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:22 -08:00
Olaf Kirch
2c78853472 IB/mthca: Return proper error codes from mthca_fmr_alloc()
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc()
would return 0 (success) no matter what. This leads to crashes a
little down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-Ewhatever).

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Roland Dreier
f33afc26dc IB: Avoid marking __devinitdata as const
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Eli Cohen
1d368c5465 IB/ib_mthca: Pre-link receive WQEs in Tavor mode
We have recently discovered that Tavor mode requires each WQE in a
posted list of receive WQEs to have a valid NDA field at all times.
This requirement holds true for regular QPs as well as for SRQs.  This
patch prelinks the receive queue in a regular QP and keeps the free
list in SRQ always properly linked.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Eli Cohen
1203c42e7b IB/mthca: Remove checks for srq->first_free < 0
The SRQ receive posting functions make sure that srq->first_free never
becomes negative, so we can remove tests of whether it is negative.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Jack Morgenstein
6ccef1de2c IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER()
For memfree devices, the firmware QUERY_ADAPTER command does not
return vendor_id, device_id, and revision_id; do not return these
fields in the QUERY_ADAPTER function for memfree devices.

Instead, for memfree devices, initialize the rev_id field of the mthca
device via init_node_data (MAD IFC query), as is done in the
query_device verb implementation.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Roland Dreier
0d89fe2c0c IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr()
In mthca_reg_phys_mr(), we calculate the page size for the HCA
hardware to use to map the buffer list passed in by the consumer.
For example, if the consumer passes in

    [0] addr 0x1000, size 0x1000
    [1] addr 0x2000, size 0x1000

then the algorithm would come up with a page size of 0x2000 and a list
of two pages, at 0x0000 and 0x2000.  Usually, this would work fine
since the memory region would start at an offset of 0x1000 and have a
length of 0x2000.

However, the old code did not take into account the alignment of the
IO virtual address passed in.  For example, if the consumer passed in
a virtual address of 0x6000 for the above, then the offset of 0x1000
would not be used correctly because the page mask of 0x1fff would
result in an offset of 0.

We can fix this quite neatly by making sure that the page shift we use
is no bigger than the first bit where the start of the first buffer
and the IO virtual address differ.  Also, we can further simplify the
code by removing the special case for a single buffer by noticing that
it doesn't matter if we use a page size that is too big.  This allows
the loop to compute the page shift to be replaced with __ffs().

Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the
original bug and suggesting several ways to improve this patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Roland Dreier
950529e5c6 IB/mthca: Update latest "native Arbel" firmware revision
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Adrian Bunk
e57895d389 IB/mthca: Remove MSI support as scheduled
Remove MSI support from the mthca driver, as scheduled.  There is no
reason to use MSI instead of MSI-X, since MSI-X performs better.  No
one has spoken up since MSI support was deprecated in commit f6be6fbe
("IB/mthca: Schedule MSI support for removal"), so apparently the MSI
support is unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:33 -08:00
Jens Axboe
642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Linus Torvalds
0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Jens Axboe
45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Roland Dreier
ab8403c424 IB/mthca: Avoid alignment traps when writing doorbells
Architectures such as ia64 see alignment traps when doing a 64-bit 
read from __be32 doorbell[2] arrays to do doorbell writes in 
mthca_write64().  Fix this by just passing the two halves of the 
doorbell value into mthca_write64().  This actually improves the 
generated code by allowing the compiler to see what's going on better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-15 20:17:27 -07:00
Eli Cohen
55a98e955c IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10 11:24:33 -07:00
Roland Dreier
76d7cc0345 IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up
Firmware commands are sent to the HCA by writing multiple words to a
command register block.  Access to this block of registers is
serialized with a mutex.  However, on large SGI systems, problems were
seen with multiple CPUs issuing FW commands at the same time, because
the writes to the register block may be reordered within the system
interconnect and reach the HCA in a different order than they were
issued (even with the mutex).  Fix this by adding an mmiowb() before
dropping the mutex.

Tested-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Roland Dreier
1a1eb6a646 IB/mthca: Increase max number of QPs per multicast group to 56
Increase the number of QPs allowed per multicast group from 8 to 56.
This allows for one QP per core on 16-core systems, which are now
quite common, and allows some space for future growth.

This is basically the same patch that Jack Morgenstein
<jackm@dev.mellanox.co.il> just supplied for mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Peter Oruba
a855b1a742 IB/mthca: Use PCI-X/PCI-Express read control interfaces
These driver changes incorporate the proposed PCI-X / PCI-Express read
byte count interface.  Reading and setting those values doesn't take
place "manually", instead wrapping functions are called to allow
quirks for some PCI bridges.

Signed-off by: Peter Oruba <peter.oruba@amd.com>
Based on work by Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:07 -07:00
Michael S. Tsirkin
017aadc4b5 IB/mthca: Enable MSI-X by default
Recover from MSI-X errors by automatically falling back on regular
interrupt, instead of asking the user to do this manually.  This makes
it possible to enable MSI-X by default, and will make it possible to
get rid of the msi_x module option in the future.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Michael S. Tsirkin
c1f74958db IB/mthca: Change command token on timeout
The FW command token is currently only updated on a command completion
event. This means that on command timeout, the same token will be
reused for new command, which results in a mess if the timed out
command *does* eventually complete.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Roland Dreier
43509d1fec IB/mthca: Simplify use of size0 in work request posting
Current code sets size0 to 0 at the start of work request posting
functions and then handles size0 == 0 specially within the loop over
work requests.  Change this so size0 is set along with f0 the first
time through the loop (when nreq == 0).  This makes the code easier to
understand by making it clearer that f0 and size0 are always
initialized if nreq != 0 without having to know that size0 == 0
implies nreq == 0.

Also annotate size0 with uninitialized_var() so that this doesn't
introduce a new compiler warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 13:28:29 -07:00
Roland Dreier
e535c699bf IB/mthca: Factor out setting WQE UD segment entries
Factor code to set UD entries out of the work request posting
functions into inline functions set_tavor_ud_seg() and
set_arbel_ud_seg().  This doesn't change the generated code in any
significant way, and makes the source easier on the eyes.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 13:21:14 -07:00
Roland Dreier
400ddc11eb IB/mthca: Factor out setting WQE remote address and atomic segment entries
Factor code to set remote address and atomic segment entries out of the
work request posting functions into inline functions set_raddr_seg()
and set_atomic_seg().  This doesn't change the generated code in any
significant way, and makes the source easier on the eyes.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 12:55:42 -07:00
Roland Dreier
80885456e8 IB/mthca: Factor out setting WQE data segment entries
Factor code to set data segment entries out of the work request
posting functions into inline functions mthca_set_data_seg() and
mthca_set_data_seg_inval().  This makes the code more readable and
also allows the compiler to do a better job -- on x86_64:

add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69)
function                                     old     new   delta
mthca_arbel_post_srq_recv                    373     369      -4
mthca_arbel_post_receive                     570     562      -8
mthca_tavor_post_srq_recv                    520     508     -12
mthca_tavor_post_send                       1344    1330     -14
mthca_arbel_post_send                       1481    1467     -14
mthca_tavor_post_receive                     792     775     -17

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 11:30:34 -07:00
Roland Dreier
6d7d080e9f IB/mthca: Use uninitialized_var() for f0
Commit 9db48926 ("drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd
var warning") added "= 0" to the declarations of f0 to shut up gcc
warnings.  However, there's no point in making the code bigger by
initializing f0 to a random value just to get rid of a warning;
setting f0 to 0 is no safer than just using uninitialized_var(), which
documents the situation better and gives smaller code too.  For example, 
on x86_64:

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-16 (-16)
function                                     old     new   delta
mthca_tavor_post_send                       1352    1344      -8
mthca_arbel_post_send                       1489    1481      -8

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 19:30:51 -07:00
Roland Dreier
e4daf73868 IB/mthca: Fix printk format used for firmware version in warning
When warning about out-of-date firmware, current mthca code messes up
the formatting of the version if the subminor doesn't have three
digits.  It doesn't fill the field with 0s so we end up with:

    ib_mthca 0000:0b:00.0: HCA FW version 1.1.  0 is old (1.2.  0 is current).

Change the format from "%3d" to "%03d" to get the right thing printed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:42 -07:00
Roland Dreier
f6be6fbe26 IB/mthca: Schedule MSI support for removal
The mthca driver supports both MSI and MSI-X.  However, MSI-X works with
all hardware that the driver handles, and provides a superset of what
MSI does, so there's no point in having code for both.  Schedule MSI
support for removal in 2008 to give anyone who actually needs MSI and
who can't use MSI time to speak up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:41 -07:00
Jeff Garzik
9db4892620 drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warning
drivers/infiniband/hw/mthca/mthca_qp.c: In function
  ‘mthca_tavor_post_send’:
drivers/infiniband/hw/mthca/mthca_qp.c:1594: warning: ‘f0’ may be used
  uninitialized in this function
drivers/infiniband/hw/mthca/mthca_qp.c: In function
  ‘mthca_arbel_post_send’:
drivers/infiniband/hw/mthca/mthca_qp.c:1949: warning: ‘f0’ may be used
  uninitialized in this function

Initializing 'f0' is not strictly necessary in either case, AFAICS.

I was considering use of uninitialized_var(), but looking at the
complex flow of control in each function, I feel it is wiser and
safer to simply zero the var and be certain of ourselves.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-17 16:18:00 -04:00
Shani Moideen
8909c571fa IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
----
2007-07-10 12:28:05 -07:00
Jan Engelhardt
06cc85086e IB: Use menuconfig for InfiniBand menu
Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Roland Dreier
3e1db334dc IB/mthca, mlx4_core: Fix typo in comment
s/signifant/significant/

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:59 -07:00
Michael S. Tsirkin
8b7e15772a IB/mthca: Fix handling of send CQE with error for QPs connected to SRQ
mthca_free_err_wqe() currently treats both send and receive CQEs
identically if a QP is using an SRQ.  But for Tavor hardware, send
CQEs with error can be chained together even if the RQ is part of SRQ,
so we may miss some CQEs.

Fix by following the WQE chain for all send CQEs even for non-SRQ QPs.

This fixes crashes in IPoIB CM:
<https://bugs.openfabrics.org//show_bug.cgi?id=604>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Linus Torvalds
8aee74c8ee Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
2007-05-21 16:19:32 -07:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Michael S. Tsirkin
b18aad7150 IB/mthca: Fix RESET to ERROR transition
According to the IB spec, a QP can be moved from RESET to the ERROR 
state, but mthca firmware does not support this and returns an error if 
we try.  Work around this FW limitation by moving the QP from RESET to
INIT with dummy parameters and then transitioning from INIT to ERROR.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:57 -07:00
Rolf Manderscheid
3f37cae694 IB/mthca: Set GRH:HopLimit when building MLX headers
Global CM packets used by rmda_cm were being sent with a GRH:hopLimit
of zero, causing them to be dropped by the router.  The problem is a
missing initialization of the hop_limit field in mthca_read_ah(),
which was called by build_mlx_header() when sending a MAD on QP1.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Ali Ayoub
de57c9f102 IB/mthca: Fix use-after-free on device restart
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Michael S. Tsirkin
bd18c11277 IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ
mthca_cq_clean() updates the CQ consumer index without moving CQEs
back to HW ownership.  As a result, the same WRID might get reported
twice, resulting in a use-after-free.  This was observed in IPoIB CM.
Fix by moving all freed CQEs to HW ownership.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:34 -07:00
Michael S. Tsirkin
3e28c56b9b IB/mthca: Fix posting >255 recv WRs for Tavor
Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must
be updated each doorbell, otherwise the next doorbell will use an
incorrect index.

Found by Ronni Zimmermann at Mellanox.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:34 -07:00
Roland Dreier
f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Linus Torvalds
972d45fb43 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Convert to NAPI
  IB: Return "maybe missed event" hint from ib_req_notify_cq()
  IB: Add CQ comp_vector support
  IB/ipath: Fix a race condition when generating ACKs
  IB/ipath: Fix two more spin lock problems
  IB/fmr_pool: Add prefix to all printks
  IB/srp: Set proc_name
  IB/srp: Add orig_dgid sysfs attribute to scsi_host
  IPoIB/cm: Don't crash if remote side uses one QP for both directions
  RDMA/cxgb3: Support for new abort logic
  RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
  RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
  RDMA/cxgb3: Fix TERM codes
  IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
  IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
  IB/mthca: Work around kernel QP starvation
  IB/ipath: Don't put QP in timeout queue if waiting to send
  IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-07 12:18:21 -07:00
Roland Dreier
ed23a72778 IB: Return "maybe missed event" hint from ib_req_notify_cq()
The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested.  In
other words, this means that the following race is possible:

	while (CQ is not empty)
		ib_poll_cq(CQ);
	// new completion is added after while loop is exited
	ib_req_notify_cq(CQ);
	// no event is generated for the existing completion

To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.

However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification).  Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.

Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:

	 < 0	means an error occurred while requesting notification
	== 0	means notification was requested successfully, and if
		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
		events were missed and it is safe to wait for another
		event.
	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
		passed in.  It means that the consumer must poll the
		CQ again to make sure it is empty to avoid the race
		described above.

We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin
f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Jean Delvare
6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Michael S. Tsirkin
9ba6d5529d IB/mthca: Work around kernel QP starvation
With mthca, RC QPs can starve each other and even UD QPs on the same
hardware schedule queue.  As a result, userspace MPI can starve
e.g. IPoIB traffic, with netdev watchdog warnings getting printed out,
and TCP connections getting stuck or failing.

Reduce the chance of this happening by using three separate hardware
schedule queues: one for userspace RC QPs, one for kernel RC QPs, and
one for all other QPs.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Joachim Fenkes
1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Roland Dreier
30c00986f3 IB/mthca: Simplify CQ cleaning in mthca_free_qp()
mthca_free_qp() already has local variables to hold the QP's send_cq
and recv_cq, so we can slightly clean up the calls to mthca_cq_clean()
by using those local variables instead of expressions like
to_mcq(qp->ibqp.send_cq).

Also, by cleaning the recv_cq first, we can avoid worrying about
whether the QP is attached to an SRQ for the second call, because we
would only clean send_cq if send_cq is not equal to recv_cq, and that
means send_cq cannot have any receive completions from the QP being
destroyed.

All this work even improves the generated code a bit:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function                                     old     new   delta
mthca_free_qp                                510     505      -5

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:11 -07:00
Roland Dreier
532c3b5817 IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash
in mthca_write_mtt() with non-memfree HCAs that have their memory
hidden (that is, have only two PCI BARs instead of having a third BAR
that allows access to the RAM attached to the HCA) on 64-bit
architectures.  This is because the commit just before, c20e20ab
("IB/mthca: Merge MR and FMR space on 64-bit systems") makes
dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and
hence mthca_write_mtt() tries to write directly into the HCA's MTT
table.  However, since that table is in the HCA's memory, this is
impossible without the PCI BAR that gives access to that memory.

This causes a crash because mthca_tavor_write_mtt_seg() basically
tries to dereference some offset of a NULL pointer.  Fix this by
adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always
use the WRITE_MTT firmware command rather than writing directly if
FMRs are not enabled.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:04 -07:00
Roland Dreier
3f114853d4 IB/mthca: Update HCA firmware revisions
Update the driver's list of current firmware versions with Mellanox's
latest releases.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:02 -07:00