Commit Graph

1067 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
badff6d01a [SK_BUFF]: Introduce skb_reset_transport_header(skb)
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more "complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:15 -07:00
Arnaldo Carvalho de Melo
459a98ed88 [SK_BUFF]: Introduce skb_reset_mac_header(skb)
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:32 -07:00
Arnaldo Carvalho de Melo
4c13eb6657 [ETH]: Make eth_type_trans set skb->dev like the other *_type_trans
One less thing for drivers writers to worry about.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:30 -07:00
Joachim Fenkes
1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Joachim Fenkes
c4ed790dfd IB/ehca: Implement modify_port
Add "Modify Port" verb support to eHCA driver.  The IB communication
manager needs this to set the IsCM port capability bit when
initializing.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Roland Dreier
37aebbde70 IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq().  This cleans up the source a little and makes the
code smaller too:

add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function                                     old     new   delta
ipoib_cm_tx_reap                             403     406      +3
ipoib_cm_stale_task                          146     145      -1
ipoib_cm_dev_stop                            173     172      -1
ipoib_cm_tx_handler                          964     956      -8
ipoib_cm_rx_handler                          956     937     -19
ipoib_cm_skb_reap                            212     190     -22

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:37 -07:00
Hal Rosenstock
de493d47d8 IB/mad: Change SMI to use enums rather than magic return codes
Clarify code by changing return values from SMI functions to named
enum values instead of magic 0/1 values.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
aeba84a925 IB/umad: Implement GRH handling for sent/received MADs
We need to set the SGID index for routed MADs and pass received
GRH information to userspace when a MAD is received.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
46f1b3d7af IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
To support destinations that are not on the local IB subnet, IPoIB
should include the GRH information when constructing an address
handle.  Using the existing ib_init_ah_from_path() call will do this
for us.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
d0e7bb1418 IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
src_path_bits needs to mask off the base LID value.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
9d41b7fdea IB/ucm: Simplify ib_ucm_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Sean Hefty
d92f76448c RDMA/ucma: Simplify ucma_get_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Roland Dreier
30c00986f3 IB/mthca: Simplify CQ cleaning in mthca_free_qp()
mthca_free_qp() already has local variables to hold the QP's send_cq
and recv_cq, so we can slightly clean up the calls to mthca_cq_clean()
by using those local variables instead of expressions like
to_mcq(qp->ibqp.send_cq).

Also, by cleaning the recv_cq first, we can avoid worrying about
whether the QP is attached to an SRQ for the second call, because we
would only clean send_cq if send_cq is not equal to recv_cq, and that
means send_cq cannot have any receive completions from the QP being
destroyed.

All this work even improves the generated code a bit:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function                                     old     new   delta
mthca_free_qp                                510     505      -5

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:11 -07:00
Roland Dreier
532c3b5817 IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash
in mthca_write_mtt() with non-memfree HCAs that have their memory
hidden (that is, have only two PCI BARs instead of having a third BAR
that allows access to the RAM attached to the HCA) on 64-bit
architectures.  This is because the commit just before, c20e20ab
("IB/mthca: Merge MR and FMR space on 64-bit systems") makes
dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and
hence mthca_write_mtt() tries to write directly into the HCA's MTT
table.  However, since that table is in the HCA's memory, this is
impossible without the PCI BAR that gives access to that memory.

This causes a crash because mthca_tavor_write_mtt_seg() basically
tries to dereference some offset of a NULL pointer.  Fix this by
adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always
use the WRITE_MTT firmware command rather than writing directly if
FMRs are not enabled.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:04 -07:00
Roland Dreier
3f114853d4 IB/mthca: Update HCA firmware revisions
Update the driver's list of current firmware versions with Mellanox's
latest releases.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:02 -07:00
Robert Walsh
40b90430ec IB/ipath: Fix WC format drift between user and kernel space
The kernel ib_wc structure now uses a QP pointer, but the user space
equivalent uses a QP number instead.  This means we can no longer use
a simple structure copy to copy stuff into user space.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:01 -07:00
Robert Walsh
6ce73b07db IB/ipath: Check that a UD work request's address handle is valid
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:00 -07:00
Robert Walsh
0d6172a428 IB/ipath: Remove duplicate stuff from ipath_verbs.h
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:00 -07:00
Robert Walsh
253fb39020 IB/ipath: Check reserved memory keys
Don't let userspace use the direct-physical-map L_key or R_key.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:00 -07:00
Bryan O'Sullivan
f0810daf74 IB/ipath: Fix unit selection when all CPU affinity bits set
At some point things changed so that all the affinity bits can be set,
but cpus_full() macro is not true.  This caused problems with the unit
selection logic on multi-unit (board) configurations.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
662af5813b IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
53c1d2c943 IB/ipath: Disable IB link earlier in shutdown sequence
Move the code that shuts down the IB link earlier in the unload
process, to be sure no new packets can arrive while we are unloading.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
490462c268 IB/ipath: Prevent random program use of diags interface
To prevent random utility reads and writes of the diag interface to the
chip, we first require a handshake of reading from offset 0 and writing
to offset 0 before any other reads or writes can be done through the
diags device.   Otherwise chip errors can be triggered.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
f5408ac7cc IB/ipath: On unrecoverable errors, force link down, LEDs off
If the chip is no longer usable, LEDs should be turned off so system
can be found easily in the cluster.

Also some minor reorganizing so both chips print hardware error
message at same point and only if there were unrecovered errors

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Michael Albaugh
27b044a815 IB/ipath: Fix driver crash (in interrupt or during unload) after chip reset
Re-init of the kernel structures after a chip reset was leaving the
portdata structure for port zero in an inconsistent state, and a
pointer to it either stale (in re-init code) or NULL (in devdata)
Fixing the order of operations on this struct, and the condition for
interrupt access, prevents the crashes.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Bryan O'Sullivan
9783ab4058 IB/ipath: Improve handling and reporting of parity errors
Mostly cleanup.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Bryan O'Sullivan
820054b7ca IB/ipath: Print better error messages if kernel is misconfigured
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Arthur Jones
569b87b47f IB/ipath: Force PIOAvail update entry point
Due to a chip bug, the PIOAvail register is not always updated to
memory.  This patch allows userspace to force an update.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Arthur Jones
7b196e2ff3 IB/ipath: Call free_irq() on chip specific initialization failure
In initialization, if we bailed at chip specific initialization, we
forgot to clean up the irq we had requested.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Bryan O'Sullivan
5a7d4eea91 IB/ipath: Discard multicast packets without a GRH
This patch fixes a bug where multicast packets without a GRH were not
being dropped as per the IB spec.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Bryan O'Sullivan
0ed3c594e3 IB/ipath: Fix calculation for number of kernel PIO buffers
If the module parameter "kpiobufs" is set too high, the calculation to
reset it to a sane value was incorrect.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Bryan O'Sullivan
c8c6f5d496 IB/ipath: Remove unused ipath_read_kreg64_port()
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Ralph Campbell
dd5190b6be IB/ipath: Fix RDMA reads of length zero and error handling
Fix RDMA read response length checking for RDMA_READ_RESPONSE_ONLY to
allow a zero length response.  RDMA read responses which don't match
the expected length or occur in response to some other operation
should generate a completion queue error (see table 56, ch. 9.9.2.3 in
the IB spec).

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Mark Debbage
c7e29ff11f IB/ipath: Allow receive ports mapped into userspace to be shared
Improve port-sharing performance by allowing any process to receive
packets from the shared hardware port under a spin lock for mutual
exclusion. Previously, one process was nominated as the master and
that process was responsible for receiving all packets from the shared
hardware port and either consuming them or forwarding them to their
destination. This led to starvation problems for other processes when
the master process was busy in computation phases.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Ralph Campbell
0a5a83cffc IB/ipath: Fix port sharing on powerpc
The port sharing feature mixed kernel virtual addresses as well as
physical addresses for the offset used to describe the mmap address to
map the InfiniPath hardware into user space.  This had a conflict on
powerpc.  The new scheme converts it to a physical address so it
doesn't conflict with chip addresses and yet still fits in 40/44 bits
so it isn't truncated by 32-bit applications calling mmap64().

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:56 -07:00
Bryan O'Sullivan
041eab9136 IB/ipath: Fix CQ flushing when QP is modified to error state
If a receive work request has been removed from the queue but has not
had a CQ entry generated for it and the QP is modified to the error
state, the completion entry generated is incorrect.  This patch fixes
the problem.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:56 -07:00
Bryan O'Sullivan
614d49a21e IB/ipath: Fix bad argument to clear_bit()
Code was converted from a &= ~mask to clear_bit, but the bit was left
shifted instead of being used directly, so we were either trashing
memory several pages away, or sometimes taking a kernel page fault on
an invalid page.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:56 -07:00
Bryan O'Sullivan
8ec1077b35 IB/ipath: Change packet problems vs chip errors handling and reporting
Some types of packet errors are moderately common with longer IB
cables and large clusters, and are not reported with prints by other
IB HCA drivers.  This suppresses those messages unless the new
__IPATH_ERRPKTDBG bit is set in ipath_debug.  Reporting of temporarily
disabled frequent error interrupts was also made clearer

We also distinguish between chip errors, and bad packets sent or
received in the wording of the messages.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
6f5c407460 IB/ipath: Fix PSN update for RC retries
This patch fixes a number of bugs with updating the PSN for retries of
RC requests.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
0434d271fd IB/ipath: Fix QP error completion queue entries
When switching to the QP error state, the completion queue entries
(error or flush) were not being generated correctly.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Bryan O'Sullivan
39c0d0b919 IB/ipath: Fix up some debug messages
ipath_dbg doesn't need the same prefixes that printk does.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
3859e39d75 IB/ipath: Support larger IB_QP_MAX_DEST_RD_ATOMIC and IB_QP_MAX_QP_RD_ATOMIC
This patch adds support for multiple RDMA reads and atomics to be sent
before an ACK is required to be seen by the requester.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
7b21d26dda IB/ipath: NMI cpu lockup if local loopback used
If a post send is done in loopback and there is no receive queue
entry, the sending QP is put on a timeout list for a while so the
receiver has a chance to post a receive buffer. If the another post
send is done, the code incorrectly tried to put the QP on the timeout
list again an corrupted the timeout list. This eventually leads to a
spin lock deadlock NMI due to the timer function looping forever with
the lock held.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Ralph Campbell
9f9630d5e1 IB/ipath: Fix SRQ limit event causing dropped CQ entry
A silly programming error causes a CQ entry to not be generated if a
SRQ limit event is generated.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Ralph Campbell
947d7617a1 IB/ipath: Don't initialize port memory for subports
A recent change was made to allocate memory for a port after CPU
affinity is set. That change didn't account for subports and was
trying to allocate memory for the port twice.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Bryan O'Sullivan
1908574559 IB/ipath: Definitions of two RXE parity err bits were reversed
The chip documentation on the expected TID vs eager TID parity error
bits was reversed from what was implemented in the RTL, for both
chips.  This corrects the definitions.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Bryan O'Sullivan
165c552c35 IB/ipath: Fix user memory region creation when IOMMU present
The loop which initializes the user memory region from an array of
pages was using the wrong limit for the array.  This worked OK when
dma_map_sg() returned the same number as the number of pages.  This
patch fixes the problem.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Bryan O'Sullivan
946db67fbf IB/ipath: Add ability to set and clear IB local loopback
This is a sticky state.  It is useful for diagnosing problems with
boards versus cable/switch problems.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:53 -07:00
Roland Dreier
a89875fc7e IPoIB: Remove pointless opcode field from debugging output
There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line.  In fact the opcode field is not
even defined for completions with a status other than success.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:53 -07:00
Hal Rosenstock
9a4b65e357 IB/umad: Fix declaration of dev_map[]
The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in
dev_map[].  We shouldn't declare it to be twice as big.

Pointed-out-by: Roland Dreier <rolandd@cisco.com>

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
2007-04-18 20:20:53 -07:00
Michael S. Tsirkin
608d8268be IB/mthca: Fix data corruption after FMR unmap on Sinai
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different.  This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.

Fix by re-applying adjust_key() after masking the key.

Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-16 14:10:55 -07:00
Stephen Rothwell
d05c7a80cf [POWERPC] Rename get_property to of_get_property: drivers
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:19 +10:00
Stephen Rothwell
a7edd0e676 [POWERPC] get_property returns const
This just tidies up some of the remains.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:17 +10:00
Steve Wise
1ca19770c5 RDMA/cxgb3: Add set_tcb_rpl_handler
As of commit 6cdbd77e ("cxgb3 - missing CPL hanler and register
setting."), the cxgb3 ethernet NIC driver no longer handles SET_TCB
replies, so we need to do it in the iWARP driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-12 10:37:11 -07:00
Michael S. Tsirkin
6371ea3d48 IPoIB/cm: Fix DMA direction typo
Receive buffers need to be mapped with DMA_FROM_DEVICE.  Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-10 08:58:30 -07:00
Erez Zilber
1d426d6418 IB/iser: Don't defer connection failure notification to workqueue
When a connection is terminated asynchronously from the iSCSI layer's
perspective, iSER needs to notify the iSCSI layer that the connection
has failed.  This is done using a workqueue (switched to from the iSER
tasklet context).  Meanwhile, the connection object (that holds the
work struct) is released.  If the workqueue function wasn't called
yet, it will be called later with a NULL pointer, which will crash the
kernel.

The context switch (tasklet to workqueue) is not required, and
everything can be done from the iSER tasklet. This eliminates the NULL
work struct bug (and simplifies the code).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-05 09:46:04 -07:00
Linus Torvalds
a26b5fce06 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: Handle aborting a command after it is sent
  IB/mthca: Fix thinko in init_mr_table()
  RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
2007-03-28 14:00:01 -07:00
Erez Zilber
3104a2175d IB/iser: Handle aborting a command after it is sent
The SCSI midlayer may abort a command that was already sent.  If the
initiator is still trying to send the command (or data-out PDUs for
that command), the QP may time out after the midlayer times
out. Therefore, when aborting the command, iSER may still have
references for the command's buffers.  When sending these PDUs, the
sends will complete with an error and their resources will be released
then.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 16:35:09 -07:00
Michael S. Tsirkin
0264d88531 IB/mthca: Fix thinko in init_mr_table()
Commit c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems")
swapped the number of MTTs and MPTs when initializing the MR table. As
a result, we get a kernel oops when the number of MTT segments
allocated exceeds 0x20000.

Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by
Dotan Barak <dotanb@mellanox.co.il>.  This fixes
https://bugs.openfabrics.org/show_bug.cgi?id=490

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 15:59:32 -07:00
Steve Wise
ed6ee5178e RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
This was spotted by the Coverity checker (CID 1554).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 15:54:40 -07:00
Alexey Kuznetsov
ecbb416939 [NET]: Fix neighbour destructor handling.
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.

The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.

I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down.  But it
would be better to add explicit test for neigh->dead in any case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:01 -07:00
Michael S. Tsirkin
77d8e1efea IB/ipoib: Fix thinko in packet length checks
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet.  Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets.  For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.

This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Michael S. Tsirkin
d04d01b113 IPoIB: Fix use-after-free in path_rec_completion()
The connected mode code added the possibility that an neigh struct
gets freed in the list_for_each_entry() loop in path_rec_completion(),
which causes a use-after-free.  Fix this by changing to the _safe
variant of the list walking macro.

This was spotted by the Coverity checker (CID 1567).

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Joachim Fenkes
73b9e9870f IB/ehca: Make scaling code work without CPU hotplug
eHCA scaling code must not depend on register_cpu_notifier() if
CONFIG_HOTPLUG_CPU is not set, so put all related code into #ifdefs.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Steve Wise
d601347188 RDMA/cxgb3: Handle build_phys_page_list() failure in iwch_reregister_phys_mem()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Bryan O'Sullivan
fae8773b73 IB/ipath: Check return value of lookup_one_len
This fixes kernel.org bug 8003.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:15 -07:00
Sean Hefty
e07832b662 IPoIB: Fix race in detaching from mcast group before attaching
There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish()
where we can try to detach from a multicast group before we've
attached to it.  Fix this by reordering the code in ipoib_mcast_leave
to free the multicast group first, which waits for the multicast
callback thread (which calls ipoib_mcast_join_finish()) to complete
before detaching from the group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:32:09 -07:00
Michael S. Tsirkin
60a596dab7 IPoIB/cm: Fix reaping of stale connections
The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped.  Fix this by
changing to time_before_eq().

Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:32:09 -07:00
Al Viro
62577fa324 [PATCH] fix ipath_dma_free_coherent() prototype
method gets u64, not dma_addr_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:49 -07:00
Mike Christie
bf32ed33e9 [SCSI] iscsi: rename DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
This patch renames DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH to avoid
confusion with the drivers default values (DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
is the iscsi RFC specific default).

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-03-11 11:26:50 -05:00
Shirley Ma
55c9adde13 IPoIB: Turn on interface's carrier after broadcast group is joined
Do netif_carrier_on() right after the IPv4 broadcast multicast group
is joined, rather than waiting for all of the initial set of multicast
group joins to finish.  This allows at least IPv4 traffic to limp
along on broken fabrics where not all multicast groups can be joined.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-08 14:59:30 -08:00
Sean Hefty
3492856e33 RDMA/ucma: Avoid sending reject if backlog is full
Change the returned error code to ENOMEM if the connection event
backlog is full.  This prevents the ib_cm from issuing a reject
on the connection, which can allow retries to succeed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 14:58:11 -08:00
Steve Wise
e64518f373 RDMA/cxgb3: Fix MR permission problems
Fix memory region permission problems:

- remove useless and redundant iwch_mem_perms enum.

- create ib_to_tpt_access_rights() for mapping ib access rights
  to T3 TPT permissions.

- create ib_to_mwbind_access_rights() for mapping ib access rights
  to T3 MWBIND WR permissions.

- fix up the mem reg code to utilize the new functions.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:51:02 -08:00
Steve Wise
1f6a849b7c RDMA/cxgb3: Don't reuse skbs that are non-linear or cloned
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:57 -08:00
Steve Wise
8cfccf02bb RDMA/cxgb3: Squelch logging AE errors
Only print one AE error for a given connection in the kernel log.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:53 -08:00
Steve Wise
adf376b370 RDMA/cxgb3: Stop EP timer when MPA exchange is aborted by peer
Stop the endpoint timer when the MPA exchange is aborted by the peer.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:49 -08:00
Steve Wise
2df50da00e RDMA/cxgb3: Move QP to error on destroy if the state is IDLE
Change iwch_destroy_qp() to always move the QP to ERROR and let
iwch_modify_qp() decide what to do.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:45 -08:00
Steve Wise
42e3175354 RDMA/cxgb3: Fixes for "normal close" failures
Fixes for "normal close" failures:

- Start normal close timer when moving to CLOSING state.
- Handle ABORTING state in close_con_rpl().
- Stop timer correctly on abort during a normal close.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:37 -08:00
David Miller
c3bb1092c8 RDMA/cxgb3: Fix build on sparc64
cxgb3 uses dma_alloc_coherent() et al. thus needs linux/dma-mapping.h
include in order to build reliably.

Noticed on sparc64.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:45:57 -08:00
Sean Hefty
cb164b8c6a RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port()
The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list.  Fix this by using kzalloc() to make
sure all pointers are NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:41:44 -08:00
Steve Wise
aeb100e246 RDMA/cxgb3: Don't use mm after it's freed in iwch_mmap()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 11:48:17 -08:00
Steve Wise
7d526e6b2c RDMA/cxgb3: Start ep timer on a MPA reject
If the consumer rejects the connection we end up under-referencing the
endpoint structure.  The fix is to call iwch_ep_disconnect() instead
of the low level disconnect functions so that the endpoint close timer
is started correctly.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 11:47:05 -08:00
Roland Dreier
88171cfed5 IB/mthca: Fix error path in mthca_alloc_memfree()
The garbled logic in mthca_alloc_memfree() causes it to return 0, even
if it fails to allocate all doorbell records.  Fix it to return -ENOMEM
when it fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-01 13:17:14 -08:00
Hoang-Nam Nguyen
31726798bd IB/ehca: Fix sync between completion handler and destroy cq
This patch fixes two issues reported by Roland Dreier and Christoph Hellwig:

- Mismatched sync/locking between completion handler and destroy cq We
  introduced a counter nr_events per cq to track number of irq events
  seen. This counter is incremented when an event queue entry is seen
  and decremented after completion handler has been called regardless
  if scaling code is active or not. Note that nr_callbacks tracks
  number of events assigned to a cpu and both counters can potentially
  diverge.

  The sync between running completion handler and destroy cq is done
  by using the global spin lock ehca_cq_idr_lock.

- Replace yield by wait_event on the counter above to become zero.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-01 13:04:05 -08:00
Roland Dreier
a27cbe8782 IPoIB: Only handle async events for one port
An asynchronous event carries the port number that the event occurred
on, so there's no reason for an IPoIB interface to process an event
associated with a different local HCA port.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-27 07:37:49 -08:00
Roland Dreier
843613b047 IPoIB: Correct debugging output when path record lookup fails
If path_rec_completion() is passed a non-NULL path record pointer
along with an unsuccessful status value, the tracing code incorrectly
prints the (invalid) DLID from the path record rather than the more
interesting status code.  The actual logic of the function correctly
uses the path record only if the status indicates a successful lookup.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-26 12:57:08 -08:00
Steve Wise
2f236735fd RDMA/cxgb3: Stop the EP Timer on BAD CLOSE
Stop the ep timer in ec_status() if the status indicates a
bad close.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-23 13:12:48 -08:00
Adrian Bunk
2b540355cd RDMA/cxgb3: cleanups
- don't mark static functions in C files as inline - gcc should know
  best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
  - cxio_hal.c: cxio_hal_clear_qp_ctx()
  - iwch_provider.c: iwch_get_qp()
- remove the following unused global functions:
  - cxio_hal.c: cxio_allocate_stag()
  - cxio_resource.: cxio_hal_get_rhdl()
  - cxio_resource.: cxio_hal_put_rhdl()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-23 13:10:43 -08:00
Sean Hefty
1836854f25 RDMA/cma: Remove unused node_guid from cma_device structure
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:35 -08:00
Sean Hefty
e971b8cd19 IB/cm: Remove ca_guid from cm_device structure
The cm_device references an ib_device, which already contains the node_guid.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:33 -08:00
Sean Hefty
962063e64b RDMA/cma: Request reversible paths only
The rdma_cm requires that path records be reversible.  Set the
reversible bit when issuing an path record query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:07 -08:00
Sean Hefty
47645d8d25 IB/core: Set hop limit in ib_init_ah_from_wc correctly
The hop_limit value in the ah_attr should be 0xFF, not the value read
from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB
spec.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:02 -08:00
Roland Dreier
aaf1aef55f IB/uverbs: Return correct error for invalid PD in register MR
If no matching PD is found in ib_uverbs_reg_mr(), then the function
jumps to err_release without setting the return value ret.  This means
that ret will hold the return value of the call to ib_umem_get() a few
lines earlier; if the function reaches the point where it looks for
the PD, we know that ib_umem_get() must have returned 0, so
ib_uverbs_reg_mr() ends up return 0 for a bad PD ID.  Fix this by
setting ret to -EINVAL before jumping to the exit path when no PD is
found.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 13:16:51 -08:00
Roland Dreier
658bcef619 IPoIB: Remove unused local_rate tracking
Now that low-level drivers handle the conversion from an absolute rate
to a relative rate, there's no need for the IPoIB driver to keep track
of the local port's data rate.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-21 20:28:05 -08:00
Michael S. Tsirkin
1812063ba3 IPoIB/cm: Improve small message bandwidth
Avoid the overhead of freeing/reallocating and mapping/unmapping for
DMA pages that have not been written to by hardware.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20 20:16:14 -08:00
Adrian Bunk
c9add6ec56 IB/mthca: Make 2 functions static
This patch makes the needlessly global functions mthca_tavor_write_mtt_seg()
and mthca_arbel_write_mtt_seg() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20 20:12:02 -08:00
Linus Torvalds
874ff01bd9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  Documentation/kernel-docs.txt update.
  arch/cris: typo in KERN_INFO
  Storage class should be before const qualifier
  kernel/printk.c: comment fix
  update I/O sched Kconfig help texts - CFQ is now default, not AS.
  Remove duplicate listing of Cris arch from README
  kbuild: more doc. cleanups
  doc: make doc. for maxcpus= more visible
  drivers/net/eexpress.c: remove duplicate comment
  add a help text for BLK_DEV_GENERIC
  correct a dead URL in the IP_MULTICAST help text
  fix the BAYCOM_SER_HDX help text
  fix SCSI_SCAN_ASYNC help text
  trivial documentation patch for platform.txt
  Fix typos concerning hierarchy
  Fix comment typo "spin_lock_irqrestore".
  Fix misspellings of "agressive".
  drivers/scsi/a100u2w.c: trivial typo patch
  Correct trivial typo in log2.h.
  Remove useless FIND_FIRST_BIT() macro from cardbus.c.
  ...
2007-02-19 13:29:02 -08:00
Robert P. J. Day
d08df601a3 Various typo fixes.
Correct mis-spellings of "algorithm", "appear", "consistent" and
(shame, shame) "kernel".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:07:33 +01:00
Roland Dreier
7084f8429c IB/core: Set static rate in ib_init_ah_from_path()
The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 15:31:24 -08:00
Roland Dreier
630e61f2fa IB/ipath: Make ipath_map_sg() static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:58:08 -08:00
Roland Dreier
38abaa63bf IB/core: Fix sparse warnings about shadowed declarations
Change a couple of variable names to avoid sparse warnings about
symbols being shadowed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:41:14 -08:00
Sean Hefty
c8f6a362bf RDMA/cma: Add multicast communication support
Extend rdma_cm to support multicast communication.  Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space.  The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP.  The port space determines the signature used in the
MGID when joining the group.  The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.

Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant.  This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP.  The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.

Multicast support is also exported to userspace through the rdma_ucm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:29:07 -08:00
Sean Hefty
faec2f7b96 IB/sa: Track multicast join/leave requests
The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left.  Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.

To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations.  Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:20:02 -08:00
Michael S. Tsirkin
8a2e65f87c IPoIB: CM error handling thinko fix
ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must
use dev_kfree_skb_any(), not kfree_skb().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise
c52daa2976 RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver
Remove the Open Grid Computing copyright.  It shouldn't be there.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise
a1a750523b RDMA/cxgb3: Fail posts synchronously when in TERMINATE state
For T3B devices, mark user QP in error once we transition
to TERMINATE.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise
ebb90986e1 RDMA/iwcm: iw_cm_id destruction race fixes
iwcm iw_cm_id destruction race condition fixes:

- iwcm_deref_id() always wakes up if there's another reference.
- clean up race condition in cm_work_handler().
- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.
- rem_ref() if this is the last reference -and- the IWCM owns freeing the
  cm_id, then free it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Hoang-Nam Nguyen
6bbcea0d42 IB/ehca: Change query_port() to return LINK_UP instead UNKNOWN
Set the port phys state as returned from ehca_query_port() to LINK_UP.
ehca actually represents a logical HCA, whose phys/link state always
is LINK_UP.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Hoang-Nam Nguyen
4fd3006032 IB/ehca: Allow en/disabling scaling code via module parameter
Allow users to en/disable scaling code when loading ib_ehca module,
rather than requiring the module to be rebuilt to change the setting.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Hoang-Nam Nguyen
8b16cef3df IB/ehca: Fix race condition/locking issues in scaling code
Fix a race condition in find_next_cpu_online() and some other locking
issues in ehca scaling code.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Hoang-Nam Nguyen
78d8d5f9ef IB/ehca: Rework irq handler
Rework ehca interrupt handling to avoid/reduce missed irq events.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Roland Dreier
551fd6122d IPoIB: Only allow root to change between datagram and connected mode
Change the permissions of the "mode" sysfs attribute to be S_IWUSR
instead of S_IWUGO.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:33 -08:00
Roland Dreier
11282b32a4 IB/mthca: Fix allocation of ICM chunks in coherent memory
The change to allow allocating ICM chunks from coherent memory did not
increment the count of sg entries properly, so a chunk that required
more than allocation would not be mapped properly by the HCA.

Fix this by adding the missing increment of chunk->nsg.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:33 -08:00
Dotan Barak
fc89afce34 IB/mthca: Allow the QP state transition RESET->RESET
RESET->RESET is an allowed QP state transition, so mthca should handle
it correctly, by just returning success without involving the firmware.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:32 -08:00
Thomas Gleixner
38515e908b [PATCH] Scheduled removal of SA_xxx interrupt flags fixups
The obsolete SA_xxx interrupt flags have been used despite the scheduled
removal.  Fixup the remaining users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: James Simmons <jsimmons@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Linus Torvalds
93bbad8fe1 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mthca: Always fill MTTs from CPU
  IB/mthca: Merge MR and FMR space on 64-bit systems
  IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
  IB/mthca: Give reserved MTTs a separate cache line
  IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
  RDMA/cxgb3: Add driver for Chelsio T3 RNIC
  IB: Remove redundant "_wq" from workqueue names
  RDMA/cma: Increment port number after close to avoid re-use
  IB/ehca: Fix memleak on module unloading
  IB/mthca: Work around gcc bug on sparc64
  IPoIB: Connected mode experimental support
  IB/core: Use ARRAY_SIZE macro for mandatory_table
  IB/mthca: Use correct structure size in call to memset()
2007-02-13 21:16:39 -08:00
Michael S. Tsirkin
b2875d4c39 IB/mthca: Always fill MTTs from CPU
Speed up memory registration by filling in MTTs directly when the CPU
can write directly to the whole table (all mem-free cards, and to
Tavor mode on 64-bit systems with the patch I posted earlier).  This
reduces the number of FW commands needed to register an MR by at least
a factor of 2 and speeds up memory registration significantly.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
c20e20ab0f IB/mthca: Merge MR and FMR space on 64-bit systems
For Tavor, we currently reserve separate MPT and MTT space for FMRs to
avoid abusing the vmalloc space on 32 bit kernels. No such problem
exists on 64 bit kernels so let's not do it there.

This way we have a shared pool for MR and FMR resources, used on
demand.  This will also make it possible to write MTTs for regular
regions directly from driver.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
391e4dea71 IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
We allocate the MTT table with alloc_pages() and then do pci_map_sg(),
so we must call pci_dma_sync_sg() after the CPU writes to the MTT
table.  This works since the device will never write MTTs on mem-free
HCAs, once we get rid of the use of the WRITE_MTT firmware command.
This change is needed to make that work, and is an improvement for
now, since it gives FMRs a chance at working.

For MPTs, both the device and CPU might write there, so we must
allocate DMA coherent memory for these.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
1d1f19cfce IB/mthca: Give reserved MTTs a separate cache line
MTTs are allocated in non-cache-coherent memory, so we must give
reserved MTTs their own cache line, to prevent both device and
CPU from writing into the same cache line at the same time.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
c7d204e8fd IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
The reserved_mtts field has different meaning in Tavor and Arbel, so
we are wasting mtt entries on memfree. Fix the Arbel case to match
Tavor semantics.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Steve Wise
b038ced7b3 RDMA/cxgb3: Add driver for Chelsio T3 RNIC
Add an RDMA/iWARP driver for the Chelsio T3 1GbE and 10GbE adapters.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:18 -08:00
Arjan van de Ven
2b8693c061 [PATCH] mark struct file_operations const 3
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Robert P. J. Day
c376222960 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:27 -08:00
Sean Hefty
c7f743a669 IB: Remove redundant "_wq" from workqueue names
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Sean Hefty
aedec08050 RDMA/cma: Increment port number after close to avoid re-use
Randomize the starting port number and avoid re-using port values
immediately after they are closed.  Instead keep track of the last
port value used and increment it every time a new port number is
assigned, to better replicate other port spaces.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Akinobu Mita
65e5c02621 IB/ehca: Fix memleak on module unloading
Percpu data is not freed on module unloading.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:49 -08:00
David Howells
6bdd61d876 IB/mthca: Work around gcc bug on sparc64
For some reason gcc-3.4.5 on sparc64 does:

 WARNING: "____ilog2_NaN" [drivers/infiniband/hw/mthca/ib_mthca.ko] undefined!

Points to note:

 (1) The asm volatile flush/flushw are just markers for viewing what comes out
     in the assembly; removing them has no effect on the result.

 (2) Changing almost anything else in dwh__mthca_arbel_init_srq_context() or
     dwh__mthca_alloc_srq() causes the problem to go away.

The compiler command line issued by the kernel build is:

/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -fno-strict-aliasing -fno-common -Os -m64 -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wa,--undeclared-regs -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -fasynchronous-unwind-tables -g  -c -o drivers/infiniband/hw/mthca/.tmp_mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c

This can be reduced to this whilst still retaining the problem:

/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -m64 -c -o drivers/infiniband/hw/mthca/mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c -Os

Removing -Os or changing it to -O or -O0 thru -O6 gets rid of the problem.

This patch to the kernel code fixes the problem:

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:49 -08:00
Michael S. Tsirkin
839fcaba35 IPoIB: Connected mode experimental support
The following patch adds experimental support for IPoIB connected
mode, as defined by the draft from the IETF ipoib working group.  The
idea is to increase performance by increasing the MTU from the maximum
of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
code, I'm able to get 800MByte/sec or more with netperf without
options on a Mellanox 4x back-to-back DDR system.

Some notes on code:
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction, so
   each connection is either active(TX) or passive (RX).  2 sides that
   want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
   this keeps the code simple without CQ resize and other tricks
6. To detect stale passive side connections (where the remote side is
   down), we keep an LRU list of passive connections (updated once per
   second per connection) and destroy a connection after it has been
   unused for several seconds. The LRU rule makes it possible to avoid
   scanning connections that have recently been active.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:48 -08:00
Ahmed S. Darwish
9a6b090c0d IB/core: Use ARRAY_SIZE macro for mandatory_table
Use ARRAY_SIZE() macro already defined in kernel.h instead of open
coding equivalent code.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Roland Dreier
99d4f22e91 IB/mthca: Use correct structure size in call to memset()
When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof
*ib_ah_attr instead of sizeof *path.

Pointed out by Jack Morgenstein <jackm@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Al Viro
b437735645 [PATCH] iscsi endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Greg Kroah-Hartman
43cb76d91e Network: convert network devices to use struct device instead of class_device
This lets the network core have the ability to handle suspend/resume
issues, if it wants to.

Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm
driver fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:11 -08:00
Hoang-Nam Nguyen
b45bfcc1ae IB/ehca: Remove obsolete prototypes
Remove prototypes for functions that don't exist.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:58 -08:00
Hoang-Nam Nguyen
4c34bdf58c IB/ehca: Remove use of do_mmap()
This patch removes do_mmap() from ehca:
 - Call remap_pfn_range() for hardware register block
 - Use vm_insert_page() to register memory allocated for completion
   queues and queue pairs
 - The actual mmap() call/trigger is now controlled by user space,
   ie. libehca

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:57 -08:00
Steve Wise
1f12667021 RDMA/addr: Handle ethernet neighbour updates during route resolution
The iWARP connection manager uses the ib_addr services to do route
resolution (neighbour discovery in the IP world).  The ib_addr
netevent callback routine, however, currently only acts on InfiniBand
neighbour updates.  It needs to act on ethernet neighbour updates as
well.

This patch just removes filtering on device type altogether and will
trigger on any neighour updates where the nud_type is valid.  This
simplifies the code some.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:57 -08:00
Ishai Rabinovitz
1033ff670d IB/srp: Don't wait for response when QP is in error state.
When there is a call to send_tsk_mgmt SRP posts a send and waits for 5
seconds to get a response.

When the QP is in the error state it is obvious that there will be no
response so it is quite useless to wait.  In fact, the timeout causes
SRP to wait a long time to reconnect when a QP error occurs. (Each
abort and each reset_device calls send_tsk_mgmt, which waits for the
timeout).  The following patch solves this problem by identifying the
failure and returning an immediate error code.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:56 -08:00
Michael S. Tsirkin
062dbb69f3 IB: Return qp pointer as part of ib_wc
struct ib_wc currently only includes the local QP number: this matches
the IB spec, but seems mostly useless. The following patch replaces
this with the pointer to qp itself, and updates all low level drivers
and all users.

This has the following advantages:
- Ability to get a per-qp context through wc->qp->qp_context
- Existing drivers already have the qp pointer ready in poll cq, so
  this change actually saves a tiny bit (extra memory read) on data path
  (for ehca it would actually be expensive to find the QP pointer when
  polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
  NULL for ehca)
- Users that need the QP number can still get it through wc->qp->qp_num

Use case:

In IPoIB connected mode code, I have a common CQ shared by multiple
QPs.  To track connection usage, I need a way to get at some per-QP
context upon the completion, and I would like to avoid allocating
context object per work request just to stick a QP pointer into it.
With this code, I can just use wc->qp->qp_context.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:55 -08:00
Hoang-Nam Nguyen
cea9ea67e9 IB/ehca: Fix mismatched spin_unlock in irq handler
The lock is taken with _irqsave and hence must be released with
_irqrestore on all paths.

Signed-off-by Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:55 -08:00
Hoang-Nam Nguyen
ce29d72cc7 IB/ehca: Fix improper use of yield() with spinlock held
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:55 -08:00
Ishai Rabinovitz
a20f3a6d7e IB/srp: Check match_strdup() return
Checks if the kmalloc in match_strdup() was successful, and bail out
on looking at the token if it failed.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:54 -08:00
Dotan Barak
f5e10529a9 IB/mthca: Don't execute QUERY_QP firmware command for QP in RESET state
If a QP being queried is in the RESET state, don't execute the
QUERY_QP firmware command (because it will fail).

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-09 14:14:28 -08:00
Hoang-Nam Nguyen
f2d9136133 IB/ehca: Use proper GFP_ flags for get_zeroed_page()
Here is a patch for ehca to use proper flag, ie. GFP_ATOMIC
resp. GFP_KERNEL, when calling get_zeroed_page() to prevent "Bug:
scheduling while atomic...". This error does not cause a kernel panic
but makes ipoib un-usable afterwards.  It is reproducible on
2.6.20-rc4 if one does ifconfig down during a flood ping test.  I have
not observed this error in earlier releases incl. 2.6.20-rc1.

This error occurs when a qp event/irq is received and ehca event
handler allocates a control block/page to obtain HCA error data block.
Use of GFP_ATOMIC when in interrupt context prevents this issue.

Signed-off-by Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-09 14:14:24 -08:00
Jack Morgenstein
98714cb161 IB/mthca: Fix PRM compliance problem in atomic-send completions
According to the Tavor and Arbel programmer's reference manuals, the
number of bytes transferred is not provided in the byte_cnt field of
the CQ entry for atomic operation completions.  For atomic operations,
the number of bytes transferred is always 8 (when the status is
"success"), and this constant value should always be used by the
driver in the ib_wc entry returned, rather than using the CQE.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:25:24 -08:00
Sean Hefty
0cefcf0bbc RDMA/ucma: Don't report events with invalid user context
There's a problem with how rdma cm events are reported to userspace
that can lead to application crashes.

When a new connection request arrives, a context for the connection is
allocated in the kernel.  The connection event is then reported to
userspace.  The userspace library retrieves the event and allocates
its own context for the connection.  The userspace context is
associated with the kernel's context when accepting.  This allows the
kernel to give userspace context with other events.

A problem occurs if a second event for the same connection occurs
before the user has had a chance to call accept.  The userspace
context has not yet been set, which causes the librdmacm to crash.
(This has been seen when the app takes too long to call accept,
resulting in the remote side timing out and rejecting the connection)

Fix this by ignoring events for new connections until userspace has
set their context.  This can only happen if an error occurs on a new
connection before the user accepts it.  This is okay, since the accept
will just fail later.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:20:08 -08:00
Sean Hefty
30a5ec982e RDMA/ucma: Fix struct ucma_event leak when backlog is full
We discard new connection requests while the listen backlog is full,
but leak a struct ucma_event in the process.  Free the structure in
this case.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:17:34 -08:00
Steve Wise
881a045fc5 RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects
The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE,
not event RDMA_CM_EVENT_REJECTED.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:15:58 -08:00
Erez Zilber
f0938401f2 IB/iser: Return error code when PDUs may not be sent
iSER limits the number of outstanding PDUs to send. When this threshold
is reached, it should return an error code (-ENOBUFS) instead of setting
the suspend_tx bit (which should be used only by libiscsi).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 10:15:15 -08:00
Michael S. Tsirkin
46707e96b7 IB/mthca: Fix off-by-one in FMR handling on memfree
mthca_table_find() will return the wrong address when the table entry
being searched for is exactly at the beginning of a sglist entry
(other than the first), because it uses >= when it should use >.

Example: assume we have 2 entries in scatterlist, 4K each, offset is
4K.  The current code will return first entry + 4K when we really want
the second entry.

In particular this means mapping an FMR on a memfree HCA may end up
writing the page table into the wrong place, leading to memory
corruption and also causing the HCA to use an incorrect address
translation table.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-04 19:46:32 -08:00