* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (48 commits)
IB/srp: Clean up error path in srp_create_target_ib()
IB/srp: Split send and recieve CQs to reduce number of interrupts
RDMA/nes: Add support for KR device id 0x0110
IB/uverbs: Use anon_inodes instead of private infinibandeventfs
IB/core: Fix and clean up ib_ud_header_init()
RDMA/cxgb3: Mark RDMA device with CXIO_ERROR_FATAL when removing
RDMA/cxgb3: Don't allocate the SW queue for user mode CQs
RDMA/cxgb3: Increase the max CQ depth
RDMA/cxgb3: Doorbell overflow avoidance and recovery
IB/core: Pack struct ib_device a little tighter
IB/ucm: Clean whitespace errors
IB/ucm: Increase maximum devices supported
IB/ucm: Use stack variable 'base' in ib_ucm_add_one
IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one
IB/umad: Clean whitespace
IB/umad: Increase maximum devices supported
IB/umad: Use stack variable 'base' in ib_umad_init_port
IB/umad: Use stack variable 'devnum' in ib_umad_init_port
IB/umad: Remove port_table[]
IB/umad: Convert *cdev to cdev in struct ib_umad_port
...
T3 hardware doorbell FIFO overflows can cause application stalls due
to lost doorbell ring events. This has been seen when running large
NP IMB alltoall MPI jobs. The T3 hardware supports an xon/xoff-type
flow control mechanism to help avoid overflowing the HW doorbell FIFO.
This patch uses these interrupts to disable RDMA QP doorbell rings
when we near an overflow condition, and then turn them back on (and
ring all the active QP doorbells) when when the doorbell FIFO empties
out. In addition if an doorbell ring is dropped by the hardware, the
code will now recover.
Design:
cxgb3:
- enable these DB interrupts
- in the interrupt handler, schedule work tasks to call the ULPs event
handlers with the new events.
- ring all the qset txqs when an overflow is detected.
iw_cxgb3:
- disable db ringing on all active qps when we get the DB_FULL event
- enable db ringing on all active qps and ring all active dbs when we get
the DB_EMPTY event
- On DB_DROP event:
- disable db rings in the event handler
- delay-schedule a work task which rings and enables the dbs on
all active qps.
- in post_send and post_recv logic, don't ring the db if it's disabled.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Verify the HW checksum state for frames handed to GRO processing.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace platfrom -> platform.
This is a frequent spelling bug.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add memory barriers to fix crashes observed on newest PowerPC platforms.
The HW and driver state of the receive rings were getting out of sync.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
unmap Rx page only when guaranteed that this page won't be
used anymore to allocate rx page chunks.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the rxq# for LRO when processing the last fragment of a
frame. This helps in fast txq selection for routing workloads.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xmit handler doesn't need to wake the queue after stopping
it temporarily (some other drivers are doing the same).
Patch on net-next-2.6, multiple netperf sessions tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added support of private MAC address per port and provisioning
packet handler for iSCSI traffic only.
The above changes are isolated to the cxgb3 driver, independent of any scsi or iscsi driver changes.
Acked-by: Karen Xie <kxie@chelsio.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Rakesh Ranjan <rakesh@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a couple of cases collapse some extra code like:
int retval = NETDEV_TX_OK;
...
return retval;
into
return NETDEV_TX_OK;
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5e68b772e6
cxgb3: map entire Rx page, feed map+offset to Rx ring.
introduced a regression on platforms defining DECLARE_PCI_UNMAP_ADDR()
and related macros as no-ops.
Rx descriptors are fed with the a page buffer bus address + page chunk offset.
The page buffer bus address is set and retrieved through
pci_unamp_addr_set(), pci_unmap_addr().
These functions being meaningless on x86 (if CONFIG_DMA_API_DEBUG is not set).
The HW ends up with a bogus bus address.
This patch saves the page buffer bus address for all plaftorms.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commits 9d21493b4b
and 08baf56108
(net: tx scalability works : trans_start)
(net: txq_trans_update() helper)
Now that core network takes care of trans_start updates, dont do it
in drivers themselves, if possible. Multi queue drivers can
avoid one cache miss (on dev->trans_start) in their start_xmit()
handler.
Exceptions are NETIF_F_LLTX drivers (vxge & tehuti)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that copying a 16-byte area at ~800k times a second
can be really expensive :) This patch redesigns the frags GRO
interface to avoid copying that area twice.
The two disciples of the frags interface have been converted.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
DMA mapping can be expensive in the presence of iommus.
Reduce the Rx iommu activity by mapping an entire page, and provide the H/W
the mapped address + offset of the current page chunk.
Reserve bits at the end of the page to track mapping references, so the page
can be unmapped.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable timestamps, update delayed ack threshold for iSCSI/iWARP traffic
Remove the len flag in Tx requests. It might corrupt offload trace packets.
Update SGE context setup to avoid potential H/W misprogrammation.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start queue set reclaim timers after the queue sets have been
allocated successfully.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under RX pressure, The HW might generate a high load of interrupts
to signal mac fifo or free lists overflow.
Disable the interrupts, and poll the relevant status bits
to maintain stats.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate TX and RX reclaim handlers
Don't disable interrupts in RX reclaim handler.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elmininate a cache miss when accessing the CPL header within
the first aggregated buffer.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update skb truesize correctly for the 2nd buffer from a Jumbo frame
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Release page chunk reference in case we fail to map it.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ring free lists door bell less frequently,
specifically every quarter of the active FL
size.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LRO switch is always set to 1 in the rx processing loop.
It breaks the accelerated iSCSI receive traffic.
Fix its computation.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes cxgb3 invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
I've kept the ioctl controls for per-queue LRO switches. However,
we should not encourage anyone to use these.
Because of that, I've also kept the skb construction code in
cxgb3. Hopefully we can phase out those per-queue switches
and then kill this too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lro manager's frag_align_pad setting was missing,
leading to misaligned access to the skb passed up
to the stack.
Tested-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I have a system with a Chelsio adapter (driven by cxgb3) whose ports are
part of a Linux bridge. Recently I updated the kernel and discovered
that things stopped working because cxgb3 was doing LRO on packets that
were passed into the bridge code for forwarding. (Incidentally, this
problem manifested itself in a strange way that made debugging a bit
interesting -- for some reason, the skb_warn_if_lro() check in bridge
didn't trigger and these LROed packets were forwarded out a forcedeth
interface, and caused the forcedeth transmit path to get stuck)
This is because cxgb3 has no way of keeping state for the LRO flag until
the interface is brought up, so if the bridging code disables LRO while
the interface is down, then cxgb3_up() will just reenable LRO, and on my
Debian system at least, the init scripts add interfaces to a bridge
before bringing the interfaces up.
Fix this by keeping track of each interface's LRO state in cxgb3 so that
when bridge disables LRO, it stays disabled in cxgb3_up() when the
interface is brought up. I did this by changing the rx_csum_offload
flag into a pair of bit flags; the effect of this on the rx_eth() fast
path is miniscule enough that it should be fine (eg on x86, a cmpb
instruction becomes a testb instruction).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The accelerated iSCSI traffic could use a private IP address unknown to the OS:
- The IP address is required in both drivers to manage ARP requests and connection set up.
- Added an control call to retrieve the ip address.
- Reply to ARP requests dedicated to the private IP address.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement NIC Tx multiqueue.
Bump up driver version.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function is_pure_response() does "ntohl(var) & const" and then
essentially just tests whether the result is 0 or not; this can be done
more efficiently by computing "var & htonl(const)" instead and doing the
byte swap at compile time instead of run time.
This change slightly shrinks the compiled code; eg on x86-64 we save a
couple of bswapl instructions:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8)
function old new delta
t3_sge_intr_msix_napi 544 536 -8
and this also has the pleasant side effect of fixing a sparse warning:
drivers/net/cxgb3/sge.c:2313:15: warning: restricted degrades to integer
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add consistency in alloc_ring() parameter checking
to avoid potential memory leaks.
alloc_ring() callers are correct fo far.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.
Drivers need not do it any more.
Some cases had to be skipped over because the drivers
were making use of the ->last_rx value themselves.
Signed-off-by: David S. Miller <davem@davemloft.net>
when a fatal error occurs, bring ports down, reset the chip,
and bring ports back up.
Factorize code used for both EEH and fatal error recovery.
Fix timer usage when bringing up/resetting sge queue sets.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A SGE queue set timer might access registers while in EEH recovery,
triggering an EEH error loop. Stop all timers early in EEH process.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic lro code checks TCP flags/options.
Remove duplicate tests done in the driver.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>