The Ammasso driver needs to use dma_alloc_coherent() for
allocating memory that will be used by the HW for dma.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The eHCA driver does not compile for a uniprocessor configuration
(CONFIG_SMP=n), due to H_SUCCESS and other symbols being undefined.
This fixes it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We discovered a problem when running IPoIB applications on multiple
CPUs on an Altix system. Many messages such as:
ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq)
appear in syslog, and the driver wedges up.
Apparently this is because writes to the doorbells from different CPUs
reach the device out of order. The following patch adds mmiowb() calls
after doorbell rings to ensure the doorbell writes are ordered.
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't attempt to set up the diagpkt device in the module init code.
Instead, wait until a piece of hardware is initialized. Fixes a
problem when loading the ib_ipath module when no InfiniPath hardware
is present: modprobe would go into the D state and stay there.
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a NULL dereference spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
pci_module_init() convertion in amso1100 driver.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
All HCAs (not just mem-free) need a spare SRQ entry, so bump srq->max
by 1 in all cases.
Noted by Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since pr_debug() has changed from a macro to an inline function when
DEBUG is not defined, its arguments now need to be defined even when
debugging is off. Therefore to_event_str() and to_qp_state_str() need
to be moved out of #ifdef DEBUG. The compiler will throw the
definitions away if DEBUG is not defined, but it needs to be able to
see that the functions exist.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fill in "max_vl_num" (encoded according to VLCap field in the PortInfo MAD)
and "init_type_reply" values in the ib_query_port() verb.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The AMSO driver was not thread-safe in the post WR code and had
code that would sleep if the WR post FIFO was full. Since these
functions can be called on interrupt level I changed the sleep to a
udelay.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
Move the call to ib_register_device() later, since a device should not
be registered until it is completely read to be used. This fixes
crashes that occur if an upper-layer driver such as IPoIB is loaded
before the ehca module.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The PSN used to generate the request following a RDMA read was
incorrect and some state booking wasn't maintained correctly. This
patch fixes that.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use. This patch replaces those with a the init_utsname
helper.
Changes: Removed several uses of init_utsname(). Hope I picked all the
right ones in net/ipv4/ipconfig.c. These are now changed to
utsname() (the per-process namespace utsname) in the previous
patch (2/7)
[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The resize CQ function changes the memory used to store the queue.
Other routines need to honor the lock before accessing the pointer
to the queue and verify that the head and tail are in range.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This change moves around port assignment so that it happens before any
memory is allocated. This allows memory to be allocated on an appropriate
CPU, which improves performance for users of /dev/ipath.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The EEPROM is read via programmable I/O pins. When the driver
is compiled -Os, the CPU can speculatively read the I/O
value before it is valid. This patch fixes the problem.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We can sometimes trigger parity errors due to processor speculative
reads to our write-combined memory (mostly seen on Woodcrest). Add a
stats counter for these.
Factored out the sendbuffererror buffer cancellation code so it can be
used in the new handling; suppress likely subsequent error messages if
within two jiffies of the cancellation.
Also restore 2 dropped TXE lines on hwe_bitsextant noticed while
debugging.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The system must be powercycled to clear a HT CRC error; reloading the
driver is not enough.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We were passing 0 for base and length, which worked on older kernels,
but it doesn't seem to any longer.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the receiver goes into the error state, we need to flush the
posted receive WQEs.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Prior to this change, the driver was not able to support a HT and PCIE
card simultaneously present in the same machine.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fixed mismatch in linkstate/trainingstate shifts and masks in the
IPATH_IBSTATE_MASK macro. It kept some linktrainingstates
from being printed correctly in debug; no functionality issue unless
I misread the code.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is required for IB conformance (spec ch. 9.6.1.5).
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't allow a write to the eeprom from ipathfs unless the write is exactly
128 bytes and starts at offset 0.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Respond with an error to the SM if our GUID is 0, and don't allow the
user to set our GUID to 0.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This gives upper-level protocols a chance to unregister while the device
is still usable.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This also entailed a little GPIO-interrupt general cleanup.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This allows multiple userspace processes to share a single hardware
context in a master/slave arrangement. It is backwards binary compatible
with existing userspace.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the second allocation failed, the first structure allocated in this
routine was not freed.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The sender requests an ACK every 1/2 MB to avoid retransmit timeouts that
were causing MVAPICH mod_bw to fail after a predictable number of sends.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the allocation of mr fails, then c2_reg_phys_mr() leaks the
page_list array it allocated earlier.
This was Coverity CID #1413.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Another NULL dereference spotted by the Coverity checker (cid #1395):
In case we can't alloc the vq_req, we goto bail1, where we call
vq_req_free(c2dev, vq_req); which then dereferences vq_req.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure all 64-bit quantities are cast to unsigned long long
when printed with "%ll" printk formats.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0x08 is the HT capability, while PCI_CAP_ID_HT_IRQCONF would be
the subtype 0x80 that mpic_scan_ht_pic() uses.
Rename PCI_CAP_ID_HT_IRQCONF into PCI_CAP_ID_HT.
And by the way, use it in the ipath driver instead of defining its
own HT_CAPABILITY_ID.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
indirect chains of includes are arch-specific and can't
be relied upon... (hell, even attempt to build it for
itanic would trigger vmalloc.h ones; err.h triggers
on e.g. alpha).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a QP has separate send and receive CQs, then the send CQ will never
have receive completions from that QP in it. So when cleaning the
send CQ, there's no need to pass in an SRQ pointer, even if the QP is
attached to an SRQ.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Trigger device remove and then add when a catastrophic error is
detected in hardware. This, in turn, will cause a device reset, which
we hope will recover from the catastrophic condition.
Since this might interefere with debugging the root cause, add a
module option to suppress this behaviour.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a driver for the Ammasso 1100 gigabit ethernet RNIC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
- Hook iWARP CM into the build system and use it in rdma_cm.
- Convert enum ib_node_type to enum rdma_node_type, which includes
the possibility of RDMA_NODE_RNIC, and update everything for this.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all. Also fix a few other whitespace
bogosities.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Incorrect number of bits was taken for static_rate field.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
port_num was not being returned for unconnected QPs.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When default static rate is returned for Tavor, need to translate it
to an ib rate value.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This stops the generic poll code from waiting for a timeout.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Also added the word "Hardware" after "Fatal" to make it more obvious
that it's hardware, not software.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch only renames files, fixes product names, and updates
comments.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When we first submitted a userspace subnet management agent, it was
rejected, so we left it out of the final driver submission. This patch
removes a number of vestigial references to it.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A lot of ipath layer code was only called in one place. Now that the
ipath_core and ib_ipath drivers are merged, it's more sensible to simply
inline the simple stuff that the layer code was doing.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is little point in keeping the two drivers separate, so we are
merging them.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some userlands try to mmap these pages read-write, so accommodate them.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change comment: no longer imply that user can set ipath_kpiobufs to zero.
Actually set ipath_kpiobufs from parameter. Previously only altered
per-device ipath_lastport_piobuf, which was over-written in chip init.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Allocate enough pointers for all possible ports, to avoid problems in
cleanup/unload.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ordering of writethrough store buffers needs to be forced, and we need
to use ifdef to get writethrough behavior to InfiniPath buffers, because
there is no generic way to specify that at this time (similar to code
in char/drm/drm_vm.c and block/z2ram.c).
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Improve performance of userspace post receive, post SRQ receive, and
poll CQ operations for ipath by allowing userspace to directly mmap()
receive queues and completion queues. This eliminates the copying
between userspace and the kernel in the data path.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Pass a struct ib_udata to the low-level driver's ->modify_srq() and
->modify_qp() methods, so that it can get to the device-specific data
passed in by the userspace driver.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a driver for IBM GX bus InfiniBand adapters, which are usable with
some pSeries/System p systems.
Signed-off-by: Heiko J Schick <schickhj.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make kernel use UAR2 instead of UAR1 for hardware access: this adds
sanity checking from the hardware side, without any performance cost.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The SM LID used to send traps to is incorrectly set to port LID. This
is a regression from 2.6.17 -- after a PortInfo MAD is received, no
traps are sent to the SM LID. The traps go to the loopback interface
instead, and are dropped there. The SM LID should be taken from the
sm_lid of the PortInfo response.
The bug was introduced by commit 12bbb2b7be:
IB/mthca: Add client reregister event generation
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
It is supposed to be OK to call mthca_create_ah() and mthca_destroy_ah()
from any context. However, for mem-full HCAs, these functions use the
mthca_alloc() and mthca_free() bitmap helpers, and those helpers use
non-IRQ-safe spin_lock() internally. Lockdep correctly warns that
this could lead to a deadlock. Fix this by changing mthca_alloc() and
mthca_free() to use spin_lock_irqsave().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Update the driver's list of HCA firmware revisions to make sure people
running Sinai firmware older than 1.1.0 get a message suggesting a
firmware upgrade. Update the Arbel versions as well while we are at it.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Leave all SRQ methods out of the device's uverbs_cmd_mask if the
device doesn't have SRQ support (because of ancient firmware) so that
we don't allow userspace to call the driver's create_srq method. This
fixes a userspace-triggerable oops caused by ib_uverbs_create_srq()
following the device's ->create_srq function pointer, which will be
NULL if the device doesn't support SRQs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When destroying a QP, mthca locks both the QP's send CQ and receive
CQ. However, the following scenario is perfectly valid:
QP_a: send_cq == CQ_x, recv_cq == CQ_y
QP_b: send_cq == CQ_y, recv_cq == CQ_x
The old mthca code simply locked send_cq and then recv_cq, which in
this case could lead to an AB-BA deadlock if QP_a and QP_b were
destroyed simultaneously.
We can fix this by changing the locking code to lock the CQ with the
lower CQ number first, which will create a consistent lock ordering.
Also, the second CQ is locked with spin_lock_nested() to tell lockdep
that we know what we're doing with the lock nesting.
This bug was found by lockdep.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The fence bit needs to be set in the doorbell too, not just the WQE.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Define a constant MTHCA_ARRAY_MASK to replace repeated uses of
(PAGE_SIZE / sizeof (void *) - 1) in mthca array code.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mthca_array_clear() does not clear the slot if the used count is
positive. This leads to crashes in mthca_qp_event() since that uses
mthca_array_get() to check that the qp is valid.
Discovered by Ali Ayoub.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Read the max_cmds value from the response to the QUERY_FW command
before printing out the value, so that the real value goes into the
debug output.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipath_skip_sge() doesn't exactly duplicate the side effects of
ipath_copy_sge() if num_sge > 1 since it doesn't decrement ss->num_sge.
This could result in the sg_list being accessed out of bounds.
Since ipath_skip_sge() is almost always called with num_sge == 1,
the original "optimization" is almost never used.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
I am still working on a proposal to remove the phys_to_virt() calls
in the ib_ipath driver. In the mean time, this patch allows SRP
to work by fixing the R_Key check and conversion from IB address
to kernel virtual address. It also returns the correct page size
for FMRs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a problem where certain error packets are passed
to the InfiniBand layer for processing even though the packet
actually was received with an error.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Mem-free HCAs always keep one spare SRQ WQE, so the SRQ limit cannot
be set beyond srq->max - 1.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
After recent changes, mthca_wq_init does not actually initialize the WQ as it
used to - it simply resets all index fields to their initial values. So,
let's rename it to mthca_wq_reset.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mthca_ah_query returs the static rate of the address handle in internal mthc
format. fix it to use rate encoding from enum ib_rate, which is what users
expect.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mthca: initialize send and receive queue locks separately
lockdep identifies a lock by the call site of its initialization. By
initializing the send and receive queue locks in mthca_wq_init() we confuse
lockdep. It warns that that the ordered acquiry of both locks in
mthca_modify_qp() is recursive acquiry of one lock:
=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
modprobe/1192 is trying to acquire lock:
(&wq->lock){....}, at: [<f892b4db>] mthca_modify_qp+0x60/0xa7b [ib_mthca]
but task is already holding lock:
(&wq->lock){....}, at: [<f892b4ce>] mthca_modify_qp+0x53/0xa7b [ib_mthca]
Initializing the locks separately in mthca_alloc_qp_common() stops the
warning and will let lockdep enforce proper ordering on paths that acquire
both locks.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove ips namespace from infinipath drivers. This renames ips_common.h to
ipath_common.h. Definitions, data structures, etc. that were not used by
kernel modules have moved to user-only headers. All names including ips have
been renamed to ipath. Some names have had an ipath prefix added.
Signed-off-by: Christian Bell <christian.bell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The receive work queue size should be ignored if the QP is created to use a
shared receive queue according to the IB spec.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can't tell for sure if any packets are in the infinipath receive buffer
when we shut down a chip port. Normally this is taken care of by orderly
shutdown, but when processes are terminated, or sending process has a bug, we
can continue to receive packets. So rather than writing zero to the address
registers for the closing port, we point it at a dummy memory.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We must increment uaddr by size we are reading or writing, since it's passed
as a char *, not a pointer to the appropriate size.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We do a few more explicit checks for specific models, and now also support the
old PathScale serial number style, or new QLogic style.
This is backwards compatible with previous versions of software and hardware.
That is, older software will see a plausible serial number and correct GUID
when used with a new board, while newer software will correctly handle an
older board.
Signed-off-by: Mike Albaugh <mike.albaugh@qlogic.com>
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This attribute group made it into the original driver, but should not have.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The two arrays only had space for 4 units.
Also changed from ipath_set_sps_lid() to ipath_set_lid(); the sps was
leftover.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch separates QP state used for sending and receiving RC packets so the
processing in the receive interrupt handler can be done mostly without locks
being held. ACK packets are now sent without requiring synchronization with
the send tasklet.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes some problems uncovered during IB compliance testing to
return the right values for error counters returned by the Performance Get
Counters packet.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The tail register read became redundant as the result of earlier receive
interrupt bug fixes.
Drop another unneeded register read.
And another line that got duplicated.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do an extra check to see if in-memory tail changed while processing packets,
and if so, going back through the loop again (but only once per call to
ipath_kreceive()). In practice, this seems to be enough to guarantee that if
we crossed the clearing of an interrupt at start of ipath_intr with a
scheduled tail register update, that we'll process the "extra" packet that
lost the interrupt because we cleared it just as it was about to arrive.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem was that I was updating the head register multiple times in the
rcvhdrq processing loop, and setting the counter on each update. Since that
meant that the tail register was ahead of head for all but the last update, we
would get extra interrupts. The fix was to not write the counter value except
on the last update.
I also changed to update rcvhdrhead and rcvegrindexhead at most every 16
packets, if there were lots of packets in the queue (and of course, on the
last packet, regardless).
I also made some small cleanups while debugging this.
With these changes, xeon/monty typically sees two openib packets per interrupt
on sdp and ipoib, opteron/monty is about 1.25 pkts/intr.
I'm seeing about 3800 Mbit/s monty/xeon, and 5000-5100 opteron/monty with
netperf sdp. Netpipe doesn't show as good as that, peaking at about 4400 on
opteron/monty sdp. Plain ipoib xeon is about 2100+ netperf, opteron 2900+, at
128KB
Signed-off-by: olson@eng-12.pathscale.com
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Also count the number of interrupts where that works (fastrcvint). On any
interrupt where the port0 head and tail registers are not equal, just call the
ipath_kreceive code without reading the interrupt status, thus saving the
approximately 0.25usec processor stall waiting for the read to return. If any
other interrupt bits are set, or head==tail, take the normal path, but that
has been reordered to handle read ahead of pioavail. Also no longer call
ipath_kreceive() from ipath_qcheck(), because that just seems to make things
worse, and isn't really buying us anything, these days.
Also no longer loop in ipath_kreceive(); better to not hold things off too
long (I saw many cases where we would loop 4-8 times, and handle thousands (up
to 3500) in a single call).
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Made in-memory rcvhdrq tail update be in dma_alloc'ed memory, not random user
or special kernel (needed for ppc, also "just the right thing to do").
Some cleanups to make unexpected link transitions less likely to produce
complaints about packet errors, and also to not leave SMA packets stuck and
unable to go out.
A few other random debug and comment cleanups.
Always init rcvhdrq head/tail registers to 0, to avoid race conditions (should
have been that way some time ago).
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is not a DMA target, so no need to use dma_alloc_coherent on it.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This helps us to survive better when memory is fragmented.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These limits are somewhat artificial in that we don't actually have any
device limits. However, the verbs layer expects that such limits exist
and are enforced, so we make up arbitrary (but sensible) limits.
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is no longer a /dev/ipath_diag file; instead, there's
/dev/ipath_diag0, 1, etc.
It's still not possible to have diags run on more than one unit at a time,
but that's easy to fix at some point.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is needed if we wish to change the size of the resource structures.
Based on an original patch from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Documentation/infiniband/core_locking.txt says:
All of the methods in struct ib_device exported by a low-level
driver must be fully reentrant. The low-level driver is required to
perform all synchronization necessary to maintain consistency, even
if multiple function calls using the same object are run
simultaneously.
However, mthca's modify_qp, modify_srq and resize_cq methods are
currently not reentrant. Add a mutex to the QP, SRQ and CQ structures
so that these calls can be properly serialized.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some error paths after the mthca_alloc_mailbox() call in mthca_modify_qp()
just do a "return -EINVAL" without freeing the mailbox. Convert these
returns to "goto out" to avoid leaking the mailbox storage.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Report the true max_map_per_fmr value from mthca_query_device(),
taking into account the change in FMR remapping introduced by the
Sinai performance optimization.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the mthca snoop of MADs that set PortInfo to check if the SM
has set the client reregister bit, and if it has, generate a client
reregister event. If the bit is not set, just generate a LID change
event as usual.
Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move ipath's struct port_info into <rdma/ib_smi.h>, so that it can be
used by mthca to implement client reregister support.
Remove the __attribute__((packed)) because all the members of the struct
are naturally aligned anyway.
Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The kernel has had wait_for_completion_timeout() for a long time now.
mthca should use it to handle FW commands timing out, instead of
implementing the same thing in a much more complicated way by using
wait_for_completion() along with a timer that does complete().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Memfree firmware is in rare cases reporting WQE index == base - 1 in
receive completion with error, instead of (rq size - 1); base is 0 in
mthca. Here is a patch to avoid kernel crash and report a correct WR
id in this case.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mthca does not restore the following PCI-X/PCI Express registers after reset:
PCI-X device: PCI-X command register
PCI-X bridge: upstream and downstream split transaction registers
PCI Express : PCI Express device control and link control registers
This causes instability and/or bad performance on systems where one of
these registers is set to a non-default value by BIOS.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If we post a list of length exactly a multiple of 256, nreq in
doorbell gets set to 256 which is wrong: it should be encoded by 0.
This is because we only zero it out on the next WR, which may not be
there. The solution is to ring the doorbell after posting a WQE, not
before posting the next one.
This is the same bug that we just fixed for QPs with non-shared RQ.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
At this point, the core QP structure hasn't been initialized, so what's
in there isn't valid. Get the same information elsewhere.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The problem was that node A's sending thread, which handles sending RDMA
read response data, would write the trigger word, the last packet would
be sent, node B would send a new RDMA read request, node A's interrupt
handler would initialize s_rdma_sge, then node A's sending thread would
update s_rdma_sge. This didn't happen very often naturally but was more
frequent with 1 byte RDMA reads. Rather than adding more locking or
increasing the QP structure size and copying sge data, I modified the
copy routine to update the pointers before writing the trigger word to
avoid the update race.
Signed-off-by: Ralph Campbell <ralphc@pathscale.com>
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fixed so it works on the PE-800. It had not previously been updated to
match PE-800 receive interrupt differences from HT-400.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is required for even semi-decent performance on OpenIB.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix NULL deref due to pcidev being clobbered before dd->ipath_f_cleanup()
was called.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the interface version that gets exported to userspace.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure modify_qp won't modify the QP if any of the changes failed.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The local loopback path for RC can lock the rkey table lock without
blocking interrupts. The receive interrupt path can then call
ipath_rkey_ok() and deadlock. Remove the redundant lock.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If we post a list of length 256 exactly, nreq in doorbell gets set to
256 which is wrong: it should be encoded by 0. This is because we
only zero it out on the next WR, which may not be there. The solution
is to ring the doorbell after posting a WQE, not before posting the
next one.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Setting fw_cmd_doorbell allows FW command to be queued using posted
writes instead of requiring polling on a "go" bit, so it should be a
performance boost. However, the option causes problems with at least
some device/firmware combinations, so set the default to 0 until we
understand what's going on better.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ipath driver's table of PCI IDs needs a { 0, } entry at the end.
This makes all of the device aliases visible to userspace so hotplug
loads the module for all supported devices. Without the patch,
modinfo ipath_core only shows:
alias: pci:v00001FC1d0000000Dsv*sd*bc*sc*i*
instead of the correct:
alias: pci:v00001FC1d00000010sv*sd*bc*sc*i*
alias: pci:v00001FC1d0000000Dsv*sd*bc*sc*i*
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Addresses for ioremap must be calculated off of pci_resource_start;
we can't directly use the bus address as seen by the HCA. Fix the
code that remaps device memory for FMR access.
Based on patch by Klaus Smolin.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix races in in destroying various objects. If a destroy routine
waits for an object to become free by doing
wait_event(&obj->wait, !atomic_read(&obj->refcount));
/* now clean up and destroy the object */
and another place drops a reference to the object by doing
if (atomic_dec_and_test(&obj->refcount))
wake_up(&obj->wait);
then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up(). And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.
Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock. This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.
While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it. So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Names that are the opposite of their intended meanings are not so helpful.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The reset code now turns off the PRESENT flag during a reset, so that
other code won't attempt to access a device that's in mid-reset.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remember when the verbs layer unregisters from the lower-level code.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Different ipath hardware types have different numbers of buffers
available, so we decide on the counts ourselves unless we are specifically
overridden with a module parameter.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some systems do not set up 64-bit maps on systems with 2GB or less of
memory installed, so we have to fall back to trying a 32-bit setup.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We were accidentally exposing the "reset" sysfs file more than once
per device.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
GuidInfo records have 8 byte GUIDs in them, so an index should be
multiplied by 8 to get an offset. mthca_query_gid() was incorrectly
multiplying by 16.
Noticed by Leonid Keller <leonid@mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch makes the needlessly global mthca_update_rate() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The driver allocates SRQ WQEs size with a power of 2 size both for
Tavor and for memfree. For Tavor, however, the hardware only requires
the WQE size to be a multiple of 16, not a power of 2, and the max
number of scatter-gather allowed is reported accordingly by the
firmware (and this is the value currently returned by
ib_query_device() and ibv_query_device()).
If the max number of scatter/gather entries reported by the FW is used
when creating an SRQ, the creation will fail for Tavor, since the
required WQE size will be increased to the next power of 2, which
turns out to be larger than the device permitted max WQE size (which
is not a power of 2).
This patch reduces the reported SRQ max wqe size so that it can be used
successfully in creating an SRQ on Tavor HCAs.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The PCI spec recommends against drivers playing with a device's PCI
read burst size, and says that systems software should configure it.
And we actually have users that report that changing it from the
default set by BIOS hurts performance and/or stability for them. On
the other hand, the Mellanox Programmer's Reference Manual recommends
turning it up all the way to the maximum value. Some tests conducted
here in the lab do not show performance improvement from this tuning,
but this might be just me.
As a work-around, make this tuning an option, off by default (safe
value), with an eye towards removing it completely one day if no one
complains.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Push translation of static rate to HCA format into low-level drivers,
where it belongs. For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage. The changes are:
- Add enum ib_rate to midlayer includes.
- Get rid of static rate translation in IPoIB; just use static rate
directly from Path and MulticastGroup records.
- Update mthca driver to translate absolute static rate into the
format used by hardware. This also fixes mthca's static rate
handling for HCAs that are capable of 4X DDR.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the mthca debugging trace output code so that it can enabled
and disabled at runtime with the debug_level module parameter in
sysfs. Also, don't allow CONFIG_INFINIBAND_MTHCA_DEBUG to be disabled
unless CONFIG_EMBEDDED is selected. We want users (and especially
distros) to have this turned on unless they really need to save space,
because by the time we want debugging output, it's usually too late to
rebuild a kernel.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Integrate the ipath core and OpenIB drivers into the kernel build
infrastructure. Add entry to MAINTAINERS.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ipath_verbs.c file implements the driver-specific components of the
kernel's Infiniband verbs layer.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Completion queues, local and remote memory keys, and memory region
support.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is an implementation of the Infiniband RC ("reliable connection")
protocol.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
These files implement the Infiniband UC ("unreliable connection") and UD
("unreliable datagram") protocols.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
These header files are used by the layered Infiniband driver.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The layering interfaces are used to implement the Infiniband protocols
and the ethernet emulation driver.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
These files introduce a char device that userspace apps use to gain
direct memory-mapped access to the InfiniPath hardware, and routines for
pinning and unpinning user memory in cases where the hardware needs to
DMA into the user address space.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ipathfs filesystem contains files that are not appropriate for
sysfs, because they contain binary data. The hierarchy is simple; the
top-level directory contains driver-wide attribute files, while numbered
subdirectories contain per-device attribute files.
Our userspace code currently expects this filesystem to be mounted on
/ipathfs, but a final location has not yet been chosen.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
EEPROM support, interrupt handling, statistics gathering, and write
combining management for x86_64.
A note regarding i2c: The Atmel EEPROM hardware we use looks like an
i2c device electrically, but is not i2c compliant at all from a
functional perspective. We tried using the kernel's i2c support to
talk to it, but failed.
Normal i2c devices have a single 7-bit or 10-bit i2c address that they
respond to. Valid 7-bit addresses range from 0x03 to 0x77. Addresses
0x00 to 0x02 and 0x78 to 0x7F are special reserved addresses
(e.g. 0x00 is the "general call" address.) The Atmel device, on the
other hand, responds to ALL addresses. It's designed to be the only
device on a given i2c bus. A given i2c device address corresponds to
the memory address within the i2c device itself.
At least one reason why the linux core i2c stuff won't work for this
is that it prohibits access to reserved addresses like 0x00, which are
really valid addresses on the Atmel devices.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipath_init_chip.c sets up an InfiniPath device for use.
ipath_diag.c permits userspace diagnostic tools to read and write a
chip's registers. It is different in purpose from the mmap interfaces
to the /sys/bus/pci resource files.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>