Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
net/ieee80211: fix more crypto-related build breakage
[PATCH] Spidernet: add ethtool -S (show statistics)
[NET] GT96100: Delete bitrotting ethernet driver
[PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
[PATCH] Cirrus Logic ep93xx ethernet driver
r8169: the MMIO region of the 8167 stands behin BAR#1
e1000, ixgb: Remove pointless wrappers
[PATCH] Remove powerpc specific parts of 3c509 driver
[PATCH] s2io: Switch to pci_get_device
[PATCH] gt96100: move to pci_get_device API
[PATCH] ehea: bugfix for register access functions
[PATCH] e1000 disable device on PCI error
drivers/net/phy/fixed: #if 0 some incomplete code
drivers/net: const-ify ethtool_ops declarations
[PATCH] ethtool: allow const ethtool_ops
[PATCH] sky2: big endian
[PATCH] sky2: fiber support
[PATCH] sky2: tx pause bug fix
drivers/net: Trim trailing whitespace
[PATCH] ehea: IBM eHEA Ethernet Device Driver
...
Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).
Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve the firmware selection by adding 2 cases where we should use the
optimized firmware:
* when the actual PCIe link width is lower than 8x.
* when the board is plugged to one of the new Intel PCIe chipsets that
are known to provide aligned PCIe completions.
The patch actually raises two concerns:
* We might want to add a generic PCI function to get the PCIe link width since
some other drivers (at least ipath) do the same. But we probably do not want
to add a new function for every PCIe capability. I will probably look at it
and discuss it on linux-pci in the future.
* As requested during the submission, the PCI ids of chipsets that are known to
provided aligned completion are defined in the myri10ge code. If we keep adding
new ones, it might become better to move them to pciids.h.
But, this sort of quirk to detect these chipsets are very specific to our NIC,
I don't think it is worth moving it to the PCI core until somebody else really
needs it.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some recent myri10ge firmwares support multicast filtering as well
as an extended mcp_irq_data structure (64 instead of 40 bytes).
The new command MXGEFW_CMD_SET_STATS_DMA_V2 is used to check
whether the firmware support those. mgp->fw_multicast_support
is defined accordingly.
When fw_multicast_support is set, some new multicast filtering
commands is passed to the board in myri10ge_set_multicast_list().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add msg_enable and use netif_msg_link to enable/disable reporting
of link status changes since some Ethernet switches seem to generate
a lot of status changes under some circumstances and some people
want to avoid useless flooding in the logs.
Also add a counter for link status changes to statistics.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert the myri10ge driver to use netdev_alloc_skb instead of dev_alloc_skb,
which requires to propagate the net_device across several functions.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dummy RDMA are always enabled on device startup since commit
9a71db721a (to work around buggy
PCIe chipsets which do not implement resending properly). But,
we also need to always re-enable them when resuming the device.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When writing the firmware to the NIC, the FIFO is 256-bytes long,
so we use 256-bytes chunks and a read to wait until the previous
write is done.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Always do a dummy RDMA after loading the firmware to work around
buggy PCIe chipsets which do not implement resending properly.
This is so cheap as to be almost free, and should never have been
conditional on the tx boundary != 4096.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Andrew Morton wrote:
> All these functions return error codes, and we're not checking them. We
> should. So there's a patch which marks all these things as __must_check,
> which causes around 1,500 new warnings.
>
The following patch fixes such a warning in myri10ge.
Check pci_enable_device() return value in myri10ge_resume().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds the wrapper function skb_is_gso which can be used instead
of directly testing skb_shinfo(skb)->gso_size. This makes things a little
nicer and allows us to change the primary key for indicating whether an skb
is GSO (if we ever want to do that).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the IRQ line, the tx_boundary, and whether Write-combining and MSI
are enabled to the list of parameters that are exported to ethtool.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Displaying the interface name when listing the device parameters
at the end of myri10ge_probe is not a good idea since udev might
rename the interface soon afterwards.
Print the bus id instead, using dev_info().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The workaround for the AER capability of the nVidia chipset has been
removed, we don't need this PCI id anymore. Drop it.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The pm_state field in the myri10ge_priv structure is unused. Drop it.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
First of all it is unnecessary to allocate a new skb in skb_pad since
the existing one is not shared. More importantly, our hard_start_xmit
interface does not allow a new skb to be allocated since that breaks
requeueing.
This patch uses pskb_expand_head to expand the existing skb and linearize
it if needed. Actually, someone should sift through every instance of
skb_pad on a non-linear skb as they do not fit the reasons why this was
originally created.
Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
TCP, etc.). As it is skb_pad will simply write over a cloned skb. Because
of the position of the write it is unlikely to cause problems but still
it's best if we don't do it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to restore the state right after saving it for later recovery
since commit 99dc804d9b (PCI: disable msi mode
in pci_disable_device) now prevents pci_save_state() from disabling MSI.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
We don't need to hardcode the AER capability of the nVidia CK804 chipset
anymore since commit cf34a8e07f (PCI: nVidia
quirk to make AER PCI-E extended capability visible) now makes sure that
this cap will be available to pci_find_ext_capability().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The following patch updates the myri10ge to 1.0.0, with the following changes:
* Switch to dma_alloc_coherent API.
* Avoid PCI burst when writing the firmware on chipset with unaligned completions.
* Use ethtool_op_set_tx_hw_csum instead of ethtool_op_set_tx_csum.
* Include linux/dma-mapping.h to bring DMA_32/64BIT_MASK on all architectures
(was missing at least on alpha).
* Some typo and warning fixes.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
drivers/net/myri10ge/myri10ge.c | 57 +++++++++++++++++++-----------
1 file changed, 37 insertions(+), 20 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>