Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
MSI is defined to be 32-bit write. The 5706 does 64-bit MSI writes
with byte enables disabled on the unused 32-bit word. This is legal
but causes problems on the AMD 8132 which will eventually stop
responding after a while.
Without this patch, the MSI test done by the driver during open will
pass, but MSI will eventually stop working after a few MSIs are
written by the device.
AMD believes this incompatibility is unique to the 5706, and
prefers to locally disable MSI rather than globally disabling it
using pci_msi_quirk.
Update version to 1.4.45.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
net/ieee80211: fix more crypto-related build breakage
[PATCH] Spidernet: add ethtool -S (show statistics)
[NET] GT96100: Delete bitrotting ethernet driver
[PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
[PATCH] Cirrus Logic ep93xx ethernet driver
r8169: the MMIO region of the 8167 stands behin BAR#1
e1000, ixgb: Remove pointless wrappers
[PATCH] Remove powerpc specific parts of 3c509 driver
[PATCH] s2io: Switch to pci_get_device
[PATCH] gt96100: move to pci_get_device API
[PATCH] ehea: bugfix for register access functions
[PATCH] e1000 disable device on PCI error
drivers/net/phy/fixed: #if 0 some incomplete code
drivers/net: const-ify ethtool_ops declarations
[PATCH] ethtool: allow const ethtool_ops
[PATCH] sky2: big endian
[PATCH] sky2: fiber support
[PATCH] sky2: tx pause bug fix
drivers/net: Trim trailing whitespace
[PATCH] ehea: IBM eHEA Ethernet Device Driver
...
Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).
Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert dev_alloc_skb() to netdev_alloc_skb() and increase default
rx ring size to 255. The old ring size of 100 was too small.
Update version to 1.4.44.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a subtle race condition between bnx2_start_xmit() and bnx2_tx_int()
similar to the one in tg3 discovered by Herbert Xu:
CPU0 CPU1
bnx2_start_xmit()
if (tx_ring_full) {
tx_lock
bnx2_tx()
if (!netif_queue_stopped)
netif_stop_queue()
if (!tx_ring_full)
update_tx_ring
netif_wake_queue()
tx_unlock
}
Even though tx_ring is updated before the if statement in bnx2_tx_int() in
program order, it can be re-ordered by the CPU as shown above. This
scenario can cause the tx queue to be stopped forever if bnx2_tx_int() has
just freed up the entire tx_ring. The possibility of this happening
should be very rare though.
The following changes are made, very much identical to the tg3 fix:
1. Add memory barrier to fix the above race condition.
2. Eliminate the private tx_lock altogether and rely solely on
netif_tx_lock. This eliminates one spinlock in bnx2_start_xmit()
when the ring is full.
3. Because of 2, use netif_tx_lock in bnx2_tx_int() before calling
netif_wake_queue().
4. Add memory barrier to bnx2_tx_avail().
5. Add bp->tx_wake_thresh which is set to half the tx ring size.
6. Check for the full wake queue condition before getting
netif_tx_lock in tg3_tx(). This reduces the number of unnecessary
spinlocks when the tx ring is full in a steady-state condition.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the wrapper function skb_is_gso which can be used instead
of directly testing skb_shinfo(skb)->gso_size. This makes things a little
nicer and allows us to change the primary key for indicating whether an skb
is GSO (if we ever want to do that).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
For messages prior to register_netdev(), prefer dev_printk() because
that prints out both our driver name and our [PCI | whatever] bus id.
Updates: 8139{cp,too}, b44, bnx2, cassini, {eepro,epic}100, fealnx,
hamachi, ne2k-pci, ns83820, pci-skeleton, r8169.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Minor change in shutdown logic to effect a link down.
Update version to 1.4.43.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all dev_kfree_skb_irq() and dev_kfree_skb_any() to
dev_kfree_skb(). These calls are never used in irq context.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NETIF_F_TSO_ECN feature for all bnx2 hardware.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use CPU native page size to determine various ring sizes. This allows
order-0 memory allocations on all systems.
Added check to limit the page size to 16K since that's the maximum rx
ring size that will be used. This will prevent using unnecessarily
large page sizes on some architectures with large page sizes.
[Suggested by David Miller]
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions to decompress firmware before loading to the internal
CPUs. Compressing the firmware reduces the driver size significantly.
Added file name length sanity check in the gzip header to prevent
going past the end of buffer [suggested by DaveM].
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow WOL settings on 5708 B2 and newer chips that have the problem
fixed.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various drivers use xmit_lock internally to synchronise with their
transmission routines. They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.
With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set. This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.
While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire. So
delaying or dropping the message is to be avoided as much as possible.
So the only alternative is to always set xmit_lock_owner. The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.
I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly. I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.
This is pretty much a straight text substitution except for a small
bug fix in winbond. It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission. This is
unsafe as an IRQ can potentially wake up the queue. So it is safer to
use netif_tx_disable.
The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kmalloc() instead of a local array in bnx2_nvram_write().
Update version to 1.4.40.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a bug in bnx2_nvram_write() caused by a counter variable not
correctly incremented by 4.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_HOTPLUG=n, gcc doesn't like some __initdata to be const (rodata)
and other __initdata not const, so make the non-const __initdata const.
gcc errors:
drivers/net/bnx2.c:66: error: version causes a section type conflict
drivers/net/starfire.c:338: error: version causes a section type conflict
drivers/net/typhoon.c:137: error: version causes a section type conflict
drivers/net/natsemi.c:241: error: version causes a section type conflict
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Put the tx producer and consumer fields in separate cache lines in
the device structure, similar to the VJ net channel queue structure.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Combine two small (56 byte and 320 byte) pci consistent memory
allocations into one allocation. Jeff Garzik suggested to store
the combined size in the bp structure for later use when freeing
the memory.
Use kzalloc() instead of kmalloc() + memset().
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix some link-related problems by doing a coalesce_now after link
change interrupt to flush out the transient link status.
To facilitate this, the host coalesce cmd register value is cached in
the device structure.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/vmalloc.h> so that it compiles properly on all archs.
Update version to 1.4.38.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update version to 1.4.37.
Add missing flush_scheduled_work() in bnx2_suspend as noted by Jeff
Garzik.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support bigger rx ring sizes (up to 1020) in the rx fast path.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase maximum receive ring size from 255 to 1020 by supporting
up to 4 linked pages of receive descriptors. To accomodate the
higher memory usage, each physical descriptor page is allocated
separately and the software ring that keeps track of the SKBs and the
DMA addresses is allocated using vmalloc.
Some of the receive-related fields in the bp structure are re-
organized a bit for better locality of reference.
The max. was reduced to 1020 from 4080 after discussion with David
Miller.
This patch contains ring init code changes only. This next patch
contains rx data path code changes.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the rx code path that does not handle the full rx ring correctly.
When the rx ring is set to the max. size (i.e. 255), the consumer and
producer indices will be the same when completing an rx packet. Fix
the rx code to handle this condition properly.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate some of the registers in ethtool register test to reduce
driver size.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update version to 1.4.31 and add 2006 copyright.
Skip the last digit when reporting the firmware version.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhance the ethtool loopback test with PHY loopback test.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add workaround for a hardware interrupt issue. When using INTA,
unmasking of the interrupt and the tag update should be done
separately to avoid some spurious interrupts,
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix TCP/UDP checksum verification. Use status bits in the buffer
descriptor instead of the checksum value to verify rx checksum.
Using the checksum value will be incorrect if the UDP packet has
zero in the UDP checksum field.
Firmware update required for this fix.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some misc. fixes for WoL, 5708 B1, and a typo '=' instead of '=='.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve handshake with bootcode with the following changes:
1. Increase timeout to 100msec and use msleep instead of udelay.
2. Add more error checking for timeouts and errors.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always set up the device to strip incoming VLAN tags when ASF is
enabled. ASF firmware will not parse packets correctly if VLAN tags
are not stripped.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the superfluous parameter checking in bnx2_{get,set}_eeprom.
The parameters are already validated in ethtool_{get,set}_eeprom.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check return of dev_alloc_skb in bnx2_test_loopback, and handle
appropriately.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Output driver name as prefix to "Unknown flash/EEPROM type." message.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refine bnx2_poll() logic to write back the most up-to-date status tag
when all work has been processed. This eliminates some occasional
extra interrupts when a older status tag is written even though all
work has been processed.
The idea is to read the status tag just before exiting bnx2_poll() and
then check again for any new work. If no new work is pending, the
status tag written back will not generate any extra interrupt. This
logic is similar to the changes David Miller did to tg3_poll().
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Dynamically determine the shared memory location where eeprom
parameters are stored instead of using a fixed location.
Add speed reporting to management firmware. This allows management
firmware to know the current speed without contending for MII
registers.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Update bnx2 nvram code with support for 5708.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add 5708 copper and serdes basic support, including 2.5 Gbps support
on 5708 serdes. SPEED_2500 is also added to ethtool.h
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix bug in bnx2_interrupt() that caused an unnecessary register read.
The BNX2_PCICFG_MISC_STATUS should only be read when the status tag
has not changed.
Add prefetch of the status block in bnx2_msi() similar to tg3_msi().
The status block is not touched in bnx2_msi() and prefetching it will
speed up bnx2_poll() that will run on the same CPU that received the
MSI.
Update version.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix remaining bits of u32 vs. pm_message confusion. Should not break
anything.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update version and add 4 minor fixes, the last 2 were suggested by
Jeff Garzik:
1. check for a valid ethernet address before setting it
2. zero out bp->regview if init_one encounters an error and unmaps
the IO address. This prevents remove_one from unmapping again.
3. use netif_rx_schedule() instead of hand coding the same.
4. use IRQ_HANDLED and IRQ_NONE.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all locks from spin_lock_irqsave() to spin_lock_bh(). All
places that require spinlocks are in BH context.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove atomic operations in the fast tx path. Expensive atomic
operations were used to keep track of the number of available tx
descriptors. The new code uses the difference between the consumer
and producer index to determine the number of free tx descriptors.
As suggested by Jeff Garzik, the name of the inline function is
changed to all lower case.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This speeds up link-up time on 5706 SerDes if the link partner does
not autoneg, a rather common scenario in blade servers. Some blade
servers use IPMI for keyboard input and it's important to minimize
link disruptions.
The speedup is achieved by shortening the timer to (HZ / 3) during
the transient period right after initiating a SerDes autoneg. If
autoneg does not complete, parallel detect can be done sooner. After
the transient period is over, the timer goes back to its normal HZ
interval.
As suggested by Jeff Garzik, the timer initialization is moved to
bnx2_init_board() from bnx2_open().
An eeprom bit is also added to allow default forced SerDes speed for
even faster link-up time.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an rtnl deadlock problem when flush_scheduled_work() is
called from bnx2_close(). In rare cases, linkwatch_event() may be on
the workqueue from a previous close of a different device and it will
try to get the rtnl lock which is already held by dev_close().
The fix is to set a flag if we are in the reset task which is run
from the workqueue. bnx2_close() will loop until the flag is cleared.
As suggested by Jeff Garzik, the loop is changed to call msleep(1)
instead of yield() in the original patch.
flush_scheduled_work() is also moved to bnx2_remove_one() before the
netdev is freed.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the following possible cleanups/fixes:
- use C99 struct initializers
- make a few arrays and structs static
- remove a few uses of literal 0 as NULL pointer
- use convenience function instead of cast+dereference in bnx2_ioctl()
- remove superfluous casts to u8 * in calls to readl/writel
Signed-off-by: Peter Hagervall <hager@cs.umu.se>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new driver bnx2 for Broadcom bcm5706 is available.
The patch also includes new 1000BASE-X advertisement bit definitions in
mii.h
Thanks to David Miller and Jeff Garzik for reviewing and their valuable
feedback.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>