The iseries_veth driver tells sysfs that it's called 'iseries_veth', but if
you ask it via ethtool it thinks it's called 'veth'. I think this comes from
2.4 when the driver was called 'veth', but it's definitely called
'iseries_veth' now, so fix it.
To make sure we don't do it again define DRV_NAME and use it everywhere.
While we're at it, change the version number to 2.0, to reflect the changes
made in this patch series.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Having merged iseries_veth.h, let's remove some of the studly caps that came
with it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
iseries_veth.h is only used by iseries_veth.c, so merge the former into
the latter.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Also to aid debugging, add sysfs support for iseries_veth's port structures.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
To aid in field debugging, add sysfs support for iseries_veth's connection
structures. At the moment this is all read-only, however we could think about
adding write support for some attributes in future.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
There's a number of problems with the way iseries_veth counts TX errors.
Firstly it counts conditions which aren't really errors as TX errors. This
includes if we don't have a connection struct for the other LPAR, or if the
other LPAR is currently down (or just doesn't want to talk to us). Neither
of these should count as TX errors.
Secondly, it counts one TX error for each LPAR that fails to accept the packet.
This can lead to TX error counts higher than the total number of packets sent
through the interface. This is confusing for users.
This patch fixes that behaviour. The non-error conditions are no longer
counted, and we introduce a new and I think saner meaning to the TX counts.
If a packet is successfully transmitted to any LPAR then it is transmitted
and tx_packets is incremented by 1.
If there is an error transmitting a packet to any LPAR then that is counted
as one error, ie. tx_errors is incremented by 1.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver often has multiple netdevices sending packets over
a single connection to another LPAR. If the bandwidth to the other LPAR is
exceeded, all the netdevices must have their queues stopped.
The current code achieves this by queueing one incoming skb on the
per-netdevice port structure. When the connection is able to send more packets
we iterate through the port structs and flush any packet that is queued,
as well as restarting the associated netdevice's queue.
This arrangement makes less sense now that we have per-connection TX timers,
rather than the per-netdevice generic TX timer.
The new code simply detects when one of the connections is full, and stops
the queue of all associated netdevices. Then when a packet is acked on that
connection (ie. there is space again) all the queues are woken up.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Currently the iseries_veth driver contravenes the specification in
Documentation/networking/driver.txt, in that if packets are not acked by
the other LPAR they will sit around forever.
This patch adds a per-connection timer which fires if we've had no acks for
five seconds. This is superior to the generic TX timer because it catches
the case of a small number of packets being sent and never acked.
This fixes a bug we were seeing on real systems, where some IPv6 neighbour
discovery packets would not be acked and then prevent the module from being
removed, due to skbs lying around.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver uses the generic TX timeout watchdog, however a better
solution is in the works, so remove this code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver can attach to multiple vlans, which correspond to
multiple net devices. However there is only 1 connection between each LPAR,
so the connection structure may be shared by multiple net devices.
This makes module removal messy, because we can't deallocate the connections
until we know there are no net devices still using them. The solution is to
use ref counts on the connections, so we can delete them (actually stop) as
soon as the ref count hits zero.
This patch fixes (part of) a bug we were seeing with IPv6 sending probes to
a dead LPAR, which would then hang us forever due to leftover skbs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch makes veth_init_connection() and veth_destroy_connection()
symmetrical in that they allocate/deallocate the same data.
Currently if there's an error while initialising connections (ie. ENOMEM)
we call veth_module_cleanup(), however this will oops because we call
driver_unregister() before we've called driver_register(). I've never seen
this actually happen though.
So instead we explicitly call veth_destroy_connection() for each connection,
any that have been set up will be deallocated.
We also fix a potential leak if vio_register_driver() fails.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver unconditionally calls dma_unmap_single() even
when the corresponding dma_map_single() may have failed.
Rework the code a bit to keep the return value from dma_unmap_single()
around, and then check if it's a dma_mapping_error() before we do
the dma_unmap_single().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver uses atomic ops to manipulate the in_use field of
one of its per-connection structures. However all references to the
flag occur while the connection's lock is held, so the atomic ops aren't
necessary.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver keeps a stack of messages for each connection
and a lock to protect the stack. However there is also a per-connection lock
which makes the message stack lock redundant.
Remove the message stack lock and document the fact that callers of the
stack-manipulation functions must hold the connection's lock.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Due to a logic bug, once promiscuous mode is enabled in the iseries_veth
driver it is never disabled.
The driver keeps two flags, promiscuous and all_mcast which have exactly the
same effect. This is because we only ever receive packets destined for us,
or multicast packets. So consolidate them into one promiscuous flag for
simplicity.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver contains a state machine which is used to manage
how connections are setup and neogotiated between LPARs.
If one side of a connection resets for some reason, the two LPARs can get
stuck in a race to re-setup the connection. This can lead to the connection
being declared dead by one or both ends. In practice the connection is
declared dead by one or both ends approximately 8/10 times a connection is
reset, although it is rare for connections to be reset.
(an example here: http://michael.ellerman.id.au/files/misc/veth-trace.html)
The core of the problem is that the end that resets the connection doesn't
wait for the other end to become aware of the reset. So the resetting end
starts setting the connection back up, and then receives a reset from the
other end (which is the response to the initial reset). And so on.
We're severely limited in what we can do to fix this. The protocol between
LPARs is essentially fixed, as we have to interoperate with both OS/400
and old Linux drivers. Which also means we need a fix that only changes the
code on one end.
The only fix I've found given that, is to just blindly sleep for a bit when
resetting the connection, in the hope that the other end will get itself
sorted. Needless to say I'd love it if someone has a better idea.
This does work, I've so far been unable to get it to break, whereas without
the fix a reset of one end will lead to a dead connection ~8/10 times.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver has a timer which we use to send acks. When the
connection is reset or stopped we need to delete the timer.
Currently we only call del_timer() when resetting a connection, which means
the timer might run again while the connection is being re-setup. As it turns
out that's ok, because the flags the timer consults have been reset.
It's cleaner though to call del_timer_sync() once we've dropped the lock,
although the timer may still run between us dropping the lock and calling
del_timer_sync(), but as above that's ok.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Currently the iseries_veth driver prints the file name and line number in its
error messages. This isn't very useful for most users, so just print
"iseries_veth: message" instead.
- convert uses of veth_printk() to veth_debug()/veth_error()/veth_info()
- make terminology consistent, ie. always refer to LPAR not lpar
- be consistent about printing return codes as %d not %x
- make format strings fit in 80 columns
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
I had some time to think about PCI assign issues in 2.6.13-rc series.
The major problem here is that we call pci_assign_unassigned_resources()
way too early - at subsys_initcall level. Therefore we give no chances
to ACPI and PnP routines (called at fs_initcall level) to reserve their
respective resources properly, as the comments in drivers/pnp/system.c
and drivers/acpi/motherboard.c suggest:
/**
* Reserve motherboard resources after PCI claim BARs,
* but before PCI assign resources for uninitialized PCI devices
*/
So I moved the pci_assign_unassigned_resources() call to
pcibios_assign_resources() (fs_initcall), which should hopefully fix a
lot of problems and make PCIBIOS_MIN_IO tweaks unnecessary.
Other changes:
- remove resource assignment code from pcibios_assign_resources(), since
it duplicates pci_assign_unassigned_resources() functionality and
actually does nothing in 2.6.13;
- modify ROM assignment code as per Ben's suggestion: try to use firmware
settings by default (if PCI_ASSIGN_ROMS is not set);
- set CARDBUS_IO_SIZE back to 4K as it's a wonderful stress test for
various setups.
Confirmed by Tero Roponen <teanropo@cc.jyu.fi> (who had problems with
the 4kB CardBus IO size previously).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Damir Perisa <damir.perisa@solnet.ch> reports:
drivers/net/s2io.h:765: error: invalid lvalue in assignment
drivers/net/s2io.h:766: error: invalid lvalue in assignment
That's a gcc4 error. I don't see why the casts are there anyway..
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Create vio_bus_ops so that we just pass a structure to vio_bus_init
instead of three separate function pointers.
Rearrange vio.h to avoid forward references. vio.h only needs
struct device_node from prom.h so remove the include and just
declare it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update version and add 4 minor fixes, the last 2 were suggested by
Jeff Garzik:
1. check for a valid ethernet address before setting it
2. zero out bp->regview if init_one encounters an error and unmaps
the IO address. This prevents remove_one from unmapping again.
3. use netif_rx_schedule() instead of hand coding the same.
4. use IRQ_HANDLED and IRQ_NONE.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all locks from spin_lock_irqsave() to spin_lock_bh(). All
places that require spinlocks are in BH context.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove atomic operations in the fast tx path. Expensive atomic
operations were used to keep track of the number of available tx
descriptors. The new code uses the difference between the consumer
and producer index to determine the number of free tx descriptors.
As suggested by Jeff Garzik, the name of the inline function is
changed to all lower case.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This speeds up link-up time on 5706 SerDes if the link partner does
not autoneg, a rather common scenario in blade servers. Some blade
servers use IPMI for keyboard input and it's important to minimize
link disruptions.
The speedup is achieved by shortening the timer to (HZ / 3) during
the transient period right after initiating a SerDes autoneg. If
autoneg does not complete, parallel detect can be done sooner. After
the transient period is over, the timer goes back to its normal HZ
interval.
As suggested by Jeff Garzik, the timer initialization is moved to
bnx2_init_board() from bnx2_open().
An eeprom bit is also added to allow default forced SerDes speed for
even faster link-up time.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an rtnl deadlock problem when flush_scheduled_work() is
called from bnx2_close(). In rare cases, linkwatch_event() may be on
the workqueue from a previous close of a different device and it will
try to get the rtnl lock which is already held by dev_close().
The fix is to set a flag if we are in the reset task which is run
from the workqueue. bnx2_close() will loop until the flag is cleared.
As suggested by Jeff Garzik, the loop is changed to call msleep(1)
instead of yield() in the original patch.
flush_scheduled_work() is also moved to bnx2_remove_one() before the
netdev is freed.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
w1 does not need to multicast its state to several groups at once,
and upcoming netlink changes will not allow bitmask for groups anyway.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the following possible cleanups/fixes:
- use C99 struct initializers
- make a few arrays and structs static
- remove a few uses of literal 0 as NULL pointer
- use convenience function instead of cast+dereference in bnx2_ioctl()
- remove superfluous casts to u8 * in calls to readl/writel
Signed-off-by: Peter Hagervall <hager@cs.umu.se>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the usage of packet type into the SKB control
buffer. After this patch it is now possible to shrink the sk_buff
structure and redefine its pkt_type.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the sparse warnings "implicit cast to nocast type"
for the priority or gfp_mask parameters of the memory allocations.
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up the virtual HCI driver. It also adds support for
the dynamic minor device number allocation.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Found a bug while reviewing the patches the second time.
The TG3_FLAG_TXD_MBOX_HWBUG flag is set after the register access
methods have been determined. This patch fixes it by moving it up before
the various access methods are assigned.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register write to register 0x68 to restart interrupts is unnecessary
as the interrupt wasn't masked in that register by the irq handler. This
will save one register write in the fast path.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the new workaround for 5703 A1/A2 if it is behind
certain ICH bridges. The workaround disables memory and uses config.
cycles only to access all registers. The 5702/03 chips can mistakenly
decode the special cycles from the ICH chipsets as memory write cycles,
causing corruption of register and memory space. Only certain ICH
bridges will drive special cycles with non-zero data during the address
phase which can fall within the 5703's address range. This is not an ICH
bug as the PCI spec allows non-zero address during special cycles.
However, only these ICH bridges are known to drive non-zero addresses
during special cycles.
The indirect_lock is also changed to spin_lock_irqsave from spin_lock_bh
because it is used in irq handler when using the indirect method to
disable interrupts.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the mailbox read method and also adds an inline function
tw32_mailbox_f() for mailbox writes that require read flush.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds various dedicated register read/write methods for the
existing workarounds, including PCIX target workaround, write with read
flush, etc. The chips that require these workarounds will use these
dedicated access functions.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the basic function pointers to do register accesses in
the fast path. This was suggested by David Miller. The idea is that
various register access methods for different hardware errata can easily
be implemented with these function pointers and performance will not be
degraded on chips that use normal register access methods.
The various register read write macros (e.g. tw32, tr32, tw32_mailbox)
are redefined to call the function pointers.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>