Noticed by Jarod Wilson: The bus manager work was unnecessarily delayed
each time the bus generation counter rolled over.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
The whole topology code only works if the old and new topologies which
are compared come from immediately successive self ID complete events.
If there happened bus resets without self ID complete events in the
meantime, or self ID complete events with invalid selfIDs, the topology
comparison could identify nodes wrongly, or more likely just corrupt
kernel memory or panic right away.
We now discard all nodes of the old topology and treat all current nodes
as new ones if the current self ID generation is not the previous one
plus 1.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
An earlier change, maybe long ago, removed the copying of self_id_count
into card->self_id_count. Since then each bus reset cleared
card->bm_retries even when it shouldn't.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Take a reference to the card whenever fw_card_bm_work() is scheduled on
that card and release it when the work is done. This allows us to
remove the cancel_delayed_work_sync() in fw_core_remove_card().
Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (patch update)
With the bus_resets patch applied, it is easy to see this memory leak
by repeatedly resetting the firewire bus while running slabtop in
another window. Just watch kmalloc-32 grow and grow...
Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reported by Jay Fenlason: A bus reset tasklet may call
fw_flush_transactions and touch transactions (call their callback which
will free them) while the context which submitted the transaction is
still inserting it into the transmission queue.
A simple solution to this problem is to _not_ "flush" the transactions
because of a bus reset (complete the transcations as 'cancelled'). They
will now simply time out (completed as 'cancelled' by the split-timeout
timer).
Jay Fenlason thought of this fix too but I was quicker to type it out.
:-)
Background:
Contexts which access an instance of struct fw_transaction are:
1. the submitter, until it inserted the packet which is embedded in the
transaction into the AT req DMA,
2. the AsReqTrContext tasklet when the request packet was acked by the
responder node or transmission to the responder failed,
3. the AsRspRcvContext tasklet when it found a request which matched
an incoming response,
4. the card->flush_timer when it picks up timed-out transactions to
cancel them,
5. the bus reset tasklet when it cancels transactions (this access is
eliminated by this patch),
6. a process which shuts down an fw_card (unregisters it from fw-core
when the controller is unbound from fw-ohci) --- although in this
case there shouldn't really be any transactions anymore because we
wait until all card users finished their business with the card.
All of these contexts run concurrently (except for the 6th, presumably).
The 1st is safe against the 2nd and 3rd because of the way how a request
packet is carefully submitted to the hardware. A race between 2nd and
3rd has been fixed a while ago (bug 9617). The 4th is almost safe
against 1st, 2nd, 3rd; there are issues with it if huge scheduling
latencies occur, to be fixed separately. The 5th looks safe against
2nd, 3rd, and 4th but is unsafe against 1st. Maybe this could be fixed
with an explicit state variable in struct fw_transaction. But this
would require fw_transaction to be rewritten as only dynamically
allocatable object with reference counting --- not a good solution if we
also can simply kill this 5th accessing context (replace it by the 4th).
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Fix: The fact that nodes had different gap counts would be overlooked
if the bus manager code would pick gap count 63 because of beta
repeaters or because of very large hop counts. In this case, the bus
manager code would miss that it actually has to send the PHY config
packet with gap count 63.
Related trivial changes: Use bool for an int used as bool, touch up
some comments.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
When a device changes its configuration ROM, it announces this with a
bus reset. firewire-core has to check which node initiated a bus reset
and whether any unit directories went away or were added on this node.
Tested with an IOI FWB-IDE01AB which has its link-on bit set if bus
power is available but does not respond to ROM read requests if self
power is off. This implements
- recognition of the units if self power is switched on after fw-core
gave up the initial attempt to read the config ROM,
- shutdown of the units when self power is switched off.
Also tested with a second PC running Linux/ieee1394. When the eth1394
driver is inserted and removed on that node, fw-core now notices the
addition and removal of the IPv4 unit on the ieee1394 node.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
The bus management workqueue job was in danger to dereference NULL
pointers. Also, after having temporarily lifted card->lock, a few node
pointers and a device pointer may have become invalid.
Add NULL pointer checks and get the necessary references. Also, move
card->local_node out of fw_card_bm_work's sight during shutdown of the
card.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
fw_device.node_id and fw_device.generation are accessed without mutexes.
We have to ensure that all readers will get to see node_id updates
before generation updates.
Fixes an inability to recognize devices after "giving up on config rom",
https://bugzilla.redhat.com/show_bug.cgi?id=429950
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>.
Verified to fix 'giving up on config rom' issues on multiple system and
drive combinations that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
cleanup after "firewire: support S100B...S400B and link slower than PHY"
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Table-based gap count optimization cannot be used if 1394b repeater PHYs
are present. But it does work with 1394b leaf nodes.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Drop filenames from file preamble, drop editor annotations and
use standard indent style for block comments.
Signed-off-by: Kristian Hoegsberg <krh@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (fixed typo)
With the CRC ITU-T implementation available in lib/ we can use that instead.
This also fixes a bug in the topology map crc computation.
Signed-off-by: Kristian Hoegsberg <krh@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (fixed Kconfig)
In case the topology build fails, we want to retain the old topology
info until another reset finishes and results in a valid new tree. If
we clear card->irm_node to NULL and the topology build fails, we end up
dereferencing a NULL pointer in a few places.
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This prevents superfluous bus traffic as fw-sbp2 logs in only to
get kicked off the device by another bus reset as the driver core
does bus management. Scheduling it this way lets the driver core
finish bus management before higher level drivers get the update
callback.
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
drivers/firewire/fw-topology.c: In function `report_found_node':
drivers/firewire/fw-topology.c:345: error: `typeof' applied to a bit-field
drivers/firewire/fw-topology.c:345: error: `typeof' applied to a bit-field
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Definitions as per IEEE 1212 and IEEE 1394:
Node ID: Concatenation of bus ID and local ID. 16 bits long.
Bus ID: Identifies a particular bus within a group of buses
interconnected by bus bridges.
Local ID: Identifies a particular node on a bus.
PHY ID: Local ID of IEEE 1394 nodes. 6 bits long.
Never ever use a variable called node_id for anything else than a node ID.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This patch contains the following cleanups:
- "extern inline" -> "static inline"
- fw-topology.c: make struct fw_node_create static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>