Fix a typo in the rearming of the catastrophic error polling timer: we
should rearm the timer as long as the stop flag is _not_ set.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some initial support for detecting and reporting catastrophic
errors reported by Mellanox HCAs. We start a periodic timer which
polls the catastrophic error reporting buffer in device memory. If an
error is detected, we dump the contents of the buffer for port-mortem
debugging, and report a fatal asynchronous error to higher levels.
In the future we can try to recover from these errors by resetting the
device, but this will require some work in higher-level code as well.
Let's get this in now, so that we at least get catastrophic errors
reported in logs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>