3d155f8cd0
Add some initial support for detecting and reporting catastrophic errors reported by Mellanox HCAs. We start a periodic timer which polls the catastrophic error reporting buffer in device memory. If an error is detected, we dump the contents of the buffer for port-mortem debugging, and report a fatal asynchronous error to higher levels. In the future we can try to recover from these errors by resetting the device, but this will require some work in higher-level code as well. Let's get this in now, so that we at least get catastrophic errors reported in logs. Signed-off-by: Roland Dreier <rolandd@cisco.com> |
||
---|---|---|
.. | ||
mthca |