backtrace_mask has been used under the code context of
ARCH_HAS_NMI_WATCHDOG. So put it into that context.
We were warned by the following warning:
arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On some configs the following build error triggers:
arch/x86/kernel/apic/hw_nmi.c:35: error: 'apic' undeclared (first use in this function)
arch/x86/kernel/apic/hw_nmi.c:35: error: (Each undeclared identifier is reported only once
arch/x86/kernel/apic/hw_nmi.c:35: error: for each function it appears in.)
Because asm/apic.h was only included implicitly. Include it explicitly.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <1273713674-8434-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The design of the hardlockup watchdog has changed and cruft was left
behind in the hw_nmi.c file. Just remove the code that isn't used
anymore.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-7-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
As part of the transition of the nmi watchdog to something more
generic, the trigger_all_cpu_backtrace code is getting left behind.
Put it in its own die_notifier so it can still be used.
V2:
- use arch_spin_locks
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-6-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The new nmi_watchdog (which uses the perf event subsystem) is very
similar in structure to the softlockup detector. Using Ingo's
suggestion, I combined the two functionalities into one file:
kernel/watchdog.c.
Now both the nmi_watchdog (or hardlockup detector) and softlockup
detector sit on top of the perf event subsystem, which is run every
60 seconds or so to see if there are any lockups.
To detect hardlockups, cpus not responding to interrupts, I
implemented an hrtimer that runs 5 times for every perf event
overflow event. If that stops counting on a cpu, then the cpu is
most likely in trouble.
To detect softlockups, tasks not yielding to the scheduler, I used the
previous kthread idea that now gets kicked every time the hrtimer fires.
If the kthread isn't being scheduled neither is anyone else and the
warning is printed to the console.
I tested this on x86_64 and both the softlockup and hardlockup paths
work.
V2:
- cleaned up the Kconfig and softlockup combination
- surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
- seperated out the softlockup case from perf event subsystem
- re-arranged the enabling/disabling nmi watchdog from proc space
- added cpumasks for hardlockup failure cases
- removed fallback to soft events if no PMU exists for hard events
V3:
- comment cleanups
- drop support for older softlockup code
- per_cpu cleanups
- completely remove software clock base hardlockup detector
- use per_cpu masking on hard/soft lockup detection
- #ifdef cleanups
- rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
- documentation additions
V4:
- documentation fixes
- convert per_cpu to __get_cpu_var
- powerpc compile fixes
V5:
- split apart warn flags for hard and soft lockups
TODO:
- figure out how to make an arch-agnostic clock2cycles call
(if possible) to feed into perf events as a sample period
[fweisbec: merged conflict patch]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Ingo provided me a config that fails to compile with:
arch/x86/built-in.o: In function
`arch_trigger_all_cpu_backtrace': (.text+0x17e78): undefined
reference to `apic' make: *** [.tmp_vmlinux1] Error 1
I realized I changed the compile behaviour of the nmi code by
not wrapping it with CONFIG_LOCAL_APIC. To fix this I add a
compile check for ARCH_HAS_NMI_WATCHDOG around
arch_trigger_all_cpu_backtrace.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: a.p.zijlstra@chello.nl
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
LKML-Reference: <1266548212-24243-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The original patch was x86_64 centric. Changed the code to make
it less so.
ested by building and running on a powerpc.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
LKML-Reference: <1266013161-31197-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a new generic nmi_watchdog implementation using the perf
events infrastructure as suggested by Ingo.
The implementation is simple, just create an in-kernel perf
event and register an overflow handler to check for cpu lockups.
I created a generic implementation that lives in kernel/ and
the hardware specific part that for now lives in arch/x86.
This approach has a number of advantages:
- It simplifies the x86 PMU implementation in the long run,
in that it removes the hardcoded low-level PMU implementation
that was the NMI watchdog before.
- It allows new NMI watchdog features to be added in a central
place.
- It allows other architectures to enable the NMI watchdog,
as long as they have perf events (that provide NMIs)
implemented.
- It also allows for more graceful co-existence of existing
perf events apps and the NMI watchdog - before these changes
the relationship was exclusive. (The NMI watchdog will 'spend'
a perf event when enabled. In later iterations we might be
able to piggyback from an existing NMI event without having
to allocate a hardware event for the NMI watchdog - turning
this into a no-hardware-cost feature.)
As for compatibility, we'll keep the old NMI watchdog code as
well until the new one can 100% replace it on all CPUs, old and
new alike. That might take some time as the NMI watchdog has
been ported to many CPU models.
I have done light testing to make sure the framework works
correctly and it does.
v2: Set the correct timeout values based on the old nmi
watchdog
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: peterz@infradead.org
LKML-Reference: <1265424425-31562-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>