Move initialization of all memory end variables to as early as
possible, so that dependent code doesn't need to check whether these
variables have already been set.
Change the range check in kunmap_atomic to actually make use of this
so that the no-mapping-estabished path (under CONFIG_DEBUG_HIGHMEM)
gets used only when the address is inside the lowmem area (and BUG()
otherwise).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Remove some unlinuxy ways to write function parameter definitions.
Remove some stray "return;"s
No functional change.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions
In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.
Similar to earlier x86-64 patch.
Includes a fix by Jiri Slaby for a mistake that broke resume
Signed-off-by: Andi Kleen <ak@suse.de>
- Move them to a pure assembly file. Previously they were in
a C file that only consisted of inline assembly. Doing it in pure
assembler is much nicer.
- Add a frame.i include with FRAME/ENDFRAME macros to easily
add frame pointers to assembly functions
- Add dwarf2 annotation to them so that the new dwarf2 unwinder
doesn't get stuck on them
- Random cleanups
Includes feedback from Jan Beulich and a UML build fix from Andrew
Morton.
Cc: jbeulich@novell.com
Cc: jdike@addtoit.com
Signed-off-by: Andi Kleen <ak@suse.de>
This ports the algorithm from x86-64 (with improvements) to i386.
Previously this only worked for frame pointer enabled kernels.
But spinlocks have a very simple stack frame that can be manually
analyzed. Do this.
Signed-off-by: Andi Kleen <ak@suse.de>
For NUMA optimization and some other algorithms it is useful to have a fast
to get the current CPU and node numbers in user space.
x86-64 added a fast way to do this in a vsyscall. This adds a generic
syscall for other architectures to make it a generic portable facility.
I expect some of them will also implement it as a faster vsyscall.
The cache is an optimization for the x86-64 vsyscall optimization. Since
what the syscall returns is an approximation anyways and user space
often wants very fast results it can be cached for some time. The norma
methods to get this information in user space are relatively slow
The vsyscall is in a better position to manage the cache because it has direct
access to a fast time stamp (jiffies). For the generic syscall optimization
it doesn't help much, but enforce a valid argument to keep programs
portable
I only added an i386 syscall entry for now. Other architectures can follow
as needed.
AK: Also added some cleanups from Andrew Morton
Signed-off-by: Andi Kleen <ak@suse.de>
AK: This redoes the changes I temporarily reverted.
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
I've had good experiences with having this on by default on x86-64.
It turns nasty hangs into easier to debug oopses.
Enable the local APIC wdog by default for systems newer than 2004.
This comes from a strange compromise: according to arjan the reason
it was off by default was some old IBM systems that corrupted
registered when NMI happened in SMI. Can't remember more specific,
but >= 2004 should avoid these. It's probably overly broad
because most older systems should be ok (and the really old systems
won't be supported by the local apic watchdog anyways)
Signed-off-by: Andi Kleen <ak@suse.de>
After a crash we should wait for NMI IPI event and not for external NMI or
NMI watchdog tick.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
method. APs should follow CPU hotplug code path.
And:
+From: Don Zickus <dzickus@redhat.com>
Makes the start/stop paths of nmi watchdog more robust to handle the
suspend/resume cases more gracefully.
AK: I merged the two patches together
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Clean up some of the output messages on the nmi error paths to make more
sense when they are displayed. This is mainly a cosmetic fix and
shouldn't impact any normal code path.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
To quote Alan Cox:
The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.
A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.
This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.
By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system. By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
watchdog.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
they are no longer needed. The various subsystems are modified to register
with the die_notifier instead.
Also includes compile fixes by Andrew Morton.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch cleans up the NMI interrupt path. Instead of being gated by if
the 'nmi callback' is set, the interrupt handler now calls everyone who is
registered on the die_chain and additionally checks the nmi watchdog,
reseting it if enabled. This allows more subsystems to hook into the NMI if
they need to (without being block by set_nmi_callback).
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch includes the changes to make the nmi watchdog on i386 SMP aware.
A bunch of code was moved around to make it simpler to read. In addition,
it is now possible to determine if a particular NMI was the result of the
watchdog or not. This feature allows the kernel to filter out unknown NMIs
easier.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds basic infrastructure to allow subsystems to reserve performance
counters on the x86 chips. Only UP kernels are supported in this patch to
make reviewing easier. The SMP portion makes a lot more changes.
Think of this as a locking mechanism where each bit represents a different
counter. In addition, each subsystem should also reserve an appropriate
event selection register that will correspond to the performance counter it
will be using (this is mainly neccessary for the Pentium 4 chips as they
break the 1:1 relationship to performance counters).
This will help prevent subsystems like oprofile from interfering with the
nmi watchdog.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There are some machines around (large xSeries or Unisys ES7000) that
need physical IO-APIC destination mode to access all of their IO
devices. This currently doesn't work in UP kernels as used in
distribution installers.
This patch allows to compile even UP kernels as GENERICARCH which
allows to use physical or clustered APIC mode.
Signed-off-by: Andi Kleen <ak@suse.de>
If there is only 1 node in the system cpus should think they are apart of
some other node.
If cases where a real numa system boots the Flat numa option make sure the
cpus don't claim to be apart on a non-existent node.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata
[CPUFREQ] Fix some more CPU hotplug locking.
[CPUFREQ] Workaround for BIOS bug in software coordination of frequency
[CPUFREQ] Longhaul - Add voltage scaling to driver
[CPUFREQ] Fix sparse warning in ondemand
[CPUFREQ] make drivers/cpufreq/cpufreq_ondemand.c:powersave_bias_target() static
[CPUFREQ] Longhaul - Add ignore_latency option
[CPUFREQ] Longhaul - Disable arbiter
[CPUFREQ][2/2] ondemand: updated add powersave_bias tunable
[CPUFREQ][1/2] ondemand: updated tune for hardware coordination
[CPUFREQ] Fix typo.
sw_any_bug_dmi_table can be used on resume, so it isn't initdata.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This reverts commits 11012d419c and
40dd2d20f2, which allowed us to use the
MMIO accesses for PCI config cycles even without the area being marked
reserved in the e820 memory tables.
Those changes were needed for EFI-environment Intel macs, but broke some
newer Intel 965 boards, so for now it's better to revert to our old
2.6.17 behaviour and at least avoid introducing any new breakage.
Andi Kleen has a set of patches that work with both EFI and the broken
Intel 965 boards, which will be applied once they get wider testing.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(And reset it on new thread creation)
It turns out that eflags is important to save and restore not just
because of iopl, but due to the magic bits like the NT bit, which we
don't want leaking between different threads.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Take default arch/*/kernel/audit.c to lib/, have those with special
needs (== biarch) define AUDIT_ARCH in their Kconfig.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some buggy BIOSes do a "software any" kind of coordination without telling
about it to OS. So, when OS sets frequency on one CPU on these platforms,
it will also impact all the other logical CPUs that are in the same power
domain. Attached patch is a workaround for those buggy BIOSes.
Patch should be a noop on the normal non-buggy platforms.
Applies over previously sent acpi-cpufreq and software coordination
bug fix patch
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Rename option "dont_scale_voltage" to "scale_voltage" because
don't will be default.
Use "pos" for calculating voltage. In this way driver don't need
to know mV value or low level value. Simply min U is one pos and
max U is second pos. All pos between these two are used.
Assume that min U is for min f and max U for max f. For frequency
between min and max calculate pos based on difference between
current frequency and min f.
Values in mobile VRM table changed to values from
C3-M datasheet.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Apparently some systems export valid HPET addresses, but hpet_enable()
fails. Then when the HPET clocksource starts up, it only checks for a
valid HPET address, and the result is a system where time does not advance.
See http://bugme.osdl.org/show_bug.cgi?id=7062 for details.
This patch just makes sure we better check that the HPET is functional
before registering the HPET clocksource.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was a bogus hunk from the genirq merge that essentially
broke stack switching for hard interrupts. Remove it since it isn't
needed.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The unwinder fallback logic still had potential for falling through to
the legacy stack trace code without printing an indication (at once
serving as a separator) of this.
Further, the stack pointer retrieval for the fallback should be as
restrictive as possible (in order to avoid having the legacy stack
tracer try to access invalid memory). The patch tightens that, but
this could certainly be further improved.
Also making the call_trace command line option now conditional upon
CONFIG_STACK_UNWIND (as it's meaningless otherwise).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One open question: Should this added push perhaps be made conditional
upon CONFIG_STACK_UNWIND or CONFIG_UNWIND_INFO?
[AK: not needed, these are all very slow paths]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The check for the MCFG table being reserved in the e820 map was originally
added to detect a broken BIOS in a preproduction Intel SDV. However it also
breaks the Apple x86 Macs, which can't supply this properly, but need
a working MCFG. With this patch they wouldn't use the MCFG and not work.
After some discussion I think it's best to remove the heuristic again.
It also failed on some other boxes (although it didn't cause much
problems there because old style port access for PCI config space
still works as fallback), but the preproduction SDVs can just use
pci=nommcfg. Supporting production machines properly is more
important.
Edgar Hucek did all the debugging work.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ignore the return value of early_init_acpi(), as it can give false error
messages. If there is something really wrong, then register_driver will
fail cleanly with EINVAL later.
[ background: modprobe acpi-cpufreq on systems not capable of speed-scaling
started failing with 'invalid argument', where previously it would only
ever -ENODEV
I'm not 100% happy with the solution. It'd be better to handle
failure properly, but this is a low-impact change for 2.6.18
We can always revisit doing this better in .19 --davej.]
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ACPI 3.0 appended a variable length UID string to the LAPIC structure
as part of support for > 256 processors. So the BAD_MADT_ENTRY() sanity
check can no longer compare for equality with a fixed structure length.
Signed-off-by: Alexey Y Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
A BIOS has been found that resumes from S3 to the routine that invoked suspend,
ignoring the resume vector. This appears to the OS as a failed S3 attempt.
This same system suspend/resume's properly with Windows.
It is possible to invoke the protected mode register restore routine (which
would normally restore the sysenter registers) when the BIOS returns from
S3. This has no effect on a correctly running system and repairs the
damage from the deviant BIOS.
Signed-off-by: William Morrow <william.morrow@amd.com>
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path. However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception. On his suggestion, this
patch changes the message to simply "Fatal exception". A suitable oops
message should already have been displayed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some laptops with VIA C3 processor, CLE266 chipset and
AMI BIOS have incorrect latency values in FADT table. These
laptops seems to be C3 capable, but latency values are to
big: 101 for C2 and 1017 for C3. This option will allow
user to skip C3 latency test but not C3 address test. AMI
BIOS is setting C3 address to correct value in DSDT table.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
ACPI C3 works for "Powersaver" processors, so use it only for them.
Older CPU will change frequency on "halt" only. But we can protect transition
in two ways:
- by ACPI PM2 register, there is "bus master arbiter disable" bit.
This isn't tested because VIA mainboards don't have PM2 register,
- by PLE133 PCI/AGP arbiter disable register.
There are two bits in this register. First is "PCI arbiter disable",
second "AGP arbiter disable". This is working on VIA Epia 800 mainboards.
Test on bm_control is more proper because this is true
when PM2 register exist.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Note how any error from acpi_processor_preregister_performance is ignored.
From: bert hubert <bert.hubert@netherlabs.nl>
Signed-off-by: Dave Jones <davej@redhat.com>
This table is only used by Ezra-T CPUs currently, and has values
for some other CPU. Fix them to match the values used by that CPU,
and for now make it clearer by renaming the variable.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This is changing "always true" test to something usefull.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch makes the needlessly global longhaul_walk_callback() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>