Timer overrides are normally disabled on Nvidia board because
they are commonly wrong, except on new ones with HPET support.
Unfortunately there are quite some Asus boards around that
don't have HPET, but need a timer override.
We don't know yet how to handle this transparently,
but at least add a command line option to force the timer override
and let them boot.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
This refactoring actually optimizes the code a little by caching the value
that we think the device is programmed with instead of reading it back from
the hardware. Which simplifies the code a little and should speed things up a
bit.
This patch introduces the concept of a ht_irq_msg and modifies the
architecture read/write routines to update this code.
There is a minor consistency fix here as well as x86_64 forgot to initialize
the htirq as masked.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are two bugs in the kretprobe-booster.
1) It doesn't make room for gs registers.
2) It doesn't change status of the current kprobe. This status will
effect the fault handling.
This patch fixes these bugs and, additionally, saves skipped registers for
compatibility with the original kretprobe.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Currently there is no specific alignment restriction in linker script
and in some cases it can be placed non 4K aligned addresses. This fails
kexec which checks that segment to be loaded is page aligned.
o I guess, it does not harm data segment to be 4K aligned.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the microcode driver is built in (rather than module) there are some,
ehm, interesting effects happening due to the new "call out to userspace"
behavior that is introduced.. and which runs too early. The result is a
boot hang; which is really nasty.
The patch below is a minimally safe patch to fix this regression for 2.6.19
by just not requesting actual microcode updates during early boot. (That
is a good idea in general anyway)
The "real" fix is a lot more complex given the entire cpu hotplug scenario
(during cpu hotplug you normally need to load the microcode as well); but
the interactions for that are just really messy at this point; this fix at
least makes it work and avoids a full detangle of hotplug.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since the "mask" bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.
The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is preparation for fixing the ordering of the accesses that
got broken by the commit cf4c6a2f27 when
factoring out the "common" io apic routing entry accesses.
Move the accessor function (that were only used by io_apic.c) out
of a header file, and use proper memory-mapped accesses rather than
making up our own "volatile" pointers.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
APM BIOS Interface Secification can now be found at
http://www.microsoft.com/whdc/archive/amp_12.mspx
Signed-off-by: Kristian Mueller <Kristian-M@Kristian-M.de>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
efi_memory_present_wrapper() parameter start/end is physical address, but
function memory_present parameter is PFN, this patch converts physical
address to PFN.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan convinced me that it was unnecessary because the assembly stubs do
this already on the stack.
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
The fake return address was being set to __KERNEL_PDA, rather than 0.
Push it earlier while %eax still equals 0.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Andrew Morton <akpm@osdl.org>
Interrupts must be disabled during alternative instruction patching. On
systems with high timer IRQ rates, or when running in an emulator, timing
differences can result in random kernel panics because of running partially
patched instructions. This doesn't yet fix NMIs, which requires extricating
the patch code from the late bug checking and is logically separate (and also
less likely to cause problems).
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce desc->name and eliminate the handle_irq_name() hack. Add
set_irq_chip_and_handler_name() to set the flow type and name at once.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid possible PIT livelock issues seen on SMP systems (and reported by
Andi), by not allowing it as a clocksource on SMP boxes.
However, since the PIT may no longer be present, we have to properly handle
the cases where SMP systems have TSC skew and fall back from the TSC.
Since the PIT isn't there, it would "fall back" to the TSC again. So this
changes the jiffies rating to 1, and the TSC-bad rating value to 0.
Thus you will get the following behavior priority on i386 systems:
tsc [if present & stable]
hpet [if present]
cyclone [if present]
acpi_pm [if present]
pit [if UP]
jiffies
Rather then the current more complicated:
tsc [if present & stable]
hpet [if present]
cyclone [if present]
acpi_pm [if present]
pit [if cpus < 4]
tsc [if present & unstable]
jiffies
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Linux group at Stratus Technologies has come across an issue with SCI
routing under ACPI. We were bitten by this when we made an x86_64 platform
whose BIOS provides an Interrupt Source Override for the SCI itself.
Apparently the override has no effect for the System Control Interrupt, and
this appears to be because of the way the SCI is setup in the ACPI code.
It does not handle the case where busirq != gsi.
The code that sets up the SCI routing assumes that bus irq == global irq.
So there is simply no provision for telling it otherwise. The attached
patch provides this mechanism.
This patch provided by David Bulkow, was tested on an i386 platform, which
does not use the SCI override, and also on an x86_64 platform which does
use an override.
Signed-off-by: David Bulkow <david.bulkow@stratus.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm
Platform firmware exports the support for Native C-state to OS using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm
With Processor Native C-state, we use 'MWAIT' instruction on the processor
to enter different C-states (C1, C2, C3). We won't use the special IO
ports to enter C-state and no SMM mode etc required to enter C-state.
Overall this will mean better C-state support.
One major advantage of using MWAIT for all C-states is, with this and
"treat interrupt as break event" feature of MWAIT, we can now get accurate
timing for the time spent in C1, C2, .. states.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Get rid of warning in the thermal throttling code about not checking
sysfs return values.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the epoll_pwait system call, that extend the event wait mechanism
with the same logic ppoll and pselect do. The definition of epoll_pwait
is:
int epoll_pwait(int epfd, struct epoll_event *events, int maxevents,
int timeout, const sigset_t *sigmask, size_t sigsetsize);
The difference between the vanilla epoll_wait and epoll_pwait is that the
latter allows the caller to specify a signal mask to be set while waiting
for events. Hence epoll_pwait will wait until either one monitored event,
or an unmasked signal happen. If sigmask is NULL, the epoll_pwait system
call will act exactly like epoll_wait. For the POSIX definition of
pselect, information is available here:
http://www.opengroup.org/onlinepubs/009695399/functions/select.html
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
hw_interrupt_type is deprecated in favour of struct irq_chip.
[mingo@elte.hu: do x86_64 too]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns). The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set. As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.
The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing. Two users have successfully booted their
powerpcs with it (one an ibook G4). It has also been boot tested on x86,
x86_64, ppc64 and ia64. Please merge for 2.6.19-rc2.
Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix. Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Which vector an irq is assigned to now varies dynamically and is
not needed outside of io_apic.c. So remove the possibility
of accessing the information outside of io_apic.c and remove
the silly macro that makes looking for users of irq_vector
difficult.
The fact this compiles ensures there aren't any more pieces
of the old CONFIG_PCI_MSI weirdness that I failed to remove.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.infradead.org/~dhowells/irq-2.6:
IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
IRQ: Typedef the IRQ handler function type
IRQ: Typedef the IRQ flow handler function type
Always make sure RIP/EIP is 0 in the registers stored on the top
of the stack of a kernel thread. This makes sure the unwinder code
won't try a fallback but knows the stack has ended.
AK: this patch is a bit mysterious. in theory they should be terminated
anyways, but it seems to fix at least one crash. Anyways double termination
probably doesn't hurt.
Signed-off-by: Andi Kleen <ak@suse.de>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
This moves the declarations for the architecture helpers into
include/linux/htirq.h from the generic include/linux/pci.h. Hopefully this
will make this distinction clearer.
htirq.h is included where it is needed.
The dependency on the msi code is fixed and removed.
The Makefile is tidied up.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turns out msi_ops was simply not enough to abstract the architecture
specific details of msi. So I have moved the resposibility of constructing
the struct irq_chip to the architectures, and have two architecture specific
functions arch_setup_msi_irq, and arch_teardown_msi_irq.
For simple architectures those functions can do all of the work. For
architectures with platform dependencies they can call into the appropriate
platform code.
With this msi.c is finally free of assuming you have an apic, and this
actually takes less code.
The helpers for the architecture specific code are declared in the linux/msi.h
to keep them separate from the msi functions used by drivers in linux/pci.h
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements two functions ht_create_irq and ht_destroy_irq for
use by drivers. Several other functions are implemented as helpers for
arch specific irq_chip handlers.
The driver for the card I tested this on isn't yet ready to be merged.
However this code is and hypertransport irqs are in use in a few other
places in the kernel. Not that any of this will get merged before 2.6.19
Because the ipath-ht400 is slightly out of spec this code will need to be
generalized to work there.
I think all of the powerpc uses are for a plain interrupt controller in a
chipset so support for native hypertransport devices is a little less
interesting.
However I think this is a half way decent model on how to separate arch
specific and generic helper code, and I think this is a functional model of
how to get the architecture dependencies out of the msi code.
[akpm@osdl.org: Kconfig fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After raising the number of irqs the system supports this function is no
longer necessary.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the change in behavior of the irq allocation code when
CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq
== vector.
create_irq is rewritten to first allocate a free irq and then to assign that
irq a vector.
assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an
vector not bound to an irq is removed.
The ioapic vector methods are removed, and everything now works with irqs.
The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI
[akpm@osdl.org: cleanup]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the hardcoded assumption that irq == vector in the msi
composition code, and it allows the msi message composition to setup logical
mode, or lowest priorirty delivery mode as we do for other apic interrupts,
and with the same selection criteria.
Basically this moves the problem of what is in the msi message into the
architecture irq management code where it belongs. Not in a generic layer
that doesn't have enough information to compose msi messages properly.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current implementation of create_irq() is a hack but it is the current
hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on
this hack. Thus we are stuck this hack of assuming irq == vector until the
depencencies in the generic msi code are removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts all the i386 PIC controllers (except VisWS and Voyager,
which I could not test - but which should still work as old-style IRQ layers)
to the new and simpler irq-chip interrupt handling layer.
[akpm@osdl.org: build fix]
[mingo@elte.hu: enable fasteoi handler for i386 level-triggered IO-APIC irqs]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows numaq to properly align cpus to their given node during
boot. Pass logical apicid to apicid_to_node and allow the summit
sub-arch to use physical apicid (hard_smp_processor_id()).
Tested against numaq and summit based systems with no issues.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes a couple of if() BUG(); constructs to
BUG_ON(); so it can be safely optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jesper Juhl reported that testing the software math-emulation by forcing
"no387" doesn't work on modern CPU's.
The reason was two-fold:
- you also need to pass in "nofxsr" to make sure that we not only don't
touch the old i387 legacy hardware, it also needs to disable the
modern XMM/FXSR sequences
- "nofxsr" didn't actually clear the capability bits immediately,
leaving the early boot sequence still using FXSR until we got to
the identify_cpu() stage.
This fixes the "nofxsr" flag to take effect immediately on the boot CPU.
Debugging by Randy Dunlap
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the new kernel_execve function on all architectures that were using
_syscall3() to implement execve.
The implementation uses code from the _syscall3 macros provided in the
unistd.h header file. I don't have cross-compilers for any of these
architectures, so the patch is untested with the exception of i386.
Most architectures can probably implement this in a nicer way in assembly or
by combining it with the sys_execve implementation itself, but this should do
it for now.
[bunk@stusta.de: m68knommu build fix]
[markh@osdl.org: build fix]
[bero@arklinux.org: build fix]
[ralf@linux-mips.org: mips fix]
[schwidefsky@de.ibm.com: s390 fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use. This patch replaces those with a the init_utsname
helper.
Changes: Removed several uses of init_utsname(). Hope I picked all the
right ones in net/ipv4/ipconfig.c. These are now changed to
utsname() (the per-process namespace utsname) in the previous
patch (2/7)
[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.
Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This
avoids all arches having to be updated. Compiles and boots on s390.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.
The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.
[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpumask: ensure that node_to_cpumask() is available to modules for all
supported combinations of architecture and CONFIG_NUMA.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock. This patch moves kfree function out scope of
kretprobe_lock.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During tracking down a PAE compile failure, I found that config.h was being
included in a bunch of places in i386 code. It is no longer necessary, so
drop it.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Re-implement smp_send_nmi_allbutself() so that calls to smp_processor_id
(through send_IPI_allbutself) can be replaced with safe_smp_processor_id
without affecting other parts of the kernel (as suggested by Eric Biederman).
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Substitute "smp_processor_id" with the stack overflow-safe
"safe_smp_processor_id" in the reboot path to the second kernel.
[akpm@osdl.org: build fix]
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a the first of a series of patch-sets aiming at making kdump more
robust against stack overflows.
This patch set does the following:
* Add safe_smp_processor_id function to i386 architecture (this function was
inspired by the x86_64 function of the same name).
* Substitute "smp_processor_id" with the stack overflow-safe
"safe_smp_processor_id" in the reboot path to the second kernel.
This patch:
On the event of a stack overflow critical data that usually resides at the
bottom of the stack is likely to be stomped and, consequently, its use should
be avoided.
In particular, in the i386 and IA64 architectures the macro smp_processor_id
ultimately makes use of the "cpu" member of struct thread_info which resides
at the bottom of the stack. x86_64, on the other hand, is not affected by
this problem because it benefits from the use of the PDA infrastructure.
To circumvent this problem I suggest implementing "safe_smp_processor_id()"
(it already exists in x86_64) for i386 and IA64 and use it as a replacement
for smp_processor_id in the reboot path to the dump capture kernel. This is a
possible implementation for i386.
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.
This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1". This condition is never met so I
suppose it is just a bug. I just remove that condition only instead of
kill the whole "if" block.
[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All on stack DECLARE_COMPLETIONs should be replaced by:
DECLARE_COMPLETION_ONSTACK
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clarify my (Pierre's) position on which GPL versions apply. The patch only
touches the source files where I am the only major author. The people who
have made the minor commits to the files have been contacted and have no
issues with this change.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert i386 apm.c from kernel_thread(), whose export is deprecated, to
kthread API.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The functions efi_call_phys_prelog and efi_call_phys_epilog in
arch/i386/kernel/efi.c wrap the spinlock efi_rt_lock: efi_call_phys_prelog
returns with the lock held, and efi_call_phys_epilog releases the lock
without acquiring it. Add lock annotations to these two functions so that
sparse can check callers for lock pairing, and so that sparse will not
complain about these functions since they intentionally use locks in this
manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the i386 summit subarch apicid_to_node to use node information
provided by the SRAT. It was discussed a little on LKML a few weeks ago
and was seen as an acceptable fix. The current way of obtaining the nodeid
static inline int apicid_to_node(int logical_apicid)
{
return logical_apicid >> 5;
}
is just not correct for all summit systems/bios. Assuming the apicid
matches the Linux node number require a leap of faith that the bios mapped
out the apicids a set way. Modern summit HW (IBM x460) does not layout its
bios in the manner for various reasons and is unable to boot i386 numa.
The best way to get the correct apicid to node information is from the SRAT
table during boot. It lays out what apicid belongs to what node. I use
this information to create a table for use at run time.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid possible deadlock on a BUG() inside down_write(mmap_sem). The deadlock
can only occur if something has gone horridly wrong, because a fault here
shouldn't happen.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IA32 manual says if micorcode update's size is 0, then the size is
default size (2048 bytes). But this doesn't suggest all microcode
update's size should be above 2048 bytes to me. We actually had a
microcode update whose size is 1024 bytes. The patch just removed the
check.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add sysfs support. Currently each CPU has three microcode related
attributes. One is 'version' which shows current ucode version of CPU.
Tools can use the attribute do validation or show CPU ucode status. one is
'reload' which allows manually reloading ucode. Another is
'processor_flags', which exports processor flags, so we can write tools to
check if CPU has latest ucode. Also add suspend/resume and CPU hotplug
support.
[akpm@osdl.org: cleanups, build fix]
[bunk@stusta.de: Kconfig fixes]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using request_firmware to pull ucode from userspace, so we don't need the
application 'microcode_ctl' to assist. We name each ucode file according
to CPU's info as intel-ucode/family-model-stepping. In this way we could
split ucode file as small one. This has a lot of advantages such as
selectively update and validate microcode for specific models, better
manage microcode file, easily write tools for administerators and so on.
with the changes, we should put all intel-ucode/xx-xx-xx microcode files
into the firmware dir (I had a tool to split previous big data file into
small one and later we will release new style data file). The init script
should be changed to just loading the driver without unloading
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up microcode update driver and make it more readable.
[akpm@osdl.org: cleanups]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Please ignore previous message.
This patch is adding support for CPU connected to CLE266
chipset. For older CPU this is only way. For "Powersaver"
processor this way will be used if ACPI C3 isn't supported.
I have tested it. It seems to work exacly like ACPI.
But it is less safe. On CLE266 chipset port 0x22 is
blocking processor access to PCI bus too.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Make the sections proper and get rid of section mismatch warnings.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
Detect the situations in which the time after a resume from disk would be
earlier than the time before the suspend and prevent them from happening on
i386.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The functions prepare_set and post_set in kernel/cpu/mtrr/generic.c wrap
the spinlock set_atomicity_lock: prepare_set returns with the lock held,
and post_set releases the lock without acquiring it. Add lock annotations
to these two functions so that sparse can check callers for lock pairing,
and so that sparse will not complain about these functions since they
intentionally use locks in this manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all references to xtime in i386 and replace them w/
get/set_timeofday(). Requires some ugly and uncertain changes to APM, but
has been lightly tested to work.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Continiung the series of small patches necessary for the perfmon subsystem,
here is a patch that adds support for the smp_call_function_single()
function for i386. It exists for almost all other architectures but i386.
The perfmon subsystem needs it in one case to free some state on a
designated remote CPU.
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch will pack any .note.* section into a PT_NOTE segment in the output
file.
To do this, we tell ld that we need a PT_NOTE segment. This requires us to
start explicitly mapping sections to segments, so we also need to explicitly
create PT_LOAD segments for text and data, and map the sections to them
appropriately. Fortunately, each section will default to its previous
section's segment, so it doesn't take many changes to vmlinux.lds.S.
This only changes i386 for now, but I presume the corresponding changes for
other architectures will be as simple.
This change also adds <linux/elfnote.h>, which defines C and Assembler macros
for actually creating ELF notes.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a boot parameter to reserve high linear address space for hypervisors.
This is necessary to allow dynamically loaded hypervisor modules, which might
not happen until userspace is already running, and also provides a useful tool
to benchmark the performance impact of reduced lowmem address space.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/i386/kernel/reboot.c defines its own struct to describe an ldt entry: it
should use struct Xgt_desc_struct (currently load_ldt is a macro, so doesn't
complain: paravirt patches make it warn).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up module initalization for apm.c. I had started by auditing for
proper return code checks in misc_register, but I found that in the event
of an initalization failure, a proc file and a kernel thread were left
hanging out. this patch properly cleans up those loose ends on any
initalization failure.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show_registers() tries to dump failing code starting 43 bytes before the
offending instruction, but this address can be bad, for example in a device
driver where the failing instruction is less than 43 bytes from the start
of the driver's code. When that happens, try to dump code starting at the
failing instruction instead of printing no code at all.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To prevent the emulated RTC timer from stopping when interrupts are delayed
for too long, disable interrupts around all of the register initialization,
and check that the interrupt handler did not schedule the next interrupt in
the past.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We cannot check MAX_NR_ZONES since it not defined in the preprocessor
anymore.
So remove the check.
The maximum number of zones per node for i386 is 3 since i386 does not
support ZONE_DMA32.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix array initialization in lots of arches
The number of zones may now be reduced from 4 to 2 for many arches. Fix the
array initialization for the zones array for all architectures so that it is
not initializing a fixed number of elements.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address a long standing issue of booting with an initrd on an i386 numa
system. Currently (and always) the numa kva area is mapped into low memory
by finding the end of low memory and moving that mark down (thus creating
space for the kva). The issue with this is that Grub loads initrds into
this similar space so when the kernel check the initrd it finds it outside
max_low_pfn and disables it (it thinks the initrd is not mapped into usable
memory) thus initrd enabled kernels can't boot i386 numa :(
My solution to the problem just converts the numa kva area to use the
bootmem allocator to save it's area (instead of moving the end of low
memory). Using bootmem allows the kva area to be mapped into more diverse
addresses (not just the end of low memory) and enables the kva area to be
mapped below the initrd if present.
I have tested this patch on numaq(no initrd) and summit(initrd) i386 numa
based systems.
[akpm@osdl.org: cleanups]
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add supplemental SSE3 instructions flag, and Direct Cache Access flag.
As described in "Intel Processor idenfication and the CPUID instruction
AP485 Sept 2006"
AK: also added for x86-64
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The counter is exported to /sys that keeps track of the
number of thermal events, such that the user knows how bad the
thermal problem might be (since the logging to syslog and mcelog
is rate limited).
AK: Fixed cpu hotplug locking
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Refactor the event processing (syslog messaging and rate limiting)
into separate file therm_throt.c. This allows consistent reporting
of CPU thermal throttle events.
After ACK'ing the interrupt, if the event is current, the user
(p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit)
the event. If that function returns 1, the user has the option to log
things further (such as to mce_log in x86_64).
AK: minor cleanup
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Some buggy systems can machine check when config space accesses
happen for some non existent devices. i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.
This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.
Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
This is useful on systems with broken PCI bus. Affects various
scans in x86-64 and i386's early ACPI quirk scan.
Cc: gregkh@suse.de
Cc: len.brown@intel.com
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Current gcc generates calls not jumps to noreturn functions. When that happens the
return address can point to the next function, which confuses the unwinder.
This patch works around it by marking asynchronous exception
frames in contrast normal call frames in the unwind information. Then teach
the unwinder to decode this.
For normal call frames the unwinder now subtracts one from the address which avoids
this problem. The standard libgcc unwinder uses the same trick.
It doesn't include adjustment of the printed address (i.e. for the original
example, it'd still be kernel_math_error+0 that gets displayed, but the
unwinder wouldn't get confused anymore.
This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
unfortunately because earlier binutils don't support .cfi_signal_frame
[AK: added automatic detection of the new binutils and wrote description]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
auto_eoi has been set. Keep track of the current status and restore that
on resume. This fixes it for AMD64 and i386.
Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Sometimes, bug reports come in where we've had an oops, and the
only record we have is what the reporter saw on screen shortly
before the system locked up completely. Unfortunatly, syslog
only prints lines beginning with KERN_EMERG to the console, so
some lines get lost.
An example of this can be seen at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203723
Some of this information isn't vital to diagnosis, but some parts
are useful, such as the tainted flag.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add HPET(s) into resource map. This will allow for the HPET(s) to be
visibile within /proc/iomem.
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add early i386 fault handlers with debug information for common faults.
Handles:
divide error
invalid opcode
protection fault
page fault
Also adds code to detect early recursive/multiple faults and halt the
system when they happen (taken from x86_64.)
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
We allow for the fact that the guest kernel may not run in ring 0. This
requires some abstraction in a few places when setting %cs or checking
privilege level (user vs kernel).
This is Chris' [RFC PATCH 15/33] move segment checks to subarch, except rather
than using #define USER_MODE_MASK which depends on a config option, we use
Zach's more flexible approach of assuming ring 3 == userspace. I also used
"get_kernel_rpl()" over "get_kernel_cs()" because I think it reads better in
the code...
1) Remove the hardcoded 3 and introduce #define SEGMENT_RPL_MASK 3 2) Add a
get_kernel_rpl() macro, and don't assume it's zero.
And:
Clean up of patch for letting kernel run other than ring 0:
a. Add some comments about the SEGMENT_IS_*_CODE() macros.
b. Add a USER_RPL macro. (Code was comparing a value to a mask
in some places and to the magic number 3 in other places.)
c. Add macros for table indicator field and use them.
d. Change the entry.S tests for LDT stack segment to use the macros
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Abstract sensitive instructions in assembler code, replacing them with macros
(which currently are #defined to the native versions). We use long names:
assembler is case-insensitive, so if something goes wrong and macros do not
expand, it would assemble anyway.
Resulting object files are exactly the same as before.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
kexec: Avoid overwriting the current pgd (V4, i386)
This patch upgrades the i386-specific kexec code to avoid overwriting the
current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
to start a secondary kernel that dumps the memory of the previous kernel.
The code introduces a new set of page tables. These tables are used to provide
an executable identity mapping without overwriting the current pgd.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
In i386's entry.S, FIX_STACK() needs annotation because it
replaces the stack pointer. And the rest of nmi() needs
annotation in order to compile with these new annotations.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
A kprobe executes IRET early and that could cause NMI recursion and stack
corruption.
Note: This problem was originally spotted and solved by Andi Kleen in the
x86_64 architecture. This patch is an adaption of his patch for i386.
AK: Merged with current code which was a bit different.
AK: Removed printk in nmi handler that shouldn't be there in the first time
AK: Added missing include.
AK: added KPROBES_END
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
A kprobe executes IRET early and that could cause NMI recursion and stack
corruption.
Note: This problem was originally spotted by Andi Kleen. This patch
adds fixes not included in his original patch.
[AK: Jan Beulich originally discovered these classes of bugs]
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Mark i386-specific cpu cache functions as __cpuinit. They are all
only called from arch/i386/common.c:display_cache_info() that already is
marked as __cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Mark i386-specific cpu identification functions as __cpuinit. They are all
only called from arch/i386/common.c:identify_cpu() that already is marked as
__cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Mark i386-specific cpu init functions as __cpuinit. They are all
only called from arch/i386/common.c:identify_cpu() that already is marked as
__cpuinit. This patch also removes the empty function init_umc().
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
The different cpu_dev structures are all used from __cpuinit callers what
I can tell. So mark them as __cpuinitdata instead of __initdata. I am a
little bit unsure about arch/i386/common.c:default_cpu, especially when it
comes to the purpose of this_cpu.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
cpu_dev->c_identify is only called from arch/i386/common.c:identify_cpu(), and
this after generic_identify() already has been called. There is no need to call
this function twice and hook it in c_identify - but I may be wrong, please
double check before applying.
This patch also removes generic_identify() from cpu.h to avoid unnecessary
future nesting.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch enables ACPI based physical CPU hotplug support for x86_64.
Implements acpi_map_lsapic() and acpi_unmap_lsapic() to support physical cpu
hotplug.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
cyrix_identify() should be __init because transmeta_identify() is.
tsc_init() is only called from setup_arch() which is marked as __init.
These two section mismatches have been detected using running modpost on
a vmlinux image compiled with CONFIG_RELOCATABLE=y.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
The implementation comes from Zach's [RFC, PATCH 10/24] i386 Vmi
descriptor changes:
Descriptor and trap table cleanups. Add cleanly written accessors for
IDT and GDT gates so the subarch may override them. Note that this
allows the hypervisor to transparently tweak the DPL of the descriptors
as well as the RPL of segments in those descriptors, with no unnecessary
kernel code modification. It also allows the hypervisor implementation
of the VMI to tweak the gates, allowing for custom exception frames or
extra layers of indirection above the guest fault / IRQ handlers.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
And add proper CFI annotation to it which was previously
impossible. This prevents "stuck" messages by the dwarf2 unwinder
when reaching the top of a kernel stack.
Includes feedback from Jan Beulich
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Fix
linux/arch/i386/kernel/mpparse.c: In function #MP_bus_info#:
linux/arch/i386/kernel/mpparse.c:232: warning: comparison is always false due to limited range of data type
Signed-off-by: Andi Kleen <ak@suse.de>
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.
AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.
+From: Jeremy Fitzhardinge <jeremy@goop.org>
KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.
Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup. Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup). It's
surprising this didn't cause more havok.
The fix is to use .pushsection/.popsection, so this stuff nests
properly. A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.
+From: Chuck Ebbert <76306.1226@compuserve.com>
Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.
Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We have a test that looks for invalid pairings of certain athlon/durons
that weren't designed for SMP, and taint accordingly (with 'S') if we find
such a configuration. However, this test shouldn't fire if there's only
a single CPU present. It's perfectly valid for an SMP kernel to boot on UP
hardware for example.
AK: changed to num_possible_cpus()
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Fix a very dubious piece of code in
arch/i386/kernel/cpu/common.c:cpu_init(). This clears out %fs and
%gs, but clobbers %eax in the process without telling gcc. It turns
out that gcc happens to be not using %eax at that point anyway so it
doesn't matter much, but it looks like a bomb waiting to go off.
This does end up saving an instruction, because gcc wants %eax==0 for
the set_debugreg()s below.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Following x86-64 patches. Reuses code from them in fact.
Convert the standard backtracer to do all output using
callbacks. Use the x86-64 stack tracer implementation
that uses these callbacks to implement the stacktrace interface.
This allows to use the new dwarf2 unwinder for stacktrace
and get better backtraces.
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
is_at_popf() needs to test for the iret instruction as well as
popf. So add that test and rename it to is_setting_trap_flag().
Also change max insn length from 16 to 15 to match reality.
LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Removes code duplication between i386/x86-64.
Not needed anymore in setup.c since early_param cleanup
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
This patch replaces the open-coded early commandline parsing
throughout the i386 boot code with the generic mechanism (already used
by ppc, powerpc, ia64 and s390). The code was inconsistent with
whether it deletes the option from the cmdline or not, meaning some of
these will get passed through the environment into init.
This transformation is mainly mechanical, but there are some notable
parts:
1) Grammar: s/linux never set's it up/linux never sets it up/
2) Remove hacked-in earlyprintk= option scanning. When someone
actually implements CONFIG_EARLY_PRINTK, then they can use
early_param().
[AK: actually it is implemented, but I'm adding the early_param it in the next
x86-64 patch]
3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
4) Various parameters now moved into their appropriate files (thanks Andi).
5) All parse functions which examine arg need to check for NULL,
except one where it has subtle humor value.
AK: readded acpi_sci handling which was completely dropped
AK: moved some more variables into acpi/boot.c
Cc: len.brown@intel.com
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Move initialization of all memory end variables to as early as
possible, so that dependent code doesn't need to check whether these
variables have already been set.
Change the range check in kunmap_atomic to actually make use of this
so that the no-mapping-estabished path (under CONFIG_DEBUG_HIGHMEM)
gets used only when the address is inside the lowmem area (and BUG()
otherwise).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Remove some unlinuxy ways to write function parameter definitions.
Remove some stray "return;"s
No functional change.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions
In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.
Similar to earlier x86-64 patch.
Includes a fix by Jiri Slaby for a mistake that broke resume
Signed-off-by: Andi Kleen <ak@suse.de>
- Move them to a pure assembly file. Previously they were in
a C file that only consisted of inline assembly. Doing it in pure
assembler is much nicer.
- Add a frame.i include with FRAME/ENDFRAME macros to easily
add frame pointers to assembly functions
- Add dwarf2 annotation to them so that the new dwarf2 unwinder
doesn't get stuck on them
- Random cleanups
Includes feedback from Jan Beulich and a UML build fix from Andrew
Morton.
Cc: jbeulich@novell.com
Cc: jdike@addtoit.com
Signed-off-by: Andi Kleen <ak@suse.de>
This ports the algorithm from x86-64 (with improvements) to i386.
Previously this only worked for frame pointer enabled kernels.
But spinlocks have a very simple stack frame that can be manually
analyzed. Do this.
Signed-off-by: Andi Kleen <ak@suse.de>
For NUMA optimization and some other algorithms it is useful to have a fast
to get the current CPU and node numbers in user space.
x86-64 added a fast way to do this in a vsyscall. This adds a generic
syscall for other architectures to make it a generic portable facility.
I expect some of them will also implement it as a faster vsyscall.
The cache is an optimization for the x86-64 vsyscall optimization. Since
what the syscall returns is an approximation anyways and user space
often wants very fast results it can be cached for some time. The norma
methods to get this information in user space are relatively slow
The vsyscall is in a better position to manage the cache because it has direct
access to a fast time stamp (jiffies). For the generic syscall optimization
it doesn't help much, but enforce a valid argument to keep programs
portable
I only added an i386 syscall entry for now. Other architectures can follow
as needed.
AK: Also added some cleanups from Andrew Morton
Signed-off-by: Andi Kleen <ak@suse.de>
AK: This redoes the changes I temporarily reverted.
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
I've had good experiences with having this on by default on x86-64.
It turns nasty hangs into easier to debug oopses.
Enable the local APIC wdog by default for systems newer than 2004.
This comes from a strange compromise: according to arjan the reason
it was off by default was some old IBM systems that corrupted
registered when NMI happened in SMI. Can't remember more specific,
but >= 2004 should avoid these. It's probably overly broad
because most older systems should be ok (and the really old systems
won't be supported by the local apic watchdog anyways)
Signed-off-by: Andi Kleen <ak@suse.de>
After a crash we should wait for NMI IPI event and not for external NMI or
NMI watchdog tick.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
method. APs should follow CPU hotplug code path.
And:
+From: Don Zickus <dzickus@redhat.com>
Makes the start/stop paths of nmi watchdog more robust to handle the
suspend/resume cases more gracefully.
AK: I merged the two patches together
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Clean up some of the output messages on the nmi error paths to make more
sense when they are displayed. This is mainly a cosmetic fix and
shouldn't impact any normal code path.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
To quote Alan Cox:
The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.
A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.
This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.
By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system. By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>