In previous commit I used u32 for u16 register.
This code will work only when ACPI block address is set.
For now it is only for VT8235 and VT8237.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
WARNING: arch/x86_64/kernel/built-in.o(.text+0xace9): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad09): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad38): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: drivers/built-in.o(.text+0x3a680): Section mismatch: reference to .init.text:acpi_map_pxm_to_node (between 'acpi_get_node' and 'acpi_lock_ac_dir')
AK: also marked mtrr_bp_init __init to avoid some more warnings
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Disable CLFLUSH again; it is still broken. Always do WBINVD.
- Always flush in the i386 case, not only when there are deferred pages.
These are both brute-force inefficient fixes, to be improved
next release cycle.
The changes to i386 are a little more extensive than strictly
needed (some dead code added), but it is more similar to the x86-64 version
now and the dead code will be used soon.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now Kprobes cannot write to the write protected kernel text when
DEBUG_RODATA is enabled. Disallow this in Kconfig for now.
Temporary fix for 2.6.22. In .23 add code to temporarily
unprotect it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several reports that VIA bridges don't support DAC and corrupt
data. I don't know if it's fixed, but let's just blacklist
them all for now.
It can be overwritten with iommu=usedac
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog
The culprit seems to be 09198e6850:
[PATCH] i386: Clean up NMI watchdog code
In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup.
Fix interchanged parameters to release_{evntsel,perfctr}_nmi.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When disabled through /proc/sys/kernel/nmi_watchdog, the NMI watchdog uses the
stop() method directly, which does not decrement the activity counter, leading
to a BUG(). Use the wrapper function instead to fix that.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At system boot time, the NMI watchdog no longer reserved its MSRs, allowing
other subsystems to mess with them. Fix that.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix various bits of obviously-busted code which we're not happening to
compile, due to ifdefs.
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a minor fix, but what is currently there is essentially wrong.
In do_page_fault, if the faulting address from user code happens to be
in kernel address space (int *p = (int*)-1; p = 0xbed;) then the
do_page_fault handler will jump over the local_irq_enable with the
goto bad_area_nosemaphore;
But the first line there sees this is user code and goes through the
process of sending a signal to send SIGSEGV to the user task. This whole
time interrupts are disabled and the task can not be preempted by a
higher priority task.
This patch always enables interrupts in the user path of the
bad_area_nosemaphore.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
powernow-k8 really needs to use ACPI to function on SMP systems.
The current Kconfig allows us to build kernels which fail mysteriously
for some users due to us trying to automatically enable this, and
getting it wrong. It's easier to just present this as an option
to the user.
Signed-off-by: Joshua Hoblitt <jhoblitt@ifa.hawaii.edu>
Signed-off-by: Dave Jones <davej@redhat.com>
Current version of "bm status" bit test works as long as
no USB device is in use. When USB device is plugged in ACPI
function in this context is always returning 1. Until reboot.
Direct I/O is working fine even when many USB devices are
connected.
Change bm_timeout value to less annoying. 1000 is still much
more then worst case observed and it is much better when status
bit gets stuck.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Rafael gets this on an SMP box with kernel preemption enabled, during
hibernation and restore (100% of the time):
Enabling non-boot CPUs ...
BUG: using smp_processor_id() in preemptible [00000001] code: bash/4514
caller is mtrr_save_state+0x9/0x40
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the following section mismatch warnings in microcode.c:
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3966): Section mismatch: reference to .exit.text: (between 'microcode_init' and 'parse_maxcpus')
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3992): Section mismatch: reference to .exit.text: (between 'microcode_init' and 'parse_maxcpus')
The warning are caused by a function marked __init that
calls a function marked __exit.
Functions marked __exit may be discarded either during link or run-time
and thus the reference is not good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Force Dell E520 to use the BIOS to shutdown/reboot.
I have at least one report that this patch fixes shutdown/reboot
problems on the Dell E520 platform.
(Andi says: People can always set the boot option. It hardly seems like a
critical issue needing a backport.)
Signed-off-by: Tim Gardner <tim.gardner@ubuntu.com>
Acked-by: Andi Kleen <ak@suse.de>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chuck reports that the recent fix from Andi to oprofile
6c977aad03 introduces a double free. Each
cpu's cpu_msrs is setup to point to cpu 0's, which causes free_msrs to free
cpu 0's pointers for_each_possible_cpu. Rather than copy the pointers, do
a deep copy instead.
[acme@redhat.com: allocate_msrs() was using for_each_online_cpu()]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dave Jones <davej@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang gets:
PCI: Cannot allocate resource region 0 of device 0000:00:04.0
PCI: Error while updating region 0000:00:04.0/0 (a8008000 != fec08000)
Note that the BAR seems to have high address bits hardwired to fec00000.
And device 0000:00:04.0 is
00:04.0 System peripheral: Siemens Nixdorf AG FSC Multiprocessor Interrupt Controller (rev 02)
I'd guess that when we try to reassign this resource, PCI interrupts might
just stop working. This could explain SCSI timeouts and other weird things.
Cc: Wolfgang Erig <Wolfgang.Erig@gmx.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jarek Poplawski noted that boot_cpu_data.x86_cache_size is signed int
and can be < 0 too.
In fact we test for it. Except we assigned it to an unsigned value..
Cc: Jarek Poplawski <jarkao2@o2.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove duplicate multipliers in clock_ratio table. On 1,4GHz
Nehemiah two frequencies are present twice in table. It isn't
fatal, but with voltage scaling enabled each will be set twice.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Longhaul with voltage scaling enabled works great on Ezra
CPU (Longhaul ver. 2). As long as "conservative" governor is
used. Both "ondemand" and "userspace" can change voltage
from min to max at once. Motherboard unfortunatly turns off
when vid difference is big. Longhaul was printing warning
message, but it is not enough. Now driver will have
"conservative" governor built in and will split bigger
changes to smaller ones.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
During recent acpi-cpufreq changes, writing to PERF_CTL msr
changed from RMW of entire 64 bit to RMW of low 32 bit and clearing of
upper 32 bit. Fix it back to do a proper RMW of the MSR.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
It is good idea to wait for PCI bus to become idle before
frequency change. Thanks to ACPI it is possible. It makes
sense only when northbridge support is in use because it is
only case in which we can disable arbiter after check if PCI
bus is busy.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Move one line where it should be. After first transition
Longhaul will skip frequency transition if destination
frequency is already set.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Looks like VT8237 has the same bits which VT8235 has.
Poke registers if it is found.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch is removing southbridge support as separate
kind of support. Instead it is used to make other kinds
of support more stable. Also northbridge and ACPI C3
support both will be used if both are available.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
WARNING: arch/i386/mach-generic/built-in.o(.data+0xc4): Section mismatch: reference to .init.text: (between 'apic_bigsmp' and 'cpu.4905')
This appears to be resolvable by removing all the __init and __initdata
qualifiers from arch/i386/mach-generic/bigsmp.c
Signed-off-by: William Irwin <bill.irwin@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not sufficiently documented that CONFIG_HOTPLUG_CPU is required for
suspend/hibernation on SMP.
Point out the non-obvious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fix:
mm/slab: fix section mismatch warning
mm: fix section mismatch warnings
init/main: use __init_refok to fix section mismatch
kbuild: introduce __init_refok/__initdata_refok to supress section mismatch warnings
all-archs: consolidate .data section definition in asm-generic
all-archs: consolidate .text section definition in asm-generic
kbuild: add "Section mismatch" warning whitelist for powerpc
kbuild: make better section mismatch reports on i386, arm and mips
kbuild: make modpost section warnings clearer
kconfig: search harder for curses library in check-lxdialog.sh
kbuild: include limits.h in sumversion.c for PATH_MAX
powerpc: Fix the MODALIAS generation in modpost for of devices
cr4 is a 32-bit register, so casting the mask to an unsigned char is wrong,
as it clears more than the PGE bit.
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix boot failures with the early CPUID checking on VIA C3
Includes fixes from Christian Volkmann
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- boot/setup.S did not print "PANIC: CPU too old for this kernel"
( not visible, also the message did not match )
- I add "# missed before: set ds"
=> somebody should check if I am right with the way to set.
=> seems to be a generic error in setup.S not to set "ds" for error messages.
AK: extracted patch out of other changes
AK: also couldn't find any other case where ds is wrong
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It reports machine check capability in CPUID, but doesn't actually
implement all the necessary MSRs of the standard Intel machine
check architecture.
This fixes a boot failure on K6s recently introduced.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only try to allocate MSRs once instead of for every CPU.
This assumes the MSRs are the same on all CPUs which is currently
true. P4-HT is a special case for different SMT threads, but the code
always saves/restores all MSRs so it works identical.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Indicate number of processors and cores more cleanly
in startup messages.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This reverts commit c8fdd24725.
It turns out the kernel was correct, and the gcc complaint was a gcc
bug. The preferred stack boundary is expressed not in bytes, but in the
the log2() of the preferred boundary, so "-mpreferred-stack-boundary=2"
is in fact exactly what we want, but a gcc that is compiled for x86-64
will consider it an error (because the 64-bit calling sequence says that
the stack should be 16-byte aligned) even if we are then using "-m32" to
generate 32-bit code.
Noted-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Jan Hubicka <jh@suse.cz>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No other architecture calls check_pgt_cache() from within flush_tlb_mm(),
and i386 is already calling check_pgt_cache() from the usual places,
tlb_finish_mmu() and cpu_idle() (the latter being odd, but not unusual).
flush_tlb_mm() has no business to be freeing pages: remove that line, which
sneaked in with slub's i386 support.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to
.init.text:mtrr_bp_init from .text between 'id entify_cpu' (at offset 0x6571)
and 'IRQ0x20_interrupt'
It's because identify_cpu() which is __cpuinit calls mtrr_bp_init() which is
__init(). __cpuinit() expands to nothing if CONFIG_HOTPLUG_CPU=y and so the
call is illegal.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not necessarily a very sane configuration, but people running "make
randconfig" noticed it wouldn't compile. This fixes some obvious
problems in discontig.c to allow a clean compile.
Acked-by: andrew hendry <andrew.hendry@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Correct revision mask for powernow-k8
[CPUFREQ] powernow-k7: fix MHz rounding issue with perflib
[CPUFREQ] Support rev H AMD64s in powernow-k8
This adds an smp_ops for voyager, and hooks things up appropriately. This is
the first baby-step to making subarch runtime switchable.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several parts of kernel/smp.c and smpboot.c are generally useful for other
subarchitectures and paravirt_ops implementations, so make them available for
reuse.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Langsdorf points out that the correct define for this
revision bump is 0x80000. Also to save us having to keep
renaming the #define, give it a more meaningful name.
Signed-off-by: Dave Jones <davej@redhat.com>
This reverts commit f64da958df.
Andi Kleen is unhappy with the changes, and they really do not seem
worth it. IPMI could use DIE_NMI_IPI instead of the new callback, even
though that ends up having its own set of problems too, mainly because
the IPMI code cannot really know the NMI was from IPMI or not.
Manually fix up conflicts in arch/x86_64/kernel/traps.c and
drivers/char/ipmi/ipmi_watchdog.c.
Cc: Andi Kleen <ak@suse.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Corey Minyard <minyard@acm.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the PST tables are broken, powernow-k7 uses ACPI's processor_perflib to
deduce the available frequency multipliers from the _PSS tables.
Upon frequency change, processor_perflib performs some verification on the
frequency (checks that it's within allowable bounds).
powernow-k7 deals with absolute frequencies in KHz, whereas perflib only
deals with MHz values. When performing the above verification, perflib
multiplies the MHz values by 1000 to obtain the KHz value.
We then end up with situations like the following:
- powernow-k7 multiplies the multiplier by the FSB, and obtains a value
such as 1266768 KHz
- perflib belives the same state has frequency of 1266 MHz
- acpi_processor_ppc_notifier calls cpufreq_verify_within_limits to verify
that 1266768 is in the allowable range of 0 to 1266000 (i.e. 1266 * 1000)
- it's not, so that frequency is rejected
- the maximum CPU frequency is not reachable
This patch solves the problem by rounding up the MHz values stored in perflib's
tables. Additionally it corrects a broken URL.
It also fixes http://bugzilla.kernel.org/show_bug.cgi?id=8255 although this
case is a bit different: the frequencies in the _PSS tables are wildly wrong,
but we get better results if we force ACPI to respect the fsb * multiplier
calculations (even though it seems that the multiplier values aren't entirely
correct either).
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Dave Jones <davej@redhat.com>
SLUB cannot run on i386 at this point because i386 uses the page->private and
page->index field of slab pages for the pgd cache.
Make SLUB run on i386 by replacing the pgd slab cache with a quicklist.
Limit the changes as much as possible. Leave the improvised linked list in place
etc etc. This has been working here for a couple of weeks now.
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch wires the eventfd system call to the x86 architectures.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch wires the timerfd system call to the x86 architectures.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch wires the signalfd system call to the x86 architectures.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit c9ccf30d77.
Entering the kernel at startup_32 without passing our real mode data in
%esi, and without guaranteeing that physical and virtual addresses are
identity mapped makes head.S impossible to maintain.
The only user of this infrastructure is lguest which is not merged so
nothing we currently support will break by removing this over designed
nightmare, and only the pending lguest patches will be affected. The
pending Xen patches have a different entry point that they use.
We are currently discussing what Xen and lguest need to do to boot the
kernel in a more normal fashion so using startup_32 in this weird manner is
clearly not their long term direction.
So let's remove this code in head.S before it causes brain damage to people
trying to maintain head.S
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...
Fix trivial comment conflict in kernel/relay.c.
The definition of USER686 is supposed to be a mask of feature bits,
not an OR of feature numbers! It happened to work anyway on the only
processor affected, simply by pure coincidence. Fix.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace automatic variable instances of __attribute__((unused)) with
__maybe_unused in mca_nmi_hook().
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the new macro here
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently a few direct accesses to the thread_info in the task structure snuck
back, so this wraps them with the appropriate wrapper.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the microcode driver use the suspend-related CPU hotplug notifications
to handle the CPU hotplug events occuring during system-wide suspend and
resume transitions. Remove the global variable suspend_cpu_hotplug
previously used for this purpose.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hard_smp_processor_id used to be just a macro that hard-coded
hard_smp_processor_id to 0 in the non SMP case. When booting non SMP kernels
on hardware where the boot ioapic id is not 0 this turns out to be a problem.
This is happens frequently in the case of kdump and once in a great while in
the case of real hardware.
Use the APIC to determine the hardware processor id in both UP and SMP kernels
to fix this issue.
Notice that hard_smp_processor_id is only used by SMP code or by code that
works with apics so we do not need to handle the case when apics are not
present and hard_smp_processor_id should never be called there.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Fix a couple grammatical errors in arch/i386/Kconfig.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
This reverts commit 464bdd33e9.
Peter Anvin correctly points out that VESA modes have nothing to do with
frame buffers per se - they are often just regular extended text modes.
Disabling them just because we don't have frame buffer support is very
wrong.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Antonino A. Daplas <adaplas@gmail.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (32 commits)
Use menuconfig objects - hwmon
hwmon/smsc47b397: Use dynamic sysfs callbacks
hwmon/smsc47b397: Convert to a platform driver
hwmon/w83781d: Deprecate W83627HF support
hwmon/w83781d: Use dynamic sysfs callbacks
hwmon/w83781d: Be less i2c_client-centric
hwmon/w83781d: Clean up conversion macros
hwmon/w83781d: No longer use i2c-isa
hwmon/ams: Do not print error on systems without apple motion sensor
hwmon/ams: Fix I2C read retry logic
hwmon: New AD7416, AD7417 and AD7418 driver
hwmon/coretemp: Add documentation
hwmon: New coretemp driver
i386: Use functions from library in msr driver
i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu
hwmon/lm75: Use dynamic sysfs callbacks
hwmon/lm78: Use dynamic sysfs callbacks
hwmon/lm78: Be less i2c_client-centric
hwmon/lm78: No longer use i2c-isa
hwmon: New max6650 driver
...
If the option vga=<VESA graphics mode> is added to the boot parameter, it will
activate graphics mode, but without any framebuffer support, the user is left
with an unusable display.
Change the behavior such that the user is instead prompted for another mode
(ala vga=ask).
NOTE: People can always use vbetool to set a graphics mode if this is really
desired, but the number of people doing this approaches zero.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make x86 COM ports into platform devices and don't probe for them
if we have PNP.
This prevents double discovery, where a device was found both by
the legacy probe and by 8250_pnp, e.g.,
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
This also means IRDA devices without a UART PNP ID will no longer be
claimed by the serial driver, which might require changes in IRDA
drivers and administration.
In addition to this patch, you may need to configure a setserial init
script, e.g., /etc/init.d/setserial, so it doesn't poke legacy UART
stuff back in. On Debian, "dpkg-reconfigure setserial" with the "kernel"
option does this.
To force the old legacy probe behavior even when we have PNPBIOS or
ACPI, load the new legacy_serial module (or build 8250 static) with
the "legacy_serial.force" option.
[akpm@linux-foundation.org: fix makefiles]
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Russell King <rmk+serial@arm.linux.org.uk>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides a debugfs knob to turn kprobes on/off
o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
also update the doc to make it current.
We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.
[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- consolidate duplicate code in all arch_prepare_kretprobe instances
into common code
- replace various odd helpers that use hlist_for_each_entry to get
the first elemenet of a list with either a hlist_for_each_entry_save
or an opencoded access to the first element in the caller
- inline add_rp_inst into it's only remaining caller
- use kretprobe_inst_table_head instead of opencoding it
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement utimensat(2) which is an extension to futimesat(2) in that it
a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
of the BSD lutimes(3) functions
For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.
Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.
Also, the completely missing futimensat() functionality is added. We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).
Test application (the syscall number will need per-arch editing):
#include <errno.h>
#include <fcntl.h>
#include <time.h>
#include <sys/time.h>
#include <stddef.h>
#include <syscall.h>
#define __NR_utimensat 280
#define UTIME_NOW ((1l << 30) - 1l)
#define UTIME_OMIT ((1l << 30) - 2l)
int
main(void)
{
int status = 0;
int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
if (fd == -1)
error (1, errno, "failed to create test file \"ttt\"");
struct stat64 st1;
if (fstat64 (fd, &st1) != 0)
error (1, errno, "fstat failed");
struct timespec t[2];
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
struct stat64 st2;
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
t[0] = st1.st_atim;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_OMIT;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("atim not set");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim changed from zero");
status = 1;
}
if (status != 0)
goto out;
t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_OMIT;
t[1] = st1.st_mtim;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("mtim changed from original time");
status = 1;
}
if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
|| st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
{
puts ("mtim not set");
status = 1;
}
if (status != 0)
goto out;
sleep (2);
t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_NOW;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_NOW;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
struct timeval tv;
gettimeofday(&tv,NULL);
if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
|| st2.st_atim.tv_sec > tv.tv_sec)
{
puts ("atim not set to NOW");
status = 1;
}
if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
|| st2.st_mtim.tv_sec > tv.tv_sec)
{
puts ("mtim not set to NOW");
status = 1;
}
if (symlink ("ttt", "tttsym") != 0)
error (1, errno, "cannot create symlink");
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
error (1, errno, "utimensat failed");
if (lstat64 ("tttsym", &st2) != 0)
error (1, errno, "lstat failed");
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("symlink atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("symlink mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
t[0].tv_sec = 1;
t[0].tv_nsec = 0;
t[1].tv_sec = 1;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to one");
status = 1;
}
if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to one");
status = 1;
}
if (status == 0)
puts ("all OK");
out:
close (fd);
unlink ("ttt");
unlink ("tttsym");
return status;
}
[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_declare_coherent_memory() allocates a bitmap 1 bit per page, it
calculates the bitmap size based on size of long, but allocates bytes...
Thanks to James Bottomley for clarifications and corrections.
Signed-off-by: G. Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HZ has not always been 100Hz for some time.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to warn unless the EFI system table major revision was exactly 1.
But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
affect anything in Linux.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In certain cases like when the real return address can't be found or when
the number of tracked calls to a kretprobed function is less than the
number of returns, we may not be able to find the correct return address
after processing a kretprobe. Currently we just do a BUG_ON, but no
information is provided about the actual failing kretprobe.
Print out details of the kretprobe before calling BUG().
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
NMI. This add config options to know if there is the ability to receive NMIs
and if it has an NMI post processing call. Then it modifies the IPMI watchdog
to take advantage of this so that it can know if an NMI comes in.
It also adds testing that the IPMI NMI watchdog works.
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use SLAB_PANIC and delete duplicated panic().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use safe MSR functions provided by arch/*/lib/msr-on-cpu.c in
arch/i386/kernel/msr.c.
Signed-off-by: Nicolas Boichat <nicolas@boichat.ch>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Add safe (exception handled) variants of rdmsr_on_cpu and wrmsr_on_cpu.
You should use these when the target MSR may not actually exist, as
doing so could trigger an exception which the regular functions do not
handle. The safe variants are slower, though.
The upcoming coretemp hardware monitoring driver will need this.
Signed-off-by: Rudolf Marek <r.marek@assembler.cz>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
This is a new slab allocator which was motivated by the complexity of the
existing code in mm/slab.c. It attempts to address a variety of concerns
with the existing implementation.
A. Management of object queues
A particular concern was the complex management of the numerous object
queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
each allocating CPU and use objects from a slab directly instead of
queueing them up.
B. Storage overhead of object queues
SLAB Object queues exist per node, per CPU. The alien cache queue even
has a queue array that contain a queue for each processor on each
node. For very large systems the number of queues and the number of
objects that may be caught in those queues grows exponentially. On our
systems with 1k nodes / processors we have several gigabytes just tied up
for storing references to objects for those queues This does not include
the objects that could be on those queues. One fears that the whole
memory of the machine could one day be consumed by those queues.
C. SLAB meta data overhead
SLAB has overhead at the beginning of each slab. This means that data
cannot be naturally aligned at the beginning of a slab block. SLUB keeps
all meta data in the corresponding page_struct. Objects can be naturally
aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
boundaries and can fit tightly into a 4k page with no bytes left over.
SLAB cannot do this.
D. SLAB has a complex cache reaper
SLUB does not need a cache reaper for UP systems. On SMP systems
the per CPU slab may be pushed back into partial list but that
operation is simple and does not require an iteration over a list
of objects. SLAB expires per CPU, shared and alien object queues
during cache reaping which may cause strange hold offs.
E. SLAB has complex NUMA policy layer support
SLUB pushes NUMA policy handling into the page allocator. This means that
allocation is coarser (SLUB does interleave on a page level) but that
situation was also present before 2.6.13. SLABs application of
policies to individual slab objects allocated in SLAB is
certainly a performance concern due to the frequent references to
memory policies which may lead a sequence of objects to come from
one node after another. SLUB will get a slab full of objects
from one node and then will switch to the next.
F. Reduction of the size of partial slab lists
SLAB has per node partial lists. This means that over time a large
number of partial slabs may accumulate on those lists. These can
only be reused if allocator occur on specific nodes. SLUB has a global
pool of partial slabs and will consume slabs from that pool to
decrease fragmentation.
G. Tunables
SLAB has sophisticated tuning abilities for each slab cache. One can
manipulate the queue sizes in detail. However, filling the queues still
requires the uses of the spin lock to check out slabs. SLUB has a global
parameter (min_slab_order) for tuning. Increasing the minimum slab
order can decrease the locking overhead. The bigger the slab order the
less motions of pages between per CPU and partial lists occur and the
better SLUB will be scaling.
G. Slab merging
We often have slab caches with similar parameters. SLUB detects those
on boot up and merges them into the corresponding general caches. This
leads to more effective memory use. About 50% of all caches can
be eliminated through slab merging. This will also decrease
slab fragmentation because partial allocated slabs can be filled
up again. Slab merging can be switched off by specifying
slub_nomerge on boot up.
Note that merging can expose heretofore unknown bugs in the kernel
because corrupted objects may now be placed differently and corrupt
differing neighboring objects. Enable sanity checks to find those.
H. Diagnostics
The current slab diagnostics are difficult to use and require a
recompilation of the kernel. SLUB contains debugging code that
is always available (but is kept out of the hot code paths).
SLUB diagnostics can be enabled via the "slab_debug" option.
Parameters can be specified to select a single or a group of
slab caches for diagnostics. This means that the system is running
with the usual performance and it is much more likely that
race conditions can be reproduced.
I. Resiliency
If basic sanity checks are on then SLUB is capable of detecting
common error conditions and recover as best as possible to allow the
system to continue.
J. Tracing
Tracing can be enabled via the slab_debug=T,<slabcache> option
during boot. SLUB will then protocol all actions on that slabcache
and dump the object contents on free.
K. On demand DMA cache creation.
Generally DMA caches are not needed. If a kmalloc is used with
__GFP_DMA then just create this single slabcache that is needed.
For systems that have no ZONE_DMA requirement the support is
completely eliminated.
L. Performance increase
Some benchmarks have shown speed improvements on kernbench in the
range of 5-10%. The locking overhead of slub is based on the
underlying base allocation size. If we can reliably allocate
larger order pages then it is possible to increase slub
performance much further. The anti-fragmentation patches may
enable further performance increases.
Tested on:
i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator
SLUB Boot options
slub_nomerge Disable merging of slabs
slub_min_order=x Require a minimum order for slab caches. This
increases the managed chunk size and therefore
reduces meta data and locking overhead.
slub_min_objects=x Mininum objects per slab. Default is 8.
slub_max_order=x Avoid generating slabs larger than order specified.
slub_debug Enable all diagnostics for all caches
slub_debug=<options> Enable selective options for all caches
slub_debug=<o>,<cache> Enable selective options for a certain set of
caches
Available Debug options
F Double Free checking, sanity and resiliency
R Red zoning
P Object / padding poisoning
U Track last free / alloc
T Trace all allocs / frees (only use for individual slabs).
To use SLUB: Apply this patch and then select SLUB as the default slab
allocator.
[hugh@veritas.com: fix an oops-causing locking error]
[akpm@linux-foundation.org: various stupid cleanups and small fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was broken. It adds complexity, for no good reason. Rather than
separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
and preferably __pa() too - and just use "virt_to_phys()" instead, which
is more readable and has nicer semantics.
However, right now, just undo the separation, and make __pa_symbol() be
the exact same as __pa(). That fixes the bugs this patch introduced,
and we can do the fairly obvious cleanups later.
Do the new __phys_addr() function (which is now the actual workhorse for
the unified __pa()/__pa_symbol()) as a real external function, that way
all the potential issues with compile/link-time optimizations of
constant symbol addresses go away, and we can also, if we choose to, add
more sanity-checking of the argument.
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
[PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
[PATCH] i386: type may be unused
[PATCH] i386: Some additional chipset register values validation.
[PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
[PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
[PATCH] i386: white space fixes in i387.h
[PATCH] i386: Drop noisy e820 debugging printks
[PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
[PATCH] x86-64: Share identical video.S between i386 and x86-64
[PATCH] x86-64: Remove CONFIG_REORDER
[PATCH] x86-64: Print type and size correctly for unknown compat ioctls
[PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
[PATCH] i386: Little cleanups in smpboot.c
[PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
[PATCH] i386: Add X86_FEATURE_RDTSCP
[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
[PATCH] i386: Implement alternative_io for i386
...
Fix up trivial conflict in include/linux/highmem.h manually.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6:
[VOYAGER] add smp alternatives
[VOYAGER] Use modern techniques to setup and teardown low identiy mappings.
[VOYAGER] Convert the monitor thread to use the kthread API
[VOYAGER] clockevents driver: bring voyager in to line
[VOYAGER] clockevents: correct boot cpu is zero assumption
[VOYAGER] add smp_call_function_single
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits)
PCI: Free resource files in error path of pci_create_sysfs_dev_files()
pci-quirks: disable MSI on RS400-200 and RS480
PCI hotplug: Use menuconfig objects
PCI: ZT5550 CPCI Hotplug driver fix
PCI: rpaphp: Remove semaphores
PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry
PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically
PCI: rpaphp: Document is_php_dn()
PCI: rpaphp: Document find_php_slot()
PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot()
PCI: rpaphp: refactor tail call to rpaphp_register_slot()
PCI: rpaphp: remove rpaphp_set_attention_status()
PCI: rpaphp: remove print_slot_pci_funcs()
PCI: rpaphp: Remove setup_pci_slot()
PCI: rpaphp: remove a call that does nothing but a pointer lookup
PCI: rpaphp: Remove another wrappered function
PCI: rpaphp: Remve another call that is a wrapper
PCI: rpaphp: remove a function that does nothing but wrap debug printks
PCI: rpaphp: Remove un-needed goto
PCI: rpaphp: Fix a memleak; slot->location string was never freed
...
Add more information to PCI resource collision message
to help with debugging.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.
set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.
The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.
Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allows architectures to advertise that they support MSI rather than listing
each architecture as a PCI_MSI dependency.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At one time, if a BIOS ROM shadow was detected for the boot video
device (stored at offset 0xc0000), we'd set a special resource flag,
IORESOURCE_ROM_SHADOW, so that the sysfs ROM file code could handle
it properly. That broke along the way somewhere though, so current
kernels will be missing 'rom' files in sysfs if the video device
doesn't have an explicit ROM BAR.
This patch fixes the regression by moving the video fixup quirk to a
little later in the boot cycle (to avoid having its work undone by
PCI resource allocation) and checking in the PCI sysfs code whether
a rom file should be created due to a shadow resource, which is also
moved to a little later in the boot cycle so it will occur after the
video fixup. Tested and works on my i386 test box.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.
In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.
My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:
arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c
I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.
Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
[PATCH] scatterlist.h needs types.h
http://lkml.org/lkml/2007/3/01/141
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In arch/i386/cpu/common.c there is:
cpu_devs[X86_VENDOR_INTEL]
cpu_devs[X86_VENDOR_CYRIX]
cpu_devs[X86_VENDOR_AMD]
...
They are all filled with data early.
The data (struct) got set to NULL for all, but Intel in different
late_initcall (exit_cpu_vendor) calls.
I don't see what sense this makes at all, maybe something that got
forgotten with the HOTPLUG_CPU extenstions?
Please check/review whether initdata, cpuinitdata is still ok and this
still works with HOTPLUG_CPU and without, it should...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: davej@redhat.com
In the case of !CONFIG_PCI_DIRECT && !CONFIG_PCI_MMCONFIG, type is
unreferened.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
On i945, a mmconfig range hitting the f0000000-ffffffff zone conflicts
with the APIC registers and others. Consider it invalid.
On E7520, values 0000 and f000 for the window register are defined
invalid in the documentation.
I haven't seen a bios use these values, but who trusts biosen these
days?
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
arch/i386/pci/mmconfig-shared.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)