David Witbrodt tracked down (and bisected) a hpet bootup hang on his
system to the following problem: a BIOS bug made the hpet device
visible as a generic PCI device. If e820 reserved entries happen to
be registered first in the resource tree [which v2.6.26 started doing],
then the PCI code will reallocate that device's BAR to some other
address - breaking timer IRQs and hanging the system.
( Normally hpet devices are hidden by the BIOS from the OS's PCI
discovery via chipset magic. Sometimes the hpet is not a PCI device
at all. )
Solve this fundamental fragility by making non-PCI platform drivers
insert resources into the resource tree even if it overlaps the e820
reserved entry, to keep the resource manager from updating the BAR.
Also do these checks for the ioapic and mmconfig addresses, and emit
a warning if this happens.
Bisected-by: David Witbrodt <dawitbro@sbcglobal.net>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: David Witbrodt <dawitbro@sbcglobal.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: crash on non-TSC-equipped CPUs
Don't enable the TSC notifier if we *either*:
1. don't have a CPU, or
2. have a CPU with constant TSC.
In either of those cases, the notifier is either damaging (1) or useless(2).
From: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
During CPU hot-remove the sysfs directory created by
threshold_create_bank(), defined in
arch/x86/kernel/cpu/mcheck/mce_amd_64.c, has to be removed before
its parent directory, created by mce_create_device(), defined in
arch/x86/kernel/cpu/mcheck/mce_64.c . Moreover, when the CPU in
question is hotplugged again, obviously the latter has to be created
before the former. At present, the right ordering is not enforced,
because all of these operations are carried out by CPU hotplug
notifiers which are not appropriately ordered with respect to each
other. This leads to serious problems on systems with two or more
multicore AMD CPUs, among other things during suspend and hibernation.
Fix the problem by placing threshold bank CPU hotplug callbacks in
mce_cpu_callback(), so that they are invoked at the right places,
if defined. Additionally, use kobject_del() to remove the sysfs
directory associated with the kobject created by
kobject_create_and_add() in threshold_create_bank(), to prevent the
kernel from crashing during CPU hotplug operations on systems with
two or more multicore AMD CPUs.
This patch fixes bug #11337.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andi Kleen <andi@firstfloor.org>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich wrote:
> Even worse - this would even try to access the MSR on non-AMD CPUs
> (currently probably prevented just by the fact that only AMD ones use
> family values of 0x10 or higher).
This patch adds cpu vendor check to the postcore_initcalls.
Reported-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
improve the debug printout:
- make it actually display something
- print it only once
would be nice to have a WARN_ONCE() facility, to feed such things to
kerneloops.org.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.cpuinit.text+0x3cc4): Section mismatch in reference from the function uv_cpu_init() to the function .init.text:uv_system_init()
The function __cpuinit uv_cpu_init() references
a function __init uv_system_init().
If uv_system_init is only used by uv_cpu_init then
annotate uv_system_init with a matching annotation.
uv_system_init was ment to be called only once, so do it from codepath
(native_smp_prepare_cpus) which is called once, right before activation
of other cpus (smp_init).
Note: old code relied on uv_node_to_blade being initialized to 0,
but it'a not initialized from anywhere.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
while fixing a different bug i moved the call to vmi_init before
early params could be parsed.
This broke the vmi specific commandline parameters.
Fix that, by moving vmi initialization after kernel has got a chance to
parse early parameters.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
early_io{re,un}map() are __init and hence can't be called from __meminit
functions.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While I don't have a hotplug capable system at hand, I think two issues need
fixing:
- pud_phys (in kernel_physical_ampping_init()) would remain uninitialized in
the after_bootmem case
- the locking done just around phys_pmd_{init,update}() would leave out pgd
updates, and it was needlessly covering code portions that do allocations
(perhaps using a more friendly gfp value in alloc_low_page() would then be
possible)
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
usable. Booting with mtrr_show showed us the BIOS-initialized
MTRR settings - which are all wrong.
So the root cause is that the BIOS has not set the mask correctly:
> [ 0.429971] MSR00000200: 00000000d0000000
> [ 0.433305] MSR00000201: 0000000ff0000800
> should be ==> [ 0.433305] MSR00000201: 0000003ff0000800
>
> [ 0.436638] MSR00000202: 00000000e0000000
> [ 0.439971] MSR00000203: 0000000fe0000800
> should be ==> [ 0.439971] MSR00000203: 0000003fe0000800
>
> [ 0.443304] MSR00000204: 0000000000000006
> [ 0.446637] MSR00000205: 0000000c00000800
> should be ==> [ 0.446637] MSR00000205: 0000003c00000800
>
> [ 0.449970] MSR00000206: 0000000400000006
> [ 0.453303] MSR00000207: 0000000fe0000800
> should be ==> [ 0.453303] MSR00000207: 0000003fe0000800
>
> [ 0.456636] MSR00000208: 0000000420000006
> [ 0.459970] MSR00000209: 0000000ff0000800
> should be ==> [ 0.459970] MSR00000209: 0000003ff0000800
So detect this borkage and add the prefix 111.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pentium III and Core Solo/Duo CPUs have an erratum
" Page with PAT set to WC while associated MTRR is UC may consolidate to UC "
which can result in WC setting in PAT to be ineffective. We will disable
PAT on such CPUs, so that we can continue to use MTRR WC setting.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All kernel mappings like ioremap(), etc uses UC_MINUS as the type. /dev/mem
mappings with /dev/mem being opened with O_SYNC however was using UC,
resulting in a conflict with /dev/mem mmap failing. This seems to be
affecting some apps (one being flashrom) which are using O_SYNC and which were
working before.
Switch /dev/mem with O_SYNC also to UC_MINUS.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Big thinko in pat memtype tracking code. reserve_memtype should be called
with physical address and not virtual address.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This section mismatch:
>> Seems to be a section mismatch; init_intel() is __cpuinit while
>> numaq_tsc_disable() is __init. Seems to be introduced in:
>>
>> commit 64898a8bad
>> Author: Yinghai Lu <yhlu.kernel@gmail.com>
>> Date: Sat Jul 19 18:01:16 2008 -0700
>>
>> x86: extend and use x86_quirks to clean up NUMAQ code
>
> Oops, I am wrong about numaq_tsc_disable() being __init. Still, I
> believe that Yinghai might be able to say what's really wrong :-)
Would lead to this crash:
BUG: unable to handle kernel paging request at c08a45f0
IP: [<c08a45f0>] numaq_tsc_disable+0x0/0x40
Fixed by the patch below.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
None of the spinlock API is exported GPL, so there's no reason for
pv_lock_ops to be.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: drago01 <drago01@gmail.com>
WARNING: vmlinux.o(.text+0x180af): Section mismatch in reference from the function leave_uniprocessor() to the function .cpuinit.text:cpu_up()
The function leave_uniprocessor() references
the function __cpuinit cpu_up().
This is often because leave_uniprocessor lacks a __cpuinit
annotation or the annotation of cpu_up is wrong.
leave_uniprocessor calls cpu_up only when CONFIG_HOTPLUG_CPU is set,
so it can be safely annotated as __ref
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Paalanen <pq@iki.fi>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
This also allowed the folding of some if()'s into the WARN()
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix an integer comparison always false warning in the PCI Calgary 64 driver.
A u8 is being compared to something that's 512 by default, resulting in the
following warning:
arch/x86/kernel/pci-calgary_64.c:1285: warning: comparison is always false due to limited range of data type
This was introduced by patch b34e90b8f0.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Vegard Nossum reported oprofile + hibernation problems:
> Now some warnings:
>
> ------------[ cut here ]------------
> WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328 s
> mp_call_function_mask+0x194/0x1a0()
The usual problem: the suspend function when interrupts are
already disabled calls smp_call_function which is not allowed with
interrupt off. But at this point all the other CPUs should be already
down anyways, so it should be enough to just drop that.
This patch should fix that problem at least by fixing cpu hotplug&
suspend support.
[ mingo@elte.hu: fixed 5 coding style errors. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The UV TLB shootdown mechanism needs a system interrupt vector.
Its vector had been hardcoded as 200, but needs to moved to the reserved
system vector range so that it does not collide with some device vector.
This is still temporary until dynamic system IRQ allocation is provided.
But it will be needed when real UV hardware becomes available and runs 2.6.27.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use savesegment and loadsegment consistently in ia32 compat code.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rene Herman reported significant Xorg startup/shutdown slowdown due
to PAT. It turns out that the memtype list has thousands of entries.
Add cached_entry to list add routine, in order to speed up the
lookup for sequential reserve_memtype calls.
Reported-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrix MediaGXm/Cx5530 Unicorn Revision 1.19.3B has stopped
booting starting at v2.6.22.
The reason is this commit:
> commit f25f64ed5b
> Author: Juergen Beisert <juergen@kreuzholzen.de>
> Date: Sun Jul 22 11:12:38 2007 +0200
>
> x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
this commit activated a macro which was dormant before due to (buggy)
macro side-effects.
I've looked through various datasheets and found that the GXm and GXLV
Geode processors don't have an incrementor.
Remove the incrementor setup entirely. As the incrementor value
differs according to clock speed and we would hope that the BIOS
configures it correctly, it is probably the right solution.
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The parenthesis in __iommu_queue_command() are not needed when assigning
into 'target' variable.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This recent patch
commit c3965bd151
Author: Paul Jackson <pj@sgi.com>
Date: Wed May 14 08:15:34 2008 -0700
x86 boot: proper use of ARRAY_SIZE instead of repeated E820MAX constant
caused these new warnings during a normal build:
In file included from linux-2.6/arch/x86/boot/memory.c:17:
linux-2.6/include/linux/log2.h: In function '__ilog2_u32':
linux-2.6/include/linux/log2.h:34: warning: implicit declaration of function 'fls'
linux-2.6/include/linux/log2.h: In function '__ilog2_u64':
linux-2.6/include/linux/log2.h:42: warning: implicit declaration of function 'fls64'
linux-2.6/include/linux/log2.h: In function '__roundup_pow_of_two ':
linux-2.6/include/linux/log2.h:63: warning: implicit declaration of function 'fls_long'
I tried to fix them in log2.h, but it's difficult because the real mode
environment is completely different from a normal kernel environment. Instead
define an own ARRAY_SIZE macro in boot.h, similar to the other private
macros there.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x27032): Section mismatch in reference from the function get_tce_space_from_tar() to the function .init.text:calgary_bus_has_devices()
The function get_tce_space_from_tar() references
the function __init calgary_bus_has_devices().
This is often because get_tce_space_from_tar lacks a __init
annotation or the annotation of calgary_bus_has_devices is wrong.
get_tce_space_from_tar is called only from __init function (calgary_init)
and calls __init function (calgary_bus_has_devices).
So annotate it properly.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take out part of get_local_pda referencing __init function (free_bootmem)
to new (static) function marked as __ref. It's safe to do because free_bootmem
is called before __init sections are dropped.
WARNING: vmlinux.o(.cpuinit.text+0x3cd7): Section mismatch in reference from the function get_local_pda() to the function .init.text:free_bootmem()
The function __cpuinit get_local_pda() references
a function __init free_bootmem().
If free_bootmem is only used by get_local_pda then
annotate free_bootmem with a matching annotation.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/power/cpu_32.c __save_processor_state calls read_cr4()
only a i486 CPU doesn't have the CR4 register. Trying to read it
produces an invalid opcode oops during suspend to disk.
Use the safe rc4 reading op instead. If the value to be written is
zero the write is skipped.
arch/x86/power/hibernate_asm_32.S
done: swapped the use of %eax and %ecx to use jecxz for
the zero test and jump over store to %cr4.
restore_image: s/%ecx/%eax/ to be consistent with done:
In addition to __save_processor_state, acpi_save_state_mem,
efi_call_phys_prelog, and efi_call_phys_epilog had checks added
(acpi restore was in assembly and already had a check for
non-zero). There were other reads and writes of CR4, but MCE and
virtualization shouldn't be executed on a i486 anyway.
Signed-off-by: David Fries <david@fries.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x118f7): Section mismatch in reference from the function construct_ioapic_table() to the function .init.text:MP_bus_info()
The function construct_ioapic_table() references
the function __init MP_bus_info().
This is often because construct_ioapic_table lacks a __init
annotation or the annotation of MP_bus_info is wrong.
construct_ioapic_table is called only from construct_default_ISA_mptable which is __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi()
The function __cpuinit init_amd() references
a function __init check_enable_amd_mmconf_dmi().
If check_enable_amd_mmconf_dmi is only used by init_amd then
annotate check_enable_amd_mmconf_dmi with a matching annotation.
check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1fe7): Section mismatch in reference from the function MP_processor_info() to the variable .init.data:x86_quirks
The function __cpuinit MP_processor_info() references
a variable __initdata x86_quirks.
If x86_quirks is only used by MP_processor_info then
annotate x86_quirks with a matching annotation.
MP_processor_info uses x86_quirks which is __init and is used only from
smp_read_mpc and construct_default_ISA_mptable which are __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x7950): Section mismatch in reference from the function native_calibrate_tsc() to the function .init.text:tsc_read_refs()
The function native_calibrate_tsc() references
the function __init tsc_read_refs().
This is often because native_calibrate_tsc lacks a __init
annotation or the annotation of tsc_read_refs is wrong.
tsc_read_refs is called from native_calibrate_tsc which is not __init
and native_calibrate_tsc cannot be marked __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: add MAP_STACK mmap flag
x86: fix section mismatch warning - spp_getpage()
x86: change init_gdt to update the gdt via write_gdt, rather than a direct write.
x86-64: fix overlap of modules and fixmap areas
x86, geode-mfgpt: check IRQ before using MFGPT as clocksource
x86, acpi: cleanup, temp_stack is used only when CONFIG_SMP is set
x86: fix spin_is_contended()
x86, nmi: clean UP NMI watchdog failure message
x86, NMI: fix watchdog failure message
x86: fix /proc/meminfo DirectMap
x86: fix readb() et al compile error with gcc-3.2.3
arch/x86/Kconfig: clean up, experimental adjustement
x86: invalidate caches before going into suspend
x86, perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds
x86, AMD IOMMU: initialize dma_ops after sysfs registration
x86m AMD IOMMU: cleanup: replace LOW_U32 macro with generic lower_32_bits
x86, AMD IOMMU: initialize device table properly
x86, AMD IOMMU: use status bit instead of memory write-back for completion wait
x86: silence mmconfig printk
x86, msr: fix NULL pointer deref due to msr_open on nonexistent CPUs
...
* 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-2.6:
cpuidle: Make ladder governor honor latency requirements fully
cpuidle: Menu governor fix wrong usage of measured_us
cpuidle: Do not use poll_idle unless user asks for it
x86: Fix ioremap off by one BUG
WARNING: vmlinux.o(.text+0x17a3e): Section mismatch in reference from the function set_pte_vaddr_pud() to the function .init.text:spp_getpage()
The function set_pte_vaddr_pud() references
the function __init spp_getpage().
This is often because set_pte_vaddr_pud lacks a __init
annotation or the annotation of spp_getpage is wrong.
spp_getpage is called from __init (__init_extra_mapping) and
non __init (set_pte_vaddr_pud) functions, so it can't be __init.
Unfortunately it calls alloc_bootmem_pages which is __init,
but does it only when bootmem allocator is available (after_bootmem == 0).
So annotate it accordingly.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
By writing directly, a memory access violation can occur whilst
hotplugging a CPU if the entry was previously marked read-only.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jean Delvare's machine triggered this BUG
acpi_os_map_memory phys ffff0000 size 65535
------------[ cut here ]------------
kernel BUG at arch/x86/mm/pat.c:233!
with ACPI in the backtrace.
Adding some debugging output showed that ACPI calls
acpi_os_map_memory phys ffff0000 size 65535
And ioremap/PAT does this check in 32bit, so addr+size wraps and the BUG
in reserve_memtype() triggers incorrectly.
BUG_ON(start >= end); /* end is exclusive */
But reserve_memtype already uses u64:
int reserve_memtype(u64 start, u64 end,
so the 32bit truncation must happen in the caller. Presumably in ioremap
when it passes this information to reserve_memtype().
This patch does this computation in 64bit.
http://bugzilla.kernel.org/show_bug.cgi?id=11346
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Fix coding style of traps_64.c with improvements suggested by Ingo.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>