PnP BIOS data, code, and 32-bit entry segments all have fixed limits as well;
set them in the GDT rather than adding more code. It would be nice to add
these fixups to the boot GDT rather than setting the GDT for each CPU; perhaps
I can wiggle this in later, but getting it in before the subsys init looks
tricky.
Also, make some progress on deprecating the ugly Q_SET_SEL macros.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The one remaining caller of set_limit, the PnP BIOS code, calls into the PnP
BIOS, passing kernel parameters in and out. These parameteres may be passed
from arbitrary kernel virtual memory, so they deserve strict protection to
stop a bad BIOS from smashing beyond the object size.
Unfortunately, the use of set_limit was badly botching this by setting the
limit in terms of pages, when it really should have byte granularity.
When doing this, I discovered my BIOS had the buggy code during the "get
system device node" call:
mov ax, es:[bx]
Which is harmless, but has a trivial workaround.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since APM BIOS segment limits are now fixed, set them in head.S GDT and don't
use the complicated _set_limit() macro expansion.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Acked-by: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
APM BIOSes have many bugs regarding proper representation of the appropriate
segment limits for calling the BIOS. By default, APM_RELAX_SEGMENTS is always
turned on to support running the APM BIOS on these buggy machines. Keeping
64k limits poses very little danger to the kernel, because the pages where the
APM BIOS is located will always be in low physical memory BIOS areas, which
should already be marked reserved, and only buggy BIOSes would possibly
overstep the segment bounds with writes to data anyway.
Since forcing stricter limits breaks many machines and is not default
behavior, it seems reasonable to deprecate the older code which may cause APM
BIOS to fault.
If you really have a badly enough broken APM BIOS that you have to turn off
APM_RELAX_SEGMENTS, seems like the best recourse here would be to disable the
APM BIOS and / or not compile it into your kernel to begin with, and / or add
your system to the known bad list.
The reason I want to deprecate this code is there is underlying brokenness
with the set_limit macros, and getting rid of many of the call sites rather
than rewriting them seems to be the simplest and most correct course of
action.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Acked-by: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
So some 486 processors do have CR4 register. Allow them to present it in
register dumps by using the old fault technique rather than testing processor
family.
Thanks to Maciej for noticing this.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Other than apparently commonly assumed, the bound instruction does not
require the corresponding IDT entry to have DPL 3.
Acked-by: "Seth, Rohit" <rohit.seth@intel.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move some code unrelated to any dealing with hardware bugs from i386's
bugs.h to a more logical place.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rather than blindly re-enabling interrupts in die(), save their state
upon entry and then restore that state.
If the kernel is in really bad condition and faults with interrupts disabled,
re-enabling them in die() may cause even more trouble, implying more chances
of data corruption.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make GDT page aligned and page padded to support running inside of a
hypervisor. This prevents false sharing of the GDT page with other hot
data, which is not allowed in Xen, and causes performance problems in
VMware.
Rather than go back to the old method of statically allocating the GDT
(which wastes unneded space for non-present CPUs), the GDT for APs is
allocated dynamically.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Setting RF (resume flag) allows a debugger to resume execution after a
code breakpoint without tripping the breakpoint again. It is reset by
the CPU after execution of one instruction.
Requested by Stephane Eranian:
"I am trying to the user HW debug registers on i386 and I am running
into a problem with ptrace() not allowing access to EFLAGS_RF for
POKEUSER (see FLAG_MASK). [ ... ] It avoids the need to remove the
breakpoint, single step, and reinstall. The equivalent functionality
exists on IA-64 and is allowed by ptrace()"
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
According to the manual, INT 6 is "invalid opcode", not "invalid operand".
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now needs to include the type 1 functions ("direct") too.
Reported by Pavel Roskin <proski@gnu.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use correct address when referencing mmconfig aperture while checking
for broken MCFG. This was a typo when porting the code from 64bit to
32bit. It caused oopses at boot on some ThinkPads.
Should definitely go into 2.6.15.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
They report all busses as MMCONFIG capable, but it never works for the
internal devices in the CPU's builtin northbridge.
It just probes all func 0 devices on bus 0 (the internal northbridge is
currently always on bus 0) and if they are not accessible using MCFG they are
put into a special fallback bitmap.
On systems where it isn't we assume the BIOS vendor supplied correct MCFG.
Requires the earlier patch for mmconfig type1 fallback
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When there is no entry for a bus in MCFG fall back to type1. This is
especially important on K8 systems where always some devices can't be accessed
using mmconfig (in particular the builtin northbridge doesn't support it for
its own devices)
Cc: <gregkh@suse.de>
Cc: <jgarzik@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's illegal because it can sleep.
Use a two step lookup scheme instead. First look up the vm_struct, then
change the direct mapping, then finally unmap it. That's ok because nobody
can change the particular virtual address range as long as the vm_struct is
still in the global list.
Also added some LinuxDoc documentation to iounmap.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Disabling LAPIC timer isn't sufficient. In some situations, such as we
enabled NMI watchdog, there is still unexpected interrupt (such as NMI)
invoked in offline CPU. This also avoids offline CPU receives spurious
interrupt and anything similar.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.
The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CPU hotplug enabled, NMI watchdog stoped working. It appears the
violation is the cpu_online check in nmi handler. local ACPI based NMI
watchdog is initialized before we set CPU online for APs. It's quite
possible a NMI is fired before we set CPU online, and that's what happens
here.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?
Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.
On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.
On ia64:
It is always boot time frequency of a particular CPU that gets displayed.
The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.
Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch moves away PMBASE reading and only performs it at
cpufreq_register_driver time by exiting with -ENODEV if unable to read
the value.
Signed-off-by: Mattia Dongili <malattia@linux.it>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
The attached patch introduces runtime latency measurement for ICH[234]
based chipsets instead of using CPUFREQ_ETERNAL. It includes
some sanity checks in case the measured value is out of range and
assigns a safe value of 500uSec that should still be enough on
problematics chipsets (current testing report values ~200uSec). The
measurement is currently done in speedstep_get_freqs in order to avoid
further unnecessary transitions and in the hope it'll come handy for SMI
also.
Signed-off-by: Mattia Dongili <malattia@linux.it>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
speedstep-ich.c | 4 ++--
speedstep-lib.c | 32 +++++++++++++++++++++++++++++++-
speedstep-lib.h | 1 +
speedstep-smi.c | 1 +
4 files changed, 35 insertions(+), 3 deletions(-)
If a user has booted with 'quiet', some important messages don't
get displayed which really should. We've seen at least one case
where powernow-k8 stopped working, and the user needed a BIOS update
that they didn't know about.
Signed-off-by: Dave Jones <davej@redhat.com>
Thanks to LinuxICC (http://linuxicc.sf.net), a comparison of a u32 less
than 0 was found, this patch changes the variable to a signed int so that
comparison is meaningful.
Signed-off-by: Gabriel A. Devenyi <ace@staticwave.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Setting irq affinity stops working when MSI is enabled. With MSI, move_irq
is empty, so we can't change irq affinity. It appears a typo in Ashok's
original commit for this issue. X86_64 actually is using move_native_irq.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anne NICOLAS <anne.nicolas@mandriva.com> and Andres Kaaber
<andres.kaaber@rescue.ee> reported their HP laptop didn't reboot smoothly.
Signed-off-by: Thierry Vignaud <tvignaud@mandriva.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Future versions of the Opteron processor may support
frequency transitions of 100 MHz, instead of the=20
current 200 MHz. This patch enables the powernow-k8
driver to transition to an odd FID code, indicating
a multiple of 100 MHz frequency.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The DBG() call where updated with the appropriate KERN_* symbol.
Signed-off-by: Daniel Marjamäki <daniel.marjamaki@comhem.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When attempting to hotadd a PCI card with a bridge on it, I saw
the kernel reporting resource collision errors even when there were
really no collisions. The problem is that the code doesn't skip
over "invalid" resources with their resource type flag not set.
Others have reported similar problems at boot time and for
non-bridge PCI card hotplug too, where the code flags a
resource collision for disabled ROMs. This patch fixes both
problems.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modified common.c so it's using the appropriate KERN_* in printk() calls.
Signed-off-by: Daniel Marjamäkia <daniel.marjamaki@comhem.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug in kprobes that can cause an Oops or even a crash when a return
probe is installed on one of the following functions: sys_execve,
do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to
remove the call to kprobe_flush_task() in flush_thread(). This fix has
been tested on all architectures for which the return-probes feature has
been implemented (i386, x86_64, ppc64, ia64). Please apply.
BACKGROUND
Up to now, we have called kprobe_flush_task() under two situations: when a
task exits, and when it execs. Flushing kretprobe_instances on exit is
correct because (a) do_exit() doesn't return, and (b) one or more
return-probed functions may be active when a task calls do_exit(). Neither
is the case for sys_execve() and its callees.
Initially, the mistaken call to kprobe_flush_task() on exec was harmless
because we put the "real" return address of each active probed function
back in the stack, just to be safe, when we recycled its
kretprobe_instance. When support for ppc64 and ia64 was added, this safety
measure couldn't be employed, and was eventually dropped even for i386 and
x86_64. sys_execve() and its callees were informally blacklisted for
return probes until this fix was developed.
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch cleans up some error messages in the
powernow-k8 driver and makes them more understandable.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Needed to make the earlier use disabled CPUs for CPU hotplug patch
actually work.
Need to register disabled processors as well, so we can count them
towards cpu_possible_map as hot pluggable cpus.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bigsmp is reported to work on large Opteron systems on 32bit too.
Enable it by default there.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A patch by Eric was merged (f2b36db692)
and later on reverted back (1e4c85f97f).
Along with above patch, another patch was posted and has been merged
(3d1675b41b). That patch was dependent on
the above patch and now it should also be reverted.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the Intel cache detection code assumption that number of threads
sharing the cache will either be equal to number of HT or core siblings.
This also cleans up the code in general a bit.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fields obtained through cpuid vector 0x1(ebx[16:23]) and
vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not
always be the same as what is available and what OS sees. So make sure
"siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen
by OS instead of what cpuid instruction says. This will also fix the buggy BIOS
cases (for example where cpuid on a single core cpu says there are "2" siblings,
even when HT is disabled in the BIOS.
http://bugzilla.kernel.org/show_bug.cgi?id=4359)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
They report 40bit, but only have 36bits of physical address space.
This caused problems with setting up the correct masks for MTRR.
CPUID workaround for steppings 0F33h(supporting x86) and 0F34h(supporting x86
and EM64T). Detail info can be found at:
http://download.intel.com/design/Xeon/specupdt/30240216.pdfhttp://download.intel.com/design/Pentium4/specupdt/30235221.pdf
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should zap the low mappings, as soon as possible, so that we can catch
kernel bugs more effectively. Previously early boot had NULL mapped
and didn't trap on NULL references.
This patch introduces boot_level4_pgt, which will always have low identity
addresses mapped. Druing boot, all the processors will use this as their
level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter C
code and zap the low mappings as soon as we are done with the usage of
identity low mapped addresses. On AP's we will zap the low mappings as
soon as we jump to C code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
According to cpuid instruction in IA32 SDM-Vol2, when computing cpu model,
we need to consider extended model ID for family 0x6 also.
AK: Also added fixes/simplifcation from Petr Vandrovec
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here's a patch that builds on Natalie Protasevich's IRQ compression
patch and tries to work for MPS boots as well as ACPI. It is meant for
a 4-node IBM x460 NUMA box, which was dying because it had interrupt
pins with GSI numbers > NR_IRQS and thus overflowed irq_desc.
The problem is that this system has 270 GSIs (which are 1:1 mapped with
I/O APIC RTEs) and an 8-node box would have 540. This is much bigger
than NR_IRQS (224 for both i386 and x86_64). Also, there aren't enough
vectors to go around. There are about 190 usable vectors, not counting
the reserved ones and the unused vectors at 0x20 to 0x2F. So, my patch
attempts to compress the GSI range and share vectors by sharing IRQs.
Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>