* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
The name of the pagedir_nosave variable does not make sense any more, so it
seems reasonable to change it to something more meaningful.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On x86_64 machines with more than 2 GB of RAM there are large memory gaps
(with no corresponding kernel virtual addresses) and reserved memory
regions between areas of usable physical RAM. Moreover, if CONFIG_FLATMEM
is set, they appear within the normal zone. swsusp should not try to save
them, so the corresponding page structs have to be marked as 'nosave'.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To prevent the emulated RTC timer from stopping when interrupts are delayed
for too long, disable interrupts around all of the register initialization,
and check that the interrupt handler did not schedule the next interrupt in
the past.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add supplemental SSE3 instructions flag, and Direct Cache Access flag.
As described in "Intel Processor idenfication and the CPUID instruction
AP485 Sept 2006"
AK: also added for x86-64
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The counter is exported to /sys that keeps track of the
number of thermal events, such that the user knows how bad the
thermal problem might be (since the logging to syslog and mcelog
is rate limited).
AK: Fixed cpu hotplug locking
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Refactor the event processing (syslog messaging and rate limiting)
into separate file therm_throt.c. This allows consistent reporting
of CPU thermal throttle events.
After ACK'ing the interrupt, if the event is current, the user
(p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit)
the event. If that function returns 1, the user has the option to log
things further (such as to mce_log in x86_64).
AK: minor cleanup
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Some buggy systems can machine check when config space accesses
happen for some non existent devices. i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.
This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.
Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
This is useful on systems with broken PCI bus. Affects various
scans in x86-64 and i386's early ACPI quirk scan.
Cc: gregkh@suse.de
Cc: len.brown@intel.com
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
SYSENTER can cause a NT to be set which might cause crashes on the IRET
in the next task.
Following similar i386 patch from Linus.
Signed-off-by: Andi Kleen <ak@suse.de>
Current gcc generates calls not jumps to noreturn functions. When that happens the
return address can point to the next function, which confuses the unwinder.
This patch works around it by marking asynchronous exception
frames in contrast normal call frames in the unwind information. Then teach
the unwinder to decode this.
For normal call frames the unwinder now subtracts one from the address which avoids
this problem. The standard libgcc unwinder uses the same trick.
It doesn't include adjustment of the printed address (i.e. for the original
example, it'd still be kernel_math_error+0 that gets displayed, but the
unwinder wouldn't get confused anymore.
This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
unfortunately because earlier binutils don't support .cfi_signal_frame
[AK: added automatic detection of the new binutils and wrote description]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We do some additional CPU synchronization in gettimeofday et.al. to make
sure the time stamps are always monotonic over multiple CPUs. But on
single core systems that is not needed. So don't do it.
Signed-off-by: Andi Kleen <ak@suse.de>
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
auto_eoi has been set. Keep track of the current status and restore that
on resume. This fixes it for AMD64 and i386.
Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Patch inserts the GART region into the iomem resource map. The GART will then
be visible within /proc/iomem. It will also allow for other users
utilizing the GART to subreserve the region (agp or IOMMU).
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Previously exit_idle would be called more often than enter_idle
Now instead of using complicated tests just keep track of it
using the per CPU variable as a flip flop. I moved the idle state into the
PDA to make the access more efficient.
Original bug report and an initial patch from Stephane Eranian,
but redone by AK.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Before 2.6.16 this was changed to work around code that accessed
CPUs not in the possible map. But that code should be all fixed now,
so mark it __initdata again.
Signed-off-by: Andi Kleen <ak@suse.de>
- Don't zero for __copy_from_user_inatomic following i386.
This will prevent spurious zeros for parallel file system writers when
one does a exception
- The string instruction version didn't zero the output on
exception. Oops.
Also I cleaned up the code a bit while I was at it and added a minor
optimization to the string instruction path.
Signed-off-by: Andi Kleen <ak@suse.de>
This patch places the IOAPIC(s) and the Local APIC specified by ACPI
tables into the resource map. The APICs will then be visible within
/proc/iomem
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Fix
linux/arch/x86_64/kernel/process.c: In function __switch_to:
linux/arch/x86_64/kernel/process.c:626: warning: assignment makes integer from pointer without a cast
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds the per thread cookie field to the task struct and the PDA.
Also it makes sure that the PDA value gets the new cookie value at context
switch, and that a new task gets a new cookie at task creation time.
Signed-off-by: Arjan van Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
Because it can take spinlocks.
Suggested by Mathieu Desnoyers
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Signed-off-by: Andi Kleen <ak@suse.de>
kexec: Avoid overwriting the current pgd (V4, x86_64)
This patch upgrades the x86_64-specific kexec code to avoid overwriting the
current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
to start a secondary kernel that dumps the memory of the previous kernel.
The code introduces a new set of page tables. These tables are used to provide
an executable identity mapping without overwriting the current pgd.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Remove most of the special cases for the debug IST stack. This is a
follow on clean up patch, it requires the bug fix patch that adds
orig_ist.
Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Based on a idea by Jeremy Fitzhardinge:
Replace the volatiles and memory clobbers in the PDA access with
telling gcc about access to a proxy PDA structure that doesn't
actually exist. But the dummy accesses give a defined ordering for
read/write accesses.
Also add some memory barriers to the early GS initialization to
make sure no PDA access is moved before it.
Advantage is some .text savings (probably most from better
code for accessing "current"):
text data bss dec hex filename
4845647 1223688 615864 6685199 66020f vmlinux
4837780 1223688 615864 6677332 65e354 vmlinux-pda
1.2% smaller code
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch updates x86_64 linker script to pack any .note.* sections
into a PT_NOTE segment in the output file.
To do this, we tell ld that we need a PT_NOTE segment. This requires
us to start explicitly mapping sections to segments, so we also need
to explicitly create PT_LOAD segments for text and data, and map the
sections to them appropriately. Fortunately, each section will
default to its previous section's segment, so it doesn't take many
changes to vmlinux.lds.S.
The corresponding change is already made for i386 in -mm and I'd like
this patch to join it. The section to segment mappings do change as do
the segment flags so some time in -mm would be good for that reason as
well, just in case.
In particular .data and .bss move from the text segment to the data
segment and .data.cacheline_aligned .data.read_mostly are put in the
data segment instead of a separate one.
I think that it would be possible to exactly match the existing section
to segment mapping and flags but it would be a more intrusive change and
I'm not sure there is a reason for the existing layout other than it is
what you get by default if you don't explicitly specify something else.
If there is a reason for the existing layout then I will of course make
the more intrusive change. If there is no reason we could probably drop
the executable or writable flags from some segments but I don't know how
much attention is paid to them anyway so it might not be worth the
effort.
The vsyscall related sections need to go in a different segment to the
normal data segment and so I invented a "user" segment to contain them.
I believe this should appear to be another data segment as far as the
kernel is concerned so the flags are setup accordingly.
The notes will be used in the Xen paravirt_ops backend to provide
additional information to the domain builder. I am in the process of
converting the xen-unstable kernels and tools over to this scheme at the
moment to support this in the future.
It has been suggested to me that the notes segment should have flags 0
(i.e. not readable) since it is only used by the loader and is not used
at runtime. For now I went with a readable segment since that is what
the i386 patch uses.
AK: dropped NOTES addition right now because the needed infrastructure
for that is not merged yet
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
In long mode the %cs is largely a relic. However there are a few cases
like iret where it matters that we have a valid value. Without this
patch it is possible to enter the kernel in startup_64 without setting
%cs to a valid value. With this patch we don't care what %cs value
we enter the kernel with, so long as the cs shadow register indicates
it is a privileged code segment.
Thanks to Magnus Damm for finding this problem and posting the
first workable patch. I have moved the jump to set %cs down a
few instructions so we don't need to take an extra jump. Which
keeps the code simpler.
Signed-of-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Drop support for non e820 BIOS calls to get the memory map.
The boot assembler code still has some support, but not the C code now.
Signed-off-by: Andi Kleen <ak@suse.de>
NMIs are not supposed to track the irq flags, but TRACE_IRQS_IRETQ
did it anyways. Add a check.
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove a define that was used only once
- Remove the too large APIC ID check because we always support
the full 8bit range of APICs.
- Restructure code a bit to be simpler.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
ACPI went to great trouble to get the APIC version and CPU capabilities
of different CPUs before passing them to the mpparser. But all
that data was used was to print it out. Actually it even faked some data
based on the boot cpu, not on the actual CPU being booted.
Remove all this code because it's not needed.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
And replace all users with ordinary smp_processor_id. The function
was originally added to get some basic oops information out even
if the GS register was corrupted. However that didn't
work for some anymore because printk is needed to print the oops
and it uses smp_processor_id() already. Also GS register corruptions
are not particularly common anymore.
This also helps the Xen port which would otherwise need to
do this in a special way because it can't access the local APIC.
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Detect the situations in which the time after a resume from disk would
be earlier than the time before the suspend and prevent them from
happening on x86_64.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
From i386 x86-64 inherited code to force reserve the 640k-1MB area.
That was needed on some old systems.
But we generally trust the e820 map to be correct on 64bit systems
and mark all areas that are not memory correctly.
This patch will allow to use the real memory in there.
Or rather the only way to find out if it's still needed is to
try. So far I'm optimistic.
Signed-off-by: Andi Kleen <ak@suse.de>
The init_amd() function is only called from identify_cpu() which is already
marked as __cpuinit. So let's mark it as __cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Implement pause_on_oops() on x86_64.
AK: I redid the patch to do the oops_enter/exit in the existing
oops_begin()/end(). This makes it much shorter.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time). However for very frequent FPU users... you
take an extra trap every context switch.
The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the
trap is avoided. (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).
After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again). The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.
[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Now for a completely different but trivial approach.
I just boot tested it with 255 CPUS and everything worked.
Currently everything (except module data) we place in
the per cpu area we know about at compile time. So
instead of allocating a fixed size for the per_cpu area
allocate the number of bytes we need plus a fixed constant
for to be used for modules.
It isn't perfect but it is much less of a pain to
work with than what we are doing now.
AK: fixed warning
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
I've noticed some erratic behavior while testing the X86_64 version
of monotonic_clock().
While spinning in a loop reading monotonic clock values (pinned to a
single cpu) I noticed that the difference between subsequent values
occasionally went negative (time going backwards).
I found that in the following code:
this_offset = get_cycles_sync();
/* FIXME: 1000 or 1000000? */
--> offset = (this_offset - last_offset)*1000 / cpu_khz;
}
return base + offset;
the offset sometimes turns out to be 0, even though
this_offset > last_offset.
+Added fix From: Toyo Abe <toyoa@mvista.com>
The x86_64-mm-monotonic-clock.patch in 2.6.18-rc4-mm2 made a change to
the updating of monotonic_base. It now uses cycles_2_ns().
I suggest that a set_cyc2ns_scale() should be done prior to the setup_irq().
Because cycles_2_ns() can be called from the timer ISR right after the irq0
is enabled.
Signed-off-by: Toyo Abe <toyoa@mvista.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.
AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.
+From: Jeremy Fitzhardinge <jeremy@goop.org>
KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.
Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup. Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup). It's
surprising this didn't cause more havok.
The fix is to use .pushsection/.popsection, so this stuff nests
properly. A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.
+From: Chuck Ebbert <76306.1226@compuserve.com>
Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.
Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This unifies the standard backtracer and the new stacktrace
in memory backtracer. The standard one is converted to use callbacks
and then reimplement stacktrace using new callbacks.
The main advantage is that stacktrace can now use the new dwarf2 unwinder
and avoid false positives in many cases.
I kept it simple to make sure the standard backtracer stays reliable.
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Lockdep can call the dwarf2 unwinder early, and the dwarf2 code
uses safe_smp_processor_id which tries to access the local APIC page.
But that doesn't work before the APIC code has set up its fixmap.
Check for this case and always return boot cpu then.
Cc: jbeulich@novell.com
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
tce_cache_blast_stress was useful during bringup to stress the IOMMU's
cache flushing. Now that we quiesce DMAs on every cache flush, using
_stress() brings the machine down to its knees once you put it under
load. Remove this debug / bringup code that isn't useful anymore
completely.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Introduce new function verify_bit_range(). Define two versions, one
for CONFIG_IOMMU_DEBUG enabled and one for disabled. Previously we
were checking that the bitmap was consistent every time we allocated
or freed an entry in the TCE table, which is good for debugging but
incurs an unnecessary penalty on non debug builds.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
is_at_popf() needs to test for the iret instruction as well as
popf. So add that test and rename it to is_setting_trap_flag().
Also change max insn length from 16 to 15 to match reality.
LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The combination of "local_save_flags" and "local_irq_disable" seems to be
equivalent to "local_irq_save" (see code snips below). Consequently, replace
occurrences of local_save_flags+local_irq_disable with local_irq_save.
* local_irq_save
#define raw_local_irq_save(flags) \
do { (flags) = __raw_local_irq_save(); } while (0)
static inline unsigned long __raw_local_irq_save(void)
{
unsigned long flags = __raw_local_save_flags();
raw_local_irq_disable();
return flags;
}
* local_save_flags
#define raw_local_save_flags(flags) \
do { (flags) = __raw_local_save_flags(); } while (0)
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Move it into srat.c No need to clutter up setup.c for it
And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.
Signed-off-by: Andi Kleen <ak@suse.de>
Removes code duplication between i386/x86-64.
Not needed anymore in setup.c since early_param cleanup
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
I think it was only needed for the printks and we can do them later.
I put in a single early_printk so that we know the kernel is alive
(early_printk doesn't need any locks)
This makes some things easier for initialization of unwind for
lockdep, which is needed by later patches.
cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Instead of hackish manual parsing
Requires earlier i386 patchkit, but also fixes i386 early_printk again.
I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)
Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)
This cleans up a lot of code.
Signed-off-by: Andi Kleen <ak@suse.de>
This makes it possible to modify CPU flags in command line
options without hacks.
And remove another copy in head64.c
Signed-off-by: Andi Kleen <ak@suse.de>
The lock prefix will cause an exception when used with the
popf instruction, so no need to continue searching after it's
found.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There's no need to check for invalid DMA data direction in nommu and
gart since we do it in dma-mapping.h anyway before calling the
individual dma-ops.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Allows easier extension of the GDT by using the proper C symbol
for the size in the descriptor.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
When testing for the REX instruction prefix, first check
for 32-bit mode because in compat mode the REX prefix is an
increment instruction.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Make translation_disabled a uchar rather than an int
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The pci_get_device() API decrements the reference count on the 'from'
parameter when it continues searching. Therefore, take a ref count on
Calgary bus when we initialize them in either translated or
non-translated mode.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We were freeing the iommu_table and leaking the bitmap pages. Also
rename it to calgary_free_bus, which is more accurate.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Move the tce_table_kva array, disabled bitmap and bus_to_phb array
into a new per bus 'struct calgary_bus_info'. Also slightly reorganize
build_tce_table and tce_table_setparms to avoid exporting bus_info to
tce.c.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The genapic field and the accessor macro weren't used anywhere.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
While an earlier patch already did a small step into that direction,
this patch moves initialization of all memory end variables to as
early as possible, so that dependent code doesn't need to check
whether these variables have already been set.
Also, remove a misleading (perhaps just outdated) comment, and make
static a variable only used in a single file.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
They did not really belong into io_apic.c. Move them into a new file
and clean it up a bit.
Also remove outdated ATI quirk that was obsolete,
Signed-off-by: Andi Kleen <ak@suse.de>
The MPS table specification says that the operating system should
renumber the IO-APICs following the table as needed. However in
ACPI this is not allowed or neeeded and all x86-64 systems are ACPI
compliant.
The code was already disabled on some systems because it caused
problems there. Remove it completely now.
CC: mdomsch@dell.com
Signed-off-by: Andi Kleen <ak@suse.de>
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions
In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.
Signed-off-by: Andi Kleen <ak@suse.de>
PIC mode is an outdated way to drive the APICs that was used on
some early MP boards. It is not supported in the ACPI model.
It is unlikely to be ever configured by any x86-64 system
Remove it thus.
Signed-off-by: Andi Kleen <ak@suse.de>
No 64bit EISA or Microchannel systems ever. Remove the left over code
in the IO-APIC driver and the mptable parser
Signed-off-by: Andi Kleen <ak@suse.de>
This was an old workaround for broken MP-BIOS. The user could
specify overwrites on the command line.
I've never seen it being used for anything on 64bit. So get
rid of it for now.
Signed-off-by: Andi Kleen <ak@suse.de>
IO-APIC or local APIC can only be disabled at runtime anyways and
Kconfig has forced these options on for a long time now.
The Kconfigs are kept only now for the benefit of the shared acpi
boot.c code.
Signed-off-by: Andi Kleen <ak@suse.de>
Previously it didn't align. Use the same one as the C compiler
in blended mode, which is good for K8 and Core2 and doesn't hurt
on P4.
Signed-off-by: Andi Kleen <ak@suse.de>
Use knowledge about EFLAGS layout (bits 22:63 are always 0) to distingush
EFLAGS word and kernel address in the spin lock stack frame.
Signed-off-by: Andi Kleen <ak@suse.de>
A few trivial spelling and grammar mistakes picked up in
"arch/x86_64/aperture.c", "arch/x86_64/crash.c" and
"arch/x86_64/apic.c". I think all are correct fixes but am ever aware
of my fallibility :o) This is my first patch submission so all
feedback is appreciated, esp. WRT CCing to Linus, Andi and
trivial@kernel.org, is this correct? And which is the most appropriate
kernel version to diff against? If any.
Should apply cleanly to 2.6.18-rc1
Signed-off-by: Adam Henley <adamazing@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
- adam
Hello,
Following my discussion with Andi. Here is a patch that introduces
two new TIF flags to simplify the context switch code in __switch_to().
The idea is to minimize the number of cache lines accessed in the common
case, i.e., when neither the debug registers nor the I/O bitmap are used.
This patch covers the x86-64 modifications. A patch for i386 follows.
Changelog:
- add TIF_DEBUG to track when debug registers are active
- add TIF_IO_BITMAP to track when I/O bitmap is used
- modify __switch_to() to use the new TIF flags
<signed-off-by>: eranian@hpl.hp.com
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP
capability uses either the RDTSCP or CPUID to obtain a CPU and node
numbers and pass them to the program.
AK: Lots of changes over Vojtech's original code:
Better prototype for vgetcpu()
It's better to pass the cpu / node numbers as separate arguments
to avoid mistakes when going from SMP to NUMA.
Also add a fast time stamp based cache using a user supplied
argument to speed things more up.
Use fast method from Chuck Ebbert to retrieve node/cpu from
GDT limit instead of CPUID
Made sure RDTSCP init is always executed after node is known.
Drop printk
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>