Instead of prefetching all segment bases before emulation, read them at the
last moment. Since most of them are unneeded, we save some cycles on
Intel machines where this is a bit expensive.
Signed-off-by: Avi Kivity <avi@qumranet.com>
rip relative decoding is relative to the instruction pointer of the next
instruction; by moving address adjustment until after decoding is complete,
we remove the need to determine the instruction size.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch enables coalesced MMIO for x86 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Prefixes functions that will be exported with kvm_.
We also prefixed set_segment() even if it still static
to be coherent.
signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add emulation for the memory type range registers, needed by VMware esx 3.5,
and by pci device assignment.
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM turns off hardware virtualization extensions during reboot, in order
to disassociate the memory used by the virtualization extensions from the
processor, and in order to have the system in a consistent state.
Unfortunately virtual machines may still be running while this goes on,
and once virtualization extensions are turned off, any virtulization
instruction will #UD on execution.
Fix by adding an exception handler to virtualization instructions; if we get
an exception during reboot, we simply spin waiting for the reset to complete.
If it's a true exception, BUG() so we can have our stack trace.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The KVM MMU tries to detect when a speculative pte update is not actually
used by demand fault, by checking the accessed bit of the shadow pte. If
the shadow pte has not been accessed, we deem that page table flooded and
remove the shadow page table, allowing further pte updates to proceed
without emulation.
However, if the pte itself points at a page table and only used for write
operations, the accessed bit will never be set since all access will happen
through the emulator.
This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels.
The kernel points a kmap_atomic() pte at a page table, and then
proceeds with read-modify-write operations to look at the dirty and accessed
bits. We get a false flood trigger on the kmap ptes, which results in the
mmu spending all its time setting up and tearing down shadows.
Fix by setting the shadow accessed bit on emulated accesses.
Signed-off-by: Avi Kivity <avi@qumranet.com>
To distinguish between real page faults and nested page faults they should be
traced as different events. This is implemented by this patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Fallout from commit 33185c504f ("x86:
merge signal_32/64.h")
Thanks to Dick Streefland who provided an useful testcase on
http://lkml.org/lkml/2008/3/17/205 (only applicable to 2.6.24.x), that
helped a lot as a deterministic way to bisect an issue that leaded to
this fix.
Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Cc: Roland McGrath <roland@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix asm/e820.h for userspace inclusion
x86: fix numaq_tsc_disable
x86: fix kernel_physical_mapping_init() for large x86 systems
asm-x86/e820.h is included from userspace. 'x86: make e820.c to have
common functions' (b79cd8f126) broke it:
make -C Documentation/lguest
cc -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include
lguest.c -lz -o lguest
In file included from ../../include/asm-x86/bootparam.h:8,
from lguest.c:45:
../../include/asm/e820.h:66: error: expected ‘)’ before ‘start’
../../include/asm/e820.h:67: error: expected ‘)’ before ‘start’
../../include/asm/e820.h:68: error: expected ‘)’ before ‘start’
../../include/asm/e820.h:72: error: expected ‘=’, ‘,’, ‘;’, ‘asm’
or ‘__attribute__’ before ‘e820_update_range’
...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
"idle=nomwait" disables the use of the MWAIT
instruction from both C1 (C1_FFH) and deeper (C2C3_FFH)
C-states.
When MWAIT is unavailable, the BIOS and OS generally
negotiate to use the HALT instruction for C1,
and use IO accesses for deeper C-states.
This option is useful for power and performance
comparisons, and also to work around BIOS bugs
where broken MWAIT support is advertised.
http://bugzilla.kernel.org/show_bug.cgi?id=10807http://bugzilla.kernel.org/show_bug.cgi?id=10914
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
"idle=halt" limits the idle loop to using
the halt instruction. No MWAIT, no IO accesses,
no C-states deeper than C1.
If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
AS arch/x86/lib/csum-copy_64.o
arch/x86/lib/csum-copy_64.S: Assembler messages:
arch/x86/lib/csum-copy_64.S:48: Error: Macro `ignore' was already defined
make[1]: *** [arch/x86/lib/csum-copy_64.o] Error 1
make: *** [arch/x86/lib] Error 2
It appears that csum-copy_64.S and dwarf2.h both define an ignore macro.
I would expect one of them can be renamed quite easily, unless they
are references elsewhere.
Caused-by-commit: 392a0fc96b
x86: merge dwarf2 headers
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Strengthen the return type for the _node_to_cpumask_ptr to be
a const pointer. This adds compiler checking to insure that
node_to_cpumask_map[] is not changed inadvertently.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that IRQ2 is never made available to the I/O APIC, there is no need
to special-case it and mask as a workaround for broken systems. Actually,
because of the former, mask_IO_APIC_irq(2) is a no-op already.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
got this on a test-system:
calling numaq_tsc_disable+0x0/0x39
NUMAQ: disabling TSC
initcall numaq_tsc_disable+0x0/0x39 returned 0 after 0 msecs
that's because we should not be using arch_initcall to call numaq_tsc_disable.
need to call it in setup_arch before time_init()/tsc_init()
and call it in init_intel() to make the cpu feature bits right.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix size of LDT entries. On x86-64, ldt_desc is a double-sized descriptor.
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In dwarf2_32.h, test for CONFIG_AS_CFI instead of
CONFIG_UNWIND_INFO. Turns out that searching for UNWIND_INFO
returns no match in any Kconfig or Makefile, so we're really
just throwing everything away regarding dwarf frames for i386.
The test that generates CONFIG_AS_CFI does not have anything
x86_64-specific, and right now, checking V=1 builds shows me
that the flags is there anyway, although unused.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In dwarf_64.h header, use the "ignore" macro the way
i386 does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
i spent a fair amount of time chasing a 64-bit bootup crash that manifested
itself as bootup segfaults:
S10network[1825]: segfault at 7f3e2b5d16b8 ip 00000031108748c9 sp 00007fffb9c14c70 error 4 in libc-2.7.so[3110800000+14d000]
eventually causing init to die and panic the system:
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.26-rc9-tip #13878
after a maratonic bisection session, the bad commit turned out to be:
| b7675791859075418199c7af86a116ea34eaf5bd is first bad commit
| commit b7675791859075418199c7af86a116ea34eaf5bd
| Author: Jeremy Fitzhardinge <jeremy@goop.org>
| Date: Wed Jun 25 00:19:00 2008 -0400
|
| x86: remove open-coded save/load segment operations
|
| This removes a pile of buggy open-coded implementations of savesegment
| and loadsegment.
after some more bisection of this patch itself, it turns out that what
makes the difference are the savesegment() changes to __switch_to().
Taking a look at this portion of arch/x86/kernel/process_64.o revealed
this crutial difference:
| good: 99c: 8c e0 mov %fs,%eax
| 99e: 89 45 cc mov %eax,-0x34(%rbp)
|
| bad: 99c: 8c 65 cc mov %fs,-0x34(%rbp)
which is due to:
| unsigned fsindex;
| - asm volatile("movl %%fs,%0" : "=r" (fsindex));
| + savesegment(fs, fsindex);
savesegment() is implemented as:
#define savesegment(seg, value) \
asm("mov %%" #seg ",%0":"=rm" (value) : : "memory")
note the "m" modifier - it allows GCC to generate the segment move
into a memory operand as well.
But regarding segment operands there's a subtle detail in the x86
instruction set: the above 16-bit moves are zero-extend, but only
if it goes to a register.
If it goes to a memory operand, -0x34(%rbp) in the above case, there's
no zero-extend to 32-bit and the instruction will only save 16 bits
instead of the intended 32-bit.
The other 16 bits is random data - which can cause problems when that
value is used later on.
The solution is to only allow segment operands to go to registers.
This fix allows my test-system to boot up without crashing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add pseudo-feature bits to describe whether the CPU supports sysenter
and/or syscall from ia32-compat userspace. This removes a hardcoded
test in vdso32-setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when more than 4g memory is installed, don't map the big hole below 4g.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
NR_IRQS: let VISWS be just a sub-case of the generic code.
This can create a somewhat larger irq_desc[] array if NR_CPUS is high
but that should not worry VisWS which has 4 CPUs at most.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the SGIVW definitions from setup_arch.h into its own header file.
preparation for turning VISWS into a generic PC architecture.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the include/asm-x86/mach-visws/ VISWS specific hardware
details include files into include/asm-x86/visws, to be used from
generic code.
No code changed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
update asm-x86/mach-visws/mach_apicdef.h to the generic version.
This should work fine as VISWS has a standard local APIC and thus
its mach_apicdef.h copy is just an ancient version of the generic code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that include/asm-x86/mach-visws/smpboot_hooks.h equals
to the default file in ../mach-default/smpboot_hooks.h, simply
include it instead of maintaining a copy.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
update include/asm-x86/mach-visws/smpboot_hooks.h to
include/asm-x86/mach-default/smpboot_hooks.h (the generic version).
this _should_ work, because VISWS sets skip_ioapic_setup, but it
should be tested on a real VISWS to make sure.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow the generic smpboot quirks code to be built with
ONFIG_X86_IO_APIC disabled. This way VISWS will be able
to use it as-is.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that include/asm-x86/mach-visws/mach_apic.h equals
to include/asm-x86/mach-default/mach_apic.h, simply start
using the generic one.
Signed-off-by: Ingo Molnar <mingo@elte.hu>