Commit Graph

89 Commits

Author SHA1 Message Date
Eddie Dong
2a8067f17b KVM: pending irq save/restore
Add in kernel irqchip save/restore support for pending vectors.

[avi: fix compile warning on i386]
[avi: remove printk]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:26 +02:00
Eddie Dong
b6958ce44a KVM: Emulate hlt in the kernel
By sleeping in the kernel when hlt is executed, we simplify the in-kernel
guest interrupt path considerably.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
Eddie Dong
97222cc831 KVM: Emulate local APIC in kernel
Because lightweight exits (exits which don't involve userspace) are many
times faster than heavyweight exits, it makes sense to emulate high usage
devices in the kernel.  The local APIC is one such device, especially for
Windows and for SMP, so we add an APIC model to kvm.

It also allows in-kernel host-side drivers to inject interrupts without
going through userspace.

[compile fix on i386 from Jindrich Makovicka]

Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
Eddie Dong
7017fc3d1a KVM: Define and use cr8 access functions
This patch is to wrap APIC base register and CR8 operation which can
provide a unique API for user level irqchip and kernel irqchip.
This is a preparation of merging lapic/ioapic patch.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:25 +02:00
Eddie Dong
85f455f7dd KVM: Add support for in-kernel PIC emulation
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:24 +02:00
Avi Kivity
7e66f350cf KVM: Close minor race in signal handling
We need to check for signals inside the critical section, otherwise a
signal can be sent which we will not notice.  Also move the check
before entry, so that if the signal happens before the first entry,
we exit immediately instead of waiting for something to happen to the
guest.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:23 +02:00
Laurent Vivier
3090dd7377 KVM: Clean up kvm_setup_pio()
Split kvm_setup_pio() into two functions, one to setup in/out pio
(kvm_emulate_pio()) and one to setup ins/outs pio (kvm_emulate_pio_string()).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:23 +02:00
Laurent Vivier
e70669abd4 KVM: Cleanup string I/O instruction emulation
Both vmx and svm decode the I/O instructions, and both botch the job,
requiring the instruction prefixes to be fetched in order to completely
decode the instruction.

So, if we see a string I/O instruction, use the x86 emulator to decode it,
as it already has all the prefix decoding machinery.

This patch defines ins/outs opcodes in x86_emulate.c and calls
emulate_instruction() from io_interception() (svm.c) and from handle_io()
(vmx.c).  It removes all vmx/svm prefix instruction decoders
(get_addr_size(), io_get_override(), io_address(), get_io_count())

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:23 +02:00
Rusty Russell
a477034750 KVM: Use kmem_cache_free for kmem_cache_zalloc'ed objects
We use kfree in svm.c and vmx.c, and this works, but it could break at
any time.  kfree() is supposed to match up with kmalloc().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:23 +02:00
Rusty Russell
f02424785a KVM: Add and use pr_unimpl for standard formatting of unimplemented features
All guest-invokable printks should be ratelimited to prevent malicious
guests from flooding logs.  This is a start.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Rusty Russell
bfc733a7a3 KVM: SVM: Make set_msr_interception more reliable
set_msr_interception() is used by svm to set up which MSRs should be
intercepted.  It can only fail if someone has changed the code to try
to intercept an MSR without updating the array of ranges.

The return value is ignored anyway: it should just BUG() if it doesn't
work.  (A build-time failure would be better, but that's tricky).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Yang, Sheng
002c7f7c32 KVM: VMX: Add cpu consistency check
All the physical CPUs on the board should support the same VMX feature
set.  Add check_processor_compatibility to kvm_arch_ops for the consistency
check.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Rusty Russell
b114b0804d KVM: Use alignment properties of vcpu to simplify FPU ops
Now we use a kmem cache for allocating vcpus, we can get the 16-byte
alignment required by fxsave & fxrstor instructions, and avoid
manually aligning the buffer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
c16f862d02 KVM: Use kmem cache for allocating vcpus
Avi wants the allocations of vcpus centralized again.  The easiest way
is to add a "size" arg to kvm_init_arch, and expose the thus-prepared
cache to the modules.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Laurent Vivier
e7d5d76cae KVM: Remove kvm_{read,write}_guest()
... in favor of the more general emulator_{read,write}_*.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
0e5017d4ae KVM: SVM: internal function name cleanup
Changes some svm.c internal function names:
1) io_adress -> io_address  (de-germanify the spelling)
2) kvm_reput_irq -> reput_irq  (it's not a generic kvm function)
3) kvm_do_inject_irq -> (it's not a generic kvm function)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
e756fc626d KVM: SVM: de-containization
container_of is wonderful, but not casting at all is better.  This
patch changes svm.c's internal functions to pass "struct vcpu_svm"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
3077c4513c KVM: Remove three magic numbers
There are several places where hardcoded numbers are used in place of
the easily-available constant, which is poor form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Shaohua Li
11ec280471 KVM: Convert vm lock to a mutex
This allows the kvm mmu to perform sleepy operations, such as memory
allocation.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Avi Kivity
15ad71460d KVM: Use the scheduler preemption notifiers to make kvm preemptible
Current kvm disables preemption while the new virtualization registers are
in use.  This of course is not very good for latency sensitive workloads (one
use of virtualization is to offload user interface and other latency
insensitive stuff to a container, so that it is easier to analyze the
remaining workload).  This patch re-enables preemption for kvm; preemption
is now only disabled when switching the registers in and out, and during
the switch to guest mode and back.

Contains fixes from Shaohua Li <shaohua.li@intel.com>.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Rusty Russell
fb3f0f51d9 KVM: Dynamically allocate vcpus
This patch converts the vcpus array in "struct kvm" to a pointer
array, and changes the "vcpu_create" and "vcpu_setup" hooks into one
"vcpu_create" call which does the allocation and initialization of the
vcpu (calling back into the kvm_vcpu_init core helper).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Gregory Haskins
a2fa3e9f52 KVM: Remove arch specific components from the general code
struct kvm_vcpu has vmx-specific members; remove them to a private structure.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Jeff Dike
8fc0d085f5 KVM: Set exit_reason to KVM_EXIT_MMIO where run->mmio is initialized.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Rusty Russell
66aee91aaa KVM: Use standard CR4 flags, tighten checking
On this machine (Intel), writing to the CR4 bits 0x00000800 and
0x00001000 cause a GPF.  The Intel manual is a little unclear, but
AFIACT they're reserved, too.

Also fix spelling of CR4_RESEVED_BITS.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
707d92fa72 KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.h
The kernel now has asm/cpu-features.h: use those macros instead of
inventing our own.

Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Qing He
dad3795d2b KVM: SMP: Add vcpu_id field in struct vcpu
This patch adds a `vcpu_id' field in `struct vcpu', so we can
differentiate BSP and APs without pointer comparison or arithmetic.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Avi Kivity
e495606dd0 KVM: Clean up #includes
Remove unnecessary ones, and rearange the remaining in the standard order.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Joerg Roedel
6031a61c2e KVM: SVM: Reliably detect if SVM was disabled by BIOS
This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
94cea1bb9d KVM: Initialize the BSP bit in the APIC_BASE msr correctly
Needs to be set on vcpu 0 only.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Shani Moideen
129ee910df KVM: SVM: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
d9e368d612 KVM: Flush remote tlbs when reducing shadow pte permissions
When a vcpu causes a shadow tlb entry to have reduced permissions, it
must also clear the tlb on remote vcpus.  We do that by:

- setting a bit on the vcpu that requests a tlb flush before the next entry
- if the vcpu is currently executing, we send an ipi to make sure it
  exits before we continue

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
d3bef15f84 KVM: Move duplicate halt handling code into kvm_main.c
Will soon have a thid user.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
17c3ba9d37 KVM: Lazy guest cr3 switching
Switch guest paging context may require us to allocate memory, which
might fail.  Instead of wiring up error paths everywhere, make context
switching lazy and actually do the switch before the next guest entry,
where we can return an error if allocation fails.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Anthony Liguori
c86813393f KVM: SVM: Allow direct guest access to PC debug port
The PC debug port is used for IO delay and does not require emulation.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:37 +03:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Anthony Liguori
94dfbdb389 KVM: SVM: Only save/restore MSRs when needed
We only have to save/restore MSR_GS_BASE on every VMEXIT.  The rest can be
saved/restored when we leave the VCPU.  Since we don't emulate the DEBUGCTL
MSRs and the guest cannot write to them, we don't have to worry about
saving/restoring them at all.

This shaves a whopping 40% off raw vmexit costs on AMD.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Anthony Liguori
25c4c2762e KVM: VMX: Properly shadow the CR0 register in the vcpu struct
Set all of the host mask bits for CR0 so that we can maintain a proper
shadow of CR0.  This exposes CR0.TS, paving the way for lazy fpu handling.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
7807fa6ca5 KVM: Lazy FPU support for SVM
Avoid saving and restoring the guest fpu state on every exit.  This
shaves ~100 cycles off the guest/host switch.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
1165f5fec1 KVM: Per-vcpu statistics
Make the exit statistics per-vcpu instead of global.  This gives a 3.5%
boost when running one virtual machine per core on my two socket dual core
(4 cores total) machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
364b625b56 KVM: SVM: Report hardware exit reason to userspace instead of dmesg
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
5008fdf5b6 KVM: Use kernel-standard types
Noted by Joerg Roedel.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Joerg Roedel
80b7706e4c KVM: SVM: enable LBRV virtualization if available
This patch enables the virtualization of the last branch record MSRs on
SVM if this feature is available in hardware. It also introduces a small
and simple check feature for specific SVM extensions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Michal Piotrowski
55bf402834 KVM: Remove unused function
Remove unused function

CC      drivers/kvm/svm.o
drivers/kvm/svm.c:207: warning: ‘inject_db’ defined but not used

Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Avi Kivity
0cc5064d33 KVM: SVM: Ensure timestamp counter monotonicity
When a vcpu is migrated from one cpu to another, its timestamp counter
may lose its monotonic property if the host has unsynced timestamp counters.
This can confuse the guest, sometimes to the point of refusing to boot.

As the rdtsc instruction is rather fast on AMD processors (7-10 cycles),
we can simply record the last host tsc when we drop the cpu, and adjust
the vcpu tsc offset when we detect that we've migrated to a different cpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Joerg Roedel
916ce2360f KVM: SVM: forbid guest to execute monitor/mwait
This patch forbids the guest to execute monitor/mwait instructions on
SVM. This is necessary because the guest can execute these instructions
if they are available even if the kvm cpuid doesn't report its
existence.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
fcd3410870 KVM: Remove unused and write-only variables
Trivial cleanup.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
6da63cf95f KVM: Don't allow the guest to turn off the cpu cache
The cpu cache is a host resource; the guest should not be able to turn
it off (even for itself).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
f6528b03f1 KVM: Remove set_cr0_no_modeswitch() arch op
set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers.
As we now cache the protected mode values on entry to real mode, this
isn't an issue anymore, and it interferes with reboot (which usually _is_
a modeswitch).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
039576c03c KVM: Avoid guest virtual addresses in string pio userspace interface
The current string pio interface communicates using guest virtual addresses,
relying on userspace to translate addresses and to check permissions.  This
interface cannot fully support guest smp, as the check needs to take into
account two pages at one in case an unaligned string transfer straddles a
page boundary.

Change the interface not to communicate guest addresses at all; instead use
a buffer page (mmaped by userspace) and do transfers there.  The kernel
manages the virtual to physical translation and can perform the checks
atomically by taking the appropriate locks.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
6722c51c51 KVM: Initialize the apic_base msr on svm too
Older userspace didn't care, but newer userspace (with the cpuid changes)
does.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00