Commit Graph

53265 Commits

Author SHA1 Message Date
Avi Kivity
2ff81f70b5 KVM: Remove unused 'instruction_length'
As we no longer emulate in userspace, this is meaningless.  We don't
compute it on SVM anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity
02c8320972 KVM: Don't require explicit indication of completion of mmio or pio
It is illegal not to return from a pio or mmio request without completing
it, as mmio or pio is an atomic operation.  Therefore, we can simplify
the userspace interface by avoiding the completion indication.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity
e7df56e4a0 KVM: Remove extraneous guest entry on mmio read
When emulating an mmio read, we actually emulate twice: once to determine
the physical address of the mmio, and, after we've exited to userspace to
get the mmio value, we emulate again to place the value in the result
register and update any flags.

But we don't really need to enter the guest again for that, only to take
an immediate vmexit.  So, if we detect that we're doing an mmio read,
emulate a single instruction before entering the guest again.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Anthony Liguori
94dfbdb389 KVM: SVM: Only save/restore MSRs when needed
We only have to save/restore MSR_GS_BASE on every VMEXIT.  The rest can be
saved/restored when we leave the VCPU.  Since we don't emulate the DEBUGCTL
MSRs and the guest cannot write to them, we don't have to worry about
saving/restoring them at all.

This shaves a whopping 40% off raw vmexit costs on AMD.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Adrian Bunk
2807696c37 KVM: fix an if() condition
It might have worked in this case since PT_PRESENT_MASK is 1, but let's
express this correctly.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
2ab455ccce KVM: VMX: Add lazy FPU support for VT
Only save/restore the FPU host state when the guest is actually using the
FPU.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
25c4c2762e KVM: VMX: Properly shadow the CR0 register in the vcpu struct
Set all of the host mask bits for CR0 so that we can maintain a proper
shadow of CR0.  This exposes CR0.TS, paving the way for lazy fpu handling.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
e0e5127d06 KVM: Don't complain about cpu erratum AA15
It slows down Windows x64 horribly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
7807fa6ca5 KVM: Lazy FPU support for SVM
Avoid saving and restoring the guest fpu state on every exit.  This
shaves ~100 cycles off the guest/host switch.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
4c690a1e86 KVM: Allow passing 64-bit values to the emulated read/write API
This simplifies the API somewhat (by eliminating the special-case
cmpxchg8b on i386).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
1165f5fec1 KVM: Per-vcpu statistics
Make the exit statistics per-vcpu instead of global.  This gives a 3.5%
boost when running one virtual machine per core on my two socket dual core
(4 cores total) machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Yaozu Dong
3fca036530 KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cycles
By checking if a reschedule is needed, we avoid dropping the vcpu.

[With changes by me, based on Anthony Liguori's observations]

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Yaozu Dong
d6c69ee9a2 KVM: MMU: Avoid heavy ASSERT at non debug mode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
4d56c8a787 KVM: VMX: Only save/restore MSR_K6_STAR if necessary
Intel hosts only support syscall/sysret in long more (and only if efer.sce
is enabled), so only reload the related MSR_K6_STAR if the guest will
actually be able to use it.

This reduces vmexit cost by about 500 cycles (6400 -> 5870) on my setup.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
35cc7f9711 KVM: Fold drivers/kvm/kvm_vmx.h into drivers/kvm/vmx.c
No meat in that file.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
e38aea3e93 KVM: VMX: Don't switch 64-bit msrs for 32-bit guests
Some msrs are only used by x86_64 instructions, and are therefore
not needed when the guest is legacy mode.  By not bothering to switch
them, we reduce vmexit latency by 2400 cycles (from about 8800) when
running a 32-bt guest on a 64-bit host.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
2345df8c55 KVM: VMX: Reduce unnecessary saving of host msrs
THe automatically switched msrs are never changed on the host (with
the exception of MSR_KERNEL_GS_BASE) and thus there is no need to save
them on every vm entry.

This reduces vmexit latency by ~400 cycles on i386 and by ~900 cycles (10%)
on x86_64.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
c9047f5333 KVM: Handle guest page faults when emulating mmio
Usually, guest page faults are detected by the kvm page fault handler,
which detects if they are shadow faults, mmio faults, pagetable faults,
or normal guest page faults.

However, in ceratin circumstances, we can detect a page fault much later.
One of these events is the following combination:

- A two memory operand instruction (e.g. movsb) is executed.
- The first operand is in mmio space (which is the fault reported to kvm)
- The second operand is in an ummaped address (e.g. a guest page fault)

The Windows 2000 installer does such an access, an promptly hangs.  Fix
by adding the missing page fault injection on that path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
364b625b56 KVM: SVM: Report hardware exit reason to userspace instead of dmesg
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
8c4385024d KVM: Retry sleeping allocation if atomic allocation fails
This avoids -ENOMEM under memory pressure.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
b5a33a7572 KVM: Use slab caches to allocate mmu data structures
Better leak detection, statistics, memory use, speed -- goodness all
around.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
417726a3fb KVM: Handle partial pae pdptr
Some guests (Solaris) do not set up all four pdptrs, but leave some invalid.
kvm incorrectly treated these as valid page directories, pinning the
wrong pages and causing general confusion.

Fix by checking the valid bit of a pae pdpte.  This closes sourceforge bug
1698922.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
d917a6b92d KVM: Initialize cr0 to indicate an fpu is present
Solaris panics if it sees a cpu with no fpu, and it seems to rely on this
bit.  Closes sourceforge bug 1698920.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Eric Sesterhenn / Snakebyte
3964994bb5 KVM: Fix overflow bug in overflow detection code
The expression

   sp - 6 < sp

where sp is a u16 is undefined in C since 'sp - 6' is promoted to int,
and signed overflow is undefined in C.  gcc 4.2 actually warns about it.
Replace with a simpler test.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
5008fdf5b6 KVM: Use kernel-standard types
Noted by Joerg Roedel.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Joerg Roedel
80b7706e4c KVM: SVM: enable LBRV virtualization if available
This patch enables the virtualization of the last branch record MSRs on
SVM if this feature is available in hardware. It also introduces a small
and simple check feature for specific SVM extensions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity
b8836737d9 KVM: Add fpu get/set operations
These are really helpful when migrating an floating point app to another
machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity
e8207547d2 KVM: Add physical memory aliasing feature
With this, we can specify that accesses to one physical memory range will
be remapped to another.  This is useful for the vga window at 0xa0000 which
is used as a movable window into the (much larger) framebuffer.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity
954bbbc236 KVM: Simply gfn_to_page()
Mapping a guest page to a host page is a common operation.  Currently,
one has first to find the memory slot where the page belongs (gfn_to_memslot),
then locate the page itself (gfn_to_page()).

This is clumsy, and also won't work well with memory aliases.  So simplify
gfn_to_page() not to require memory slot translation first, and instead do it
internally.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Dor Laor
e0fa826f96 KVM: Add mmu cache clear function
Functions that play around with the physical memory map
need a way to clear mappings to possibly nonexistent or
invalid memory.  Both the mmu cache and the processor tlb
are cleared.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity
df513e2cdd KVM: x86 emulator: fix bit string operations operand size
On x86, bit operations operate on a string of bits that can reside in
multiple words.  For example, 'btsl %eax, (blah)' will touch the word
at blah+4 if %eax is between 32 and 63.

The x86 emulator compensates for that by advancing the operand address
by (bit offset / BITS_PER_LONG) and truncating the bit offset to the
range (0..BITS_PER_LONG-1).  This has a side effect of forcing the operand
size to 8 bytes on 64-bit hosts.

Now, a 32-bit guest goes and fork()s a process.  It write protects a stack
page at 0xbffff000 using the 'btr' instruction, at offset 0xffc in the page
table, with bit offset 1 (for the write permission bit).

The emulator now forces the operand size to 8 bytes as previously described,
and an innocent page table update turns into a cross-page-boundary write,
which is assumed by the mmu code not to be a page table, so it doesn't
actually clear the corresponding shadow page table entry.  The guest and
host permissions are out of sync and guest memory is corrupted soon
afterwards, leading to guest failure.

Fix by not using BITS_PER_LONG as the word size; instead use the actual
operand size, so we get a 32-bit write in that case.

Note we still have to teach the mmu to handle cross-page-boundary writes
to guest page table; but for now this allows Damn Small Linux 0.4 (2.4.20)
to boot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity
afeb1f14c5 KVM: Remove debug message
No longer interesting.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Avi Kivity
36868f7b0e KVM: Use list_move()
Use list_move() where possible.  Noticed by Dor Laor.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Michal Piotrowski
55bf402834 KVM: Remove unused function
Remove unused function

CC      drivers/kvm/svm.o
drivers/kvm/svm.c:207: warning: ‘inject_db’ defined but not used

Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Avi Kivity
0cc5064d33 KVM: SVM: Ensure timestamp counter monotonicity
When a vcpu is migrated from one cpu to another, its timestamp counter
may lose its monotonic property if the host has unsynced timestamp counters.
This can confuse the guest, sometimes to the point of refusing to boot.

As the rdtsc instruction is rather fast on AMD processors (7-10 cycles),
we can simply record the last host tsc when we drop the cpu, and adjust
the vcpu tsc offset when we detect that we've migrated to a different cpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Avi Kivity
d28c6cfbbc KVM: MMU: Fix hugepage pdes mapping same physical address with different access
The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map
the same physical address, they share the same shadow page.  This is a fairly
common case (kernel mappings on i386 nonpae Linux, for example).

However, if the two pdes map the same memory but with different permissions, kvm
will happily use the cached shadow page.  If the access through the more
permissive pde will occur after the access to the strict pde, an endless pagefault
loop will be generated and the guest will make no progress.

Fix by making the access permissions part of the cache lookup key.

The fix allows Xen pae to boot on kvm and run guest domains.

Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:27 +03:00
Joerg Roedel
916ce2360f KVM: SVM: forbid guest to execute monitor/mwait
This patch forbids the guest to execute monitor/mwait instructions on
SVM. This is necessary because the guest can execute these instructions
if they are available even if the kvm cpuid doesn't report its
existence.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Sergey Kiselev
0e5bf0d0e4 KVM: Handle writes to MCG_STATUS msr
Some older (~2.6.7) kernels write MCG_STATUS register during kernel
boot (mce_clear_all() function, called from mce_init()). It's not
currently handled by kvm and will cause it to inject a GPF.
Following patch adds a "nop" handler for this.

Signed-off-by: Sergey Kiselev <sergey.kiselev@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
fcd3410870 KVM: Remove unused and write-only variables
Trivial cleanup.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
6da63cf95f KVM: Don't allow the guest to turn off the cpu cache
The cpu cache is a host resource; the guest should not be able to turn
it off (even for itself).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
038881c8be KVM: Hack real-mode segments on vmx from KVM_SET_SREGS
As usual, we need to mangle segment registers when emulating real mode
as vm86 has specific constraints.  We special case the reset segment base,
and set the "access rights" (or descriptor flags) to vm86 comaptible values.

This fixes reboot on vmx.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
024aa1c02f KVM: Modify guest segments after potentially switching modes
The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and
guest segment registers.  Since segment handling is modified by the mode on
Intel procesors, update the segment registers after the mode switch has taken
place.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:26 +03:00
Avi Kivity
f6528b03f1 KVM: Remove set_cr0_no_modeswitch() arch op
set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers.
As we now cache the protected mode values on entry to real mode, this
isn't an issue anymore, and it interferes with reboot (which usually _is_
a modeswitch).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
8cb5b03332 KVM: Workaround vmx inability to virtualize the reset state
The reset state has cs.selector == 0xf000 and cs.base == 0xffff0000,
which aren't compatible with vm86 mode, which is used for real mode
virtualization.

When we create a vcpu, we set cs.base to 0xf0000, but if we get there by
way of a reset, the values are inconsistent and vmx refuses to enter
guest mode.

Workaround by detecting the state and munging it appropriately.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
aac012245a KVM: MMU: Remove global pte tracking
The initial, noncaching, version of the kvm mmu flushed the all nonglobal
shadow page table translations (much like a native tlb flush).  The new
implementation flushes translations only when they change, rendering global
pte tracking superfluous.

This removes the unused tracking mechanism and storage space.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
ca5aac1f96 KVM: MMU: Remove unnecessary check for pdptr access
We already special case the pdptr access, so no need to check it again.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
039576c03c KVM: Avoid guest virtual addresses in string pio userspace interface
The current string pio interface communicates using guest virtual addresses,
relying on userspace to translate addresses and to check permissions.  This
interface cannot fully support guest smp, as the check needs to take into
account two pages at one in case an unaligned string transfer straddles a
page boundary.

Change the interface not to communicate guest addresses at all; instead use
a buffer page (mmaped by userspace) and do transfers there.  The kernel
manages the virtual to physical translation and can perform the checks
atomically by taking the appropriate locks.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
f0fe510864 KVM: Future-proof argument-less ioctls
Some ioctls ignore their arguments.  By requiring them to be zero now,
we allow a nonzero value to have some special meaning in the future.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity
07c45a366d KVM: Allow kernel to select size of mmap() buffer
This allows us to store offsets in the kernel/user kvm_run area, and be
sure that userspace has them mapped.  As offsets can be outside the
kvm_run struct, userspace has no way of knowing how much to mmap.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity
1961d276c8 KVM: Add guest mode signal mask
Allow a special signal mask to be used while executing in guest mode.  This
allows signals to be used to interrupt a vcpu without requiring signal
delivery to a userspace handler, which is quite expensive.  Userspace still
receives -EINTR and can get the signal via sigwait().

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00