Replaced check_user_space() + __check_access_register with the new
check_space(). The old functions made wrong assumptions about kernel
and user space when the kernel and user address spaces are switched
(kernel in home space, user in primary/secondary space).
Secondly the user process can switch to the accress register mode if
it is running in primary or secondary mode. In addition it can load
an arbitrary value to the access registers. If any other value than
0 for primary space or 1 for secondary space is loaded and memory
is accessed using the base register related to the access register,
the program should be terminated with a SIGSEGV. To achieve that the
DUALD pointer in the DUCT and the PSALD pointer in the PASTE need
to point to an array of 8 invalid access-list entries to get a
ALEN-translation exception if an invalid alet is used.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For extended error reporting we sometimes have to start an
Sense Subsystem Status request (SNSS). When this request needs
to be recovered for some reason, the recovery request will
fail with 'command reject'.
Our usual recovery procedure will retry the failed request by
creating a new request and chaining the failed request from that
one. SNSS requests, though, must not be chained from anything,
so the recovery request will fail permanently.
Use the default recovery for SNSS request, which will just restart
the original request without further ado.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Don't have functions in header files unless they are inline.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After switching compression on/off with the mt command, tape encryption is no
longer working. The reason for that is, that the modeset_byte is set to
the compression value instead of using bitwise and/or bit operations to
enable/disable the corresponding bit.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
nss and kexec don't work together since kexec wants to write to the
read-only text section of the shared kernel image.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reipl doesn't work on older machines were s390_reset_machine() gets
called. The reason is that the text section is read-only but the
variable dump_prefix_page is there. Since s390_reset_machine() writes
to it we get a protection exception.
Therefore move dump_prefix_page to the bss section.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid sprinkling a _lot_ of preempt_disable/preempt_enable pairs.
This would be necessary for e.g. the iucv driver. Also this way we
are more consistent with other architectures which disable
preemption at least for smp_call_function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The illegal operation handler calls the die notifier with DIE_BPT to
let kprobes pick up its breakpoint. If kprobes does not find its
breakpoint it returns NOTIFY_STOP instead of NOTIFY_DONE.
Since we use stop_machine_run on s390 to arm/disarm the kprobes
breakpoints the race that kprobe_handler tries to solve by checking
for the kprobes breakpoints does not exist. Removing the check makes
BUG_ON working again.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
I recognized a compile error in latest git:
/here/workdir/git/drivers/net/gianfar.c: In function `gfar_vlan_rx_kill_vid':
/here/workdir/git/drivers/net/gianfar.c:1135: error: structure has no member named `vgrp'
This error was introduced in commit:
commit 6d04e3b04b
...
[VLAN]: Avoid a 4-order allocation.
Signed-off-by: Jan Altenberg <jan@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix reference counting (memory leak) problem in __nfulnl_send() and callers
related to packet queueing.
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count module references correctly: after instance_destroy() there
might be timer pending and holding a reference for this netlink instance.
Based on patch by Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate possible NULL pointer dereference in nfulnl_recv_config().
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paranoia: instance_put() might have freed the inst pointer when we
spin_unlock_bh().
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop reference leaking in nfulnl_log_packet(). If we start a timer we
are already taking another reference.
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some stacks apparently send packets with SYN|URG set. Linux accepts
these packets, so TCP conntrack should to.
Pointed out by Martijn Posthuma <posthuma@sangine.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK,
but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or
CONFIG_NF_CONNTRACK_NETLINK for ifdefs.
Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:
- unconfirmed entries can not be killed manually, they are removed on
confirmation or final destruction of the conntrack entry, which means
we might iterate forever without making forward progress.
This can happen in combination with the conntrack event cache, which
holds a reference to the conntrack entry, which is only released when
the packet makes it all the way through the stack or a different
packet is handled.
- taking references to an unconfirmed entry and using it outside the
locked section doesn't work, the list entries are not refcounted and
another CPU might already be waiting to destroy the entry
What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.
Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.
Reported and tested by Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just define a local {claim,release}_dma_lock() implementation
for the floppy driver to use so we don't need to define and
export to modules the silly dma_spin_lock.
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_PARAVIRT broke old glibc bootup: it silently turned off the
selectability of CONFIG_COMPAT_VDSO and thus rendered distro kernels
unbootable on old-style VDSO glibc setups.
the proper solution is to keep COMPAT_VDSO available - if a hypervisor
needs any modification of that concept then we'll judge those changes in
full context, once those changes are submitted.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
there's a new NMI watchdog related problem: KVM crashes on certain
bzImages because ... we enable the NMI watchdog by default (even if the
user does not ask for it) , and no other OS on this planet does that so
KVM doesnt have emulation for that yet. So KVM injects a #GP, which
crashes the Linux guest:
general protection fault: 0000 [#1]
PREEMPT SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c011a8ae>] Not tainted VLI
EFLAGS: 00000246 (2.6.20-rc5-rt0 #3)
EIP is at setup_apic_nmi_watchdog+0x26d/0x3d3
and no, i did /not/ request an nmi_watchdog on the boot command line!
Solution: turn off that darn thing! It's a debug tool, not a 'make life
harder' tool!!
with this patch the KVM guest boots up just fine.
And with this my laptop (Lenovo T60) also stopped its sporadic hard
hanging (sometimes in acpi_init(), sometimes later during bootup,
sometimes much later during actual use) as well. It hung with both
nmi_watchdog=1 and nmi_watchdog=2, so it's generally the fact of NMI
injection that is causing problems, not the NMI watchdog variant, nor
any particular bootup code.
[ NMI breaks on some systems, esp in combination with SMM -Arjan ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do not use default=y for CONFIG_VMI (we do not do that for any driver or
special-hardware feature): the overwhelming majority of Linux users does
not need it, and interested users and distributions can enable it
as-needed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clarify the description of the CONFIG_VMI option: describe the reality
that VMI is a VMWare-only interface for now. Once that changes and
another hypervisor adopts the VMI ABI we can change the text.
As can be seen from the Xen paravirtualization patches submitted to lkml
the Xen project has chosen its own, non-VMI interface between Xen and
the para-Linux - so remove Xen from the description.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Temove the mistaken turning on of NO_IDLE_HZ on x86+PARAVIRT kernels.
It's an obsolete, limited form of dynticks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CT based mach64 cards were reported to hang on sparc64 boxes when
compiled with gcc-4.1.x and later.
Looking at this piece of code, it's no surprise. A critical
delay was implemented as an empty for() loop, and gcc 4.0.x
and previous did not optimize it away, so we did get a delay.
But gcc-4.1.x and later can optimize it away, and we get crashes.
Use a real udelay() to fix this. Fix verified on SunBlade100.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CC arch/i386/kernel/vmi.o
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c: In function 'vmi_map_pt_hook':
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: 'KM_PTE0' undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: (Each undeclared identifier is reported only once
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: for each function it appears in.)
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: 'KM_PTE1' undeclared (first use in this function)
make[2]: *** [arch/i386/kernel/vmi.o] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replacing use of UTS_RELEASE with utsname()->release avoids that the
usb-storage driver is recompiled each time the kernel version changes.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add some documentation for the new and very useful io-accounting feature.
It's being added to Documentation/filesystems/proc.txt
Signed-off-by: Roland Kletzing <devzero@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"drivers/char/epca.c:2741: warning: 'get_termio' defined but not used"
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doing something like this on a two cpu system
# echo 0 > /sys/devices/system/cpu/cpu0/online
# echo 1 > /sys/devices/system/cpu/cpu0/online
# echo 0 > /sys/devices/system/cpu/cpu1/online
will give me this:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc2-g562aa1d4-dirty #7
-------------------------------------------------------
bash/1282 is trying to acquire lock:
(&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
but task is already holding lock:
(&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240
which lock already depends on the new lock.
This happens because we have the following code in kernel/hrtimer.c:
migrate_hrtimers(int cpu)
[...]
old_base = &per_cpu(hrtimer_bases, cpu);
new_base = &get_cpu_var(hrtimer_bases);
[...]
spin_lock(&new_base->lock);
spin_lock(&old_base->lock);
Which means the spinlocks are taken in an order which depends on which cpu
gets shut down from which other cpu. Therefore lockdep complains that there
might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
cpu hotplug it's safe to assume that it isn't executed concurrently on a
The same problem exists in kernel/timer.c: migrate_timers().
As pointed out by Christian Borntraeger one possible solution to avoid
the locking order complaints would be to make sure that the locks are
always taken in the same order. E.g. by taking the lock of the cpu with
the lower number first.
To achieve this we introduce two new spinlock functions double_spin_lock
and double_spin_unlock which lock or unlock two locks in a given order.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Christian Borntraeger <cborntra@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch resolves the issue found here:
http://bugme.osdl.org/show_bug.cgi?id=7426
The basic summary is:
Currently we register most of i386/x86_64 clocksources at module_init
time. Then we enable clocksource selection at late_initcall time. This
causes some problems for drivers that use gettimeofday for init
calibration routines (specifically the es1968 driver in this case),
where durring module_init, the only clocksource available is the low-res
jiffies clocksource. This may cause slight calibration errors, due to
the small sampling time used.
It should be noted that drivers that require fine grained time may not
function on architectures that do not have better then jiffies
resolution timekeeping (there are a few). However, this does not
discount the reasonable need for such fine-grained timekeeping at init
time.
Thus the solution here is to register clocksources earlier (ideally when
the hardware is being initialized), and then we enable clocksource
selection at fs_initcall (before device_initcall).
This patch should probably get some testing time in -mm, since
clocksource selection is one of the most important issues for correct
timekeeping, and I've only been able to test this on a few of my own
boxes.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipmi_si_intf tries to access default ports, if no device could be found
elsewhere. On PPC we have a function to check, if these legacy IO ports
are accessible. This patch adds a check for these ports on PPC. This
patch fixes a breakage of IPMI module on PPC machines without a BMC.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- In fact we don't have to fail if AOP_TRUNCATED_PAGE was returned from
prepare_write or commit_write. It is beter to retry attempt where it
is possible.
- Rearange ecryptfs_get_lower_page() error handling logic, make it more clean.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Currently after path_lookup succeed we dot't have any guarantie what
it is DIR. This must be explicitly demanded.
- path_lookup can't return negative dentry, So inode check is useless.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Testing NMI watchdog ... CPU#0: NMI appears to be stuck (54->54)!
CPU#1: NMI appears to be stuck (0->0)!
Keep the PIT/HPET alive when nmi_watchdog = 1 is given on the command
line.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent patch for raid6 reshape had a change missing that showed up in
subsequent review.
Many places in the raid5 code used "conf->raid_disks-1" to mean "number of
data disks". With raid6 that had to be changed to "conf->raid_disk -
conf->max_degraded" or similar. One place was missed.
This bug means that if a raid6 reshape were aborted in the middle the
recorded position would be wrong. On restart it would either fail (as the
position wasn't on an appropriate boundary) or would leave a section of the
array unreshaped, causing data corruption.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Critical fixes for SMP.
Fix a couple functions which needed to be __devinit and fix a bogus parameter
to AP startup that just so happened to work because the low virtual mapping of
memory was still established.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use para_fill instead of directly setting the APIC ops to the result of the
vmi_get_function call - this allows one to implement a VMI ROM without
implementing APIC functions, just using the native APIC functions.
While doing this, I realized that there is a lot more cleanup that should have
been done. Basically, we should never assume that the ROM implements a
specific set of functions, and always allow fallback to the native
implementation.
This is critical for future compatibility.
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
More goo from hrtimers integration. We do compile and run properly with NO_HZ
enabled. There was a period when we didn't because of a missing export, but
that was since fixed.
And with the clocksource code now firmly in place, we can get rid of code that
fixes up the wallclock, since this is done in the common infrastructure. This
actually fixes a timer bug as well, that was caused by do_settimeofday no
longer being callable with interrupts disabled due to the use of
on_each_cpu().
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The time_init_hook in paravirt-ops no longer functions in the correct manner
after the integration of the hrtimers code. The problem is that now the call
path for time initialization is:
time_init :
late_time_init = hpet_time_init;
late_time_init -> hpet_time_init:
setup_pit_timer (BAD)
do_time_init --> (via paravirt.h)
time_init_hook --> (via arch_hooks.h)
time_init_hook (in SUBARCH/setup.c)
If this isn't confusing enough, the paravirt case goes through an indirect
function pointer in the paravirt-ops table. The problem is, by the time the
paravirt hook is called, the pit timer is already enabled.
But paravirt guests have their own timer, and don't want to use the PIT.
Rather than intensify the struggle for power going on here, just make it all
nice and simple and just unconditionally do all timer setup in the
late_time_init hook. This also has the advantage of enabling timers in the
same place in all code paths, so everyone has the same bugs and we don't have
outliers who break other code because they turn on timer too early or too
late.
So the paravirt-ops time init function is now by default hpet_time_init, which
is the time init function used for native hardware. Paravirt guests have the
chance to override this when they setup the paravirt-ops table, and should
need no change.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not respecting udelay causes problems with any virtual hardware that is passed
through to real hardware. This can be noticed by any device that interacts
with the real world in real time - like AP startup, which takes real time. Or
keyboard LEDs, which should blink in real-time. Or floppy drives, but only
when passed through to a real floppy controller on OSes which can't
sufficiently buffer the floppy commands to emulate a zero latency floppy. Or
IDE drives, when connecting to a physical CDROM.
This was mostly a hack to get the kernel to boot faster, but it introduced a
number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
against it. We were the only client, and now want to clean up this cruft.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
page tables. This information is required so the physical address of PTE
updates can be determined; otherwise, the mm layer would have to carry the
physical address all the way to each PTE modification callsite, which is even
more hideous that the macros required to provide the proper hooks.
So lets not mess up arch neutral code to achieve this, but keep the horror in
an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here
because some types are not yet defined in all the include paths for this
header.
This patch is absolutely required for HIGHPTE kernels to operate properly with
VMI.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to share the common code in tsc.c which does CPU Khz calibration, we
need to make an accurate value of CPU speed available to the tsc.c code. This
value loses a lot of precision in a VM because of the timing differences with
real hardware, but we need it to be as precise as possible so the guest can
make accurate time calculations with the cycle counters.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The custom_sched_clock hook is broken. The result from sched_clock needs to
be in nanoseconds, not in CPU cycles. The TSC is insufficient for this
purpose, because TSC is poorly defined in a virtual environment, and mostly
represents real world time instead of scheduled process time (which can be
interrupted without notice when a virtual machine is descheduled).
To make the scheduler consistent, we must expose a different nature of time,
that is scheduled time. So deprecate this custom_sched_clock hack and turn it
into a paravirt-op, as it should have been all along. This allows the tsc.c
code which converts cycles to nanoseconds to be shared by all paravirt-ops
backends.
It is unfortunate to add a new paravirt-op, but this is a very distinct
abstraction which is clearly different for all virtual machine
implementations, and it gets rid of an ugly indirect function which I
ashamedly admit I hacked in to try to get this to work earlier, and then even
got in the wrong units.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>