Not respecting udelay causes problems with any virtual hardware that is passed
through to real hardware. This can be noticed by any device that interacts
with the real world in real time - like AP startup, which takes real time. Or
keyboard LEDs, which should blink in real-time. Or floppy drives, but only
when passed through to a real floppy controller on OSes which can't
sufficiently buffer the floppy commands to emulate a zero latency floppy. Or
IDE drives, when connecting to a physical CDROM.
This was mostly a hack to get the kernel to boot faster, but it introduced a
number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
against it. We were the only client, and now want to clean up this cruft.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
page tables. This information is required so the physical address of PTE
updates can be determined; otherwise, the mm layer would have to carry the
physical address all the way to each PTE modification callsite, which is even
more hideous that the macros required to provide the proper hooks.
So lets not mess up arch neutral code to achieve this, but keep the horror in
an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here
because some types are not yet defined in all the include paths for this
header.
This patch is absolutely required for HIGHPTE kernels to operate properly with
VMI.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to share the common code in tsc.c which does CPU Khz calibration, we
need to make an accurate value of CPU speed available to the tsc.c code. This
value loses a lot of precision in a VM because of the timing differences with
real hardware, but we need it to be as precise as possible so the guest can
make accurate time calculations with the cycle counters.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The custom_sched_clock hook is broken. The result from sched_clock needs to
be in nanoseconds, not in CPU cycles. The TSC is insufficient for this
purpose, because TSC is poorly defined in a virtual environment, and mostly
represents real world time instead of scheduled process time (which can be
interrupted without notice when a virtual machine is descheduled).
To make the scheduler consistent, we must expose a different nature of time,
that is scheduled time. So deprecate this custom_sched_clock hack and turn it
into a paravirt-op, as it should have been all along. This allows the tsc.c
code which converts cycles to nanoseconds to be shared by all paravirt-ops
backends.
It is unfortunate to add a new paravirt-op, but this is a very distinct
abstraction which is clearly different for all virtual machine
implementations, and it gets rid of an ugly indirect function which I
ashamedly admit I hacked in to try to get this to work earlier, and then even
got in the wrong units.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we do not check for vma flags if sys_move_pages is called to move
individual pages. If sys_migrate_pages is called to move pages then we
check for vm_flags that indicate a non migratable vma but that still
includes VM_LOCKED and we can migrate mlocked pages.
Extract the vma_migratable check from mm/mempolicy.c, fix it and put it
into migrate.h so that is can be used from both locations.
Problem was spotted by Lee Schermerhorn
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
facilitiate nice working properly where cpu power is shared. The idling of
cpus in the presence of runnable tasks is considered too fragile, easy to
break with outside code, and the complexity of managing this system if an
architecture comes along with many logical cores sharing cpu power will be
unworkable.
Remove the associated per_cpu_gain variable in sched_domains used only by
this code.
Also:
The reason is that with dynticks enabled, this code breaks without yet
further tweaks so dynticks brought on the rapid demise of this code. So
either we tweak this code or kill it off entirely. It was Ingo's preference
to kill it off. Either way this needs to happen for 2.6.21 since dynticks
has gone in.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gpio_keys driver is wrongly ARM-specific; it can't build on
other platforms with GPIO suport. This fixes that problem.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: pHilipp Zabel <philipp.zabel@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Ben Nizette <ben.nizette@iinet.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some cases when we are not using msi we need a way to ensure that the
hardware does not have an msi capability enabled. Currently the code has been
calling disable_msi_mode to try and achieve that. However disable_msi_mode
has several other side effects and is only available when msi support is
compiled in so it isn't really appropriate.
Instead this patch implements pci_msi_off which disables all msi and msix
capabilities unconditionally with no additional side effects.
pci_disable_device was redundantly clearing the bus master enable flag and
clearing the msi enable bit. A device that is not allowed to perform bus
mastering operations cannot generate intx or msi interrupt messages as those
are essentially a special case of dma, and require bus mastering. So the call
in pci_disable_device to disable msi capabilities was redundant.
quirk_pcie_pxh also called disable_msi_mode and is updated to use pci_msi_off.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A -mm patch caused:
In file included from drivers/pci/quirks.c:532:
include/asm/io_apic.h:61: error: "MAX_IO_APICS" undeclared here (not in a function)
So let's include the needed header.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[VLAN]: Avoid a 4-order allocation.
[HDLC] Fix dev->header_cache_update having a random value.
[NetLabel]: Verify sensitivity level has a valid CIPSO mapping
[PPPOE]: Key connections properly on local device.
[AF_UNIX]: Test against sk_max_ack_backlog properly.
[NET]: Fix bugs in "Whether sock accept queue is full" checking
* master.kernel.org:/home/rmk/linux-2.6-arm: (30 commits)
[ARM] Acorn: move the i2c bus driver into drivers/i2c
[ARM] ARM SCSI: Don't try to dma_map_sg too many scatterlist entries
[ARM] ARM FAS216: don't modify scsi_cmnd request_bufflen
[ARM] rtc-pcf8583: Final fixes for this RTC on RiscPC
[ARM] rtc-pcf8583: correct month and year offsets
[ARM] rtc-pcf8583: don't use BCD_TO_BIN/BIN_TO_BCD
[ARM] EBSA110: Work around build errors
[ARM] 4241/1: Define mb() as compiler barrier on a uniprocessor system
[ARM] 4239/1: S3C24XX: Update kconfig entries for PM
[ARM] 4238/1: S3C24XX: docs: update suspend and resume
[ARM] 4237/2: oprofile: Always allow backtraces on ARM
[ARM] Yet more asm/apm-emulation.h stuff
ARM: OMAP: Add missing get_irqnr_preamble and arch_ret_to_user for omap2
ARM: OMAP: Use linux/delay.h not asm/delay.h
ARM: OMAP: Remove obsolete alsa typedefs
ARM: OMAP: omap1510->15xx conversions needed for sx1
ARM: OMAP: Add missing includes to board-nokia770
ARM: OMAP: Workqueue changes for board-h4.c
ARM: OMAP: dmtimer.c omap1 register fix
ARM: OMAP: board-nokia770: correct lcd name
...
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] MTX1: clear PCI errors
[MIPS] MTX1: add idsel cardbus ressources
[MIPS] MTX1: remove unneeded settings
[MIPS] dma_sync_sg_for_cpu is a no-op except for non-coherent R10000s.
[MIPS] Cobalt: update reserved resources
[MIPS] SN: PCI fixup needs to include <irq.h>.
[MIPS] DMA: Fix a bunch of warnings due to missing inline keywords.
[MIPS] RM: It should be #ifdef CONFIG_FOO not #if CONFIG_FOO ...
[MIPS] Fix and cleanup the mess that a dozen prom_printf variants are.
[MIPS] DEC: Remove redeclarations of mips_machgroup and mips_machtype.
[MIPS] No need to write c0_compare in plat_timer_setup
[MIPS] Convert to RTC-class ds1742 driver
[MIPS] Oprofile: Add missing break statements.
[MIPS] jmr3927: build fix
[MIPS] SNI: Fix mc146818_decode_year
[MIPS] Replace sys32_timer_create with the generic compat_sys_timer_create.
[MIPS] Replace sys32_socketcall with the generic compat_sys_socketcall.
[MIPS] N32 waitid is the same as o32.
The generic rtc-ds1742 driver can be used for RBTX4927 and JMR3927
(with __swizzle_addr trick). This patch also removes MIPS local
DS1742 stuff.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Big endian RMs uses a different mc146818_decode_year than little endian RMs
Correct mc146818_decode_year for years before 2000
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Use the standard magic.h for kvmfs.
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Allocate a distinct inode for every vcpu in a VM. This has the following
benefits:
- the filp cachelines are no longer bounced when f_count is incremented on
every ioctl()
- the API and internal code are distinctly clearer; for example, on the
KVM_GET_REGS ioctl, there is no need to copy the vcpu number from
userspace and then copy the registers back; the vcpu identity is derived
from the fd used to make the call
Right now the performance benefits are completely theoretical since (a) we
don't support more than one vcpu per VM and (b) virtualization hardware
inefficiencies completely everwhelm any cacheline bouncing effects. But
both of these will change, and we need to prepare the API today.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This avoids having filp->f_op and the corresponding inode->i_fop different,
which is a little unorthodox.
The ioctl list is split into two: global kvm ioctls and per-vm ioctls. A new
ioctl, KVM_CREATE_VM, is used to create VMs and return the VM fd.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This adds a special MSR based hypercall API to KVM. This is to be
used by paravirtual kernels and virtual drivers.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The function ide_get_best_pio_mode() fails to return the correct IORDY setting
for the explicitly specified modes -- fix this along with the heading comment,
and also remove the long commented out code.
Also, while at it, correct the misliading comment about the PIO cycle time in
<linux/ide.h> -- it actually consists of only the active and recovery periods,
with only some chips also including the address setup time into equation...
[ bart: sl82c105 seems to be currently the only driver affected by this fix ]
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This patch splits the vlan_group struct into a multi-allocated struct. On
x86_64, the size of the original struct is a little more than 32KB, causing
a 4-order allocation, which is prune to problems caused by buddy-system
external fragmentation conditions.
I couldn't just use vmalloc() because vfree() cannot be called in the
softirq context of the RCU callback.
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
when I use linux TCP socket, and find there is a bug in function
sk_acceptq_is_full().
When a new SYN comes, TCP module first checks its validation. If valid,
send SYN,ACK to the client and add the sock to the syn hash table. Next
time if received the valid ACK for SYN,ACK from the client. server will
accept this connection and increase the sk->sk_ack_backlog -- which is
done in function tcp_check_req().We check wether acceptq is full in
function tcp_v4_syn_recv_sock().
Consider an example:
After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to
1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept()
system call is not invoked now.
1. 1st connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept
this connection. Increase the sk->sk_ack_backlog
2. 2nd connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept
this connection. Increase the sk->sk_ack_backlog
3. 3rd connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse
this connection.
I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1
but now it can accept 2 connections.
Signed-off-by: Wei Dong <weid@np.css.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The information contained within platform_data should be self-contained.
Replace the pointer to a MAC address with the actual MAC address in
struct mv643xx_eth_platform_data.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Conditionalize all PM related stuff in libata core layer using
CONFIG_PM.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The initial simplex handling code is fooled if you suspend and resume.
This also causes problems with some single channel controllers which
claim to be simplex.
The fix is fairly simple, instead of keeping a flag to remember if we
gave away the simplex channel we remember the actual owner. As the owner
is always part of the host_set we don't even need a refcount.
Knowing the owner also means we can reassign simplex DMA channels in
future hotplug code etc if we need to
Signed-off-by: Alan Cox <alan@redhat.com>
(and a signed-off for the patch I sent before while I remember)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Currently, the mb() is defined as a DMB operation on ARMv6, even for
UP systems. This patch defines mb() as a compiler barrier only. For
the SMP case, the smp_* variants should be used anyway and the patch
defines them as DMB.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Convert 1510->15xx in generic omap code, so that sx1 can work.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid:
HID: fix Logitech DiNovo Edge touchwheel and Logic3 /SpectraVideo middle button
HID: add git tree information to MAINTAINERS
HID: fix broken Logitech S510 keyboard report descriptor; make extra keys work
HID: fix possible double-free on error path in hid parser
HID: hid-debug.c should #include <linux/hid-debug.h>
HID: fix bug in zeroing the last field byte in output reports
USB HID: use CONFIG_HID_DEBUG for outputting report descriptor
USB HID: Fix USB vendor and product IDs endianness for USB HID devices
This patch provides the following hugetlb-related fixes to the recent stacked
shm files changes:
- Update is_file_hugepages() so it will reconize hugetlb shm segments.
- get_unmapped_area must be called with the nested file struct to handle
the sfd->file->f_ops->get_unmapped_area == NULL case.
- The fsync f_op must be wrapped since it is specified in the hugetlbfs
f_ops.
This is based on proposed fixes from Eric Biederman that were debugged and
tested by me. Without it, attempting to use hugetlb shared memory segments
on powerpc (and likely ia64) will kill your box.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The CAPI trace debug functions were using a fixed size buffer, which can be
overflowed if wrong formatted CAPI messages were sent to the kernel capi
layer. The code was also not protected against multiple callers. This fix
bug 8028.
Additionally the patch make the CAPI trace functions optional.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the fact that pte_mkread set _PAGE_RW instead of _PAGE_USER (the logic is
copied from i386 in most place, so it is really as bad as you're thinking).
Thus currently page tables are more permissive than they should.
Such a change may trigger other latent bugs, so be careful with this.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
linux/irq.h uses EINVAL but does not #include linux/errno.h. This results in
the compiler spitting out errors on some files.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
throttle_vm_writeout() is designed to wait for the dirty levels to subside.
But if the caller holds IO or FS locks, we might be holding up that writeout.
So change it to take a single nap to give other devices a chance to clean some
memory, then return.
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename PG_checked to PG_owner_priv_1 to reflect its availablilty as a
private flag for use by the owner/allocator of the page. In the case of
pagecache pages (which might be considered to be owned by the mm),
filesystems may use the flag.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>