Commit Graph

173037 Commits

Author SHA1 Message Date
Steven Whitehouse
7e71c55ee7 GFS2: Fix potential race in glock code
We need to be careful of the ordering between clearing the
GLF_LOCK bit and scheduling the workqueue.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03 11:42:25 +00:00
Frederic Weisbecker
c08f782985 mutex: Fix missing conditions to build mutex_spin_on_owner()
We don't need to build mutex_spin_on_owner() if we have
CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as
it won't be used under such configs.

Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary
checks before building it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1259783357-8542-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
2009-12-03 11:50:11 +01:00
Frederic Weisbecker
c02260277e mutex: Better control mutex adaptive spinning config
Introduce CONFIG_MUTEX_SPIN_ON_OWNER so that we can centralize
in a single place the conditions that determine its definition
and use.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1259783357-8542-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
2009-12-03 11:50:11 +01:00
Manuel Lauss
efd9eb96d5 ASoC: au1x: dbdma2: plug memleak in pcm device creation error path
free the allocated pcm platform device in the error path.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-12-03 10:49:55 +00:00
Manuel Lauss
1bc8079879 ASoC: au1x: dbdma2: fix oops on soc device removal.
platform_device_unregister() frees resources for us, no need to
do it explicitly.  Fixes an oops when machine code removes the
soc-audio device.

Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-12-03 10:49:55 +00:00
Darrick J. Wong
4528752f49 x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
On a multi-node x3950M2 system, there's a slight oddity in the
PCI device tree for all secondary nodes:

 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
     \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

...as compared to the primary node:

 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
  \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

In both nodes, the LSI RAID controller hangs off a CalIOC2
device, but on the secondary nodes, the BIOS hides the VGA
device and substitutes the device tree ending with the disk
controller.

It would seem that Calgary devices don't necessarily appear at
the top of the PCI tree, which means that the current code to
find the Calgary IOMMU that goes with a particular device is
buggy.

Rather than walk all the way to the top of the PCI
device tree and try to match bus number with Calgary descriptor,
the code needs to examine each parent of the particular device;
if it encounters a Calgary with a matching bus number, simply
use that.

Otherwise, we BUG() when the bus number of the Calgary doesn't
match the bus number of whatever's at the top of the device tree.

Extra note: This patch appears to work correctly for the x3950
that came before the x3950 M2.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Corinna Schultz <coschult@us.ibm.com>
Cc: <stable@kernel.org>
LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:44:05 +01:00
Paul E. McKenney
8bfb2f8e65 rcu: Make RCU's CPU-stall detector be default
The RCU_CPU_STALL_DETECTOR costs almost nothing and has located
some bugs that might otherwise have been difficult to track
down.  Make it be default for the TREE RCU implementations.

The vmlinux size impact is limited (on 64-bit x86 defconfig):

   text	   data	    bss	    dec	    hex	filename
   8440248	1260076	 995588	10695912	 a334e8	vmlinux.before
   8440774	1260060	 995588	10696422	 a336e6	vmlinux.after

+526 bytes - acceptable default cost.

For RAM starved systems, TINY_RCU does not support CPU-stall detection
and is much smaller, but then again it is a uniprocessor...

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846162906-git-send-email->
[ v2: added image size calculations to the changelog ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:35:27 +01:00
Paul E. McKenney
d9a3da0699 rcu: Add expedited grace-period support for preemptible RCU
Implement an synchronize_rcu_expedited() for preemptible RCU
that actually is expedited.  This uses
synchronize_sched_expedited() to force all threads currently
running in a preemptible-RCU read-side critical section onto the
appropriate ->blocked_tasks[] list, then takes a snapshot of all
of these lists and waits for them to drain.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1259784616158-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:35:25 +01:00
Paul E. McKenney
cf244dc01b rcu: Enable fourth level of TREE_RCU hierarchy
Enable a fourth level of rcu_node hierarchy for TREE_RCU and
TREE_PREEMPT_RCU.  This is for stress-testing and experiemental
purposes only, although in theory this would enable 16,777,216
CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
systems. Normal experimental use of this fourth level will
normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
though the more adventurous (and more fortunate) experimenters
may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
CONFIG_RCU_FANOUT=4 for 256-CPU systems.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846161257-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:34:53 +01:00
Paul E. McKenney
d3f6bad391 rcu: Rename "quiet" functions
The number of "quiet" functions has grown recently, and the
names are no longer very descriptive.  The point of all of these
functions is to do some portion of the task of reporting a
quiescent state, so rename them accordingly:

o	cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
	quiescent state to the per-CPU rcu_data structure.  If this
	turns out to be a new quiescent state for this grace period,
	then rcu_report_qs_rnp() will be invoked to propagate the
	quiescent state up the rcu_node hierarchy.

o	cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
	a quiescent state for a given CPU (or possibly a set of CPUs)
	up the rcu_node hierarchy.

o	cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
	reports a full set of quiescent states to the global rcu_state
	structure.

o	task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
	a quiescent state due to a task exiting an RCU read-side critical
	section that had previously blocked in that same critical section.
	As indicated by the new name, this type of quiescent state is
	reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
	to do so).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846163698-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:34:26 +01:00
Takashi Iwai
ac2c92e0cd ALSA: hda - Fix memory leaks in the previous patch
The previous hack for replacing the codec name give memory leaks at
error paths.  This patch fixes them.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-12-03 10:14:10 +01:00
Kailang Yang
274693f370 ALSA: hda - Add ALC661/259, ALC892/888VD support
Fixed List:
   1. Add alc_read_coef_idx function
   2. Add ALC661 ALC259
   3. Add ALC892 ALC888VD

Signed-off-by: Kailang Yang <kailang@realtek.com.tw>

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-12-03 10:07:50 +01:00
Uwe Kleine-König
b16533d333 MAINTAINERS: add tree and file pattern for ARM IMX
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-12-03 09:33:03 +01:00
Martin K. Petersen
98262f2762 block: Allow devices to indicate whether discarded blocks are zeroed
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down.  However, not all devices return zeroed
blocks after a discard.  Some drives return stale data, potentially
containing old superblocks.  It is therefore important to know whether
discarded blocks are properly zeroed.

Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation.  Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 09:24:48 +01:00
Alan Cox
d6250a03fa pata_ali: Fix regression with old devices
Making the new stuff work broke some of the old chipsets. We need to go
back to the old set up values for these it seems. Unfortunately even with
documentation this is basically a mix of cargoculting and guesswork.

Chased down to the exact line by Gianluca.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:36 -05:00
Alan Cox
be315d4615 [libata] PATA: Update experimental tags
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:36 -05:00
Alan Cox
d43744390e pata_cmd64x: implement serialization as per notes
Daniela Engert pointed out that there are some implementation notes for the
643 and 646 that deal with certain serialization rules. In theory we don't
need them because they apply when the motherboard decides not to retry PCI
requests for long enough and the chip is busy doing a DMA transfer on the
other channel.

The rule basically is "don't touch the taskfile of the other channel while
a DMA is in progress". To implement that we need to

- not issue a command on a channel when there is a DMA command queued
- not issue a DMA command on a channel when there are PIO commands queued
- use the alternative access to the interrupt source so that we do not
  touch altstatus or status on shared IRQ.

Updated to remote extra conditional check Bartlomiej noted and to remove
the variables for irq checks as the CMD648 doesn't have the underlying problem.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:36 -05:00
Alan Cox
f20941f334 pata_sis: Implement MWDMA for the UDMA 133 capable chips
Bartlomiej pointed out that while this got fixed in the old driver whoever
did it didn't port it across.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:36 -05:00
Alan Cox
10734fc8d5 pata_via: Blacklist some combinations of Transcend Flash and via
Reported by Mikulas Patocka.

VIA VT82C586B + Transcend TS64GSSD25-M v0826 does not work in UDMA mode

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:36 -05:00
Benjamin Herrenschmidt
294264a942 libata/sff: Use ops->bmdma_stop instead of ata_bmdma_stop()
In libata-sff, ata_sff_post_internal_cmd() directly calls ata_bmdma_stop()
instead of ap->ops->bmdma_stop(). This can be a problem for controllers
that use their own bmdma_stop for which the generic sff one isn't suitable

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:36 -05:00
Christoph Hellwig
18f0f97850 libata: add translation for SCSI WRITE SAME (aka TRIM support)
Add support for the ATA TRIM command in libata.  We translate a WRITE SAME 16
command with the unmap bit set into an ATA TRIM command and export enough
information in READ CAPACITY 16 and the block limits EVPD page so that the new
SCSI layer discard support will driver this for us.

Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
merged we can fix it up to properly use the symbolic name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Tejun Heo
6013efd886 libata: retry failed FLUSH if device didn't fail it
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.

However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.

This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination.  Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode.  This was reported in the following bugzilla bug.

  http://bugzilla.kernel.org/show_bug.cgi?id=14543

This patch makes libata EH retry FLUSH if it wasn't failed by the
device.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
ashish kalra
fd6c29e3de sata_fsl: Add asynchronous notification support
Enable device hot-plug support on Port multiplier fan-out ports

Signed-off-by: Ashish Kalra <Ashish.Kalra@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Bartlomiej Zolnierkiewicz
10a9c96922 pata_hpt{37x,3x2n}: add debounce delay to cable detection methods
Alan Cox reported that cable detection sometimes works unreliably
for HPT3xxN and that the issue is fixed by adding debounce delay
as used by the vendor driver.

Sergei Shtylyov also noticed that debounce delay is needed for all
HPT37x and HPT3xxN chipsets according to vendor drivers.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Bartlomiej Zolnierkiewicz
f3b1cf40d4 pata_hpt3x2n: fix cable detection
The detection was reversed between primary and secondary ports.

Fix it to match hpt366 and the vendor driver.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Matthew Garrett
6a74463798 ata: Don't require newlines for link_power_management_policy
sysfs attributes shouldn't require newlines. Make it possible to set the
link power management policy without a trailing newline.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Otavio Salvador
4192be6402 pata-it821x: use PCI_DEVICE_ID_RDC_D1010 define
Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Bartlomiej Zolnierkiewicz
ab81a505ae pata_hpt37x: unify ->pre_reset methods
We can use the same ->pre_reset method for all HPT37x chipsets now.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Bartlomiej Zolnierkiewicz
9e87be9edd pata_hpt37x: add proper cable detection methods
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Shaohua Li
1b677afda4 ahci: disable SNotification capability for ich8
I obseved there is a sata_async_notification() for every ahci
interrupt. But the async notification does nothing (this is hard
disk drive and no pmp). This cause cpu wastes some time on sntf
register access.

It appears ICH AHCI doesn't support SNotification register, but the
controller reports it does. After quirking it, the async notification
disappears.

PS. it appears all ICH don't support SNotification register from ICH
manual, don't know if we need quirk all ICH. I don't have machines
with all kinds of ICH.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Vivek Mahajan
dae77214fa sata_sil24: MSI support, disabled by default
The following patch adds MSI support. Some platforms
may have broken MSI, so those are defaulted to use
legacy PCI interrupts.

Signed-off-by: Vivek Mahajan <vivek.mahajan@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Robert Hancock
097dac9183 libata: remove experimental tag on PATA drivers
Remove the experimental tag on Parallel ATA drivers. Though some of the
individual PATA drivers are still marked as experimental, as a group they can
hardly be considered to be, given they've been used in various distros for some
time.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Thiago Farina
4c4a90fd2b sata_mv: Clean up hard coded array size calculation.
Use ARRAY_SIZE macro of kernel api instead.

Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Jiri Slaby
7095e3eb49 pata_via: fix double put on isa bridge
In via_init_one, when via_isa_bridges iterator reaches
PCI_DEVICE_ID_VIA_ANON and last but one via_isa_bridges bridge is
found but rev doesn't match, pci_dev_put(isa) is called twice.

Do pci_dev_put only once.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Krzysztof Halasa
ba3a221ce2 pata_cs5536: use 32-bit BM DMA template instead of 16-bit.
Tested on IXP425 + CS5536.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:34 -05:00
Tejun Heo
f2406770a2 libata-acpi: missing _SDD is not an error
Missing _SDD is not an error.  Don't treat it as one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:33 -05:00
Avi Kivity
d5696725b2 KVM: VMX: Fix comparison of guest efer with stale host value
update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:34:20 +02:00
Carsten Otte
f50146bd7b KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
This patch corrects the checking of the new address for the prefix register.
On s390, the prefix register is used to address the cpu's lowcore (address
0...8k). This check is supposed to verify that the memory is readable and
present.
copy_from_guest is a helper function, that can be used to read from guest
memory. It applies prefixing, adds the start address of the guest memory in
user, and then calls copy_from_user. Previous code was obviously broken for
two reasons:
- prefixing should not be applied here. The current prefix register is
  going to be updated soon, and the address we're looking for will be
  0..8k after we've updated the register
- we're adding the guest origin (gmsor) twice: once in subject code
  and once in copy_from_guest

With kuli, we did not hit this problem because (a) we were lucky with
previous prefix register content, and (b) our guest memory was mmaped
very low into user address space.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:26 +02:00
Avi Kivity
3548bab501 KVM: Drop user return notifier when disabling virtualization on a cpu
This way, we don't leave a dangling notifier on cpu hotunplug or module
unload.  In particular, module unload leaves the notifier pointing into
freed memory.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:26 +02:00
Sheng Yang
046d87103a KVM: VMX: Disable unrestricted guest when EPT disabled
Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
supported processors.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity
eb3c79e64a KVM: x86 emulator: limit instructions to 15 bytes
While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Carsten Otte
d7b0b5eb30 KVM: s390: Make psw available on all exits, not just a subset
This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this.  A capability check is provided so users can
verify the updated API exists.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Jan Kiszka
3cfc3092f4 KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity
65ac726404 KVM: VMX: Report unexpected simultaneous exceptions as internal errors
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults.  If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Avi Kivity
a9c7399d6c KVM: Allow internal errors reported to userspace to carry extra data
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available.  Add extra data to internal error
reporting so we can expose it to the debugger.  Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Jan Kiszka
c54d2aba27 KVM: Reorder IOCTLs in main kvm.h
Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Jan Kiszka
4f926bf291 KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
avoid triggering the WARN_ON in kvm_queue_exception if there is already
an exception pending and reject such invalid requests.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:24 +02:00
Marcelo Tosatti
e50212bb51 KVM: only clear irq_source_id if irqchip is present
Otherwise kvm might attempt to dereference a NULL pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti
2204ae3c96 KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
Otherwise kvm might attempt to dereference a NULL pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti
3ddea128ad KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00