android_kernel_xiaomi_sm8350

Author	SHA1	Message	Date
Len Brown	96916090f4	Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release	2008-04-30 13:58:00 -04:00
Roland McGrath	5a8da0ea82	signals: x86 TS_RESTORE_SIGMASK Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own set_restore_sigmask() function. This saves the costly SMP-safe set_bit operation, which we do not need for the sigmask flag since TIF_SIGPENDING always has to be set too. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:37 -07:00
Yinghai Lu	70b9f7dc14	x86/pci: remove flag in pci_cfg_space_size_ext so let pci_cfg_space_size call it directly without flag. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-04-29 15:34:05 -07:00
Sam Ravnborg	98db6f193c	x86: fix section mismatch in pci_scan_bus Fix following section mismatch warning: WARNING: vmlinux.o(.text+0x275616): Section mismatch in reference from the function pci_scan_bus() to the function .devinit.text:pci_scan_bus_parented() The warning was seen with a CONFIG_DEBUG_SECTION_MISMATCH=y build. The inline function pci_scan_bus refer to functions annotated __devinit - so annotate it __devinit too. This revealed a few x86 specific functions that were only used from __init or __devinit context. So annotate these __devinit and the warning was killed. The added include in pci.h was not strictly required but added to avoid being dependent on indirect includes. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>	2008-04-29 13:41:59 -07:00
Linus Torvalds	1f43c53930	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes: x86: fix PCI MSI breaks when booting with nosmp x86: vget_cycles() __always_inline x86: add more boot protocol documentation bootprotocol: cleanup x86: fix warning in "x86: clean up vSMP detection" x86: !x & y typo in mtrr code	2008-04-29 09:03:19 -07:00
Linus Torvalds	5f78e4d339	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci: x86: add pci=check_enable_amd_mmconf and dmi check x86: work around io allocation overlap of HT links acpi: get boot_cpu_id as early for k8_scan_nodes x86_64: don't need set default res if only have one root bus x86: double check the multi root bus with fam10h mmconf x86: multi pci root bus with different io resource range, on 64-bit x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit x86: get mp_bus_to_node early x86 pci: remove checking type for mmconfig probe x86: remove unneeded check in mmconf reject driver core: try parent numa_node at first before using default x86: seperate mmconf for fam10h out from setup_64.c x86: if acpi=off, force setting the mmconf for fam10h x86_64: check MSR to get MMCONFIG for AMD Family 10h x86_64: check and enable MMCONFIG for AMD Family 10h x86_64: set cfg_size for AMD Family 10h in case MMCONFIG x86: mmconf enable mcfg early x86: clear pci_mmcfg_virt when mmcfg get rejected x86: validate against acpi motherboard resources Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to OLPC support manually.	2008-04-29 08:26:51 -07:00
Linus Torvalds	44473d9913	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] state info wrong after resume [CPUFREQ] allow use of the powersave governor as the default one [CPUFREQ] document the currently undocumented parts of the sysfs interface [CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism	2008-04-29 08:18:49 -07:00
Christoph Lameter	66916cd267	x86: use kbuild.h Drop the macro definitions in asm-offsets_*.c and use kbuild.h Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:29 -07:00
Tim Gardner	8c4dd60682	edd: add default mode CONFIG_EDD_OFF=n, override with edd={on,off} Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk Drive Services. CONFIG_EDD_OFF disables EDD while still compiling EDD into the kernel. Default behavior can be forced using 'edd=on' or 'edd=off' as a kernel parameter. [akpm@linux-foundation.org: fix kernel-parameters.txt] Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:23 -07:00
Alexey Dobriyan	c74c120a21	proc: remove proc_root from drivers Remove proc_root export. Creation and removal works well if parent PDE is supplied as NULL -- it worked always that way. So, one useless export removed and consistency added, some drivers created PDEs with &proc_root as parent but removed them as NULL and so on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:18 -07:00
Andres Salomon	3ef0e1f8ca	x86: olpc: add One Laptop Per Child architecture support This adds support for OLPC XO hardware. Open Firmware on XOs don't contain the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also adds functionality for running EC commands, and a CONFIG_OLPC. A number of OLPC drivers depend upon CONFIG_OLPC. olpc_ec_timeout is a hack to work around Embedded Controller bugs. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: geode_has_vsa build fix] [akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist] Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Cc: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:07 -07:00
FUJITA Tomonori	a852250920	swiotlb: use iommu_is_span_boundary helper function iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs (commit `3715863aa1`). SWIOTLB can use it instead of the homegrown function. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:05 -07:00
Adrian Bunk	7d195a5409	proper extern for late_time_init Add a proper extern for late_time_init in include/linux/init.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:03 -07:00
Adrian Bunk	eb0f1c442d	proper __do_softirq() prototype Add a proper prototype for __do_softirq() in include/linux/interrupt.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:02 -07:00
Jesse Barnes	e90955c26d	x86: fix PCI MSI breaks when booting with nosmp set up sane APIC state even in the nosmp case. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-29 13:45:24 +02:00
Ingo Molnar	781fe2ebc0	bootprotocol: cleanup Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-29 13:45:24 +02:00
Alexander van Heukelum	8008abbd87	x86: fix warning in "x86: clean up vSMP detection" The function detect_vsmp_box is a void function in the PCI case. Change the !PCI stub to void too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-29 13:45:24 +02:00
Harvey Harrison	e686d34156	x86: !x & y typo in mtrr code As written, this can never be true. Spotted by the Sparse checker. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-29 13:45:24 +02:00
Roland McGrath	d9dedc1385	x86_64 vDSO: use initdata The 64-bit vDSO image is in a special ".vdso" section for no reason I can determine. Furthermore, the location of the vdso_end symbol includes some wrongly-calculated padding space in the image, which is then (correctly) rounded to page size, resulting in an extra page of zeros in the image mapped in to user processes. This changes it to put the vdso.so image into normal initdata as we have always done for the 32-bit vDSO images. The extra padding is gone, so the user VMA is one page instead of two. The image that was already copied around at boot time is now in initdata, so we recover that wasted space after boot. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 13:49:35 -07:00
Darrick J. Wong	e8628dd06d	[CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism Currently, affected_cpus shows which CPUs need to have their frequency coordinated in software. When hardware coordination is in use, the contents of this file appear the same as when no coordination is required. This can lead to some confusion among user-space programs, for example, that do not know that extra coordination is required to force a CPU core to a particular speed to control power consumption. To fix this, create a "related_cpus" attribute that always displays the coordination map regardless of whatever coordination strategy the cpufreq driver uses (sw or hw). If the cpufreq driver does not provide a value, fall back to policy->cpus. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>	2008-04-28 16:27:08 -04:00
Venkatesh Pallipadi	e56a727b02	[CPUFREQ] Make acpi-cpufreq more robust against BIOS freq changes behind our back. We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu(). Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>	2008-04-28 15:16:46 -04:00
PJ Waskiewicz	9d9ad4b51d	x86: Fix 32-bit MSI-X allocation leakage This bug was introduced in the 2.6.24 i386/x86_64 tree merge, where MSI-X vector allocation will eventually fail. The cause is the new bit array tracking used vectors is not getting cleared properly on IRQ destruction on the 32-bit APIC code. This can be seen easily using the ixgbe 10 GbE driver on multi-core systems by simply loading and unloading the driver a few times. Depending on the number of available vectors on the host system, the MSI-X allocation will eventually fail, and the driver will only be able to use legacy interrupts. I am generating the same patch for both stable trees for 2.6.24 and 2.6.25. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 10:49:17 -07:00
Andres Salomon	32bf87e369	x86: geode: MSR cleanup This cleans up a few MSR-using drivers in the following manner: - Ensures MSRs are all defined in asm/geode.h, rather than in misc places - Makes the naming consistent; cs553[56] ones begin with MSR_, GX-specific ones start with MSR_GX_, and LX-specific ones start with MSR_LX_. Also, make the names match the data sheet. - Use MSR names rather than numbers in source code - Document the fact that the LX's MSR_PADSEL has the wrong value in the data sheet. That's, uh, good to note. Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:35 -07:00
Thomas Petazzoni	7ae9392c0a	x86: configurable DMI scanning code Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in order to be able to remove the DMI table scanning code if it's not needed, and then reduce the kernel code size. With CONFIG_DMI (i.e before) : text data bss dec hex filename 1076076 128656 98304 1303036 13e1fc vmlinux Without CONFIG_DMI (i.e after) : text data bss dec hex filename 1068092 126308 98304 1292704 13b9a0 vmlinux Result: text data bss dec hex filename -7984 -2348 0 -10332 -285c vmlinux The new option appears in "Processor type and features", only when CONFIG_EMBEDDED is defined. This patch is part of the Linux Tiny project, and is based on previous work done by Matt Mackall <mpm@selenic.com>. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Anvin" <hpa@zytor.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:30 -07:00
Christoph Lameter	d60cd46bbd	pageflags: use proper page flag functions in Xen Xen uses bitops to manipulate page flags. Make it use proper page flag functions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	2301696932	vmallocinfo: add caller information Add caller information so that /proc/vmallocinfo shows where the allocation request for a slice of vmalloc memory originated. Results in output like this: 0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages 0xffffc20000801000-0xffffc20000806000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc 0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages 0xffffc20000c07000-0xffffc20000c0a000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc 0xffffc20000c0a000-0xffffc20000c0c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c0c000-0xffffc20000c0f000 12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap 0xffffc20000c10000-0xffffc20000c15000 20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap 0xffffc20000c16000-0xffffc20000c18000 8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap 0xffffc20000c18000-0xffffc20000c1a000 8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap 0xffffc20000c1a000-0xffffc20000c1c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c1c000-0xffffc20000c1e000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c1e000-0xffffc20000c20000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c20000-0xffffc20000c22000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c22000-0xffffc20000c24000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c24000-0xffffc20000c26000 8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap 0xffffc20000c26000-0xffffc20000c28000 8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap 0xffffc20000c28000-0xffffc20000c2d000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc 0xffffc20000c2d000-0xffffc20000c31000 16384 tcp_init+0xd5/0x31c pages=3 vmalloc 0xffffc20000c31000-0xffffc20000c34000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc 0xffffc20000c34000-0xffffc20000c36000 8192 init_vdso_vars+0xde/0x1f1 0xffffc20000c36000-0xffffc20000c38000 8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap 0xffffc20000c38000-0xffffc20000c3a000 8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap 0xffffc20000c3a000-0xffffc20000c3e000 16384 sys_swapon+0x509/0xa15 pages=3 vmalloc 0xffffc20000c40000-0xffffc20000c61000 135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap 0xffffc20000c61000-0xffffc20000c6a000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c6a000-0xffffc20000c73000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c73000-0xffffc20000c7c000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c7c000-0xffffc20000c7f000 12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc 0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap 0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc 0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages 0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages 0xffffc20002204000-0xffffc2000220d000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc2000220d000-0xffffc20002216000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002216000-0xffffc2000221f000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc2000221f000-0xffffc20002228000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002228000-0xffffc20002231000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002231000-0xffffc20002234000 12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc 0xffffc20002240000-0xffffc20002261000 135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap 0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages 0xffffffffa0000000-0xffffffffa0022000 139264 module_alloc+0x4f/0x55 pages=33 vmalloc 0xffffffffa0022000-0xffffffffa0029000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc 0xffffffffa002b000-0xffffffffa0034000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc 0xffffffffa0034000-0xffffffffa003d000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc 0xffffffffa003d000-0xffffffffa0049000 49152 module_alloc+0x4f/0x55 pages=11 vmalloc 0xffffffffa0049000-0xffffffffa0050000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Pekka Enberg	1b27d05b6e	mm: move cache_line_size() to <linux/cache.h> Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Jeremy Fitzhardinge	180c06efce	hotplug-memory: make online_page() common All architectures use an effectively identical definition of online_page(), so just make it common code. x86-64, ia64, powerpc and sh are actually identical; x86-32 is slightly different. x86-32's differences arise because it puts its hotplug pages in the highmem zone. We can handle this in the generic code by inspecting the page to see if its in highmem, and update the totalhigh_pages count appropriately. This leaves init_32.c:free_new_highpage with a single caller, so I folded it into add_one_highpage_init. I also removed an incorrect comment referring to the NUMA case; any NUMA details have already been dealt with by the time online_page() is called. [akpm@linux-foundation.org: fix indenting] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:17 -07:00
Ingo Molnar	f022bfd582	x86: PAT fix Adrian Bunk noticed the following Coverity report: > Commit `e7f260a276` > (x86: PAT use reserve free memtype in mmap of /dev/mem) > added the following gem to arch/x86/mm/pat.c: > > <-- snip --> > > ... > int phys_mem_access_prot_allowed(struct file file, unsigned long pfn, > unsigned long size, pgprot_t vma_prot) > { > u64 offset = ((u64) pfn) << PAGE_SHIFT; > unsigned long flags = _PAGE_CACHE_UC_MINUS; > unsigned long ret_flags; > ... > ... (nothing that touches ret_flags) > ... > if (flags != _PAGE_CACHE_UC_MINUS) { > retval = reserve_memtype(offset, offset + size, flags, NULL); > } else { > retval = reserve_memtype(offset, offset + size, -1, &ret_flags); > } > > if (retval < 0) > return 0; > > flags = ret_flags; > > if (pfn <= max_pfn_mapped && > ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) { > free_memtype(offset, offset + size); > printk(KERN_INFO > "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n", > current->comm, current->pid, > cattr_name(flags), > offset, offset + size); > return 0; > } > > vma_prot = __pgprot((pgprot_val(vma_prot) & ~_PAGE_CACHE_MASK) \| > flags); > return 1; > } > > <-- snip --> > > If (flags != _PAGE_CACHE_UC_MINUS) we pass garbage from the stack to > ioremap_change_attr() and/or __pgprot(). > > Spotted by the Coverity checker. the fix simplifies the code as we get rid of the 'ret_flags' complication. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:15:16 -07:00
Linus Torvalds	86cf02f8ea	x86 PAT: tone down debugging messages some more Ingo already fixed one of these at my request (in "x86 PAT: tone down debugging messages", commit `1ebcc654f0`), but there was another one he missed. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-27 11:59:30 -07:00
Linus Torvalds	42cadc8600	Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits) KVM: kill file->f_count abuse in kvm KVM: MMU: kvm_pv_mmu_op should not take mmap_sem KVM: SVM: remove selective CR0 comment KVM: SVM: remove now obsolete FIXME comment KVM: SVM: disable CR8 intercept when tpr is not masking interrupts KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled KVM: export kvm_lapic_set_tpr() to modules KVM: SVM: sync TPR value to V_TPR field in the VMCB KVM: ppc: PowerPC 440 KVM implementation KVM: Add MAINTAINERS entry for PowerPC KVM KVM: ppc: Add DCR access information to struct kvm_run ppc: Export tlb_44x_hwater for KVM KVM: Rename debugfs_dir to kvm_debugfs_dir KVM: x86 emulator: fix lea to really get the effective address KVM: x86 emulator: fix smsw and lmsw with a memory operand KVM: x86 emulator: initialize src.val and dst.val for register operands KVM: SVM: force a new asid when initializing the vmcb KVM: fix kvm_vcpu_kick vs __vcpu_run race KVM: add ioctls to save/store mpstate KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_* ...	2008-04-27 10:13:52 -07:00
Marcelo Tosatti	960b399169	KVM: MMU: kvm_pv_mmu_op should not take mmap_sem kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down in the MMU processing will take it if necessary, so as it is it can deadlock. Apparently a leftover from the days before slots_lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:45 +03:00
Joerg Roedel	1336028b9a	KVM: SVM: remove selective CR0 comment There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:44 +03:00
Joerg Roedel	aaf697e4e0	KVM: SVM: remove now obsolete FIXME comment With the usage of the V_TPR field this comment is now obsolete. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:43 +03:00
Joerg Roedel	aaacfc9ae2	KVM: SVM: disable CR8 intercept when tpr is not masking interrupts This patch disables the intercept of CR8 writes if the TPR is not masking interrupts. This reduces the total number CR8 intercepts to below 1 percent of what we have without this patch using Windows 64 bit guests. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:43 +03:00
Joerg Roedel	d7bf8221a3	KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be synced with the TPR field in the local apic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:42 +03:00
Joerg Roedel	ec7cf6903f	KVM: export kvm_lapic_set_tpr() to modules This patch exports the kvm_lapic_set_tpr() function from the lapic code to modules. It is required in the kvm-amd module to optimize CR8 intercepts. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:41 +03:00
Joerg Roedel	649d68643e	KVM: SVM: sync TPR value to V_TPR field in the VMCB This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB. With this change we can safely remove the CR8 read intercept. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:40 +03:00
Avi Kivity	f9b7aab35c	KVM: x86 emulator: fix lea to really get the effective address We never hit this, since there is currently no reason to emulate lea. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:35 +03:00
Avi Kivity	16286d082d	KVM: x86 emulator: fix smsw and lmsw with a memory operand lmsw and smsw were implemented only with a register operand. Extend them to support a memory operand as well. Fixes Windows running some display compatibility test on AMD hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:34 +03:00
Avi Kivity	66b8550573	KVM: x86 emulator: initialize src.val and dst.val for register operands This lets us treat the case where mod == 3 in the same manner as other cases. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:33 +03:00
Avi Kivity	a79d2f1805	KVM: SVM: force a new asid when initializing the vmcb Shutdown interception clears the vmcb, leaving the asid at zero (which is illegal. so force a new asid on vmcb initialization. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:32 +03:00
Marcelo Tosatti	e9571ed54b	KVM: fix kvm_vcpu_kick vs __vcpu_run race There is a window open between testing of pending IRQ's and assignment of guest_mode in __vcpu_run. Injection of IRQ's can race with __vcpu_run as follows: CPU0 CPU1 kvm_x86_ops->run() vcpu->guest_mode = 0 SET_IRQ_LINE ioctl .. kvm_x86_ops->inject_pending_irq kvm_cpu_has_interrupt() apic_test_and_set_irr() kvm_vcpu_kick if (vcpu->guest_mode) send_ipi() vcpu->guest_mode = 1 So move guest_mode=1 assignment before ->inject_pending_irq, and make sure that it won't reorder after it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:32 +03:00
Marcelo Tosatti	62d9f0dbc9	KVM: add ioctls to save/store mpstate So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:16 +03:00
Avi Kivity	a45352908b	KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_* We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:04:13 +03:00
Marcelo Tosatti	3d80840d96	KVM: hlt emulation should take in-kernel APIC/PIT timers into account Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:04:11 +03:00
Joerg Roedel	3564990af1	KVM: SVM: do not intercept task switch with NPT When KVM uses NPT there is no reason to intercept task switches. This patch removes the intercept for it in that case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:23 +03:00
Feng(Eric) Liu	d4c9ff2d1b	KVM: Add kvm trace userspace interface This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:22 +03:00
Feng (Eric) Liu	2714d1d3d6	KVM: Add trace markers Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:19 +03:00
Joerg Roedel	53371b5098	KVM: SVM: add intercept for machine check exception To properly forward a MCE occured while the guest is running to the host, we have to intercept this exception and call the host handler by hand. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:18 +03:00
Joerg Roedel	6394b6494c	KVM: SVM: align shadow CR4.MCE with host This patch aligns the host version of the CR4.MCE bit with the CR4 active in the guest. This is necessary to get MCE exceptions when the guest is running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:18 +03:00
Joerg Roedel	ec077263b2	KVM: SVM: indent svm_set_cr4 with tabs instead of spaces The svm_set_cr4 function is indented with spaces. This patch replaces them with tabs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:17 +03:00
Anthony Liguori	35149e2129	KVM: MMU: Don't assume struct page for x86 This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:15 +03:00
Marcelo Tosatti	bed1d1dfc4	KVM: MMU: prepopulate guest pages after write-protecting Zdenek reported a bug where a looping "dmsetup status" eventually hangs on SMP guests. The problem is that kvm_mmu_get_page() prepopulates the shadow MMU before write protecting the guest page tables. By doing so, it leaves a window open where the guest can mark a pte as present while the host has shadow cached such pte as "notrap". Accesses to such address will fault in the guest without the host having a chance to fix the situation. Fix by moving the write protection before the pte prefetch. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:58 +03:00
Avi Kivity	fcd6dbac92	KVM: MMU: Only mark_page_accessed() if the page was accessed by the guest If the accessed bit is not set, the guest has never accessed this page (at least through this spte), so there's no need to mark the page accessed. This provides more accurate data for the eviction algortithm. Noted by Andrea Arcangeli. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:57 +03:00
Avi Kivity	3d45830c2b	KVM: Free apic access page on vm destruction Noticed by Marcelo Tosatti. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:54 +03:00
Izik Eidus	3ee16c8145	KVM: MMU: allow the vm to shrink the kvm mmu shadow caches Allow the Linux memory manager to reclaim memory in the kvm shadow cache. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:53 +03:00
Marcelo Tosatti	3200f405a1	KVM: MMU: unify slots_lock usage Unify slots_lock acquision around vcpu_run(). This is simpler and less error-prone. Also fix some callsites that were not grabbing the lock properly. [avi: drop slots_lock while in guest mode to avoid holding the lock for indefinite periods] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:52 +03:00
Sheng Yang	25c5f225be	KVM: VMX: Enable MSR Bitmap feature MSR Bitmap controls whether the accessing of an MSR causes VM Exit. Eliminating exits on automatically saved and restored MSRs yields a small performance gain. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:52 +03:00
Izik Eidus	37817f2982	KVM: x86: hardware task switching support This emulates the x86 hardware task switch mechanism in software, as it is unsupported by either vmx or svm. It allows operating systems which use it, like freedos, to run as kvm guests. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:39 +03:00
Izik Eidus	2e4d265349	KVM: x86: add functions to get the cpl of vcpu Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:38 +03:00
Avi Kivity	4c9fc8ef50	KVM: VMX: Add module option to disable flexpriority Useful for debugging. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:37 +03:00
Avi Kivity	268fe02ae0	KVM: no longer EXPERIMENTAL Long overdue. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:36 +03:00
Avi Kivity	0b49ea8659	KVM: MMU: Introduce and use spte_to_page() Encapsulate the pte mask'n'shift in a function. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:35 +03:00
Izik Eidus	855149aaa9	KVM: MMU: fix dirty bit setting when removing write permissions When mmu_set_spte() checks if a page related to spte should be release as dirty or clean, it check if the shadow pte was writeble, but in case rmap_write_protect() is called called it is possible for shadow ptes that were writeble to become readonly and therefor mmu_set_spte will release the pages as clean. This patch fix this issue by marking the page as dirty inside rmap_write_protect(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:34 +03:00
Avi Kivity	947da53830	KVM: MMU: Set the accessed bit on non-speculative shadow ptes If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:33 +03:00
Glauber Costa	1e977aa12d	x86: KVM guest: disable clock before rebooting. This patch writes 0 (actually, what really matters is that the LSB is cleared) to the system time msr before shutting down the machine for kexec. Without it, we can have a random memory location being written when the guest comes back It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c) and crash_shutdown, used in the path of crash_kexec() (kexec.c) Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:31 +03:00
Glauber Costa	3c62c62502	x86: make native_machine_shutdown non-static it will allow external users to call it. It is mainly useful for routines that will override its machine_ops field for its own special purposes, but want to call the normal shutdown routine after they're done Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:30 +03:00
Glauber Costa	ed23dc6f5b	x86: allow machine_crash_shutdown to be replaced This patch a llows machine_crash_shutdown to be replaced, just like any of the other functions in machine_ops Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:29 +03:00
Marcelo Tosatti	096d14a3b5	x86: KVM guest: hypercall batching Batch pte updates and tlb flushes in lazy MMU mode. [avi: - adjust to mmu_op - helper for getting para_state without debug warnings] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:28 +03:00
Marcelo Tosatti	1da8a77bdc	x86: KVM guest: hypercall based pte updates and TLB flushes Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - guest/host split - fix 32-bit truncation issues - adjust to mmu_op - adjust to ->release_*() renamed - add ->release_pud()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:28 +03:00
Marcelo Tosatti	2f333bcb4e	KVM: MMU: hypercall based pte updates and TLB flushes Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:27 +03:00
Avi Kivity	9f81128591	KVM: Provide unlocked version of emulator_write_phys() Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:26 +03:00
Marcelo Tosatti	0cf1bfd273	x86: KVM guest: add basic paravirt support Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:25 +03:00
Marcelo Tosatti	a28e4f5a62	KVM: add basic paravirt support Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:24 +03:00
Sheng Yang	308b0f239e	KVM: Add reset support for in kernel PIT Separate the reset part and prepare for reset support. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:23 +03:00
Sheng Yang	e0f63cb927	KVM: Add save/restore supporting of in kernel PIT Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:22 +03:00
Sheng Yang	7837699fa6	KVM: In kernel PIT model The patch moves the PIT model from userspace to kernel, and increases the timer accuracy greatly. [marcelo: make last_injected_time per-guest] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:21 +03:00
Avi Kivity	4fcaa98267	KVM: Remove pointless desc_ptr #ifdef The desc_struct changes left an unnecessary #ifdef; remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:27 +03:00
Avi Kivity	019960ae99	KVM: VMX: Don't adjust tsc offset forward Most Intel hosts have a stable tsc, and playing with the offset only reduces accuracy. By limiting tsc offset adjustment only to forward updates, we effectively disable tsc offset adjustment on these hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:27 +03:00
Harvey Harrison	b8688d51bb	KVM: replace remaining __FUNCTION__ occurances __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:27 +03:00
Joerg Roedel	71c4dfafc0	KVM: detect if VCPU triple faults In the current inject_page_fault path KVM only checks if there is another PF pending and injects a DF then. But it has to check for a pending DF too to detect a shutdown condition in the VCPU. If this is not detected the VCPU goes to a PF -> DF -> PF loop when it should triple fault. This patch detects this condition and handles it with an KVM_SHUTDOWN exit to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:27 +03:00
Avi Kivity	2d3ad1f40c	KVM: Prefix control register accessors with kvm_ to avoid namespace pollution Names like 'set_cr3()' look dangerously close to affecting the host. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:26 +03:00
Marcelo Tosatti	05da45583d	KVM: MMU: large page support Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:25 +03:00
Marcelo Tosatti	2e53d63acb	KVM: MMU: ignore zapped root pagetables Mark zapped root pagetables as invalid and ignore such pages during lookup. This is a problem with the cr3-target feature, where a zapped root table fools the faulting code into creating a read-only mapping. The result is a lockup if the instruction can't be emulated. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:25 +03:00
Alexander Graf	847f0ad8cb	KVM: Implement dummy values for MSR_PERF_STATUS Darwin relies on this and ceases to work without. Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:25 +03:00
Harvey Harrison	14af3f3c56	KVM: sparse fixes for kvm/x86.c In two case statements, use the ever popular 'i' instead of index: arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here Make it static. arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static? Drop the return statements. arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:24 +03:00
Harvey Harrison	4866d5e3d5	KVM: SVM: make iopm_base static Fixes sparse warning as well. arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:24 +03:00
Harvey Harrison	77cd337f22	KVM: x86 emulator: fix sparse warnings in x86_emulate.c Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed variable warnings on the internal variable _tmp used by both macros. Change the outer macro to use __tmp. Avoids a sparse warning like the following at every call site of __emulate_2op arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one arch/x86/kvm/x86_emulate.c:1091:3: originally declared here [18 more warnings suppressed] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:24 +03:00
Amit Shah	f11c3a8d84	KVM: Add stat counter for hypercalls Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:24 +03:00
Avi Kivity	a5f61300c4	KVM: Use x86's segment descriptor struct instead of private definition The x86 desc_struct unification allows us to remove segment_descriptor.h. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:24 +03:00
Avi Kivity	a988b910ef	KVM: Add API for determining the number of supported memory slots Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Avi Kivity	f725230af9	KVM: Add API to retrieve the number of supported vcpus per vm Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Harvey Harrison	7a95727567	KVM: x86 emulator: make register_address_increment and JMP_REL static inlines Change jmp_rel() to a function as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Harvey Harrison	e4706772ea	KVM: x86 emulator: make register_address, address_mask static inlines Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:22 +03:00
Harvey Harrison	ddcb2885e2	KVM: x86 emulator: add ad_mask static inline Replaces open-coded mask calculation in macros. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:22 +03:00
Glauber de Oliveira Costa	790c73f628	x86: KVM guest: paravirtualized clocksource This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. [randy dunlap: add missing include] [randy dunlap: disallow on Voyager or Visual WS] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:22 +03:00
Glauber de Oliveira Costa	18068523d3	KVM: paravirtualized clocksource: host part This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writing to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. [marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME] [avi: save full value of the msr, even if enable bit is clear] [avi: clear previous value of time_page] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:22 +03:00
Joerg Roedel	24e09cbf48	KVM: SVM: enable LBR virtualization This patch implements the Last Branch Record Virtualization (LBRV) feature of the AMD Barcelona and Phenom processors into the kvm-amd module. It will only be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So there is no increased world switch overhead when the guest doesn't use these MSRs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:21 +03:00
Joerg Roedel	f65c229c3e	KVM: SVM: allocate the MSR permission map per VCPU This patch changes the kvm-amd module to allocate the SVM MSR permission map per VCPU instead of a global map for all VCPUs. With this we have more flexibility allowing specific guests to access virtualized MSRs. This is required for LBR virtualization. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:21 +03:00
Joerg Roedel	e6101a96c9	KVM: SVM: let init_vmcb() take struct vcpu_svm as parameter Change the parameter of the init_vmcb() function in the kvm-amd module from struct vmcb to struct vcpu_svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:21 +03:00
Ryan Harper	2e11384c2c	KVM: VMX: fix typo in VMX header define Looking at Intel Volume 3b, page 148, table 20-11 and noticed that the field name is 'Deliver' not 'Deliever'. Attached patch changes the define name and its user in vmx.c Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:21 +03:00
Joerg Roedel	709ddebf81	KVM: SVM: add support for Nested Paging This patch contains the SVM architecture dependent changes for KVM to enable support for the Nested Paging feature of AMD Barcelona and Phenom processors. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:21 +03:00
Joerg Roedel	fb72d1674d	KVM: MMU: add TDP support to the KVM MMU This patch contains the changes to the KVM MMU necessary for support of the Nested Paging feature in AMD Barcelona and Phenom Processors. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:20 +03:00
Joerg Roedel	cc4b6871e7	KVM: export the load_pdptrs() function to modules The load_pdptrs() function is required in the SVM module for NPT support. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:20 +03:00
Joerg Roedel	4d9976bbdc	KVM: MMU: make the __nonpaging_map function generic The mapping function for the nonpaging case in the softmmu does basically the same as required for Nested Paging. Make this function generic so it can be used for both. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:20 +03:00
Joerg Roedel	1855267210	KVM: export information about NPT to generic x86 code The generic x86 code has to know if the specific implementation uses Nested Paging. In the generic code Nested Paging is called Two Dimensional Paging (TDP) to avoid confusion with (future) TDP implementations of other vendors. This patch exports the availability of TDP to the generic x86 code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:19 +03:00
Joerg Roedel	6c7dac72d5	KVM: SVM: add module parameter to disable Nested Paging To disable the use of the Nested Paging feature even if it is available in hardware this patch adds a module parameter. Nested Paging can be disabled by passing npt=0 to the kvm_amd module. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:19 +03:00
Joerg Roedel	e3da3acdb3	KVM: SVM: add detection of Nested Paging feature Let SVM detect if the Nested Paging feature is available on the hardware. Disable it to keep this patch series bisectable. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:19 +03:00
Joerg Roedel	33bd6a0b3e	KVM: SVM: move feature detection to hardware setup code By moving the SVM feature detection from the each_cpu code to the hardware setup code it runs only once. As an additional advance the feature check is now available earlier in the module setup process. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:19 +03:00
Joerg Roedel	9457a712a2	KVM: allow access to EFER in 32bit KVM This patch makes the EFER register accessible on a 32bit KVM host. This is necessary to boot 32 bit PAE guests under SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:19 +03:00
Joerg Roedel	9f62e19a11	KVM: VMX: unifdef the EFER specific code To allow access to the EFER register in 32bit KVM the EFER specific code has to be exported to the x86 generic code. This patch does this in a backwards compatible manner. [avi: add check for EFER-less hosts] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:18 +03:00
Joerg Roedel	50a37eb4e0	KVM: align valid EFER bits with the features of the host system This patch aligns the bits the guest can set in the EFER register with the features in the host processor. Currently it lets EFER.NX disabled if the processor does not support it and enables EFER.LME and EFER.LMA only for KVM on 64 bit hosts. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:18 +03:00
Joerg Roedel	f2b4b7ddf6	KVM: make EFER_RESERVED_BITS configurable for architecture code This patch give the SVM and VMX implementations the ability to add some bits the guest can set in its EFER register. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:18 +03:00
Sheng Yang	2384d2b326	KVM: VMX: Enable Virtual Processor Identification (VPID) To allow TLB entries to be retained across VM entry and VM exit, the VMM can now identify distinct address spaces through a new virtual-processor ID (VPID) field of the VMCS. [avi: drop vpid_sync_all()] [avi: add "cc" to asm constraints] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:17 +03:00
Avi Kivity	d196e34336	KVM: MMU: Decouple mmio from shadow page tables Currently an mmio guest pte is encoded in the shadow pagetable as a not-present trapping pte, with the SHADOW_IO_MARK bit set. However nothing is ever done with this information, so maintaining it is a useless complication. This patch moves the check for mmio to before shadow ptes are instantiated, so the shadow code is never invoked for ptes that reference mmio. The code is simpler, and with future work, can be made to handle mmio concurrently. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:17 +03:00
Avi Kivity	1d6ad2073e	KVM: x86 emulator: group decoding for group 1 instructions Opcodes 0x80-0x83 Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:16 +03:00
Avi Kivity	d95058a1a7	KVM: x86 emulator: add group 7 decoding This adds group decoding for opcode 0x0f 0x01 (group 7). Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:15 +03:00
Avi Kivity	fd60754e4f	KVM: x86 emulator: Group decoding for groups 4 and 5 Add group decoding support for opcode 0xfe (group 4) and 0xff (group 5). Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:15 +03:00
Avi Kivity	7d858a19ef	KVM: x86 emulator: Group decoding for group 3 This adds group decoding support for opcodes 0xf6, 0xf7 (group 3). Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:14 +03:00
Avi Kivity	43bb19cd33	KVM: x86 emulator: group decoding for group 1A This adds group decode support for opcode 0x8f. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:14 +03:00
Avi Kivity	e09d082c03	KVM: x86 emulator: add support for group decoding Certain x86 instructions use bits 3:5 of the byte following the opcode as an opcode extension, with the decode sometimes depending on bits 6:7 as well. Add support for this in the main decoding table rather than an ad-hock adaptation per opcode. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:14 +03:00
Dong, Eddie	1ae0a13def	KVM: MMU: Simplify hash table indexing Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:14 +03:00
Dong, Eddie	489f1d6526	KVM: MMU: Update shadow ptes on partial guest pte writes A guest partial guest pte write will leave shadow_trap_nonpresent_pte in spte, which generates a vmexit at the next guest access through that pte. This patch improves this by reading the full guest pte in advance and thus being able to update the spte and eliminate the vmexit. This helps pae guests which use two 32-bit writes to set a single 64-bit pte. [truncation fix by Eric] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:13 +03:00
Peter Zijlstra	7f424a8b08	fix idle (arch, acpi and apm) and lockdep OK, so 25-mm1 gave a lockdep error which made me look into this. The first thing that I noticed was the horrible mess; the second thing I saw was hacks like: `71e93d1561` The problem is that arch idle routines are somewhat inconsitent with their IRQ state handling and instead of fixing _that_, we go paper over the problem. So the thing I've tried to do is set a standard for idle routines and fix them all up to adhere to that. So the rules are: idle routines are entered with IRQs disabled idle routines will exit with IRQs enabled Nearly all already did this in one form or another. Merge the 32 and 64 bit bits so they no longer have different bugs. As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated irq-enable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-27 00:01:45 +02:00
Yinghai Lu	5f0b2976cb	x86: add pci=check_enable_amd_mmconf and dmi check so will disable that feature by default, and only enable that via pci=check_enable_amd_mmconf or for system match with dmi table. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	e8ee6f0ae5	x86: work around io allocation overlap of HT links normally BIOSes assign io/mmio range to different HT links without overlapping, even same node same link should get non overlapping entries. but Rafael L. Wysocki's buggy BIOS creates a link with overlapping entries for mmio and io: node 0 link 0: io port [1000, ffffff] node 0 link 0: mmio [e0000000, efffffff] node 0 link 0: mmio [a0000, bffff] node 0 link 0: mmio [80000000, ffffffff] try to merge them and we will get: bus: [00, ff] on node 0 link 0 bus: 00 index 0 io port: [0, ffff] bus: 00 index 1 mmio: [80000000, fcffffffff] bus: 00 index 2 mmio: [a0000, bffff] so later we will reduce the chance to assign used resource to unassigned device. Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	cbf9bd603a	acpi: get boot_cpu_id as early for k8_scan_nodes [mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	4cf1946374	x86_64: don't need set default res if only have one root bus if there's only one root bus there's no need to split resources. This patch fixes the issue described at: http://lkml.org/lkml/2008/4/10/304 Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	6e184f299d	x86: double check the multi root bus with fam10h mmconf some bioses give same range to mmconf for fam10h msr, and mmio for node/link. fam10h msr will overide mmio for node/link. so we can not assign range to devices under node/link for unassigned resources. this patch will take range out from the mmio for node/link Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	30a18d6c3f	x86: multi pci root bus with different io resource range, on 64-bit scan AMD opteron io/mmio routing to make sure every pci root bus get correct resource range. Thus later pci scan could assign correct resource to device with unassigned resource. this can fix a system without _CRS for multi pci root bus. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	35ddd068fb	x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit ... so we use the same code with Quad core cpu as old opteron. This patch is useful when acpi=off or _PXM is not there in DSDT. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	871d5f8dd0	x86: get mp_bus_to_node early Currently, on an amd k8 system with multi ht chains, the numa_node of pci devices under /sys/devices/pci0000:80/* is always 0, even if that chain is on node 1 or 2 or 3. Workaround: pcibus_to_node(bus) is used when we want to get the node that pci_device is on. In struct device, we already have numa_node member, and we could use dev_to_node()/set_dev_node() to get and set numa_node in the device. set_dev_node is called in pci_device_add() with pcibus_to_node(bus), and pcibus_to_node uses bus->sysdata for nodeid. The problem is when pci_add_device is called, bus->sysdata is not assigned correct nodeid yet. The result is that numa_node will always be 0. pcibios_scan_root and pci_scan_root could take sysdata. So we need to get mp_bus_to_node mapping before these two are called, and thus get_mp_bus_to_node could get correct node for sysdata in root bus. In scanning of the root bus, all child busses will take parent bus sysdata. So all pci_device->dev.numa_node will be assigned correctly and automatically. Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we could also could make other bus specific device get the correct numa_node too. This is an updated version of pci_sysdata and Jeff's pci_domain patch. [ mingo@elte.hu: build fix ] Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:04 +02:00
Yinghai Lu	bb63b42199	x86 pci: remove checking type for mmconfig probe doesn't need to check if it is type1 or type2, we can use raw_pci_ops directly. also make pci_direct_conf1 static again. anyway is there system with type 2 and mmconf support? Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 23:41:04 +02:00
Yinghai Lu	d2ebdf4bae	x86: remove unneeded check in mmconf reject mmconfig is only used to access extended configuration space. so don't need to reject MFG that only have one entry and only handle bus0. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 23:41:04 +02:00
Yinghai Lu	d39398a333	x86: seperate mmconf for fam10h out from setup_64.c Separate mmconf for fam10h out from setup_64.c Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 23:41:04 +02:00
Yinghai Lu	d4c4d09415	x86: if acpi=off, force setting the mmconf for fam10h some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804 to handle other buses. at that case MCFG will cover all over them. but with acpi=off, we can not use MCFG. this patch will double check the busnbits, and if it is less handling 256 bues, and acpi=off will forcely reset the mmconf in msr, so we still use mmconf in above case. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 23:41:04 +02:00
Yinghai Lu	7fd0da4085	x86_64: check MSR to get MMCONFIG for AMD Family 10h so even booting kernel with acpi=off or even MCFG is not there, we still can use MMCONFIG. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Cc: Greg KH <greg@kroah.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 23:41:04 +02:00
Yinghai Lu	eee206c3bf	x86_64: check and enable MMCONFIG for AMD Family 10h So we can use MMCONF when MMCONF is not set by BIOS using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to make sure the range is not conflicted with some prefetch MMIO that is above 4G. (current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor) Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 23:41:04 +02:00
Yinghai Lu	57741a7790	x86_64: set cfg_size for AMD Family 10h in case MMCONFIG reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:03 +02:00
Yinghai Lu	05c58b8ac7	x86: mmconf enable mcfg early Patch "x86: validate against ACPI motherboard resources" changed the mmconf init sequence, and init MMCONF late in acpi_init. here change it back to old sequence: 1. check hostbridge in early 2. check MCFG with e820 in early 3. if all fail, will check MCFg with acpi _CRS in acpi_init So we can make MCONF working again when acpi=off is set if hostbridge support that. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg KH <greg@kroah.com> Cc: Greg KH <greg@kroah.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:03 +02:00
Yinghai Lu	0b64ad7123	x86: clear pci_mmcfg_virt when mmcfg get rejected For x86_64, need to free pci_mmcfg_virt, and iounmap some pointers when MMCONF is not reserved in E820 or acpi _CRS and get rejected. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg KH <greg@kroah.com> Cc: Greg KH <greg@kroah.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:03 +02:00
Robert Hancock	7752d5cfe3	x86: validate against acpi motherboard resources This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: - Validate all MMCONFIG configurations provided, not just the first one. - Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. - Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. This also cleans up the MMCONFIG initialization functions so that they simply do nothing if MMCONFIG is not compiled in. Based on an original patch by Rajesh Shah from Intel. [akpm@linux-foundation.org: many fixes and cleanups] Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andi Kleen <ak@suse.de> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 23:41:03 +02:00
Linus Torvalds	c3bf9bc243	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3: x86_64/mm: check and print vmemmap allocation continuous x86_64: fix setup_node_bootmem to support big mem excluding with memmap x86_64: make reserve_bootmem_generic() use new reserve_bootmem() mm: allow reserve_bootmem() cross nodes mm: offset align in alloc_bootmem() mm: fix alloc_bootmem_core to use fast searching for all nodes mm: make mem_map allocation continuous	2008-04-26 14:04:32 -07:00
Yinghai Lu	c2b91e2eec	x86_64/mm: check and print vmemmap allocation continuous On big systems with lots of memory, don't print out too much during bootup, and make it easy to find if it is continuous. on 256G 8 sockets system will get [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0 [ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0 [ffffe20038700000-ffffe200387fffff] potential offnode page_structs [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2 [ffffe20054700000-ffffe200547fffff] potential offnode page_structs [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2 [ffffe20070700000-ffffe200707fffff] potential offnode page_structs [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4 [ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4 [ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6 [ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7 instead of a very long print out... Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 22:51:09 +02:00
Yinghai Lu	1a27fc0a42	x86_64: fix setup_node_bootmem to support big mem excluding with memmap typical case: four sockets system, every node has 4g ram, and we are using: memmap=10g$4g to mask out memory on node1 and node2 when numa is enabled, early_node_mem is used to get node_data and node_bootmap. if it can not get memory from the same node with find_e820_area(), it will use alloc_bootmem to get buff from previous nodes. so check it and print out some info about it. need to move early_res_to_bootmem into every setup_node_bootmem. and it takes range that node has. otherwise alloc_bootmem could return addr that reserved early. depends on "mm: make reserve_bootmem can crossed the nodes". Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 22:51:08 +02:00
Yinghai Lu	8b3cd09ed2	x86_64: make reserve_bootmem_generic() use new reserve_bootmem() "mm: make reserve_bootmem can crossed the nodes" provides new reserve_bootmem(), let reserve_bootmem_generic() use that. reserve_bootmem_generic() is used to reserve initramdisk, so this way we can make sure even when bootloader or kexec load ranges cross the node memory boundaries, reserve_bootmem still works. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 22:51:08 +02:00
Linus Torvalds	9b79ed952b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3: x86, bitops: select the generic bitmap search functions x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only x86: finalize bitops unification x86, UML: remove x86-specific implementations of find_first_bit x86: optimize find_first_bit for small bitmaps x86: switch 64-bit to generic find_first_bit x86: generic versions of find_first_(zero_)bit, convert i386 bitops: use __fls for fls64 on 64-bit archs generic: implement __fls on all 64-bit archs generic: introduce a generic __fls implementation x86: merge the simple bitops and move them to bitops.h x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps x86, uml: fix uml with generic find_next_bit for x86 x86: change x86 to use generic find_next_bit uml: Kconfig cleanup uml: fix build error	2008-04-26 13:46:11 -07:00
Linus Torvalds	539a5fe226	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam: x86, boot: Document for linked list of struct setup_data x86, boot: export linked list of struct setup_data via debugfs x86, boot: add linked list of struct setup_data x86, boot: add free_early to early reservation machanism	2008-04-26 13:29:41 -07:00
Huang, Ying	c14b2adf19	x86, boot: export linked list of struct setup_data via debugfs Export linked list of struct setup_data via debugfs. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 21:34:42 +02:00

1 2 3 4 5 ...

2353 Commits