Commit Graph

5343 Commits

Author SHA1 Message Date
Ingo Molnar
695a461296 Merge branch 'amd-iommu/2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2009-09-04 14:44:16 +02:00
Yinghai Lu
0d96b9ff74 x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
Otherwise, system with apci id lifting will have wrong apicid in
/proc/cpuinfo.

and use that in srat_detect_node().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4A998CCA.1040407@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:55:29 +02:00
markus.t.metzger@intel.com
1653192f51 x86, perf_counter, bts: Do not allow kernel BTS tracing for now
Kernel BTS tracing generates too much data too fast for us to
handle, causing the kernel to hang.

Fail for BTS requests for kernel code.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zjilstra@chello.nl>
LKML-Reference: <20090902140616.901253000@intel.com>
[ This is really a workaround - but we want BTS tracing in .32
  so make sure we dont regress. The lockup should be fixed
  ASAP. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:26:40 +02:00
markus.t.metzger@intel.com
596da17f94 x86, perf_counter, bts: Correct pointer-to-u64 casts
On 32bit, pointers in the DS AREA configuration are cast to
u64. The current (long) cast to avoid compiler warnings results
in a signed 64bit address.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090902140615.305889000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:26:39 +02:00
markus.t.metzger@intel.com
747b50aaf7 x86, perf_counter, bts: Fail if BTS is not available
Reserve PERF_COUNT_HW_BRANCH_INSTRUCTIONS with sample_period ==
1 for BTS tracing and fail, if BTS is not available.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090902140612.943801000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:26:39 +02:00
Jeremy Fitzhardinge
53f824520b x86/i386: Put aligned stack-canary in percpu shared_aligned section
Pack aligned things together into a special section to minimize
padding holes.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <4AA035C0.9070202@goop.org>
[ queued up in tip:x86/asm because it depends on this commit:
  x86/i386: Make sure stack-protector segment base is cache aligned ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 07:10:31 +02:00
Andreas Herrmann
5a925b4282 x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
Current sched domain creation code can't handle multi-node processors.
When switching to power_savings scheduling errors show up and
system might hang later on (due to broken sched domain hierarchy):

  # echo 0  >> /sys/devices/system/cpu/sched_mc_power_savings
  CPU0 attaching sched-domain:
   domain 0: span 0-5 level MC
    groups: 0 1 2 3 4 5
    domain 1: span 0-23 level NODE
     groups: 0-5 6-11 18-23 12-17
  ...
  # echo 1  >> /sys/devices/system/cpu/sched_mc_power_savings
  CPU0 attaching sched-domain:
   domain 0: span 0-11 level MC
    groups: 0 1 2 3 4 5 6 7 8 9 10 11
  ERROR: parent span is not a superset of domain->span
    domain 1: span 0-5 level CPU
  ERROR: domain->groups does not contain CPU0
     groups: 6-11 (__cpu_power = 12288)
  ERROR: groups don't span domain->span
     domain 2: span 0-23 level NODE
      groups:
  ERROR: domain->cpu_power not set

  ERROR: groups don't span domain->span
  ...

Fixing all aspects of power-savings scheduling for Magny-Cours needs
some larger changes in the sched domain creation code.

As a short-term and temporary workaround avoid the problems by
extending "the worst possible hack" ;-(
and always use llc_shared_map on AMD Magny-Cours when MC domain span
is calculated.

With this I get:

  # echo 1  >> /sys/devices/system/cpu/sched_mc_power_savings
  CPU0 attaching sched-domain:
   domain 0: span 0-5 level MC
    groups: 0 1 2 3 4 5
    domain 1: span 0-5 level CPU
     groups: 0-5 (__cpu_power = 6144)
     domain 2: span 0-23 level NODE
      groups: 0-5 (__cpu_power = 6144) 6-11 (__cpu_power = 6144) 18-23 (__cpu_power = 6144) 12-17 (__cpu_power = 6144)
  ...

I.e. no errors during sched domain creation, no system hangs, and also
mc_power_savings scheduling works to a certain extend.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:10:14 -07:00
Andreas Herrmann
cb9805ab5b x86, mcheck: Use correct cpumask for shared bank4
This fixes threshold_bank4 support on multi-node processors.

The correct mask to use is llc_shared_map, representing an internal
node on Magny-Cours.

We need to create 2 sets of symlinks for sibling shared banks -- one
set for each internal node, symlinks of each set should target the
first core on same internal node.

Currently only one set is created where all symlinks are targeting
the first core of the entire socket.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:10:08 -07:00
Andreas Herrmann
a326e948c5 x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
L3 cache size, associativity and shared_cpu information need to be
adapted to show information for an internal node instead of the
entire physical package.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:10:03 -07:00
Andreas Herrmann
4a376ec3a2 x86: Fix CPU llc_shared_map information for AMD Magny-Cours
Construct entire NodeID and use it as cpu_llc_id. Thus internal node
siblings are stored in llc_shared_map.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:09:59 -07:00
Jeremy Fitzhardinge
1ea0d14e48 x86/i386: Make sure stack-protector segment base is cache aligned
The Intel Optimization Reference Guide says:

	In Intel Atom microarchitecture, the address generation unit
	assumes that the segment base will be 0 by default. Non-zero
	segment base will cause load and store operations to experience
	a delay.
		- If the segment base isn't aligned to a cache line
		  boundary, the max throughput of memory operations is
		  reduced to one [e]very 9 cycles.
	[...]
	Assembly/Compiler Coding Rule 15. (H impact, ML generality)
	For Intel Atom processors, use segments with base set to 0
	whenever possible; avoid non-zero segment base address that is
	not aligned to cache line boundary at all cost.

We can't avoid having a non-zero base for the stack-protector
segment, but we can make it cache-aligned.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
LKML-Reference: <4AA01893.6000507@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-03 21:30:51 +02:00
Joerg Roedel
2b681fafcc Merge branch 'amd-iommu/pagetable' into amd-iommu/2.6.32
Conflicts:
	arch/x86/kernel/amd_iommu.c
2009-09-03 17:14:57 +02:00
Joerg Roedel
03362a05c5 Merge branch 'amd-iommu/passthrough' into amd-iommu/2.6.32
Conflicts:
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2009-09-03 16:34:23 +02:00
Joerg Roedel
85da07c409 Merge branches 'gart/fixes', 'amd-iommu/fixes+cleanups' and 'amd-iommu/fault-handling' into amd-iommu/2.6.32 2009-09-03 16:32:00 +02:00
Joerg Roedel
4751a95134 x86/amd-iommu: Initialize passthrough mode when requested
This patch enables the passthrough mode for AMD IOMMU by
running the initialization function when iommu=pt is passed
on the kernel command line.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:46 +02:00
Joerg Roedel
a1ca331c8a x86/amd-iommu: Don't detach device from pt domain on driver unbind
This patch makes sure a device is not detached from the
passthrough domain when the device driver is unloaded or
does otherwise release the device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:46 +02:00
Joerg Roedel
21129f786f x86/amd-iommu: Make sure a device is assigned in passthrough mode
When the IOMMU driver runs in passthrough mode it has to
make sure that every device not assigned to an IOMMU-API
domain must be put into the passthrough domain instead of
keeping it unassigned.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:45 +02:00
Joerg Roedel
eba6ac60ba x86/amd-iommu: Align locking between attach_device and detach_device
This patch makes the locking behavior between the functions
attach_device and __attach_device consistent with the
locking behavior between detach_device and __detach_device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:44 +02:00
Joerg Roedel
aa879fff5d x86/amd-iommu: Fix device table write order
The V bit of the device table entry has to be set after the
rest of the entry is written to not confuse the hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:43 +02:00
Joerg Roedel
0feae533dd x86/amd-iommu: Add passthrough mode initialization functions
When iommu=pt is passed on kernel command line the devices
should run untranslated. This requires the allocation of a
special domain for that purpose. This patch implements the
allocation and initialization path for iommu=pt.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:42 +02:00
Joerg Roedel
2650815fb0 x86/amd-iommu: Add core functions for pd allocation/freeing
This patch factors some code of protection domain allocation
into seperate functions. This way the logic can be used to
allocate the passthrough domain later. As a side effect this
patch fixes an unlikely domain id leakage bug.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:34 +02:00
Joerg Roedel
ac0101d396 x86/dma: Mark iommu_pass_through as __read_mostly
This variable is read most of the time. This patch marks it
as such. It also documents the meaning the this variable
while at it.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:13:46 +02:00
Joerg Roedel
abdc5eb3d6 x86/amd-iommu: Change iommu_map_page to support multiple page sizes
This patch adds a map_size parameter to the iommu_map_page
function which makes it generic enough to handle multiple
page sizes. This also requires a change to alloc_pte which
is also done in this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:11:17 +02:00
Joerg Roedel
a6b256b413 x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
This patch changes fetch_pte and iommu_page_unmap to support
different page sizes too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:11:08 +02:00
Joerg Roedel
8f7a017ce0 x86/amd-iommu: Use 2-level page tables for dma_ops domains
The driver now supports a dynamic number of levels for IO
page tables. This allows to reduce the number of levels for
dma_ops domains by one because a dma_ops domain has usually
an address space size between 128MB and 4G.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:49 +02:00
Joerg Roedel
bad1cac28a x86/amd-iommu: Remove bus_addr check in iommu_map_page
The driver now supports full 64 bit device address spaces.
So this check is not longer required.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:48 +02:00
Joerg Roedel
8c8c143cdc x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX
This change allows to remove these old macros later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:47 +02:00
Joerg Roedel
8bc3e12742 x86/amd-iommu: Change alloc_pte to support 64 bit address space
This patch changes the alloc_pte function to be able to map
pages into the whole 64 bit address space supported by AMD
IOMMU hardware from the old limit of 2**39 bytes.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:46 +02:00
Joerg Roedel
50020fb632 x86/amd-iommu: Introduce increase_address_space function
This function will be used to increase the address space
size of a protection domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:46 +02:00
Joerg Roedel
04bfdd8406 x86/amd-iommu: Flush domains if address space size was increased
Thist patch introduces the update_domain function which
propagates the larger address space of a protection domain
to the device table and flushes all relevant DTEs and the
domain TLB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:45 +02:00
Joerg Roedel
407d733e30 x86/amd-iommu: Introduce set_dte_entry function
This function factors out some logic of attach_device to a
seperate function. This new function will be used to update
device table entries when necessary.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:44 +02:00
Joerg Roedel
6a0dbcbe4e x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices
This patch adds a generic variant of
amd_iommu_flush_all_devices function which flushes only the
DTEs for a given protection domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:43 +02:00
Joerg Roedel
a6d41a4027 x86/amd-iommu: Use fetch_pte in amd_iommu_iova_to_phys
Don't reimplement the page table walker in this function.
Use the generic one.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:42 +02:00
Joerg Roedel
38a76eeeaf x86/amd-iommu: Use fetch_pte in iommu_unmap_page
Instead of reimplementing existing logic use fetch_pte to
walk the page table in iommu_unmap_page.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:41 +02:00
Joerg Roedel
9355a08186 x86/amd-iommu: Make fetch_pte aware of dynamic mapping levels
This patch changes the fetch_pte function in the AMD IOMMU
driver to support dynamic mapping levels.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:34 +02:00
Joerg Roedel
6a1eddd2f9 x86/amd-iommu: Reset command buffer if wait loop fails
Instead of a panic on an comletion wait loop failure, try to
recover from that event from resetting the command buffer.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:56:19 +02:00
Joerg Roedel
b26e81b871 x86/amd-iommu: Panic if IOMMU command buffer reset fails
To prevent the driver from doing recursive command buffer
resets, just panic when that recursion happens.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:35 +02:00
Joerg Roedel
a345b23b79 x86/amd-iommu: Reset command buffer on ILLEGAL_COMMAND_ERROR
On an ILLEGAL_COMMAND_ERROR the IOMMU stops executing
further commands. This patch changes the code to handle this
case better by resetting the command buffer in the IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:34 +02:00
Joerg Roedel
93f1cc67cf x86/amd-iommu: Add reset function for command buffers
This patch factors parts of the command buffer
initialization code into a seperate function which can be
used to reset the command buffer later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:34 +02:00
Joerg Roedel
d586d7852c x86/amd-iommu: Add function to flush all DTEs on one IOMMU
This function flushes all DTE entries on one IOMMU for all
devices behind this IOMMU. This is required for command
buffer resetting later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:23 +02:00
Joerg Roedel
e0faf54ee8 x86/amd-iommu: fix broken check in amd_iommu_flush_all_devices
The amd_iommu_pd_table is indexed by protection domain
number and not by device id. So this check is broken and
must be removed.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Joerg Roedel
ae908c22aa x86/amd-iommu: Remove redundant 'IOMMU' string
The 'IOMMU: ' prefix is not necessary because the
DUMP_printk macro already prints its own prefix.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Joerg Roedel
4c6f40d4e0 x86/amd-iommu: replace "AMD IOMMU" by "AMD-Vi"
This patch replaces the "AMD IOMMU" printk strings with the
official name for the hardware: "AMD-Vi".

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Joerg Roedel
f2430bd104 x86/amd-iommu: Remove some merge helper code
This patch removes some left-overs which where put into the code to
simplify merging code which also depends on changes in other trees.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:55 +02:00
Joerg Roedel
e394d72aa8 x86/amd-iommu: Introduce function for iommu-local domain flush
This patch introduces a function to flush all domain tlbs
for on one given IOMMU. This is required later to reset the
command buffer on one IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:41:34 +02:00
Joerg Roedel
945b4ac44e x86/amd-iommu: Dump illegal command on ILLEGAL_COMMAND_ERROR
This patch adds code to dump the command which caused an
ILLEGAL_COMMAND_ERROR raised by the IOMMU hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 14:28:04 +02:00
Joerg Roedel
e3e59876e8 x86/amd-iommu: Dump fault entry on DTE error
This patch adds code to dump the content of the device table
entry which caused an ILLEGAL_DEV_TABLE_ENTRY error from the
IOMMU hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 14:17:08 +02:00
Ingo Molnar
f76bd108e5 Merge branch 'perfcounters/urgent' into perfcounters/core
Merge reason: We are going to modify a place modified by
              perfcounters/urgent.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 21:42:59 +02:00
David Howells
ee18d64c1f KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
Add a keyctl to install a process's session keyring onto its parent.  This
replaces the parent's session keyring.  Because the COW credential code does
not permit one process to change another process's credentials directly, the
change is deferred until userspace next starts executing again.  Normally this
will be after a wait*() syscall.

To support this, three new security hooks have been provided:
cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
the blank security creds and key_session_to_parent() - which asks the LSM if
the process may replace its parent's session keyring.

The replacement may only happen if the process has the same ownership details
as its parent, and the process has LINK permission on the session keyring, and
the session keyring is owned by the process, and the LSM permits it.

Note that this requires alteration to each architecture's notify_resume path.
This has been done for all arches barring blackfin, m68k* and xtensa, all of
which need assembly alteration to support TIF_NOTIFY_RESUME.  This allows the
replacement to be performed at the point the parent process resumes userspace
execution.

This allows the userspace AFS pioctl emulation to fully emulate newpag() and
the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
alter the parent process's PAG membership.  However, since kAFS doesn't use
PAGs per se, but rather dumps the keys into the session keyring, the session
keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
the newpag flag.

This can be tested with the following program:

	#include <stdio.h>
	#include <stdlib.h>
	#include <keyutils.h>

	#define KEYCTL_SESSION_TO_PARENT	18

	#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)

	int main(int argc, char **argv)
	{
		key_serial_t keyring, key;
		long ret;

		keyring = keyctl_join_session_keyring(argv[1]);
		OSERROR(keyring, "keyctl_join_session_keyring");

		key = add_key("user", "a", "b", 1, keyring);
		OSERROR(key, "add_key");

		ret = keyctl(KEYCTL_SESSION_TO_PARENT);
		OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");

		return 0;
	}

Compiled and linked with -lkeyutils, you should see something like:

	[dhowells@andromeda ~]$ keyctl show
	Session Keyring
	       -3 --alswrv   4043  4043  keyring: _ses
	355907932 --alswrv   4043    -1   \_ keyring: _uid.4043
	[dhowells@andromeda ~]$ /tmp/newpag
	[dhowells@andromeda ~]$ keyctl show
	Session Keyring
	       -3 --alswrv   4043  4043  keyring: _ses
	1055658746 --alswrv   4043  4043   \_ user: a
	[dhowells@andromeda ~]$ /tmp/newpag hello
	[dhowells@andromeda ~]$ keyctl show
	Session Keyring
	       -3 --alswrv   4043  4043  keyring: hello
	340417692 --alswrv   4043  4043   \_ user: a

Where the test program creates a new session keyring, sticks a user key named
'a' into it and then installs it on its parent.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 21:29:22 +10:00
Ingo Molnar
936e894a97 Merge commit 'v2.6.31-rc8' into x86/txt
Conflicts:
	arch/x86/kernel/reboot.c
	security/Kconfig

Merge reason: resolve the conflicts, bump up from rc3 to rc8.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 08:17:56 +02:00
Shane Wang
69575d3886 x86, intel_txt: clean up the impact on generic code, unbreak non-x86
Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-01 18:25:07 -07:00
Prarit Bhargava
1a8e42fa81 [CPUFREQ] Create a blacklist for processors that should not load the acpi-cpufreq module.
Create a blacklist for processors that should not load the acpi-cpufreq module.

The initial entry in the blacklist function is the Intel 0f68 processor.  It's
specification update mentions errata AL30 which implies that cpufreq should not
run on this processor.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:20 -04:00
Mark Langsdorf
db39d5529d [CPUFREQ] Powernow-k8: Enable more than 2 low P-states
Remove an obsolete check that used to prevent there being more
than 2 low P-states.  Now that low-to-low P-states changes are
enabled, it prevents otherwise workable configurations with
multiple low P-states.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Krists Krilovs <pow@pow.za.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01 12:45:20 -04:00
Catalin Marinas
acde31dc46 kmemleak: Ignore the aperture memory hole on x86_64
This block is allocated with alloc_bootmem() and scanned by kmemleak but
the kernel direct mapping may no longer exist. This patch tells kmemleak
to ignore this memory hole. The dma32_bootmem_ptr in
dma32_reserve_bootmem() is also ignored.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-09-01 11:12:32 +01:00
H. Peter Anvin
ff55df53df x86, msr: Export the register-setting MSR functions via /dev/*/msr
Make it possible to access the all-register-setting/getting MSR
functions via the MSR driver.  This is implemented as an ioctl() on
the standard MSR device node.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@gmail.com>
2009-08-31 16:16:04 -07:00
H. Peter Anvin
0cc0213e73 x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
For some reason, the _safe MSR functions returned -EFAULT, not -EIO.
However, the only user which cares about the return code as anything
other than a boolean is the MSR driver, which wants -EIO.  Change it
to -EIO across the board.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-08-31 15:15:23 -07:00
Borislav Petkov
6b0f43ddfa x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
fbd8b1819e turns off the bit for
/proc/cpuinfo. However, a proper/full fix would be to additionally
turn off the bit in the CPUID output so that future callers get
correct CPU features info.

Do that by basically reversing what the BIOS wrongfully does at boot.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1251705011-18636-3-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:29 -07:00
Borislav Petkov
177fed1ee8 x86, msr: Rewrite AMD rd/wrmsr variants
Switch them to native_{rd,wr}msr_safe_regs and remove
pv_cpu_ops.read_msr_amd.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:28 -07:00
Borislav Petkov
132ec92f3f x86, msr: Add rd/wrmsr interfaces with preset registers
native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow
presetting of a subset of eight x86 GPRs before executing the rd/wrmsr
instructions. This is needed at least on AMD K8 for accessing an erratum
workaround MSR.

Originally based on an idea by H. Peter Anvin.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:26 -07:00
Thomas Gleixner
e11dadabf4 x86: apic namespace cleanup
boot_cpu_physical_apicid is a global variable and used as function
argument as well. Rename the function arguments to avoid confusion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 21:30:47 +02:00
Thomas Gleixner
bc07844a33 x86: Distangle ioapic and i8259
The proposed Moorestown support patches use an extra feature flag
mechanism to make the ioapic work w/o an i8259. There is a much
simpler solution.

Most i8259 specific functions are already called dependend on the irq
number less than NR_IRQS_LEGACY. Replacing that constant by a
read_mostly variable which can be set to 0 by the platform setup code
allows us to achieve the same without any special feature flags.

That trivial change allows us to proceed with MRST w/o doing a full
blown overhaul of the ioapic code which would delay MRST unduly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 19:23:09 +02:00
Thomas Gleixner
3f4110a48a x86: Add Moorestown early detection
Moorestown MID devices need to be detected early in the boot process
to setup and do not call x86_default_early_setup as there is no EBDA
region to reserve.

[ Copied the minimal code from Jacobs latest MRST series ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
2009-08-31 11:09:40 +02:00
Pan, Jacob jun
162bc7ab01 x86: Add hardware_subarch ID for Moorestown
x86 bootprotocol 2.07 has introduced hardware_subarch ID in the boot
parameters provided by FW. We use it to identify Moorestown platforms.

[ tglx: Cleanup and paravirt fix ]

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 11:09:40 +02:00
Thomas Gleixner
47a3d5da70 x86: Add early platform detection
Platforms like Moorestown require early setup and want to avoid the
call to reserve_ebda_region. The x86_init override is too late when
the MRST detection happens in setup_arch. Move the default i386
x86_init overrides and the call to reserve_ebda_region into a separate
function which is called as the default of a switch case depending on
the hardware_subarch id in boot params. This allows us to add a case
for MRST and let MRST have its own early setup function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 11:09:40 +02:00
Thomas Gleixner
dd0a70c8f9 x86: Move tsc_init to late_time_init
We do not need the TSC before late_time_init. Move the tsc_init to the
late time init code so we can also utilize HPET for calibration (which
we claimed to do but never did except in some older kernel
version). This also helps Moorestown to calibrate the TSC with the
AHBT timer which needs to be initialized in late_time_init like HPET.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:47 +02:00
Thomas Gleixner
2d826404f0 x86: Move tsc_calibration to x86_init_ops
TSC calibration is modified by the vmware hypervisor and paravirt by
separate means. Moorestown wants to add its own calibration routine as
well. So make calibrate_tsc a proper x86_init_ops function and
override it by paravirt or by the early setup of the vmware
hypervisor.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:47 +02:00
Thomas Gleixner
47926214d8 x86: Replace the now identical time_32/64.c by time.c
Remove the redundant copy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
ef4512882d x86: time_32/64.c unify profile_pc
The code is identical except for the formatting and a useless
#ifdef. Make it the same.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
08047c4f17 x86: Move calibrate_cpu to tsc.c
Move the code where it's only user is. Also we need to look whether
this hardwired hackery might interfere with perfcounters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
454ede7eeb x86: Make timer setup and global variables the same in time_32/64.c
The timer and timer irq setup code is identical in 32 and 64 bit. Make
it the same formatting as well. Also add the global variables under
the necessary ifdefs to both files.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
0be6939422 x86: Remove mca bus ifdef from timer interrupt
MCA_bus is constant 0 when CONFIG_MCA=n. So the compiler removes that
code w/o needing an extra #ifdef

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
64fcbac1f3 x86: Simplify timer_ack magic in time_32.c
Let the compiler optimize the timer_ack magic away in the 32bit timer
interrupt and put the same code into time_64.c. It's optimized out for
CONFIG_X86_IO_APIC on 32bit and for 64bit because timer_ack is const 0
in both cases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
dd3e6e8c6e x86: Prepare unification of time_32/64.c
Unify the top comment and the includes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
ecce85089e x86: Remove do_timer hook
This is a left over of the old x86 sub arch support. Remove it and
open code it like we do in time_64.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
845b3944bb x86: Add timer_init to x86_init_ops
The timer init code is convoluted with several quirks and the paravirt
timer chooser. Figuring out which code path is actually taken is not
for the faint hearted.

Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and
replace the paravirt time chooser and the remaining x86 quirk with a
simple x86_init_ops function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
736decac64 x86: Move percpu clockevents setup to x86_init_ops
paravirt overrides the setup of the default apic timers as per cpu
timers. Moorestown needs to override that as well.

Move it to x86_init_ops setup and create a separate x86_cpuinit struct
which holds the function for the secondary evtl. hotplugabble CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
f1d7062a23 x86: Move xen_post_allocator_init into xen_pagetable_setup_done
We really do not need two paravirt/x86_init_ops functions which are
called in two consecutive source lines. Move the only user of
post_allocator_init into the already existing pagetable_setup_done
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
030cb6c00d x86: Move paravirt pagetable_setup to x86_init_ops
Replace more paravirt hackery by proper x86_init_ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
6f30c1ac3f x86: Move paravirt banner printout to x86_init_ops
Replace another obscure paravirt magic and move it to
x86_init_ops. Such a hook is also useful for embedded and special
hardware.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
42bbdb43b1 x86: Replace ARCH_SETUP by a proper x86_init_ops
ARCH_SETUP is a horrible leftover from the old arch/i386 mach support
code. It still has a lonely user in xen. Move it to x86_init_ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
428cf9025b x86: Move traps_init to x86_init_ops
Replace the quirks by a simple x86_init_ops function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
66bcaf0bde x86: Move irq_init to x86_init_ops
irq_init is overridden by x86_quirks and by paravirts. Unify the whole
mess and make it an unconditional x86_init_ops function which defaults
to the standard function and can be overridden by the early platform
code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
d9112f4302 x86: Move pre_intr_init to x86_init_ops
Replace the quirk machinery by a x86_init_ops function which
defaults to the standard implementation. This is also a preparatory
patch for Moorestown support which needs to replace the default
init_ISA_irqs as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
b3f1b617f4 x86: Move get/find_smp_config to x86_init_ops
Replace the quirk machinery by a x86_init_ops function which defaults
to the standard implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Jan Beulich
47d25003cb x86: Fix earlyprintk=dbgp for machines without NX
Since parse_early_param() may (e.g. for earlyprintk=dbgp)
involve calls to page table manipulation functions (here
set_fixmap_nocache()), NX hardware support must be determined
before calling that function (so that __supported_pte_mask gets
properly set up).

But the call after parse_early_param() can also not go away, as
that will honor eventual command line specified disabling of
the NX functionality.

( This will then just result in whatever mappings got
  established during parse_early_param() having the NX bit set
  despite it being disabled on the command line, but I think
  that's tolerable).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <4A97F3BD02000078000121B9@vpn.id2.novell.com>
[ merged to x86/pat to resolve a conflict. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 15:47:32 +02:00
Ingo Molnar
eebc57f73d Merge branch 'for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 into x86/apic
Merge reason: the SFI (Simple Firmware Interface) feature in the ACPI
              tree needs this cleanup, pull it into the APIC branch as
	      well so that there's no interactions.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 09:31:47 +02:00
Feng Tang
2a4ab640d3 ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n
Some IO-APIC routines are ACPI specific now, but need to
be exposed when CONFIG_ACPI=n for the benefit of SFI.

Remove #ifdef ACPI around these routines:

io_apic_get_unique_id(int ioapic, int apic_id);
io_apic_get_version(int ioapic);
io_apic_get_redir_entries(int ioapic);

Move these routines from ACPI-specific boot.c to io_apic.c:

uniq_ioapic_id(u8 id)
mp_find_ioapic()
mp_find_ioapic_pin()
mp_register_ioapic()

Also, since uniq_ioapic_id() is now no longer static,
re-name it to io_apic_unique_id() for consistency
with the other public io_apic routines.

For simplicity, do not #ifdef the resulting code ACPI || SFI,
thought that could be done in the future if it is important
to optimize the !ACPI !SFI IO-APIC x86 kernel for size.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2009-08-28 19:57:27 -04:00
Thomas Gleixner
7285dd7fd3 clocksource: Resolve cpu hotplug dead lock with TSC unstable
Martin Schwidefsky analyzed it:
To register a clocksource the clocksource_mutex is acquired and if
necessary timekeeping_notify is called to install the clocksource as
the timekeeper clock. timekeeping_notify uses stop_machine which needs
to take cpu_add_remove_lock mutex.
Starting a new cpu is done with the cpu_add_remove_lock mutex held.
native_cpu_up checks the tsc of the new cpu and if the tsc is no good
clocksource_change_rating is called. Which needs the clocksource_mutex
and the deadlock is complete.

The solution is to replace the TSC via the clocksource watchdog
mechanism. Mark the TSC as unstable and schedule the watchdog work so
it gets removed in the watchdog thread context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
2009-08-28 20:25:24 +02:00
Thomas Gleixner
90e1c6969d x86: Move oem_bus_info to x86_init_ops
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:53 +02:00
Thomas Gleixner
52fdb56846 x86: Move mpc_oem_pci_bus to x86_init_ops
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
72302142e1 x86: Move smp_read_mpc_oem to x86_init_ops.
Move smp_read_mpc_oem from quirks to x86_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
fd6c666149 x86: Move mpc_apic_id to x86_init_ops
The mpc_apic_id setup is handled by a x86_quirk. Make it a
x86_init_ops function with a default implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
de93410310 x86: Move ioapic_ids_setup to x86_init_ops
32bit and also the numaq code have special requirements on the
ioapic_id setup. Convert it to a x86_init_ops function and get rid
of the quirks and #ifdefs

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
f4848472cd x86: Sanitize smp_record and move it to x86_init_ops
The x86 quirkification introduced an extra ugly hackery with a
variable pointer in the mpparse code. If the pointer is initialized
then it is dereferenced and the variable set to 0 or incremented.

Create a x86_init_ops function and let the affected numaq code
hold the function. Default init is a setup noop.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
6b18ae3e2f x86: Move memory_setup to x86_init_ops
memory_setup is overridden by x86_quirks and by paravirts with weak
functions and quirks. Unify the whole mess and make it an
unconditional x86_init_ops function which defaults to the standard
function and can be overridden by the early platform code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
816c25e7d4 x86: Add reserve_ebda_region to x86_init_ops
reserve_ebda_region needs to be called befor start_kernel. Moorestown
needs to override it. Make it a x86_init_ops function and initialize
it with the default reserve_ebda_region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
8fee697d99 x86: Add request_standard_resources to x86_init
The 32bit and the 64bit code are slighty different in the reservation
of standard resources. Also the upcoming Moorestown support needs its
own version of that.

Add it to x86_init_ops and initialize it with the 64bit default. 32bit
overrides it in early boot. Now moorestown can add it's own override
w/o sprinkling the code with more #ifdefs

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
f7cf5a5b8c x86: Add probe_roms to x86_init
probe_roms is only used on 32bit. Add it to the x86_init ops and
remove the #ifdefs.

Default initializer is x86_init_noop() which is overridden in
the 32bit boot code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
57844a8f8e x86: Add x86_init infrastructure
The upcoming Moorestown support brings the embedded world to x86. The
setup code of x86 has already a couple of hooks which are either
x86_quirks or paravirt ops. Some of those setup hooks are pretty
convoluted like the timer setup and the tsc calibration code. But
there are other places which could do with a cleanup.

Instead of having inline functions/macros which are modified at
compile time I decided to introduce x86_init ops which are
unconditional in the code and make it clear that they can be changed
either during compile time or in the early boot process. The function
pointers are initialized by default functions which can be noops so
that the pointer can be called unconditionally in the most cases. This
also allows us to remove 32bit/64bit, paravirt and other #ifdeffery.

paravirt guests are just a hardware platform in the setup code, so we
should treat them as such and not hide all behind multiple layers of
indirection and compile time dependencies.

It's more obvious that x86_init.timers.timer_init() is a function
pointer than the late_time_init = choose_time_init() obscurity. It's
also way simpler to grep for x86_init.timers.timer_init and find all
the places which modify that function pointer instead of analyzing
weak functions, macros and paravirt indirections.

Note. This is not a general paravirt_ops replacement. It just will
move setup related hooks which are potentially useful for other
platform setup purposes as well out of the paravirt domain.

Add the base infrastructure without any functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
4152f93508 Merge branch 'sched/clock' into x86/cleanups
Reason: The tsc init cleanup depends on sched_clock_init moving past
late_time_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:06:24 +02:00