Commit Graph

1984 Commits

Author SHA1 Message Date
Glauber Costa
e3f77edfc1 x86: use initial_code for i386
x86_64 jumps to whatever is written in "initial_code" symbol,
instead of a fixed address. Do it for i386 too. It will allow us
to integrate more of the smp boot code.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:17 +02:00
Glauber Costa
a939098afc x86: move x86_64 gdt closer to i386
i386 and x86_64 used two different schemes for maintaining the gdt.
With this patch, x86_64 initial gdt table is defined in a .c file,
same way as i386 is now. Also, we call it "gdt_page", and the descriptor,
"early_gdt_descr". This way we achieve common naming, which can allow for
more code integration.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:16 +02:00
Glauber Costa
736f12bff9 x86: don't use gdt_page openly.
There's a macro available for that.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:15 +02:00
Glauber Costa
9cf4f298e2 x86: use stack_start in x86_64
call x86_64's init_rsp stack_start, just as i386 does.
Put a zeroed stack segment for consistency. With this,
we can eliminate one ugly ifdef in smpboot.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:14 +02:00
Jeremy Fitzhardinge
a7bf0bd5e6 build: add __page_aligned_data and __page_aligned_bss
Making a variable page-aligned by using
__attribute__((section(".data.page_aligned"))) is fragile because if
sizeof(variable) is not also a multiple of page size, it leaves
variables in the remainder of the section unaligned.

This patch introduces two new qualifiers, __page_aligned_data and
__page_aligned_bss to set the section *and* the alignment of
variables.  This makes page-aligned variables more robust because the
linker will make sure they're aligned properly.  Unfortunately it
requires *all* page-aligned data to use these macros...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:13 +02:00
Bernhard Walle
1ecd27657b x86: unify crashkernel reservation for 32 and 64 bit
This patch moves the reserve_crashkernel() to setup.c and removes the
architecture-specific version. Both versions were more or less the same.

I tested it on both x86-64 and i386, with CONFIG_KEXEC on and off (so
that it compiles).

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:45:44 +02:00
Ingo Molnar
6236af82d8 Merge branch 'x86/fixmap' into x86/devel
Conflicts:

	arch/x86/mm/init_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:24:29 +02:00
Ingo Molnar
e3ae0acf59 Merge branch 'x86/uv' into x86/devel 2008-07-08 12:24:13 +02:00
Cliff Wickman
e7eb8726d0 x86, SGI UV: uv_ptc_proc_write fix
Someone could write 0 bytes to /proc/sgi_uv/ptc_statistics,
causing
  optstr[count - 1] = '\0';
to write to who-knows-where.

(Andi Kleen noticed this need from a patch I sent for
 similar code in the ia64 world (sn2_ptc_proc_write()).)

(count less than zero is not possible here, as count is unsigned)

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:31 +02:00
Cliff Wickman
cef5327868 x86, SGI UV: TLB shootdown using broadcast assist unit, v6
v6: 6/19 close the security hole in uv_ptc_proc_write())

  > Found a potential security hole while doing that:
  > static ssize_t uv_ptc_proc_write(struct file *file, const char __user *user,
  >                              size_t count, loff_t *data)
  >     if (copy_from_user(optstr, user, count))
  >             return -EFAULT;
  >
  > is count guaranteed to never be larger than 64?

is fixed below.

It adds tlb_uv.o to the Makefile.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: mingo@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:30 +02:00
Jack Steiner
b6df1b8bc1 x86: fix stack overflow for large values of MAX_APICS
physid_mask_of_physid() causes a huge stack (12k) to be created if the
number of APICS is large. Replace physid_mask_of_physid() with a
new function that does not create large stacks. This is a problem only
on large x86_64 systems.

this paves the way to increase MAX_APICS.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Cc: mingo@elte.hu
Cc: tglx@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:28 +02:00
Ingo Molnar
d400524aff SGI UV: TLB shootdown using broadcast assist unit, fix
fix:

arch/x86/kernel/tlb_uv.c: In function ‘uv_table_bases_init':
arch/x86/kernel/tlb_uv.c:612: error: ‘bau_tabsp' undeclared (first use in this function)
arch/x86/kernel/tlb_uv.c:612: error: (Each undeclared identifier is reported only once
arch/x86/kernel/tlb_uv.c:612: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:27 +02:00
Ingo Molnar
b4c286e6af SGI UV: clean up arch/x86/kernel/tlb_uv.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:26 +02:00
Ingo Molnar
dc163a41ff SGI UV: TLB shootdown using broadcast assist unit
TLB shootdown for SGI UV.

v5: 6/12 corrections/improvements per Ingo's second review

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:25 +02:00
Cliff Wickman
b194b12050 SGI UV: TLB shootdown using broadcast assist unit, cleanups
TLB shootdown for SGI UV.

v1: 6/2 original
v2: 6/3 corrections/improvements per Ingo's review
v3: 6/4 split atomic operations off to a separate patch (Jeremy's review)
v4: 6/12 include <mach_apic.h> rather than <asm/mach-bigsmp/mach_apic.h>
         (fixes a !SMP build problem that Ingo found)
         fix the index on uv_table_bases[blade]

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:24 +02:00
Cliff Wickman
1812924bb1 x86, SGI UV: TLB shootdown using broadcast assist unit
TLB shootdown for SGI UV.

Depends on patch (in tip/x86/irq):
   x86-update-macros-used-by-uv-platform.patch   Jack Steiner May 29

This patch provides the ability to flush TLB's in cpu's that are not on
the local node.  The hardware mechanism for distributing the flush
messages is the UV's "broadcast assist unit".

The hook to intercept TLB shootdown requests is a 2-line change to
native_flush_tlb_others() (arch/x86/kernel/tlb_64.c).

This code has been tested on a hardware simulator. The real hardware
is not yet available.

The shootdown statistics are provided through /proc/sgi_uv/ptc_statistics.
The use of /sys was considered, but would have required the use of
many /sys files.  The debugfs was also considered, but these statistics
should be available on an ongoing basis, not just for debugging.

Issues to be fixed later:
- The IRQ for the messaging interrupt is currently hardcoded as 200
  (see UV_BAU_MESSAGE).  It should be dynamically assigned in the future.
- The use of appropriate udelay()'s is untested, as they are a problem
  in the simulator.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:22 +02:00
Ingo Molnar
d98b940ab2 Merge branch 'linus' into x86/irq 2008-07-08 12:23:00 +02:00
Ingo Molnar
4b62ac9a2b Merge branch 'x86/nmi' into x86/devel
Conflicts:

	arch/x86/kernel/nmi.c
	arch/x86/kernel/nmi_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:17:08 +02:00
Ingo Molnar
2b4fa851b2 Merge branch 'x86/numa' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/e820.c
	arch/x86/kernel/efi_64.c
	arch/x86/kernel/mpparse.c
	arch/x86/kernel/setup.c
	arch/x86/kernel/setup_32.c
	arch/x86/mm/init_64.c
	include/asm-x86/proto.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:59:23 +02:00
Bernhard Walle
46f68e1c6b x86: use reserve_bootmem_generic() to reserve crashkernel memory on x86_64
This patch uses reserve_bootmem_generic() instead of reserve_bootmem()
to reserve the crashkernel memory on x86_64. That's necessary for NUMA
machines, see 00212fef81:

  [PATCH] Fix kdump Crash Kernel boot memory reservation for NUMA machines

  This patch will fix a boot memory reservation bug that trashes memory on
  the ES7000 when loading the kdump crash kernel.

  The code in arch/x86_64/kernel/setup.c to reserve boot memory for the crash
  kernel uses the non-numa aware "reserve_bootmem" function instead of the
  NUMA aware "reserve_bootmem_generic".  I checked to make sure that no other
  function was using "reserve_bootmem" and found none, except the ones that
  had NUMA ifdef'ed out.

  I have tested this patch only on an ES7000 with NUMA on and off (numa=off)
  in a single (non-NUMA) and multi-cell (NUMA) configurations.

  Signed-off-by: Amul Shah <amul.shah@unisys.com>
  Looks-good-to: Vivek Goyal <vgoyal@in.ibm.com>
  Cc: Andi Kleen <ak@muc.de>
  Signed-off-by: Andrew Morton <akpm@osdl.org>
  Signed-off-by: Linus Torvalds <torvalds@osdl.org>

The switch-back to reserve_bootmem() was accidentally introduced in
5c3391f9f7 when adding the BOOTMEM_EXCLUSIVE
parameter.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:49:52 +02:00
Bernhard Walle
3fd052b1b4 x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.

It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.

The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:49:49 +02:00
Randy Dunlap
053713f574 x86: fix setup.c printk format warning
Fix setup.c printk format warning:

linux-next-20080605/arch/x86/kernel/setup.c: In function 'setup_per_cpu_areas':
linux-next-20080605/arch/x86/kernel/setup.c:173: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ssize_t'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:31:32 +02:00
Vegard Nossum
03db1f74a7 x86: don't return invalid pointers from node_to_cpumask()
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:31:31 +02:00
Jeremy Fitzhardinge
f307d25e63 x86: compile error fix for smpboot.c
Without this patch, my link fails with:

arch/x86/kernel/built-in.o(.cpuinit.text+0x3c6e): In function `get_local_pda':
: undefined reference to `_cpu_pda'
arch/x86/kernel/built-in.o(.cpuinit.text+0x3cd1): In function `get_local_pda':
: undefined reference to `after_bootmem'
arch/x86/kernel/built-in.o(.cpuinit.text+0x3cec): In function `get_local_pda':
: undefined reference to `_cpu_pda'
make[2]: *** [.tmp_vmlinux1] Error 1

Caused by commit 766da892634694f795b18b9538407816896fc470
    x86: remove static boot_cpu_pda array v2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:27 +02:00
Mike Travis
5deb0b2a25 x86: leave initial __cpu_pda array in place until cpus are booted
Ingo Molnar wrote:
...
> they crashed after about 3 randconfig iterations with:
>
>   early res: 4 [8000-afff] PGTABLE
>   early res: 5 [b000-b87f] MEMNODEMAP
> PANIC: early exception 0e rip 10:ffffffff8077a150 error 2 cr2 37
> Pid: 0, comm: swapper Not tainted 2.6.25-sched-devel.git-x86-latest.git #14
>
> Call Trace:
>  [<ffffffff81466196>] early_idt_handler+0x56/0x6a
>  [<ffffffff8077a150>] ? numa_set_node+0x30/0x60
>  [<ffffffff8077a129>] ? numa_set_node+0x9/0x60
>  [<ffffffff8147a543>] numa_init_array+0x93/0xf0
>  [<ffffffff8147b039>] acpi_scan_nodes+0x3b9/0x3f0
>  [<ffffffff8147a496>] numa_initmem_init+0x136/0x150
>  [<ffffffff8146da5f>] setup_arch+0x48f/0x700
>  [<ffffffff802566ea>] ? clockevents_register_notifier+0x3a/0x50
>  [<ffffffff81466a87>] start_kernel+0xd7/0x440
>  [<ffffffff81466422>] x86_64_start_kernel+0x222/0x280
...
Here's the fixup...  This one should follow the previous patches.

Thanks,
Mike
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:26 +02:00
Mike Travis
3461b0af02 x86: remove static boot_cpu_pda array v2
* Remove the boot_cpu_pda array and pointer table from the data section.
    Allocate the pointer table and array during init.  do_boot_cpu()
    will reallocate the pda in node local memory and if the cpu is being
    brought up before the bootmem array is released (after_bootmem = 0),
    then it will free the initial pda.  This will happen for all cpus
    present at system startup.

    This removes 512k + 32k bytes from the data section.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:25 +02:00
Mike Travis
9f248bde9d x86: remove the static 256k node_to_cpumask_map
* Consolidate node_to_cpumask operations and remove the 256k
    byte node_to_cpumask_map.  This is done by allocating the
    node_to_cpumask_map array after the number of possible nodes
    (nr_node_ids) is known.

  * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
    been increased.  It now shows faults when calling node_to_cpumask()
    and node_to_cpumask_ptr().

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:24 +02:00
Mike Travis
7891a24e1e x86: restore pda nodenumber field
* Restore the nodenumber field in the x86_64 pda.  This field is slightly
    different than the x86_cpu_to_node_map mainly because it's a static
    indication of which node the cpu is on while the cpu to node map is a
    dyanamic mapping that may get reset if the cpu goes offline.  This also
    simplifies the numa_node_id() macro.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:23 +02:00
Mike Travis
23ca4bba3e x86: cleanup early per cpu variables/accesses v4
* Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
    used by some per_cpu variables that are initialized and accessed
    before there are per_cpu areas allocated.

    ["Early" in respect to per_cpu variables is "earlier than the per_cpu
    areas have been setup".]

    This patchset adds these new macros:

	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
	DECLARE_EARLY_PER_CPU(_type, _name)

	early_per_cpu_ptr(_name)
	early_per_cpu_map(_name, _idx)
	early_per_cpu(_name, _cpu)

    The DEFINE macro defines the per_cpu variable as well as the early
    map and pointer.  It also initializes the per_cpu variable and map
    elements to "_initvalue".  The early_* macros provide access to
    the initial map (usually setup during system init) and the early
    pointer.  This pointer is initialized to point to the early map
    but is then NULL'ed when the actual per_cpu areas are setup.  After
    that the per_cpu variable is the correct access to the variable.

    The early_per_cpu() macro is not very efficient but does show how to
    access the variable if you have a function that can be called both
    "early" and "late".  It tests the early ptr to be NULL, and if not
    then it's still valid.  Otherwise, the per_cpu variable is used
    instead:

	#define early_per_cpu(_name, _cpu) 			\
		(early_per_cpu_ptr(_name) ?			\
			early_per_cpu_ptr(_name)[_cpu] :	\
			per_cpu(_name, _cpu))

    A better method is to actually check the pointer manually.  In the
    case below, numa_set_node can be called both "early" and "late":

	void __cpuinit numa_set_node(int cpu, int node)
	{
	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);

	    if (cpu_to_node_map)
		    cpu_to_node_map[cpu] = node;
	    else
		    per_cpu(x86_cpu_to_node_map, cpu) = node;
	}

  * Add a flag "arch_provides_topology_pointers" that indicates pointers
    to topology cpumask_t maps are available.  Otherwise, use the function
    returning the cpumask_t value.  This is useful if cpumask_t set size
    is very large to avoid copying data on to/off of the stack.

  * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
    the non-debug case has been optimized a bit.

  * Remove an unreferenced compiler warning in drivers/base/topology.c

  * Clean up #ifdef in setup.c

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:20 +02:00
Mike Travis
7496b60654 x86: fix remove cpu_pda table patch
Mike Travis wrote:
> Ingo Molnar wrote:
>> * Mike Travis <travis@sgi.com> wrote:
>>
>>> [Ingo - please replace "PATCH 07/11" with this one.]
>>>
>>>     *	Remove 544k bytes from the kernel by removing the boot_cpu_pda
>>> 	array from the data section and allocating it during startup.
>>>
>>> 	Fixed panic in setup_per_cpu_areas when HOTPLUG_CPU not set.
>>>
>>> For inclusion into sched-devel/latest tree.
>> sched-devel.git randconfig testing found another crash with your queue:
>>
>> [    0.111060] Brought up 1 CPUs
>> [    0.111986] Total of 1 processors activated (4022.73 BogoMIPS).
>> [    0.112987] Testing NMI watchdog ... <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
>> [    0.114982] IP: [<ffffffff8180d4a0>] check_nmi_watchdog+0xb0/0x210
>> [    0.114982] PGD 0
>> [    0.114982] Oops: 0000 [1] SMP
>> [    0.114982] CPU 0
>> [............]
>>
>>  http://redhat.com/~mingo/misc/config-Mon_Apr_28_23_25_25_CEST_2008.bad
>>  http://redhat.com/~mingo/misc/log-Mon_Apr_28_23_25_25_CEST_2008.bad
>>
>> 	Ingo
>
> Hi Ingo,
>
> I need a bit more information on your hardware configuration.  Building a
> kernel with the above config file started up fine on both the Intel and AMD
> boxes.
>
> Based on the above output it looks like it might be a UP machine?
...

Ok, I think I found it.  In check_nmi_watchdog():

        for (cpu = 0; cpu < NR_CPUS; cpu++)
                prev_nmi_count[cpu] = cpu_pda(cpu)->__nmi_count;

As I mentioned it works fine on both of my systems so could you try it out?

Thanks!
Mike
--

  * Change function check_nmi_watchdog() to use nr_cpu_ids instead of NR_CPUS.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:28:47 +02:00
Ingo Molnar
3de352bbd8 Merge branch 'x86/mpparse' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/io_apic_32.c
	arch/x86/kernel/setup_64.c
	arch/x86/mm/init_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:14:58 +02:00
Matthew Garrett
9340e1ccdf x86, ioapic, acpi quirk: disable IRQ 0 through I/O APIC for some HP systems
Some HP laptops have a problem with their DSDT reporting as
HP/SB400/10000, which includes some code which overrides all temperature
trip points to 16C if the INTIN2 input of the I/O APIC is enabled.  This
input is incorrectly designated the ISA IRQ 0 via an interrupt source
override even though it is wired to the output of the master 8259A and
INTIN0 is not connected at all.  So far two models have been identified,
namely nx6125 and nx6325.

Use a knob provided by the I/O APIC interrupt registration code to
abandon any attempts to route IRQ 0 through the I/O APIC for these
systems.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:10:41 +02:00
Maciej W. Rozycki
471694ea6c x86, ioapic, acpi: add a knob to disable IRQ 0 through I/O APIC
As discovered recently some systems exhibit problems when the 8254 timer
IRQ is routed through the I/O APIC.  These problems do not affect the
timer IRQ itself and therefore cannot be detected when the correctness of
operation of the interrupt is verified in check_timer().  Therefore the
I/O APIC path of the timer IRQ has to be disabled entirely.

This is a change that lets platforms ask for the timer IRQ not to be
registered in the I/O APIC interrupt tables.  The local APIC and ExtINTA
paths are unaffected.  This request is only taken into account for ACPI
platforms as MP table systems seem unaffected so far.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:10:32 +02:00
Yinghai Lu
fcfa146e41 x86: update mptable fix with no ioapic v2
if the system doesn't have ioapic, we don't need to store entries for mptable
update

also let mp_config_acpi_gsi not call func in mpparse
so later could decouple mpparse with acpi more easily

Reported-by: Daniel Exner <dex@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Exner <dex@dragonslave.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:39:07 +02:00
Yinghai Lu
95a71a45c2 x86: cleanup machine_specific_memory_setup, v2
1. let 64bit support 88 and e801 too
2. introduce default_machine_specific_memory_setup, and reuse it
   for voyager

v2: fix 64 bit compiling

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:39:01 +02:00
Yinghai Lu
1c6e55032e x86: use acpi_numa_init to parse on 32-bit numa
seperate SRAT finding and parsing from get_memcfg_from_srat,
and let getmemcfg_from_srat only handle array from previous step.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:47 +02:00
Yinghai Lu
593a0cc390 x86: move some function out of setup_bootmem_alloc
... to make it more like 64-bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:34 +02:00
Yinghai Lu
064d25f120 x86: merge setup_memory_map with e820
... and kill e820_32/64.c and e820_32/64.h

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:25 +02:00
Yinghai Lu
cc9f7a0ccf x86: kill bad_ppro
so don't punish all other cpus without that problem when init highmem

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:19 +02:00
Yinghai Lu
41c094fd3c x86: move e820_resource_resources to e820.c
and make 32-bit resource registration more like 64 bit.

also move probe_roms back to setup_32.c

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:14 +02:00
Huang, Ying
8c5beb50d3 x86 boot: pass E820 memory map entries more than 128 via linked list of setup data
Because of the size limits of struct boot_params (zero page), the
maximum number of E820 memory map entries can be passed to kernel is
128. As pointed by Paul Jackson, there is some machine produced by SGI
with so many nodes that the number of E820 memory map entries is more
than 128. To enabling Linux kernel on these system, a new setup data
type named SETUP_E820_EXT is defined to pass additional memory map
entries to Linux kernel.

This patch is based on x86/auto-latest branch of git-x86 tree and has
been tested on x86_64 and i386 platform.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:39 +02:00
Yinghai Lu
b5bc6c0e55 x86, mm: use add_highpages_with_active_regions() for high pages init v2
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage

also remove page_is_reserved_early(), it is not needed anymore.

v2: fix the build of other platforms

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:25 +02:00
Yinghai Lu
d0be6bdea1 x86: rename two e820 related functions
rename update_memory_range to e820_update_range
rename add_memory_region to e820_add_region

to make it more clear that they are about e820 map operations.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:01 +02:00
Yinghai Lu
6df8809bbd x86: use dstapic in mp_config_acpi_legacy_irqs
so we don't get the same value multiple times.

also make mp_config_acpi_legacy_irqs more readable by moving assignments
together.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:55 +02:00
Yinghai Lu
d867e5310b x86: keep MP_intsrc_info untouched if we do not update mptable
Daniel Exner reported IO-APIC enumeration breakage in linux-next.

Alexey Starikovskiy found out that it might be related to
commit 2944e16b25 "x86: update mptable".

use enable_update_mptable to decide if need check before add mp_irqs array.

Reported-by: Daniel Exner <webmaster@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:40 +02:00
Yinghai Lu
9a27f5c516 x86: clean up relocate_initrd
1. move that before zone_sizes_init ...
2. add free_early for one old one, otherwise it will be be reserved again
   when we init highmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:34 +02:00
Yinghai Lu
d2dbf34332 x86: clean up reserve_bootmem_generic() and port it to 32-bit
1. add reserve_bootmem_generic for 32bit
2. change len to unsigned long
3. make early_res_to_bootmem to use it

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:17 +02:00
Yinghai Lu
b1f006b65c x86: make generic arch support NUMAQ, fix #2
we are checking mptable early for numaq, so don't need to reserve_bootmem
for it. bootmem is not there yet.

do the same thing as 64-bit.

found it on 64g above system from 64-bit kernel kexec to 32 bit kernel with
numaq support.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:04 +02:00
Yinghai Lu
ab4a465e96 x86: e820 merge parsing of the mem=/memmap= boot parameters
since we now have 32-bit support for e820_register_active_regions(),
we can merge the parsing of the mem=/memmap= boot parameters.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:35:38 +02:00
Ingo Molnar
df5f6c212c x86: unify the reserve_bootmem() behavior of early_res_to_bootmem()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:35:31 +02:00
Bernhard Walle
12448e3ef3 x86: use reserve_bootmem_generic() to reserve crashkernel memory on x86_64
This patch uses reserve_bootmem_generic() instead of reserve_bootmem()
to reserve the crashkernel memory on x86_64. That's necessary for NUMA
machines, see 00212fef81:

  [PATCH] Fix kdump Crash Kernel boot memory reservation for NUMA machines

  This patch will fix a boot memory reservation bug that trashes memory on
  the ES7000 when loading the kdump crash kernel.

  The code in arch/x86_64/kernel/setup.c to reserve boot memory for the crash
  kernel uses the non-numa aware "reserve_bootmem" function instead of the
  NUMA aware "reserve_bootmem_generic".  I checked to make sure that no other
  function was using "reserve_bootmem" and found none, except the ones that
  had NUMA ifdef'ed out.

  I have tested this patch only on an ES7000 with NUMA on and off (numa=off)
  in a single (non-NUMA) and multi-cell (NUMA) configurations.

  Signed-off-by: Amul Shah <amul.shah@unisys.com>
  Looks-good-to: Vivek Goyal <vgoyal@in.ibm.com>
  Cc: Andi Kleen <ak@muc.de>
  Signed-off-by: Andrew Morton <akpm@osdl.org>
  Signed-off-by: Linus Torvalds <torvalds@osdl.org>

The switch-back to reserve_bootmem() was accidentally introduced in
5c3391f9f7 when adding the BOOTMEM_EXCLUSIVE
parameter.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:35:13 +02:00
Bernhard Walle
8b2ef1d728 x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.

It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.

The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:34:54 +02:00
Ingo Molnar
896395c290 Merge branch 'linus' into tmp.x86.mpparse.new 2008-07-08 10:32:56 +02:00
Ingo Molnar
1b8ba39a3f Merge branch 'x86/irq' into x86/devel
Conflicts:

	arch/x86/kernel/i8259.c
	arch/x86/kernel/irqinit_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:53:57 +02:00
Ingo Molnar
58cf35228f Merge branches 'x86/mmio', 'x86/delay', 'x86/idle', 'x86/oprofile', 'x86/debug', 'x86/ptrace' and 'x86/amd-iommu' into x86/devel 2008-07-08 09:46:15 +02:00
Ingo Molnar
6924d1ab8b Merge branches 'x86/numa-fixes', 'x86/apic', 'x86/apm', 'x86/bitops', 'x86/build', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel 2008-07-08 09:16:56 +02:00
Christophe Jaillet
25556c1699 x86, arch/x86/kernel/io_apic_32.c: use kzalloc instead of kmalloc/memset
1) replace kmalloc/memset with equivalent kzalloc.

Signed-off-by: Christophe Jaillet <jaillet.christophe@wanadoo.fr>
Cc: cj <jaillet.christophe@wanadoo.fr>
Cc: petero2@telia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:25 +02:00
Maciej W. Rozycki
7f0dbbc08d x86: fix IO APIC breakage on HP nx6325, v2
> That helped a lot, the system seems to work normally now.
>
> Here's the relevant snippet from dmesg:
>
> [    0.108006] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.108006] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [    0.108006] ...trying to set up timer (IRQ0) through the 8259A ... <3>
> [    0.108006] ..... (found apic 0 pin 2) ...<3> failed.
> [    0.108006] ...trying to set up timer as Virtual Wire IRQ...<3> works.
>
> and the whole thing is at: http://www.sisk.pl/kernel/debug/20080618/dmesg-2.log

 Hmm, that only proved the 8259A is indeed wired to the pin #2 of the I/O
APIC.

> I, personally, don't have any and AMD only has SB600 documentation on its
> web page (it's still marked as "AMD confidential" ;-)).

 Well, the IC block is most likely the same as that's not rocket science
and once done there is no need to fiddle with that.  That written, I am
afraid there is nothing useful about the IC in the document, except that
it's there and consists of an I/O APIC providing 24 inputs and the usual
pair of 8259A cores.  Thanks for the reference anyway.

> There is an interrupt controller in there, but I'm not sure if there's any
> 8259A.  The northbridge is on the CPU, actually.

 I will praise the day someone ships an x86 machine without an 8259A core!

 As expressed in another mail I suspect there may actually be a direct
route from the 8254 to INTIN0 in the southbridge -- this is what other
bootstrap logs seen in the Internet suggest.  This would mean this
particular BIOS is buggy (is it the latest version?) and provides an
incorrect IRQ override in its ACPI tables, for example because the
responsible block has been blindly copied from a machine using a commoner
wiring.  This could be moderately easily fixed up with a quirk based on
the PCI ID (after checking it again, we actually used to have a quirk for
ATI in this area, but the way it was done suggests the issue was not
understood well enough).

 Could you please remove the hack sent yesterday and test the patch
provided below?  I do hope it builds, but I have no immediate means to
check it.  Please report the output.  The intent is to test INTIN0
directly before testing INTIN2 through the 8259A.  Thanks.

 Aside of that, what I have gathered from your reports (please correct me
if I have got it wrong) is that when the through-8259A mode is used, then
after a while 8254 timer interrupts stop arriving.  What's interesting,
the "Virtual Wire IRQ" seems to work for you correctly (that's quite an
odd setup where a local APIC input is used in the native mode -- please
post /proc/interrupts for confirmation), which in turn implies the master
8259A drives its INT output as we expect.  Why would the I/O APIC input
have problems then?  Hmm...

[ mingo@elte.hu: revert the "x86: fix IO APIC breakage on HP nx6325"
  version. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:24 +02:00
Maciej W. Rozycki
cd08d0754e x86: fix IO APIC breakage on HP nx6325
On Thu, 19 Jun 2008, Rafael J. Wysocki wrote:

> >  With such a configuration the "x86: I/O APIC: timer through 8259A
> > second-chance" patch should not matter, because the only change it
> > introduces is an attempt to try the same I/O APIC pin again, but with the
> > IRQ0 line of the master 8259A enabled.  That's not a terribly unusual
> > configuration and nothing should get confused in the system.
>
> But it _does_ get confused, really.

 Something certainly gets confused, but so far I am not sure which bit
exactly it is, are you?

> >  Barring the unlikely possibility of the 8259A actually being wired to
> > INTIN2 of the I/O APIC I can see two possible explanations:
> >
> > 1. The 8259A interrupt actually escapes to the CPU somehow and is handled
> >    as an ExtINTA interrupt.  This would make the code in check_timer()
> >    decide it has found a working configuration, while actually it has been
> >    fooled.
[...]
> Here you go:
>
> [    0.108006] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.108006] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [    0.108006] ...trying to set up timer (IRQ0) through the 8259A ... <3>
> [    0.108006] ..... (found apic 0 pin 2) ...<3> works.
>
> The full dmesg is at: http://www.sisk.pl/kernel/debug/20080618/dmesg-1.log

Thanks.  In this case I suspect the case #1 quoted above happens, that is
the 8259A manages to deliver its interrupt somehow.  Note at this stage it
is meant to be in the AEOI mode, so it can happily resubmit the interrupt
indefinitely with no additional handling as long as it receives INTA
cycles.

Can you please try the patch below on top of "x86: I/O APIC: timer
through 8259A second-chance" to see whether my hypothesis is true?  It
modifies the through-8259A setup path so that the APIC input gets masked,
but the 8259A has the timer interrupt still enabled.  Let me know how the
timer interrupt is routed in this case.

Bisected-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:24 +02:00
Paolo Ciarrocchi
360624484c x86: coding style fixes to arch/x86/kernel/io_apic_32.c
Before:
total: 91 errors, 73 warnings, 2850 lines checked

After:
total: 1 errors, 47 warnings, 2848 lines checked

Compile tested:

paolo@paolo-desktop:/tmp$ size io*
   text    data     bss     dec     hex filename
  13836    1756   11104   26696    6848 io_apic_32.o.after
  13836    1756   11104   26696    6848 io_apic_32.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:23 +02:00
Cyrill Gorcunov
46b3b4ef1e x86, io-apic: use predefined names instead of numeric constants
This patch replaces some hard-coded numbers with predefined names.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:22 +02:00
Maciej W. Rozycki
d54db1ac9e x86: APIC/SMP: Downgrade the NMI watchdog for "nosmp"
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip has been deactivated as a result of "nosmp".  Downgrade to the
local APIC watchdog similarly to what is done for the UP case.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:19 +02:00
Maciej W. Rozycki
1966202748 x86: APIC/UP: Remove redundant NMI watchdog downgrade
For the UP case the NMI watchdog downgrade is done consistently in
APIC_init_uniprocessor() now.  Remove redundant code used only when
BIOS-disabled local APIC is activated.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:18 +02:00
Maciej W. Rozycki
acae7d906f x86: APIC/UP: Downgrade the NMI watchdog for no I/O APIC
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip will not be used in the UP configuration, because "noapic" has
been specified or the chip is simply not there.  Downgrade to the local
APIC watchdog to rectify.

The new #ifdef is ugly, I know.  A proper solution is to provide suitable
definitions of smp_found_config, etc. for !CONFIG_X86_IO_APIC in a header.
Likewise the whole if () condition should be moved to a static inline
function.  Such clean-ups are beyond the scope of this change and can be
done once the whole issue of the timer has been sorted out.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:17 +02:00
Ingo Molnar
6fe9fe8756 Revert "x86: APIC/SMP: downgrade the NMI watchdog for "nosmp""
This reverts commit 791b93d3dfaf16c23e978bec0cc0a3dd9d855d63.

A better fix from Maciej will be merged.
2008-07-08 09:13:15 +02:00
Ingo Molnar
ab5a5be099 Revert "x86, io-apic: fix nmi_watchdog=1 bootup hang"
This reverts commit 2229ff84f01746d02fb6b79e156fb5cce48c908f.

A better fix from Maciej will be merged.
2008-07-08 09:13:14 +02:00
Ingo Molnar
ff11571b25 x86, io-apic: fix nmi_watchdog=1 bootup hang
nmi_watchdog=1 hangs on 64-bit:

[    0.250000] Detected 12.564 MHz APIC timer.
[    0.254178] APIC timer registered as dummy, due to nmi_watchdog=1!
[    0.260366] Testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!
[    ...     ]
[    0.470003] calling  genl_init+0x0/0xd0
[  hard hang ]

bisected it down to:

 git-bisect start
 git-bisect good 1beee8dc8c
 git-bisect bad 11582ece0aaa2d0f94f345c08a4ab9997078a083
 git-bisect bad 5479c623bb44089844022c03d4c0eb16d5b7a15f
 git-bisect bad cfb4c7fabeb499e1c29f9d1878968e37a938e28a
 git-bisect good 246dd412d3
 git-bisect bad 3f8237eaff7dc1e35fa791dae095574fd974e671
 git-bisect good 90e23b13ab849e2a11f00c655eb3a2011b4623be
 git-bisect bad 833526a34eeefc117df3191a594c3c3a4f15a9ac
 git-bisect good 791b93d3dfaf16c23e978bec0cc0a3dd9d855d63
 git-bisect bad 65767c64068f2c93e56a1accfed5c78230ac12d7
 git-bisect bad 2abc5c05dd82c188e3bdf6641a274f013348d14b
 git-bisect bad 317e1f2597ffb4d4db940577bbe56dc6e881ef07

| 317e1f2597ffb4d4db940577bbe56dc6e881ef07 is first bad commit
| commit 317e1f2597ffb4d4db940577bbe56dc6e881ef07
| Author: Maciej W. Rozycki <macro@linux-mips.org>
| Date:   Wed May 21 22:10:22 2008 +0100
|     x86: I/O APIC: clean up the 8259A on a NMI watchdog failure

the problem is that in the dummy-lapic branch we rely on the i8259A
but if the NMI watchdog fails we turn off IRQ 0 - which doesnt work
too well ;-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:13 +02:00
Cyrill Gorcunov
067fa0ff0c x86: IO-APIC - use NMI_NONE instead of numeric constant
Not sure but maybe it is better to use NMI_DISABLED,
will take a look. But for now this patch is not change
anything in logic so it will not hurt/broke the kernel.
For most cases nmi_watchdog assignment is by one of NMI_*
macro so I think there it make sense too.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:12 +02:00
Ingo Molnar
b1b57ee135 x86 build fix:
arch/x86/kernel/io_apic_64.c: In function 'check_timer':
  arch/x86/kernel/io_apic_64.c:1688: error: 'vector' undeclared (first use in this function)
  arch/x86/kernel/io_apic_64.c:1688: error: (Each undeclared identifier is reported only once
  arch/x86/kernel/io_apic_64.c:1688: error: for each function it appears in.)
2008-07-08 09:13:11 +02:00
Thomas Gleixner
431ee79db0 x86: apic_64.c fix sparse warnings about shadowed variables
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:10 +02:00
Thomas Gleixner
7223daf5e1 x86: make irq_cfg static
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:09 +02:00
Maciej W. Rozycki
691874fa96 x86: I/O APIC: timer through 8259A second-chance
Some systems incorrectly report the ExtINTA pin of the I/O APIC as the
genuine target of the timer interrupt.  Here is a change that copies timer
pin information found to the other pin if one has been found only.  This
way both a direct and a through-8259A route is tested with the pin letting
these problematic systems work well enough.  If no timer pin information
has been found for the I/O APIC, then local APIC variations are tried
only, similarly to what is done without the change (except without the
misleading messages).

Obviously if we try the first-chance path without being told by the BIOS
to do so, we should not complain either, so do not print the message in
this case.

The 64-bit variation should be updated with a call to
replace_pin_at_irq() which can be done with the upcoming merge.  Since
add_pin_to_irq() is now always called in the first-chance path, the
condition to require it in the second-chance path no longer happens.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:07 +02:00
Maciej W. Rozycki
03be750559 x86: I/O APIC: keep the timer IRQ masked during set-up
Keep the timer interrupt line masked when reconfiguring its interrupt
redirection entry in the I/O APIC.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:06 +02:00
Maciej W. Rozycki
24742ece8e x86: I/O APIC: unmask the second-chance timer interrupt
Unmask the timer interrupt line set up in the through-8259A mode
explicitly after setup_timer_IRQ0_pin() has set up the I/O APIC interrupt
redirection entry to let the two operations be unbound from each other.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:05 +02:00
Maciej W. Rozycki
f7633ce55b x86: I/O APIC: rename setup_ExtINT_IRQ0_pin()
Rename setup_ExtINT_IRQ0_pin() to setup_timer_IRQ0_pin() to better
reflect the upcoming role of a function setting up a (semi-)arbitrary I/O
APIC pin appropriately for the 8254 timer.  By "appropriate" the following
settings are meant: edge-triggered, active-high, all the other settings
per-architecture.  Adjust comments to reflect code appropriately.  No
functional changes.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:04 +02:00
Maciej W. Rozycki
6b4722a777 x86: I/O APIC: remove redundant LVT0 masking
The LINT0 line of the local APIC is masked in the LVT0 entry in
check_timer() before this function is ever called.  Removed the
redundant unmasking for better control.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:03 +02:00
Maciej W. Rozycki
80d16bace6 x86: I/O APIC: remove redundant 8259A {,un}masking
For a better control the masking and unmasking of the timer interrupt
line in the 8259A operating in the 'Virtual Wire' mode has been moved out
of setup_ExtINT_IRQ0_pin() now, so remove the redundant calls from the
function.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:02 +02:00
Maciej W. Rozycki
f08252623c x86: I/O APIC: fix the name of the through-8259A handler
When the through-8259A mode is used for the timer, the call to
set_irq_handler() will register a NULL handler name, resulting in
"IO-APIC-<NULL>" reported.  Fix by calling ioapic_register_intr() as done
for all the other I/O APIC interrupts.

The 64-bit variation calls set_irq_chip_and_handler_name() here
needlessly and should get fixed with the upcoming merge.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:01 +02:00
Maciej W. Rozycki
9a1c619291 x86: I/O APIC: fix the name of the L-APIC IRQ handler
The local APIC interrupt handler gets registered with
set_irq_chip_and_handler_name(), which results in
"local-APIC-edge-fasteoi" reported as the name of the handler.  Fix by
removing the type of the handler left over from before the generic
handlers were introduced.

The 64-bit variation should get fixed with the upcoming merge.

NB It should really use the "edge" handler and not the "fasteoi" one,
but that's a separate issue.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:00 +02:00
Maciej W. Rozycki
35542c5ebc x86: I/O APIC: clean up the 8259A on a NMI watchdog failure
There is no point in keeping the 8259A enabled if the I/O APIC NMI
watchdog has failed and the 8259A is not used to pass through regular
timer interrupts.  This fixes problems with some systems where some logic
gets confused.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:59 +02:00
Maciej W. Rozycki
a1133d8e4f x86: APIC/SMP: downgrade the NMI watchdog for "nosmp"
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip has been deactivated as a result of "nosmp".  Downgrade to the
local APIC watchdog similarly to what is done for the UP case.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:58 +02:00
Maciej W. Rozycki
73d08e6360 x86: APIC/SMP: correct the message for "nosmp"
The local APIC is no longer forced off when "nosmp" has been specified.
Correct the message printed.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:57 +02:00
Maciej W. Rozycki
60134ebe79 x86: I/O APIC: keep IRQ off when changing LVT registers
Disable the 8259A acting in the "virtual wire" mode to keep the interrupt
line inactive while fiddling with local APIC interrupt vector registers
associated with its destination inputs.  To be on the safe side,
especially concerning flipping the trigger mode.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:56 +02:00
Maciej W. Rozycki
e67465f129 x86: I/O APIC: clean up after a fasteoi failure
Disable the 8259A when routing of the timer interrupt through the chip to
the local APIC of the primary processor has failed.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:55 +02:00
Maciej W. Rozycki
ecd29476ae x86: I/O APIC: remove parameters to fiddle with the 8259A
Remove the "disable_8254_timer" and "enable_8254_timer" kernel
parameters.  Now that AEOI acknowledgements are no longer needed for
correct timer operation, the 8259A can be kept disabled unconditionally
unless interrupts, either timer or watchdog ones, are actually passed
through it.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:54 +02:00
Maciej W. Rozycki
d11d5794e0 x86: I/O APIC: AEOI timer acknowledgement clean-ups
The code that used to be in do_slow_gettimeoffset() that relied on the
IRR bit of the master 8259A PIC for IRQ0 to check the state of the output
timer 0 of the PIT is no longer there.  As a result, there is no need to
use the POLL command to acknowledge the timer interrupt in the "8259A
Virtual Wire", except for the NMI watchdog when the i82489DX APIC is used
(this is because this particular APIC treats NMIs as level-triggered and
keeping the input asserted would keep motherboard NMI sources held off for
too long).  Remove the unneeded bits and adjust comments accordingly.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:53 +02:00
Ingo Molnar
a0176e2485 Revert "Revert "x86: fix ioapic bug again""
This reverts commit 0b6a39f7eb.

The changes in tip/x86/apic solve this better.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:49 +02:00
Ingo Molnar
93022136ff Merge commit 'v2.6.26-rc9' into x86/cpu 2008-07-08 07:47:47 +02:00
Yinghai Lu
c49c412a47 x86: make 64bit identify_cpu use cpu_dev v2
v2: fix early_panic on this config:

  http://redhat.com/~mingo/misc/config-Thu_Jun_19_14_22_37_CEST_2008.bad

reason : struct cpu_vendor_dev size is 16, need to make table to be 16
         byte alignment

also print out the cpu supported...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:40 +02:00
Yinghai Lu
dcd32b6a1f x86: make 64-bit identify_cpu use cpu_dev
we may need to move some functions to common.c later

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:39 +02:00
Robert Richter
3a27dd1ce5 x86: Move PCI IO ECS code to x86/pci
"Form follows function". Code is now where it belongs to.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:39 +02:00
Thomas Gleixner
0beefa208b x86: add C1E aware idle function, fix
On Tue, 17 Jun 2008, Rafael J. Wysocki wrote:
>
> BTW, with the C1E patches reverted I don't get the
> WARNING: at /home/rafael/src/linux-next/kernel/smp.c:215 smp_call_function_single+0x3d/0xa2
> in the log.  Thomas?

The BROADCAST_FORCE notification uses smp_function_call and therefor
must be run with interrupts enabled.

While at it, add a comment for the BROADCAST_EXIT notifier as well.

Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:37 +02:00
Thomas Gleixner
aa276e1caf x86, clockevents: add C1E aware idle function
C1E on AMD machines is like C3 but without control from the OS. Up to
now we disabled the local apic timer for those machines as it stops
when the CPU goes into C1E. This excludes those machines from high
resolution timers / dynamic ticks, which hurts especially X2 based
laptops.

The current boot time C1E detection has another, more serious flaw
as well: some BIOSes do not enable C1E until the ACPI processor module
is loaded. This causes systems to stop working after that point.

To work nicely with C1E enabled machines we use a separate idle
function, which checks on idle entry whether C1E was enabled in the
Interrupt Pending Message MSR. This allows us to do timer broadcasting
for C1E and covers the late enablement of C1E as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:18 +02:00
Jens Axboe
5e374fb626 generic-ipi: fixlet
create proper stackframe.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-06 14:01:50 +02:00
Ingo Molnar
a1716d508a Merge branch 'x86/s2ram-fix' into x86/urgent 2008-07-05 08:42:45 +02:00
Rafael J. Wysocki
64e83b5a91 x86 ACPI: fix resume from suspend to RAM on uniprocessor x86-64
Since the trampoline code is now used for ACPI resume from suspend to RAM,
the trampoline page tables have to be fixed up during boot not only on SMP
systems, but also on UP systems that use the trampoline.

Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10923

Reported-by: Dionisus Torimens <djtm@gmx.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-05 08:42:28 +02:00
H. Peter Anvin
4b4f7280d7 x86 ACPI: normalize segment descriptor register on resume
Some Dell laptops enter resume with apparent garbage in the segment
descriptor registers (almost certainly the result of a botched
transition from protected to real mode.)  The only way to clean that
up is to enter protected mode ourselves and clean out the descriptor
registers.

This fixes resume on Dell XPS M1210 and Dell D620.

Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10927

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-05 08:25:40 +02:00
Linus Torvalds
b8a0b6ccf2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xen: fix address truncation in pte mfn<->pfn conversion
  arch/x86/mm/init_64.c: early_memtest(): fix types
  x86: fix Intel Mac booting with EFI
2008-07-04 10:46:46 -07:00
Joerg Roedel
fb339690a0 x86, AMD IOMMU: remove unnecessary code from the iommu_enable function
This code removes a leftover from the iommu_enable function. The ctrl variable
is assigned but never used.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:44:44 +02:00
Joerg Roedel
c1cbebeec4 x86, AMD IOMMU: don't try to init IOMMU if early detect code did not detect one
This patch adds a check if the early detect code has found AMD IOMMU hardware
descriptions and does not try to initialize hardware if the check failed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:44:43 +02:00
Joerg Roedel
8b14518fad x86, AMD IOMMU: honor iommu=off instead of amd_iommu=off
This patch removes the amd_iommu=off kernel parameter and honors the generic
iommu=off parameter for the same purpose.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:44:41 +02:00
Joerg Roedel
999ba417cc x86, AMD IOMMU: flush domain TLB when there is more than one page to flush
This patch changes the domain TLB flushing behavior of the driver. When there
is more than one page to flush it flushes the whole domain TLB instead of every
single page. So we send only a single command to the IOMMU in every case which
is faster to execute.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:44:40 +02:00
Joerg Roedel
5f6a59d8ad x86, AMD IOMMU: remove unnecessary set_bit_string
The set_bit_string call in the address allocator is not necessary because its
already called in iommu_area_alloc().

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:44:39 +02:00
Venki Pallipadi
2d144e6309 x86, mce_64.c: mce_cpu_quirks being ignored
Quirks getting ignored was a bug. Below patch fixes the bug, until
we have the dynamic banks support.

Sysfs choice configuration should not have any issues with the earlier patch
as we look for NR_SYSFS_BANKS in do_machine_check().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Max Asbock <masbock@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03 15:05:21 +02:00
Ingo Molnar
a8cac81776 Merge commit 'v2.6.26-rc8' into x86/mce 2008-07-03 15:03:02 +02:00
Ben Castricum
bc4e0f9ae2 x86: microcode: cosmetic changes
First announce ourself, then start working. Currently this module reports
itself when all is completed which is not most modules do. Plus some
cosmetic/whitespace cleanups.

Signed-off-by: Ben Castricum <lk0806@bencastricum.nl>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03 11:37:20 +02:00
Hugh Dickins
216705d272 x86: fix Intel Mac booting with EFI
Fedora reports that mem_init()'s zap_low_mappings(), extended to SMP in
61165d7a03 x86: fix app crashes after SMP
resume causes 32-bit Intel Mac machines to reboot very early when
booting with EFI.

The EFI code appears to manage low mappings for itself when needed; but
like many before it, confuses PSE with PAE.  So it has only been mapping
half the space it needed when PSE but not PAE.  This remained unnoticed
until we moved the SMP zap_low_mappings() before
efi_enter_virtual_mode().  Presumably could have been noticed years ago
if anyone ran a UP kernel on such machines?

Reported-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Glauber Costa <gcosta@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Peter Jones <pjones@redhat.com>
2008-07-03 08:19:18 +02:00
Arnd Bergmann
38c4c97c62 x86-mce: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-07-02 15:06:26 -06:00
Roland McGrath
45fdc3a762 x86 ptrace: fix PTRACE_GETFPXREGS error
ptrace has always returned only -EIO for all failures to access
registers.  The user_regset calls are allowed to return a more
meaningful variety of errors.  The REGSET_XFP calls use -ENODEV
for !cpu_has_fxsr hardware.  Make ptrace return the traditional
-EIO instead of the error code from the user_regset call.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-01 11:03:31 +02:00
Joerg Roedel
7441e9cb18 x86, AMD IOMMU: disable suspend/resume with IOMMU enabled (for now)
This patch disables suspend/resume on machines with AMD IOMMU enabled. Real
suspend/resume support for AMD IOMMU is currently being worked on. Until this
is ready it will be disabled to avoid data corruption when the IOMMU is not
properly re-enabled at resume. The patch is based on a similar patch for the
GART driver written by Pavel Machek.

The overall driver merged into tip/master is tested with parallel disk and
network loads and showed no problems in a test running for 3 days.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 20:35:15 +02:00
Linus Torvalds
bbad5d4750 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ptrace GET/SET FPXREGS broken
  x86: fix cpu hotplug crash
  x86: section/warning fixes
  x86: shift bits the right way in native_read_tscp
2008-06-30 08:56:57 -07:00
TAKADA Yoshihito
11dbc963a8 ptrace GET/SET FPXREGS broken
When I update kernel 2.6.25 from 2.6.24, gdb does not work.
On 2.6.25, ptrace(PTRACE_GETFPXREGS, ...) returns ENODEV.

But 2.6.24 kernel's ptrace() returns EIO.
It is issue of compatibility.

I attached test program as pt.c and patch for fix it.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
#include <sys/ptrace.h>
#include <sys/types.h>

struct user_fxsr_struct {
	unsigned short	cwd;
	unsigned short	swd;
	unsigned short	twd;
	unsigned short	fop;
	long	fip;
	long	fcs;
	long	foo;
	long	fos;
	long	mxcsr;
	long	reserved;
	long	st_space[32];	/* 8*16 bytes for each FP-reg = 128 bytes */
	long	xmm_space[32];	/* 8*16 bytes for each XMM-reg = 128 bytes */
	long	padding[56];
};

int main(void)
{
  pid_t pid;

  pid = fork();

  switch(pid){
  case -1:/*  error */
    break;
  case 0:/*  child */
    child();
    break;
  default:
    parent(pid);
    break;
  }
  return 0;
}

int child(void)
{
  ptrace(PTRACE_TRACEME);
  kill(getpid(), SIGSTOP);
  sleep(10);
  return 0;
}
int parent(pid_t pid)
{
  int ret;
  struct user_fxsr_struct fpxregs;

  ret = ptrace(PTRACE_GETFPXREGS, pid, 0, &fpxregs);
  if(ret < 0){
    printf("%d: %s.\n", errno, strerror(errno));
  }
  kill(pid, SIGCONT);
  wait(pid);
  return 0;
}

/* in the kerel, at kernel/i387.c get_fpxregs() */

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 14:35:18 +02:00
Zhang, Yanmin
fcb43042ef x86: fix cpu hotplug crash
Vegard Nossum reported crashes during cpu hotplug tests:

  http://marc.info/?l=linux-kernel&m=121413950227884&w=4

In function _cpu_up, the panic happens when calling
__raw_notifier_call_chain at the second time. Kernel doesn't panic when
calling it at the first time. If just say because of nr_cpu_ids, that's
not right.

By checking the source code, I found that function do_boot_cpu is the culprit.
Consider below call chain:
 _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.

So do_boot_cpu is called in the end. In do_boot_cpu, if
boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
on, when _cpu_up calls __raw_notifier_call_chain at the second time to
report CPU_UP_CANCELED, because this cpu is already cleared from
cpu_possible_map, get_cpu_sysdev returns NULL.

Many resources are related to cpu_possible_map, so it's better not to
change it.

Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
cpu_possible_map.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 13:15:43 +02:00
FUJITA Tomonori
44974c8fc1 x86 gart: remove unnecessary set_bit_string
iommu_area_alloc internally calls set_bit_string and set bits
properly. This set_bit_string is unnecessary.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: joerg.roedel@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 10:52:40 +02:00
Ingo Molnar
8594698ebd stacktrace: fix modular build, export print_stack_trace and save_stack_trace
fix:

ERROR: "print_stack_trace" [kernel/backtracetest.ko] undefined!
ERROR: "save_stack_trace" [kernel/backtracetest.ko] undefined!

Signed-off-by: Ingo Molnar <mingo@elte.hu>

and fix:

  Building modules, stage 2.
  MODPOST 376 modules
ERROR: "print_stack_trace" [kernel/backtracetest.ko] undefined!
make[1]: *** [__modpost] Error 1

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 09:20:55 +02:00
Vegard Nossum
9d8ad5d6c7 x86: don't destroy %rbp on kernel-mode faults
From the code:

    "B stepping K8s sometimes report an truncated RIP for IRET exceptions
    returning to compat mode. Check for these here too."

The code then proceeds to truncate the upper 32 bits of %rbp. This means
that when do_page_fault() is finally called, its prologue,

    do_page_fault:
        push %rbp
        movl %rsp, %rbp

will put the truncated base pointer on the stack. This means that the
stack tracer will not be able to follow the base-pointer changes and
will see all subsequent stack frames as unreliable.

This patch changes the code to use a different register (%rcx) for the
checking and leaves %rbp untouched.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 17:45:59 +02:00
Ingo Molnar
127a237a1f fix "smp_call_function: get rid of the unused nonatomic/retry argument"
fix:

arch/x86/kernel/process.c: In function 'cpu_idle_wait':
arch/x86/kernel/process.c:64: error: too many arguments to function 'smp_call_function'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 11:48:22 +02:00
Ingo Molnar
92af4e2902 x86, AMD IOMMU, build fix #2
fix:

 arch/x86/kernel/amd_iommu.c: In function ‘amd_iommu_init_dma_ops':
 arch/x86/kernel/amd_iommu.c:940: error: lvalue required as left operand of assignment
 arch/x86/kernel/amd_iommu.c:941: error: lvalue required as left operand of assignment

due to !CONFIG_GART_IOMMU.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:52:34 +02:00
Joerg Roedel
a69ca34018 AMD_IOMMU: call detect and initialization functions from dma code
This patch adds the function calls to initialize the AMD IOMMU hardware and
dma_ops to the generic DMA code for the x86 architecture.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:22 +02:00
Joerg Roedel
8736197ba8 x86, AMD IOMMU: initialize dma_ops from IOMMU initialization and enable IOMMUs
This patch adds a call to the driver specific dma_ops initialization routine
from the AMD IOMMU hardware initialization. Further it adds the necessary code
to finally enable AMD IOMMU hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:21 +02:00
Joerg Roedel
c6da992e16 x86, AMD IOMMU: add amd_iommu.h to export functions to the generic x86 dma code
This patch adds the amd_iommu.h file which will be included in the generic
code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:21 +02:00
Joerg Roedel
6631ee9d00 x86, AMD IOMMU: add dma_ops initialization function
This patch adds the function to initialize the dma_ops for the AMD IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:20 +02:00
Joerg Roedel
c432f3df8e x86, AMD IOMMU: add pre-allocation of protection domains
This patch adds a function to pre-allocate protection domains. So we don't have
to allocate it on the first request for a device (which can happen in atomic
mode).

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:20 +02:00
Joerg Roedel
5d8b53cf3f x86, AMD IOMMU: add mapping functions for coherent mappings
This patch adds the dma_ops functions for coherent mappings.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:19 +02:00
Joerg Roedel
65b050adbf x86, AMD IOMMU: add mapping functions for scatter gather lists
This patch adds the dma_ops functions for mapping and unmapping scatter gather
lists.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:19 +02:00
Joerg Roedel
4da70b9e4f x86, AMD IOMMU: add dma_ops mapping functions for single mappings
This patch adds the dma_ops specific mapping functions for single mappings.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:18 +02:00
Joerg Roedel
cb76c32297 x86, AMD IOMMU: add generic dma_ops mapping functions
This patch adds the generic functions to map and unmap pages to a protection
domain for dma_ops usage.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:18 +02:00
Joerg Roedel
b20ac0d4d6 x86, AMD IOMMU: add functions to find IOMMU device resources
This patch adds functions necessary to find the IOMMU resources for a specific
device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:17 +02:00
Joerg Roedel
ec487d1a11 x86, AMD IOMMU: add domain allocation and deallocation functions
This patch adds the functions to allocate and free protection domains for the
IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:17 +02:00
Joerg Roedel
d3086444b2 x86, AMD IOMMU: add address allocation and deallocation functions
This patch adds the address allocator to the AMD IOMMU code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:16 +02:00
Joerg Roedel
bd0e521158 x86, AMD IOMMU: add functions to initialize unity mappings
This patch adds the functions which will initialize the unity mappings in the
device page tables.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:16 +02:00
Joerg Roedel
a19ae1eccf x86, AMD IOMMU: add functions to send IOMMU commands
This patch adds generic handling function as well as all functions to send
specific commands to the IOMMU hardware as required by this driver.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:15 +02:00
Joerg Roedel
000fca2dfc x86, AMD IOMMU: add amd_iommu.c to Makefile
This patch adds the new amd_iommu.c file to the architecture kernel Makefile
for x86.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:15 +02:00
Joerg Roedel
b6c02715cf x86, AMD IOMMU: add generic defines and structures for mapping code
This patch adds generic stuff used by the mapping and unmapping code for the
AMD IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:14 +02:00
Joerg Roedel
918ad6c54d x86, AMD IOMMU: add kernel command line parameters for AMD IOMMU
This patch adds two parameters to the kernel command line to control behavior
of the AMD IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:14 +02:00
Joerg Roedel
ae7877de9c x86, AMD IOMMU: add early detection code
This patch adds code to detect an AMD IOMMU early during boot before the early
detection code of other IOMMUs run.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:13 +02:00
Joerg Roedel
fe74c9cf39 x86, AMD IOMMU: clue initialization code together
This patch puts the AMD IOMMU ACPI table parsing and hardware initialization
functions together to the main intialization routine.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:13 +02:00
Joerg Roedel
be2a022c0d x86, AMD IOMMU: add functions to parse IOMMU memory mapping requirements for devices
This patch adds the functions to parse the information about IOMMU exclusion
ranges and required unity mappings for the devices handled by the IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:12 +02:00
Joerg Roedel
e47d402d2d x86, AMD IOMMU: add detect code for AMD IOMMU hardware
This patch adds the detection of AMD IOMMU hardware provided on information
from ACPI provided by the BIOS.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:12 +02:00
Joerg Roedel
5d0c8e49f8 x86, AMD IOMMU: add functions for IOMMU hardware initialization from ACPI
This patch adds functions to initialize the IOMMU hardware with information
from ACPI and PCI.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:11 +02:00
Joerg Roedel
3566b7786a x86, AMD IOMMU: add device table initialization functions
This patch adds functions necessary to initialize the device table from the
ACPI definitions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:11 +02:00
Joerg Roedel
b36ca91e1d x86, AMD IOMMU: add command buffer (de-)allocation
This patch adds the functions to allocate and deallocate the command buffer for
one IOMMU in the system.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:10 +02:00
Joerg Roedel
b2026aa2dc x86, AMD IOMMU: add functions for programming IOMMU MMIO space
This patch adds the functions required to programm the IOMMU with the MMIO
space.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:10 +02:00
Joerg Roedel
6c56747b46 x86, AMD IOMMU: add functions for mapping/unmapping the MMIO space
This patch contains two functions to map and unmap the MMIO region of an IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:09 +02:00
Joerg Roedel
ca7ed057ae x86, AMD IOMMU: add amd_iommu_init.c to Makefile
This patch adds the source file amd_iommu_init.c to the kernel Makefile for the
x86 architecture.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:09 +02:00
Joerg Roedel
3e8064ba59 x86, AMD IOMMU: add functions to find last possible PCI device for IOMMU
This patch adds functions to find the last PCI bus/device/function the IOMMU
driver has to handle. This information is used later to allocate the memory for
the data structures.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:08 +02:00
Joerg Roedel
928abd2545 x86, AMD IOMMU: add data structures to manage the IOMMUs in the system
This patch adds the data structures which will contain the information read
from the ACPI table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:08 +02:00
Joerg Roedel
f6e2e6b6fc x86, AMD IOMMU: add defines and structures for ACPI scanning code
This patch adds the required data structures and constants required to parse
the ACPI table for the AMD IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:07 +02:00
Pavel Machek
fa3d319ac6 pci-gart_64.c: could we get better explanation?
Add better explanation to pci-gart.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Trivial patch monkey <trivial@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-26 15:17:18 +02:00
Jens Axboe
15c8b6c1aa on_each_cpu(): kill unused 'retry' parameter
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:38 +02:00
Jens Axboe
8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Jens Axboe
3b16cf8748 x86: convert to generic helpers for IPI function calls
This converts x86, x86-64, and xen to use the new helpers for
smp_call_function() and friends, and adds support for
smp_call_function_single().

Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:21:54 +02:00
Jeremy Fitzhardinge
08b882c627 paravirt: add hooks for ptep_modify_prot_start/commit
This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit.  This allows the hypervisor-specific backends to
implement these in some more efficient way.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:16:00 +02:00
Ingo Molnar
f6477cc76c Merge branch 'linus' into x86/timers 2008-06-25 12:36:55 +02:00
Ingo Molnar
8700600a74 Merge branch 'linus' into x86/nmi 2008-06-25 12:31:28 +02:00
Ingo Molnar
cbd6712406 Merge branch 'linus' into x86/irq 2008-06-25 12:30:49 +02:00
Ingo Molnar
48cf937f48 Merge branch 'linus' into x86/i8259 2008-06-25 12:30:33 +02:00
Ingo Molnar
037a6079eb Merge branch 'linus' into x86/gart 2008-06-25 12:30:26 +02:00
Ingo Molnar
8b7ef4ec5b Merge branch 'linus' into x86/fixmap 2008-06-25 12:30:21 +02:00
Ingo Molnar
28f73e51d0 Merge branch 'linus' into x86/delay
Conflicts:

	arch/x86/kernel/tsc_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:30:10 +02:00
Ingo Molnar
1262b0088f Merge branch 'linus' into x86/cleanups 2008-06-25 12:29:32 +02:00
Ingo Molnar
97e6722b8d Merge branch 'linus' into tracing/ftrace 2008-06-25 12:27:56 +02:00
Ingo Molnar
d02859ecb3 Merge commit 'v2.6.26-rc8' into x86/xen
Conflicts:

	arch/x86/xen/enlighten.c
	arch/x86/xen/mmu.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:16:51 +02:00
Gerd Hoffmann
f6e16d5ad4 x86: KVM guest: Use the paravirt clocksource structs and functions
This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.

The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup.  kvmclock must take that in account when registering the
physical address within the host.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:33 +03:00
Gerd Hoffmann
7af192c954 x86: Add structs and functions for paravirt clocksource
This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).

It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code.  They are enabled using
CONFIG_PARAVIRT_CLOCK.

Subsequent patches of this series will put the code in use.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:31 +03:00
WANG Cong
378fc6eedc x86: arch/x86/kernel/machine_kexec_32.c: remove extra semicolons
I can't see any reason to keep these semicolons.

Signed-off-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 18:37:49 +02:00
Alok Kataria
f3f3149f35 x86: use cpu_khz for loops_per_jiffy calculation, cleanup
As suggested by Ingo, remove all references to tsc from init/calibrate.c

TSC is x86 specific, and using tsc in variable names in a generic file should
be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
of lpj value. Also tsc_rate_*  is called timer_rate_*

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:53:46 +02:00
Ingo Molnar
6ff10de374 x86: fix "x86: use cpu_khz for loops_per_jiffy calculation"
fix:

arch/x86/kernel/tsc_32.c: In function ‘tsc_init':
arch/x86/kernel/tsc_32.c:421: error: ‘lpj_tsc' undeclared (first use in this function)
arch/x86/kernel/tsc_32.c:421: error: (Each undeclared identifier is reported only once
arch/x86/kernel/tsc_32.c:421: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 09:29:03 +02:00
Alok Kataria
3da757daf8 x86: use cpu_khz for loops_per_jiffy calculation
On the x86 platform we can use the value of tsc_khz computed during tsc
calibration to calculate the loops_per_jiffy value. Its very important
to keep the error in lpj values to minimum as any error in that may
result in kernel panic in check_timer. In virtualization environment, On
a highly overloaded host the guest delay calibration may sometimes
result in errors beyond the ~50% that timer_irq_works can handle,
resulting in the guest panicking.

Does some formating changes to lpj_setup code to now have a single
printk to print the bogomips value.

We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:51:33 +02:00
Abhishek Sagar
395a59d0f8 ftrace: store mcount address in rec->ip
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:10:56 +02:00
Yinghai Lu
0754557d72 x86: change early_gart_iommu_check() back to any_mapped
Kevin Winchester reported a GART related direct rendering failure against
linux-next-20080611, which shows up via these log entries:

 PCI: Using ACPI for IRQ routing
 PCI: Cannot allocate resource region 0 of device 0000:00:00.0
 agpgart: Detected AGP bridge 0
 agpgart: Aperture conflicts with PCI mapping.
 agpgart: Aperture from AGP @ e0000000 size 128 MB
 agpgart: Aperture conflicts with PCI mapping.
 agpgart: No usable aperture found.
 agpgart: Consider rebooting with iommu=memaper=2 to get a good aperture.

instead of the expected:

 PCI: Using ACPI for IRQ routing
 agpgart: Detected AGP bridge 0
 agpgart: Aperture from AGP @ e0000000 size 128 MB

Kevin bisected it down to this change in tip/x86/gart:
"x86: checking aperture size order".

agp check is using request_mem_region(), and could fail if e820 is reserved...

change it back to e820_any_mapped().

Reported-and-bisected-by: "Kevin Winchester" <kjwinchester@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 13:12:04 +02:00
Ingo Molnar
f34bfb1bee Merge branch 'linus' into tracing/ftrace 2008-06-23 11:11:42 +02:00
Arnd Bergmann
77149367da microcode: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-06-20 14:05:59 -06:00
Arnd Bergmann
864fe51671 apm_32: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-06-20 14:05:54 -06:00
Jeremy Fitzhardinge
aeaaa59c7e x86/paravirt/xen: add set_fixmap pv_mmu_ops
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:56 +02:00
Jan Beulich
5f0120b578 x86-64: remove unnecessary ptregs call stubs
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Andi Kleen" <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:25:11 +02:00
Jordan Crouse
ffe6e1da86 x86, geode: add a VSA2 ID for General Software
General Software writes their own VSA2 module for their version
of the Geode BIOS, which returns a different ID then the standard
VSA2.  This was causing the framebuffer driver to break for most
GSW boards.

Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Cc: tglx@linutronix.de
Cc: linux-geode@lists.infradead.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:19:03 +02:00
Glauber Costa
b8e0418b2a x86: fix typo CONFIX -> CONFIG
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 11:59:11 +02:00
Bernhard Walle
d3942cff62 x86: use BOOTMEM_EXCLUSIVE on 32-bit
This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
i386 and prints a error message on failure.

The patch is still for 2.6.26 since it is only bug fixing. The unification
of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
2008-06-19 10:08:48 +02:00
Mikael Pettersson
df17b1d990 x86, 32-bit: fix boot failure on TSC-less processors
Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6"
(invalid opcode) and a kernel halt immediately after the
kernel has been uncompressed. The BUG shows EIP pointing
to an rdtsc instruction in native_read_tsc(), invoked from
native_sched_clock().

(This error occurs so early that not even the serial console
can capture it.)

A bisection showed that this bug first occurs in 2.6.26-rc3-git7,
via commit 9ccc906c97:

>x86: distangle user disabled TSC from unstable
>
>tsc_enabled is set to 0 from the command line switch "notsc" and from
>the mark_tsc_unstable code. Seperate those functionalities and replace
>tsc_enable with tsc_disable. This makes also the native_sched_clock()
>decision when to use TSC understandable.
>
>Preparatory patch to solve the sched_clock() issue on 32 bit.
>
>Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

The core reason for this bug is that native_sched_clock() gets
called before tsc_init().

Before the commit above, tsc_32.c used a "tsc_enabled" variable
which defaulted to 0 == disabled, and which only got enabled late
in tsc_init(). Thus early calls to native_sched_clock() would skip
the TSC and use jiffies instead.

After the commit above, tsc_32.c uses a "tsc_disabled" variable
which defaults to 0, meaning that the TSC is Ok to use. Early calls
to native_sched_clock() now erroneously try to use the TSC on
!cpu_has_tsc processors, leading to invalid opcode exceptions.

My proposed fix is to initialise tsc_disabled to a "soft disabled"
state distinct from the hard disabled state set up by the "notsc"
kernel option. This fixes the native_sched_clock() problem. It also
allows tsc_init() to be simplified: instead of setting tsc_disabled = 1
on every error return, we just set tsc_disabled = 0 once when all
checks have succeeded.

I've verified that this lets my 486 boot again. I've also verified
that a Core2 machine still uses the TSC as clocksource after the patch.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 10:08:47 +02:00
Suresh Siddha
75118a82e2 x86: fix NULL pointer deref in __switch_to
Patrick McHardy reported a crash:

> > I get this oops once a day, its apparently triggered by something
> > run by cron, but the process is a different one each time.
> >
> > Kernel is -git from yesterday shortly before the -rc6 release
> > (last commit is the usb-2.6 merge, the x86 patches are missing),
> > .config is attached.
> >
> > I'll retry with current -git, but the patches that have gone in
> > since I last updated don't look related.
> >
> > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
> > 000001ff
> > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
> > [62060.043009] *pde = 00000000
> > [62060.043009] Oops: 0002 [#1] PREEMPT

Vegard Nossum analyzed it:

> This decodes to
>
>    0:   0f ae 00                fxsave (%eax)
>
> so it's related to the floating-point context. This is the exact
> location of the crash:
>
> $ addr2line -e arch/x86/kernel/process_32.o -i ab0
> include/asm/i387.h:232
> include/asm/i387.h:262
> arch/x86/kernel/process_32.c:595
>
> ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
> Or maybe it never had any other value.

Somehow (as described below) TS_USEDFPU is set but the fpu is not
allocated or freed.

Another possible FPU pre-emption issue with the sleazy FPU optimization
which was benign before but not so anymore, with the dynamic FPU allocation
patch.

New task is getting exec'd and it is prempted at the below point.

flush_thread() {
	...
	/*
	* Forget coprocessor state..
	*/
	clear_fpu(tsk);
		<----- Preemption point
	clear_used_math();
	...
}

Now when it context switches in again, as the used_math() is still set
and fpu_counter can be > 5, we will do a math_state_restore() which sets
the task's TS_USEDFPU. After it continues from the above preemption point
it does clear_used_math() and much later free_thread_xstate().

Now, at the next context switch, it is quite possible that xstate is
null, used_math() is not set and TS_USEDFPU is still set. This will
trigger unlazy_fpu() causing kernel oops.

Fix this  by clearing tsk's fpu_counter before clearing task's fpu.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 10:08:45 +02:00
Paolo Ciarrocchi
219835f10e x86: coding style fixes to x86/kernel/cpu/cpufreq/cpufreq-nforce2.c
Before:
total: 22 errors, 8 warnings, 440 lines checked

After:
total: 0 errors, 8 warnings, 442 lines checked

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/cpufreq-nforce2.o.*
3d4330a5d188fe904446e5948a618b48  /tmp/cpufreq-nforce2.o.after
1477e6b0dcd6f59b1fb6b4490042eca6  /tmp/cpufreq-nforce2.o.before
^^^ I guess this is because I fixed a few "do not initialise statics to 0 or NULL"

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/cpufreq-nforce2.o.*
   text    data     bss     dec     hex filename
   1923      72      16    2011     7db /tmp/cpufreq-nforce2.o.after
   1923      72      16    2011     7db /tmp/cpufreq-nforce2.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:14 +02:00
Paolo Ciarrocchi
5175676a2d x86: coding style fixes to arch/x86/kernel/cpu/mcheck/k7.c
Before:
total: 6 errors, 13 warnings, 105 lines checked

After:
total: 0 errors, 0 warnings, 105 lines checked

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/k7*
   text    data     bss     dec     hex filename
   1135       0       0    1135     46f /tmp/k7.o.after
   1135       0       0    1135     46f /tmp/k7.o.before

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/k7*
87b14954045aa37dbaee6fb7e022ed9a  /tmp/k7.o.after
87b14954045aa37dbaee6fb7e022ed9a  /tmp/k7.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:12 +02:00
Paolo Ciarrocchi
fe94ae995d x86: coding style fixes to arch/x86/kernel/cpu/mcheck/p4.c
Before:
total: 16 errors, 34 warnings, 257 lines checked

After:
total: 0 errors, 2 warnings, 257 lines checked

No changes in the compiled code:

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/p4*
   text    data     bss     dec     hex filename
   2644       4       4    2652     a5c /tmp/p4.o.after
   2644       4       4    2652     a5c /tmp/p4.o.before

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/p4*
13f1b21c4246b31a28aaff38184586ca  /tmp/p4.o.after
13f1b21c4246b31a28aaff38184586ca  /tmp/p4.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:11 +02:00
Jiri Slaby
e17ba73b0e x86, generic: mark early_printk as asmlinkage
It's not explicitly marked as asmlinkage, but invoked from x86_32
startup code with parameters on stack.

No other architectures define early_printk and none of them are affected
by this change, since defines asmlinkage as empty token.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 13:11:01 +02:00
Daniel Rahn
b4b3bd96f2 x86: correctly report NR_BANKS in mce_64.c
attached is a no-brainer that makes kernel correctly report
NR_BANKS for MCE. We are right now limited to NR_BANKS==6, but the
error message will use the available number of banks instead of the
defined maximum.

For a Nehalem based system it will print:

"MCE: warning: using only 9 banks"

while the correct message would be

"MCE: warning: using only 6 banks"

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 10:29:58 +02:00
Ingo Molnar
ee4311adf1 ftrace: build fix with gcc 4.3
fix:

arch/x86/kernel/ftrace.c: Assembler messages:
arch/x86/kernel/ftrace.c:82: Error: bad register name `%sil'
make[1]: *** [arch/x86/kernel/ftrace.o] Error 1

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-17 17:43:53 +02:00
Jesse Barnes
15650a2f64 x86/PCI: fixup early quirk probing
On x86, we do early PCI probing to apply some quirks for chipset bugs.
However, in a recent cleanup (7bcbc78dea) a
thinko was introduced that causes us to probe all subfunctions of even single
function devices (a function was factored out of an inner loop and a "break"
became a "return").  Fix that up by making check_dev_quirk() return a value so
we can keep the factored code intact.

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-16 15:29:45 -07:00
Ingo Molnar
c54f9da1c8 Merge branch 'linus' into x86/irqstats 2008-06-16 11:27:53 +02:00
Ingo Molnar
d939d2851f Merge branch 'linus' into x86/irq 2008-06-16 11:27:45 +02:00
Ingo Molnar
33ee375b2e Merge branch 'linus' into x86/gart 2008-06-16 11:27:18 +02:00
Ingo Molnar
6d72b7952f Merge branch 'linus' into core/rodata 2008-06-16 11:24:00 +02:00
Ingo Molnar
688d22e23a Merge branch 'linus' into x86/xen 2008-06-16 11:21:27 +02:00
Ingo Molnar
fd2c17e177 Merge branch 'linus' into x86/timers 2008-06-16 11:20:57 +02:00
Ingo Molnar
faeca31d06 Merge branch 'linus' into x86/pat 2008-06-16 11:20:28 +02:00
Ingo Molnar
1791a78c0b Merge branch 'linus' into x86/cleanups 2008-06-16 11:17:50 +02:00
Ingo Molnar
e765ee90da Merge branch 'linus' into tracing/ftrace 2008-06-16 11:15:58 +02:00
Ingo Molnar
28638ea4f8 Merge branch 'linus' into x86/nmi
Conflicts:

	arch/x86/kernel/nmi_32.c
2008-06-16 10:17:15 +02:00
Linus Torvalds
0269c5c6d9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: fixup write combine comment in pci_mmap_resource
  x86: PAT export resource_wc in pci sysfs
  x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
  suspend-vs-iommu: prevent suspend if we could not resume
  x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY
  pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
  PCI: use dev_to_node in pci_call_probe
  PCI: Correct last two HP entries in the bfsort whitelist
2008-06-14 13:32:56 -07:00
Stas Sergeev
1da2e3d679 provide rtc_cmos platform device
Recently (around 2.6.25) I've noticed that RTC no longer works for me.  It
turned out this is because I use pnpacpi=off kernel option to work around
the parport_pc bugs.  I always did so, but RTC used to work fine in the
past, and now it have regressed.

The patch fixes the problem by creating the platform device for the RTC
when PNP is disabled.  This may also help running the PNP-enabled kernel
on an older PCs.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12 18:05:42 -07:00
Rafael J. Wysocki
d8f3de0d24 Suspend-related patches for 2.6.27
ACPI PM: Add possibility to change suspend sequence

There are some systems out there that don't work correctly with
our current suspend/hibernation code ordering.  Provide a workaround
for these systems allowing them to pass 'acpi_sleep=old_ordering' in
the kernel command line so that it will use the pre-ACPI 2.0 ("old")
suspend code ordering.

Unfortunately, this requires us to add a platform hook to the
resuming of devices for recovering the platform in case one of the
device drivers' .suspend() routines returns error code.  Namely,
ACPI 1.0 specifies that _PTS should be called before suspending
devices, but _WAK still should be called before resuming them in
order to undo the changes made by _PTS.  However, if there is an
error during suspending devices, they are automatically resumed
without returning control to the PM core, so the _WAK has to be
called from within device_resume() in that cases.

The patch also reorders and refactors the ACPI suspend/hibernation
code to avoid duplication as far as reasonably possible.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-12 14:25:09 -07:00
Jesse Barnes
883eed1b3e Merge branch 'pci-for-jesse' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip into for-linus 2008-06-12 13:51:05 -07:00
Vegard Nossum
4461145ef1 x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
Alessandro Suardi reported:
> Recently upgraded my FC6 desktop to Fedora 9; with the
>  latest nautilus RPM updates my VNC session went nuts
>  with nautilus pegging the CPU for everything that breathed.
>
> I now reverted to an earlier nautilus package, but during
>  the peak CPU period my kernel spat this:
>
> [314185.623294] ------------[ cut here ]------------
> [314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
> [314185.623514] Modules linked in: iptable_filter ip_tables x_tables
> sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
> snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
> [314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
> [314185.624021]  [<c0115b95>] warn_on_slowpath+0x41/0x7b
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c0128396>] ? up_read+0x16/0x28
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
> [314185.624021]  [<c012d0a0>] check_flags+0x4c/0x128
> [314185.624021]  [<c012fa73>] lock_acquire+0x31/0x7d
> [314185.624021]  [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
> [314185.624021]  [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
> [314185.624021]  [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
> [314185.624021]  [<c0128d81>] notify_die+0x2d/0x2f
> [314185.624021]  [<c01043b0>] do_int3+0x1f/0x4d
> [314185.624021]  [<c02f2d3b>] int3+0x27/0x2c
> [314185.624021]  =======================
> [314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
> [314185.624021] possible reason: unannotated irqs-off.
> [314185.624021] irq event stamp: 488879
> [314185.624021] hardirqs last  enabled at (488879): [<c0102d67>]
> restore_nocheck+0x12/0x15
> [314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
> work_resched+0x19/0x30
> [314185.624021] softirqs last  enabled at (488876): [<c011a1ba>]
> __do_softirq+0xa6/0xac
> [314185.624021] softirqs last disabled at (488865): [<c010476e>]
> do_softirq+0x57/0xa6
>
> I didn't seem to find it with some googling, so here it is.
>
> I was incidentally ltracing that process to try and find out
>  what was gulping down that much CPU (sorry, no idea
>  whether ltrace and the WARNING happened at the same
>  time or which came first) and:

Yeah, this is extremely likely to be the source of the warning.

The warning should be harmless, however.

> Box is my trusty noname K7-800, 512MB RAM; if there's
>  anything else useful I might be able to provide, just ask.

It would be interesting to see where the int3 comes from.  Too bad,
lockdep doesn't provide the register dump. The stacktrace also doesn't
go further than the int3(), I wonder if this int3 came from userspace?
The ltrace readme says "software breakpoints, like gdb", so I guess
this is the case. Yep, seems like it.

This looks relevant:

| commit fb1dac909d
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Wed Jan 16 09:51:59 2008 +0100
|
|     lockdep: more hardirq annotations for notify_die()

I'm attaching a similarly-looking patch for this case (DO_VM86_ERROR),
though I suspect it might be missing for the other cases
(DO_ERROR/DO_ERROR_INFO) as well.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:19 +02:00
Peter Zijlstra
e32e58a96d x86: fix lockdep warning during suspend-to-ram
Andrew Morton wrote:

> I've been seeing the below for a long time during suspend-to-ram on the Vaio.
>
>
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ... <4>------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x127()
> Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables acpi_cpufreq nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy sr_mod snd_seq_oss cdrom snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ieee80211 pcspkr ieee80211_crypt snd_pcm i2c_i801 snd_timer i2c_core ide_pci_generic piix snd soundcore snd_page_alloc button ext3 jbd ide_disk ide_core [last unloaded: ipw2200]
> Pid: 3250, comm: zsh Not tainted 2.6.26-rc5 #1
>  [<c011c5f5>] warn_on_slowpath+0x41/0x6d
>  [<c01080e6>] ? native_sched_clock+0x82/0x96
>  [<c013789c>] ? mark_held_locks+0x41/0x5c
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0138637>] ? __lock_acquire+0xae3/0xb2b
>  [<c0313413>] ? schedule+0x39b/0x3b4
>  [<c0135596>] check_flags+0x4c/0x127
>  [<c01386b9>] lock_acquire+0x3a/0x86
>  [<c0315075>] _spin_lock+0x26/0x53
>  [<c0140660>] ? refrigerator+0x13/0xc3
>  [<c0140660>] refrigerator+0x13/0xc3
>  [<c012684a>] get_signal_to_deliver+0x3c/0x31e
>  [<c0102fe7>] do_notify_resume+0x91/0x6ee
>  [<c01359fd>] ? lock_release_holdtime+0x50/0x56
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0235d24>] ? read_chan+0x0/0x58c
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0315694>] ? _spin_unlock_irqrestore+0x42/0x58
>  [<c0230afa>] ? tty_ldisc_deref+0x5c/0x63
>  [<c0233104>] ? tty_read+0x66/0x98
>  [<c014b3f0>] ? audit_syscall_exit+0x2aa/0x2c5
>  [<c0109430>] ? do_syscall_trace+0x6b/0x16f
>  [<c0103a9c>] work_notifysig+0x13/0x1b
>  =======================
> ---[ end trace 25b49fe59a25afa5 ]---
> possible reason: unannotated irqs-off.
> irq event stamp: 58919
> hardirqs last  enabled at (58919): [<c0103afd>] syscall_exit_work+0x11/0x26

Joy - I so love entry.S

Best I can make of it:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    (no TRACE_IRQS_OFF)
      work_pending
        work_notifysig
          do_notify_resume()
            do_signal()
              get_signal_to_deliver()
                try_to_freeze()
                  refrigerator()
                    task_lock() -> check_flags() -> BANG

The normal path is:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    restore_all
      TRACE_IRQS_IRET
      iret

No idea why that would not warn..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:09 +02:00
Ingo Molnar
0b6a39f7eb Revert "x86: fix ioapic bug again"
This reverts commit 6e908947b4.

Németh Márton reported:

| there is a problem in 2.6.26-rc3 which was not there in case of
| 2.6.25: the CPU wakes up ~90,000 times per sec instead of ~60 per sec.
|
| I also "git bisected" the problem, the result is:
|
| 6e908947b4 is first bad commit
| commit 6e908947b4
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Fri Mar 21 14:32:36 2008 +0100
|
|     x86: fix ioapic bug again

the original problem is fixed by Maciej W. Rozycki in the tip/x86/apic
branch (confirmed by Márton), but those changes are too intrusive for
v2.6.26 so we'll go for the less intrusive (repeated) revert now.

Reported-and-bisected-by: Németh Márton <nm127@freemail.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:28 +02:00
Joe Korty
86b2b70e15 x86: fix asm warning in head_32.S
On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).

Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.

    arch/x86/kernel/head_32.S: Assembler messages:
    arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
    arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.

Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.

Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:12 +02:00
Ingo Molnar
3703f39965 geode: fix modular build
-tip testing found this build bug:

 MODPOST 331 modules
 ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
 ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
 make[1]: *** [__modpost] Error 1
 make: *** [modules] Error 2

with this config:

  http://redhat.com/~mingo/misc/config-Wed_Jun__4_18_01_59_CEST_2008.bad

export those symbols.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:25:51 +02:00
Cyrill Gorcunov
f781b03c4b x86: touch_nmi_watchdog(): reset alert counters for supported nmi_watchdog modes only
The checking 'if nmi_watchdog > 0' (ie NMI_NONE) is quite fast but it
has a side effect - it's taken even if nmi_watchdog = NMI_DISABLED.

Nowadays nmi_watchdog is set up to NMI_NONE by default so this condition
is properly taken most the time but we better show this explicitly.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 15:14:54 +02:00
Rafael J. Wysocki
6703f6d10d x86, gart: add resume handling
If GART IOMMU is used on an AMD64 system, the northbridge registers
related to it should be restored during resume so that memory is not
corrupted.  Make gart_resume() handle that as appropriate.

Ref. http://lkml.org/lkml/2008/5/25/96 and the following thread.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 14:11:25 +02:00
Ingo Molnar
bb6dfb32f9 Merge branch 'linus' into x86/gart 2008-06-12 11:27:22 +02:00
Andreas Herrmann
cd7a4e936d x86: PAT: fixed checkpatch errors (and whitespaces)
x86: PAT: fixed checkpatch errors (and whitespaces)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:24 +02:00
Andreas Herrmann
97cfab6ac4 x86: PAT: fix ambiguous paranoia check in pat_init()
Starting with commit 8d4a430085 (x86:
cleanup PAT cpu validation) the PAT CPU feature flag is not cleared
anymore. Now the error message

  "PAT enabled, but CPU feature cleared"

in pat_init() is misleading.

Furthermore the current code does not check for existence of the PAT
CPU feature flag if a CPU is whitelisted in validate_pat_support.

This patch clears pat_wc_enabled if boot CPU has no PAT feature flag
and adapts the paranoia check.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:22 +02:00
Andreas Herrmann
ee863ba7ab x86: unconditionally enable PAT for AMD CPUs
If PAT support is advertised it should just work. No errata known.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:20 +02:00
Yinghai Lu
e3f2baebf4 PCI/x86: early dump pci conf space v2
Allows us to dump PCI space before any kernel changes have been made.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:52 -07:00
Rafael J. Wysocki
1eede070a5 Introduce new top level suspend and hibernation callbacks
Introduce 'struct pm_ops' and 'struct pm_ext_ops' ('ext' meaning
'extended') representing suspend and hibernation operations for bus
types, device classes, device types and device drivers.

Modify the PM core to use 'struct pm_ops' and 'struct pm_ext_ops'
objects, if defined, instead of the ->suspend(), ->resume(),
->suspend_late(), and ->resume_early() callbacks (the old callbacks
will be considered as legacy and gradually phased out).

The main purpose of doing this is to separate suspend (aka S2RAM and
standby) callbacks from hibernation callbacks in such a way that the
new callbacks won't take arguments and the semantics of each of them
will be clearly specified.  This has been requested for multiple
times by many people, including Linus himself, and the reason is that
within the current scheme if ->resume() is called, for example, it's
difficult to say why it's been called (ie. is it a resume from RAM or
from hibernation or a suspend/hibernation failure etc.?).

The second purpose is to make the suspend/hibernation callbacks more
flexible so that device drivers can handle more than they can within
the current scheme.  For example, some drivers may need to prevent
new children of the device from being registered before their
->suspend() callbacks are executed or they may want to carry out some
operations requiring the availability of some other devices, not
directly bound via the parent-child relationship, in order to prepare
for the execution of ->suspend(), etc.

Ultimately, we'd like to stop using the freezing of tasks for suspend
and therefore the drivers' suspend/hibernation code will have to take
care of the handling of the user space during suspend/hibernation.
That, in turn, would be difficult within the current scheme, without
the new ->prepare() and ->complete() callbacks.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:50 -07:00
Thomas Gleixner
00dba56465 x86: move more common idle functions/variables to process.c
more unification. Should cause no change in functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:29 +02:00
Thomas Gleixner
09fd4b4ef5 x86: use cpuid to check MWAIT support for C1
cpuid(0x05) provides extended information about MWAIT in EDX when bit
0 of ECX is set. Bit 4-7 of EDX determine whether MWAIT is supported
for C1. C1E enabled CPUs have these bits set to 0.

Based on an earlier patch from Andi Kleen.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:21 +02:00
Thomas Gleixner
732d7be17b x86: use cpuinfo to check for interrupt pending message msr
Simplify code: no need to do a cpuid(1) again. The cpuinfo structure
has all necessary information already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:14 +02:00
Thomas Gleixner
aa83f3f2cf x86: cleanup C1E enabled detection
Rename the "MSR_K8_ENABLE_C1E" MSR to INT_PENDING_MSG, which is the
name in the data sheet as well. Move the C1E mask to the header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:07 +02:00
Thomas Gleixner
6ddd2a2794 x86: simplify idle selection
default_idle is selected in cpu_idle(), when no other idle routine is
selected. Select it in select_idle_routine() when mwait is not
selected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:01 +02:00
Paolo Ciarrocchi
5b80fe8bd7 x86: coding style fixes to arch/x86/kernel/sys_i386_32.c
Before:
total: 16 errors, 25 warnings, 246 lines checked

After:
total: 0 errors, 7 warnings, 246 lines checked

Compile tested.

paolo@paolo-desktop:/tmp$ size sys*
   text    data     bss     dec     hex filename
   1209       0       0    1209     4b9 sys_i386_32.o.after
   1209       0       0    1209     4b9 sys_i386_32.o.before

paolo@paolo-desktop:/tmp$ md5sum sys*
6144f6d6ce7342c3e192681a1ccaa1c1  sys_i386_32.o.after
6144f6d6ce7342c3e192681a1ccaa1c1  sys_i386_32.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:34:54 +02:00
Robert Richter
9e26d84273 fix build bug in "x86: add PCI extended config space access for AMD Barcelona"
Also much less code now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:32:53 +02:00
Miquel van Smoorenburg
b7f09ae583 x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
Currently arch/x86/kernel/pci-dma.c always adds __GFP_NORETRY
to the allocation flags, because it wants to be reasonably
sure not to deadlock when calling alloc_pages().

But really that should only be done in two cases:
- when allocating memory in the lower 16 MB DMA zone.
  If there's no free memory there, waiting or OOM killing is of no use
- when optimistically trying an allocation in the DMA32 zone
  when dma_mask < DMA_32BIT_MASK hoping that the allocation
  happens to fall within the limits of the dma_mask

Also blindly adding __GFP_NORETRY to the the gfp variable might
not be a good idea since we then also use it when calling
dma_ops->alloc_coherent(). Clearing it might also not be a
good idea, dma_alloc_coherent()'s caller might have set it
on purpose. The gfp variable should not be clobbered.

[ mingo@elte.hu: converted to delta patch ontop of previous version. ]

Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:22:18 +02:00
Andreas Herrmann
668231141f x86: fix compile warning in io_apic_{32,64}.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:21:10 +02:00
Abhishek Sagar
1d74f2a0f6 ftrace: remove ftrace_ip_converted()
Remove the unneeded function ftrace_ip_converted().

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:57:49 +02:00
Yinghai Lu
6780711e98 x86: update mptable, fix
need to call early_reserve_e820() to preallocate mptable for 32bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:35:04 +02:00
Yinghai Lu
d49c428840 x86: make generic arch support NUMAQ
... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.

NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:34:42 +02:00
Ingo Molnar
4e78c91abe Revert "x86, numaq: add pci_acpi_scan_root() stub"
This reverts commit f329469097.

That bug will be fixed in a better way via:

  x86: make generic arch support NUMAQ
2008-06-10 11:33:01 +02:00
Yinghai Lu
e0da336468 x86: introduce max_physical_apicid for bigsmp switching
a multi-socket test-system with 3 or 4 ioapics, when 4 dualcore cpus or
2 quadcore cpus installed, needs to switch to bigsmp or physflat.

CPU apic id is [4,11] instead of [0,7], and we need to check max apic
id instead of cpu numbers.

also add check for 32 bit when acpi is not compiled in or acpi=off.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:32:09 +02:00
Yinghai Lu
c3ff01672a x86: fix boot failure with 64GB+ system with numa 32-bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:59 +02:00
Yinghai Lu
db3660c190 x86: remove all active memory ranges before registering them again after trimming - 64bit
this way we keep the early_node_map all right.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:30:47 +02:00
Ingo Molnar
2377494285 Revert "x86, 32-bit: SRAT fix"
This reverts commit ea57a5a6db, a better
fix will be merged.
2008-06-09 10:57:16 +02:00
Ingo Molnar
ea57a5a6db x86, 32-bit: SRAT fix
we were adding reserved BIOS ranges as general memory as well => not good.

solves this crash:

[   20.068075] hostname used greatest stack depth: 6464 bytes left
[   20.121404] BUG: unable to handle kernel <1>BUG: unable to handle kernel NULL pointer dereference at 00000b8c
[   20.121404] IP: [<c01160ae>] kmap_atomic_prot+0x2d/0x1c3
[   20.121404] *pdpt = 00000000367eb001 *pde = 0000000000000000
[   20.121404] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   20.121404]
[   20.121404] Pid: 2061, comm: rc.sysinit Not tainted (2.6.26-rc3 #2440)
[   20.121404] EIP: 0060:[<c01160ae>] EFLAGS: 00010206 CPU: 0
[   20.121404] EIP is at kmap_atomic_prot+0x2d/0x1c3

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-06 16:38:35 +02:00
mingo@elte.hu
75b9f5d2a0 x86, nmi: fix build
fix:

arch/x86/kernel/built-in.o: In function `proc_nmi_enabled':
: undefined reference to `nmi_watchdog_default'
arch/x86/kernel/built-in.o: In function `native_smp_prepare_cpus':
: undefined reference to `nmi_watchdog_default'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-05 15:15:10 +02:00
Cyrill Gorcunov
1a1b1d1322 x86: watchdog - check for CPU is being supported
This patch does check if CPU is being recongnized
before call the unreserve(). Since enable_lapic_nmi_watchdog()
does have such a check the same is make sense here too
in a sake of code consistency (but nothing more).

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:14:14 +02:00
Cyrill Gorcunov
3ed3f06295 x86: nmi - consolidate nmi_watchdog_default for 32bit mode
64bit mode bootstrap code does set nmi_watchdog to NMI_NONE
by default and doing the same on 32bit mode is safe too.
Such an action saves us from several #ifdef.

Btw, my previous commit

commit 19ec673ced
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date:   Wed May 28 23:00:47 2008 +0400

    x86: nmi - fix incorrect NMI watchdog used by default

did not fix the problem completely, moreover it
introduced additional bug - nmi_watchdog would be
set to either NMI_LOCAL_APIC or NMI_IO_APIC
_regardless_ to boot option if being enabled thru
/proc/sys/kernel/nmi_watchdog. Sorry for that.
Fix it too.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:13:59 +02:00
Krzysztof Oledzki
74e411cb64 x86: add another PCI ID for ICH6 force hpet.
Tested on Asus P5GDC-V

$ lspci -n -n |grep ISA
00:1f.0 ISA bridge [0601]: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface Bridge [8086:2640] (rev 03)

Force enabled HPET at base address 0xfed00000
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:11:35 +02:00
Huang, Ying
c45a707dbe x86: linked list of setup_data for i386
This patch adds linked list of struct setup_data supported for i386.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
0c51a965ed x86: extract common part of head32.c and head64.c into head.c
This patch extracts the common part of head32.c and head64.c into head.c.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
ecacf09f7d x86: reserve EFI memory map with reserve_early
This patch reserves the EFI memory map with reserve_early(). Because EFI
memory map is allocated by bootloader, if it is not reserved by
reserved_early(), it may be overwritten through address returned by
find_e820_area().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
d0ec2c6f2c x86: reserve highmem pages via reserve_early
This patch makes early reserved highmem pages become reserved
pages. This can be used for highmem pages allocated by bootloader such
as EFI memory map, linked list of setup_data, etc.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
d3fbe5ea95 x86: split out common code into find_overlapped_early()
This patch clean up reserve_early() family functions by extracting the
common part of reserve_early(), free_early() and bad_addr() into
find_overlapped_early().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:01 +02:00
Pavel Machek
4f384f8bcd x86: aperture_64.c: corner case wrong
If

fix == 0, aper_enabled == 1, gart_fix_e820 == 0

	if (!fix && !aper_enabled)
		return;

	if (gart_fix_e820 && !fix && aper_enabled) {
		if (e820_any_mapped(aper_base, aper_base + aper_size,
				    E820_RAM)) {
			/* reserve it, so we can reuse it in second kernel */
			printk(KERN_INFO "update e820 for GART\n");
			add_memory_region(aper_base, aper_size, E820_RESERVED);
			update_e820();
		}
		return;
	}

	/* different nodes have different setting, disable them all atfirst*/

we'll fall back here and disable all the settings, even when they were
all consistent.

What about this? (I hope it compiles...)

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 13:59:13 +02:00
Pavel Machek
fa5b8a30cf aperture_64.c: duplicated code, buggy?
Hi!

void __init early_gart_iommu_check(void)

contains

	for (num = 24; num < 32; num++) {
		if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
			continue;

loop, with very similar loop duplicated in

void __init gart_iommu_hole_init(void)

. First copy of a loop seems to be buggy, too. It uses 0 as a "nothing
set" value, which may actually bite us in last_aper_enabled case
(because it may be often zero).

(Beware, it is hard to test this patch, because this code has about
2^8 different code paths, depending on hardware and cmdline settings).

Plus, the second loop does not check for consistency of
aper_enabled. Should it?

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 13:58:42 +02:00
Yinghai Lu
bd70e522af x86: e820 max_arch_pfn typo fix for 64 bit
should use right shift

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 23:15:33 +02:00
Suresh Siddha
870568b390 x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack
Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU
optimization".

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

It's a side effect though. This is the failing scenario:

process 'A' in save_i387_ia32() just after clear_used_math()

Got an interrupt and pre-empted out.

At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()

This results in init_fpu() during the __switch_to()'s math_state_restore()

And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.

Bisected-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-06-04 16:21:24 +02:00
Pavel Machek
cd76374e9d suspend-vs-iommu: prevent suspend if we could not resume
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume.  Prevent system suspend in case we
would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Suresh Siddha
e8a496ac8c x86: fix broken math-emu with lazy allocation of fpu area
Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.

math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Ingo Molnar
deef325086 x86: disable preemption in native_smp_prepare_cpus
Priit Laes reported the following warning:

Call Trace:
 [<ffffffff8022f1e1>] warn_on_slowpath+0x51/0x63
 [<ffffffff80282e48>] sys_ioctl+0x2d/0x5d
 [<ffffffff805185ff>] _spin_lock+0xe/0x24
 [<ffffffff80227459>] task_rq_lock+0x3d/0x73
 [<ffffffff805133c3>] set_cpu_sibling_map+0x336/0x350
 [<ffffffff8021c1b8>] read_apic_id+0x30/0x62
 [<ffffffff806d921d>] verify_local_APIC+0x90/0x138
 [<ffffffff806d84b5>] native_smp_prepare_cpus+0x1f9/0x305
 [<ffffffff806ce7b1>] kernel_init+0x59/0x2d9
 [<ffffffff80518a26>] _spin_unlock_irq+0x11/0x2b
 [<ffffffff8020bf48>] child_rip+0xa/0x12
 [<ffffffff806ce758>] kernel_init+0x0/0x2d9
 [<ffffffff8020bf3e>] child_rip+0x0/0x12

fix this by generally disabling preemption in native_smp_prepare_cpus().

Reported-and-bisected-by: Priit Laes <plaes@plaes.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Yinghai Lu
fb3bbd6a66 x86: fix APIC warning on 32bit v2
for http://bugzilla.kernel.org/show_bug.cgi?id=10613

BIOS bug, APIC version is 0 for CPU#0! fixing up to 0x10. (tell your hw vendor)

v2: fix 64 bit compilation

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00