Commit Graph

832 Commits

Author SHA1 Message Date
Eric W. Biederman
8b955b0ddd [PATCH] Initial generic hypertransport interrupt support
This patch implements two functions ht_create_irq and ht_destroy_irq for
use by drivers.  Several other functions are implemented as helpers for
arch specific irq_chip handlers.

The driver for the card I tested this on isn't yet ready to be merged.
However this code is and hypertransport irqs are in use in a few other
places in the kernel.  Not that any of this will get merged before 2.6.19

Because the ipath-ht400 is slightly out of spec this code will need to be
generalized to work there.

I think all of the powerpc uses are for a plain interrupt controller in a
chipset so support for native hypertransport devices is a little less
interesting.

However I think this is a half way decent model on how to separate arch
specific and generic helper code, and I think this is a functional model of
how to get the architecture dependencies out of the msi code.

[akpm@osdl.org: Kconfig fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
cd1182f56a [PATCH] genirq: x86_64 irq: Kill irq compression
With more irqs in the system we don't need this.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
f023d764cc [PATCH] genirq: x86_64 irq: Kill gsi_irq_sharing
After raising the number of irqs the system supports this function is no
longer necessary.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
550f2299ac [PATCH] genirq: x86_64 irq: make vector_irq per cpu
This refactors the irq handling code to make the vectors a per cpu resource so
the same vector number can be simultaneously used on multiple cpus for
different irqs.

This should make systems that were hitting limits on the total number of irqs
much more livable.

[akpm@osdl.org: build fix]
[akpm@osdl.org: __target_IO_APIC_irq is unneeded on UP]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
e500f57436 [PATCH] genirq: x86_64 irq: Make the external irq handlers report their vector, not the irq number
This is a small pessimization but it paves the way for making this information
per cpu.  Which allows the the maximum number of IRQS to become NR_CPUS*224.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
04b9267b15 [PATCH] genirq: x86_64 irq: Remove the msi assumption that irq == vector
This patch removes the change in behavior of the irq allocation code when
CONFIG_PCI_MSI is defined.  Removing all instances of the assumption that irq
== vector.

create_irq is rewritten to first allocate a free irq and then to assign that
irq a vector.

assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an
vector not bound to an irq is removed.

The ioapic vector methods are removed, and everything now works with irqs.

The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
589e367f9b [PATCH] genirq: x86_64 irq: Move msi message composition into io_apic.c
This removes the hardcoded assumption that irq == vector in the msi
composition code, and it allows the msi message composition to setup logical
mode, or lowest priorirty delivery mode as we do for other apic interrupts,
and with the same selection criteria.

Basically this moves the problem of what is in the msi message into the
architecture irq management code where it belongs.  Not in a generic layer
that doesn't have enough information to compose msi messages properly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
c4fa0bbf38 [PATCH] genirq: x86_64 irq: Dynamic irq support
The current implementation of create_irq() is a hack but it is the current
hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on
this hack.  Thus we are this hack of assuming irq == vector until the
depencencies in the generic irq code are removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
0be6652f1e [PATCH] genirq: x86_64 irq: Reenable migrating irqs to other cpus
In the latest changes the code for migrating x86_64 irqs was dropped.  This
reads it in a fashion that will work even if we change the vector on level
triggered irqs when we migrate them.

[akpm@osdl.org: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:26 -07:00
Ingo Molnar
f29bd1ba68 [PATCH] genirq: convert the x86_64 architecture to irq-chips
This patch converts all the x86_64 PIC controllers layers to the new and
simpler irq-chip interrupt handling layer.

[mingo@elte.hu: The patch also enables the fasteoi handler for x86_64]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:25 -07:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Andrew Morton
0e4a523fa3 [PATCH] revert "insert IOAPIC(s) and Local APIC into resource map"
Commit 54dbc0c9eb is causing various
people's machines to fail to map PCI resources.

Revert it in preparation for addressing the show-APICs-in-/proc/iomem
requirement in a different manner.

Cc: Aaron Durbin <adurbin@google.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 19:46:18 -07:00
Arnd Bergmann
3db03b4afb [PATCH] rename the provided execve functions to kernel_execve
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly.  Rename these to kernel_execve to
get the right semantics there.  Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:23 -07:00
Serge E. Hallyn
96b644bdec [PATCH] namespaces: utsname: use init_utsname when appropriate
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
e9ff3990f0 [PATCH] namespaces: utsname: switch to using uts namespaces
Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
0437eb594e [PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c.  This
avoids all arches having to be updated.  Compiles and boots on s390.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Serge E. Hallyn
ab516013ad [PATCH] namespaces: add nsproxy
This patch adds a nsproxy structure to the task struct.  Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.

The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.

[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
bibo,mao
99219a3fbc [PATCH] kretprobe spinlock deadlock patch
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock.  This patch moves kfree function out scope of
kretprobe_lock.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
bibo,mao
62c27be0dd [PATCH] kprobe whitespace cleanup
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Atsushi Nemoto
8ef386092d [PATCH] kill wall_jiffies
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.

This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1".  This condition is never met so I
suppose it is just a bug.  I just remove that condition only instead of
kill the whole "if" block.

[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:27 -07:00
Andi Kleen
34596dc9e5 [PATCH] Define vsyscall cache as blob to make clearer that user space shouldn't use it
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Vivek Goyal
120b114237 [PATCH] Re-positioning the bss segment
[AK: This apparently broke some systems, but we need it to fix
a compile problem with old binutils and in theory the patch
is correct. So let's trying reenabling it again.]

o Currently bss segment is being placed somewhere in the middle (after .data)
  section and after bss lots of init section and data sections are coming.
  Is it intentional?

o One side affect of placing bss in the middle is that objcopy keeps the
  bss in raw binary image (vmlinux.bin) hence unnecessarily increasing
  the size of raw binary image. (In my case ~600K). It also increases
  the size of generated bzImage, though the increase is very small
  (896 bytes), probably a very high compression ratio for stream
  of zeros.

o This patch moves the bss at the end hence reducing the size of
  bzImage by 896 bytes and size of vmlinux.bin by 600K.

o This change benefits in the context of relocatable kernel patches. If
  kernel bss is not part of compressed data (vmlinux.bin) then it does
  not have to be decompressed and this area can be used by the decompressor
  for its execution hence keeping the memory requirements bounded and
  decompressor code does not stomp over any other data loaded beyond
  kernel image (As might be the case with bootloaders like kexec).

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
9d0ef4fd61 [PATCH] Use ARRAY_SIZE in setup.c
Based on i386 patch from Bjorn.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
29cbc78b90 [PATCH] x86: Clean up x86 NMI sysctls
Use prototypes in headers
Don't define panic_on_unrecovered_nmi for all architectures

Cc: dzickus@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
013bf2c50e [PATCH] Refactor some duplicated code in mpparse.c
No logic changes

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
d802ab981d [PATCH] Document iommu=panic
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
ded318ec80 [PATCH] Fix broken indentation in iommu_setup
No functional changes; only white space.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
ece6684012 [PATCH] Allow disabling DAC using command line options
Might or might not work around some reported bugs on VIA systems.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Atsushi Nemoto
3171a0305d [PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem)
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.

Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update.  Passing ticks
get rid of this redundant calculation.  Also there are another redundancy
pointed out by Martin Schwidefsky.

This cleanup make a barrier added by
5aee405c66 needless.  So this patch removes
it.

As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies.  (This patch does not really
remove wall_jiffies.  It would be another cleanup patch)

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:15 -07:00
Rolf Eike Beer
d6bd3a39f7 [PATCH] Move valid_dma_direction() from x86_64 to generic code
As suggested by Muli Ben-Yehuda this function is moved to generic code as
may be useful for all archs.

[akpm@osdl.org: fix]
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:10 -07:00
Mel Gorman
5cb248abf5 [PATCH] Have x86_64 use add_active_range() and free_area_init_nodes
Size zones and holes in an architecture independent manner for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Rafael J. Wysocki
75534b50cc [PATCH] Change the name of pagedir_nosave
The name of the pagedir_nosave variable does not make sense any more, so it
seems reasonable to change it to something more meaningful.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:49:01 -07:00
Rafael J. Wysocki
e8eff5ac29 [PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64
On x86_64 machines with more than 2 GB of RAM there are large memory gaps
(with no corresponding kernel virtual addresses) and reserved memory
regions between areas of usable physical RAM.  Moreover, if CONFIG_FLATMEM
is set, they appear within the normal zone.  swsusp should not try to save
them, so the corresponding page structs have to be marked as 'nosave'.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:58 -07:00
Andrew Morton
a3bc0dbc81 [PATCH] smp_call_function_single() cleanup
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.

Move it into <linux/smp.h>.

Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:56 -07:00
Clemens Ladisch
1447c27d38 [PATCH] hpet rtc emulation: add watchdog timer
To prevent the emulated RTC timer from stopping when interrupts are delayed
for too long, disable interrupts around all of the register initialization,
and check that the interrupt handler did not schedule the next interrupt in
the past.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:54 -07:00
Dave Jones
dcf10307c3 [PATCH] i386/x86-64: New Intel feature flags
Add supplemental SSE3 instructions flag, and Direct Cache Access flag.
As described in "Intel Processor idenfication and the CPUID instruction
AP485 Sept 2006"

AK: also added for x86-64

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dmitriy Zavin
3222b36f46 [PATCH] x86: Add a cumulative thermal throttle event counter.
The counter is exported to /sys that keeps track of the
number of thermal events, such that the user knows how bad the
thermal problem might be (since the logging to syslog and mcelog
is rate limited).

AK: Fixed cpu hotplug locking

Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dmitriy Zavin
15d5f83983 [PATCH] x86: Refactor thermal throttle processing
Refactor the event processing (syslog messaging and rate limiting)
into separate file therm_throt.c. This allows consistent reporting
of CPU thermal throttle events.

After ACK'ing the interrupt, if the event is current, the user
(p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit)
the event. If that function returns 1, the user has the option to log
things further (such as to mce_log in x86_64).

AK: minor cleanup

Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Andi Kleen
0637a70a5d [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
Some buggy systems can machine check when config space accesses
happen for some non existent devices.  i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.

This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.

Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
f157cbb1eb [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
This is useful on systems with broken PCI bus. Affects various
scans in x86-64 and i386's early ACPI quirk scan.

Cc: gregkh@suse.de
Cc: len.brown@intel.com
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
658fdbef66 [PATCH] Don't leak NT bit into next task
SYSENTER can cause a NT to be set which might cause crashes on the IRET
in the next task.

Following similar i386 patch from Linus.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Jan Beulich
adf1423698 [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
Current gcc generates calls not jumps to noreturn functions. When that happens the
return address can point to the next function, which confuses the unwinder.

This patch works around it by marking asynchronous exception
frames in contrast normal call frames in the unwind information.  Then teach
the unwinder to decode this.

For normal call frames the unwinder now subtracts one from the address which avoids
this problem.  The standard libgcc unwinder uses the same trick.

It doesn't include adjustment of the printed address (i.e. for the original
example, it'd still be kernel_math_error+0 that gets displayed, but the
unwinder wouldn't get confused anymore.

This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
unfortunately because earlier binutils don't support .cfi_signal_frame

[AK: added automatic detection of the new binutils and wrote description]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
dd54a11004 [PATCH] Remove all traces of signal number conversion
This was old code that was needed for iBCS and x86-64 never supported that.

Pointed out by Albert Cahalan
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
2049336f60 [PATCH] Don't synchronize time reading on single core AMD systems
We do some additional CPU synchronization in gettimeofday et.al. to make
sure the time stamps are always monotonic over multiple CPUs. But on
single core systems that is not needed. So don't do it.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
27fbe5b28a [PATCH] Use string instructions for Core2 copy/clear
It is faster than using a unrolled loop for the use cases the kernel
cares about (cached, sizes typically < 4K)

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Matthew Garrett
35d534a3ff [PATCH] x86: - restore i8259A eoi status on resume
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
auto_eoi has been set. Keep track of the current status and restore that
on resume. This fixes it for AMD64 and i386.

Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Aaron Durbin
56dd669a13 [PATCH] Insert GART region into resource map
Patch inserts the GART region into the iomem resource map. The GART will then
be visible within /proc/iomem. It will also allow for other users
utilizing the GART to subreserve the region (agp or IOMMU).

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
a15da49deb [PATCH] Fix idle notifiers
Previously exit_idle would be called more often than enter_idle

Now instead of using complicated tests just keep track of it
using the per CPU variable as a flip flop.  I moved the idle state into the
PDA to make the access more efficient.

Original bug report and an initial patch from Stephane Eranian,
but redone by AK.

Cc: Stephane Eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
b38337a624 [PATCH] Mark per cpu data initialization __initdata again
Before 2.6.16 this was changed to work around code that accessed
CPUs not in the possible map. But that code should be all fixed now,
so mark it __initdata again.
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
3022d734a5 [PATCH] Fix zeroing on exception in copy_*_user
- Don't zero for __copy_from_user_inatomic following i386.
This will prevent spurious zeros for parallel file system writers when
one does a exception
- The string instruction version didn't zero the output on
exception. Oops.

Also I cleaned up the code a bit while I was at it and added a minor
optimization to the string instruction path.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
adurbin@google.com
54dbc0c9eb [PATCH] insert IOAPIC(s) and Local APIC into resource map
This patch places the IOAPIC(s) and the Local APIC specified by ACPI
tables into the resource map. The APICs will then be visible within
/proc/iomem

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
7b0bda74f7 [PATCH] Fix a PDA warning uncovered by the new type checking
Fix
linux/arch/x86_64/kernel/process.c: In function __switch_to:
linux/arch/x86_64/kernel/process.c:626: warning: assignment makes integer from pointer without a cast

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
96e540492a [PATCH] Fix a irqcount comment in entry.S
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Arjan van de Ven
0a42540580 [PATCH] Add the canary field to the PDA area and the task struct
This patch adds the per thread cookie field to the task struct and the PDA.
Also it makes sure that the PDA value gets the new cookie value at context
switch, and that a new task gets a new cookie at task creation time.

Signed-off-by: Arjan van Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
e8c7391de4 [PATCH] Don't use kernel_text_address in oops context
Because it can take spinlocks.

Suggested by Mathieu Desnoyers

Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Magnus Damm
4bfaaef01a [PATCH] Avoid overwriting the current pgd (V4, x86_64)
kexec: Avoid overwriting the current pgd (V4, x86_64)

This patch upgrades the x86_64-specific kexec code to avoid overwriting the
current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
to start a secondary kernel that dumps the memory of the previous kernel.

The code introduces a new set of page tables. These tables are used to provide
an executable identity mapping without overwriting the current pgd.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Keith Owens
f574164491 [PATCH] Remove most of the special cases for the debug IST stack
Remove most of the special cases for the debug IST stack.  This is a
follow on clean up patch, it requires the bug fix patch that adds
orig_ist.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
53ee11ae0d [PATCH] Optimize PDA accesses slightly
Based on a idea by Jeremy Fitzhardinge:

Replace the volatiles and memory clobbers in the PDA access with
telling gcc about access to a proxy PDA structure that doesn't
actually exist. But the dummy accesses give a defined ordering for
read/write accesses.

Also add some memory barriers to the early GS initialization to
make sure no PDA access is moved before it.

Advantage is some .text savings (probably most from better
code for accessing "current"):

   text    data     bss     dec     hex filename
4845647 1223688  615864 6685199  66020f vmlinux
4837780 1223688  615864 6677332  65e354 vmlinux-pda

1.2% smaller code

Cc:  Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Ian Campbell
f2a9e1dec2 [PATCH] Put .note.* sections into a PT_NOTE segment
This patch updates x86_64 linker script to pack any .note.* sections
into a PT_NOTE segment in the output file.

To do this, we tell ld that we need a PT_NOTE segment.  This requires
us to start explicitly mapping sections to segments, so we also need
to explicitly create PT_LOAD segments for text and data, and map the
sections to them appropriately.  Fortunately, each section will
default to its previous section's segment, so it doesn't take many
changes to vmlinux.lds.S.

The corresponding change is already made for i386 in -mm and I'd like
this patch to join it. The section to segment mappings do change as do
the segment flags so some time in -mm would be good for that reason as
well, just in case.

In particular .data and .bss move from the text segment to the data
segment and .data.cacheline_aligned .data.read_mostly are put in the
data segment instead of a separate one.

I think that it would be possible to exactly match the existing section
to segment mapping and flags but it would be a more intrusive change and
I'm not sure there is a reason for the existing layout other than it is
what you get by default if you don't explicitly specify something else.
If there is a reason for the existing layout then I will of course make
the more intrusive change. If there is no reason we could probably drop
the executable or writable flags from some segments but I don't know how
much attention is paid to them anyway so it might not be worth the
effort.

The vsyscall related sections need to go in a different segment to the
normal data segment and so I invented a "user" segment to contain them.
I believe this should appear to be another data segment as far as the
kernel is concerned so the flags are setup accordingly.

The notes will be used in the Xen paravirt_ops backend to provide
additional information to the domain builder. I am in the process of
converting the xen-unstable kernels and tools over to this scheme at the
moment to support this in the future.

It has been suggested to me that the notes segment should have flags 0
(i.e. not readable) since it is only used by the loader and is not used
at runtime. For now I went with a readable segment since that is what
the i386 patch uses.

AK: dropped NOTES addition right now because the needed infrastructure
for that is not merged yet

Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Eric W. Biederman
26374c7b7d [PATCH] Reload CS when startup_64 is used.
In long mode the %cs is largely a relic.  However there are a few cases
like iret where it matters that we have a valid value.  Without this
patch it is possible to enter the kernel in startup_64 without setting
%cs to a valid value.  With this patch we don't care what %cs value
we enter the kernel with, so long as the cs shadow register indicates
it is a privileged code segment.

Thanks to Magnus Damm for finding this problem and posting the
first workable patch.  I have moved the jump to set %cs down a
few instructions so we don't need to take an extra jump.  Which
keeps the code simpler.

Signed-of-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
8380aabb99 [PATCH] Remove non e820 fallbacks in high level code
Drop support for non e820 BIOS calls to get the memory map.

The boot assembler code still has some support, but not the C code now.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
7a0a2dff1c [PATCH] Add a missing check for irq flags tracing in NMI
NMIs are not supposed to track the irq flags, but TRACE_IRQS_IRETQ
did it anyways. Add a check.

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
aecc63615e [PATCH] Fix coding style and output of the mptable parser
Give the printks a consistent prefix.
Add some missing white space.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
e4251e130d [PATCH] Remove some cruft in apic id checking during processor setup
- Remove a define that was used only once
- Remove the too large APIC ID check because we always support
the full 8bit range of APICs.
- Restructure code a bit to be simpler.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
f2c2cca3ac [PATCH] Remove APIC version/cpu capability mpparse checking/printing
ACPI went to great trouble to get the APIC version and CPU capabilities
of different CPUs before passing them to the mpparser. But all
that data was used was to print it out.  Actually it even faked some data
based on the boot cpu, not on the actual CPU being booted.

Remove all this code because it's not needed.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
151f8cc116 [PATCH] Remove safe_smp_processor_id()
And replace all users with ordinary smp_processor_id.  The function
was originally added to get some basic oops information out even
if the GS register was corrupted. However that didn't
work for some anymore because printk is needed to print the oops
and it uses smp_processor_id() already. Also GS register corruptions
are not particularly common anymore.

This also helps the Xen port which would otherwise need to
do this in a special way because it can't access the local APIC.

Cc: Chris Wright <chrisw@sous-sol.org>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Rafael J. Wysocki
34464a5b89 [PATCH] Detect clock skew during suspend
Detect the situations in which the time after a resume from disk would
be earlier than the time before the suspend and prevent them from
happening on x86_64.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
dbf9272e86 [PATCH] Don't force reserve the 640k-1MB range
From i386 x86-64 inherited code to force reserve the 640k-1MB area.
That was needed on some old systems.

But we generally trust the e820 map to be correct on 64bit systems
and mark all areas that are not memory correctly.

This patch will allow to use the real memory in there.

Or rather the only way to find out if it's still needed is to
try. So far I'm optimistic.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Magnus Damm
ed77504b20 [PATCH] mark init_amd() as __cpuinit
The init_amd() function is only called from identify_cpu() which is already
marked as __cpuinit. So let's mark it as __cpuinit.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Andrew Morton
abf0f10948 [PATCH] wire up oops_enter()/oops_exit()
Implement pause_on_oops() on x86_64.

AK: I redid the patch to do the oops_enter/exit in the existing
oops_begin()/end(). This makes it much shorter.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Arjan van de Ven
e07e23e1fd [PATCH] non lazy "sleazy" fpu implementation
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily.  This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time).  However for very frequent FPU users...  you
take an extra trap every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.  If the app indeed uses the FPU, the
trap is avoided.  (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again).  The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.

[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:36 +02:00
Eric W. Biederman
ba4d40bb5c [PATCH] Auto size the per cpu area.
Now for a completely different but trivial approach.
I just boot tested it with 255 CPUS and everything worked.

Currently everything (except module data) we place in
the per cpu area we know about at compile time.  So
instead of allocating a fixed size for the per_cpu area
allocate the number of bytes we need plus a fixed constant
for to be used for modules.

It isn't perfect but it is much less of a pain to
work with than what we are doing now.

AK: fixed warning

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
2717941c6a [PATCH] Make boot_param_data pure BSS
Since it's all zero.

Actually I think gcc 4+ will do that automatically, but earlier compilers won't
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Dimitri Sivanich
cbf9b4bb76 [PATCH] X86_64 monotonic_clock goes backwards
I've noticed some erratic behavior while testing the X86_64 version
of monotonic_clock().

While spinning in a loop reading monotonic clock values (pinned to a
single cpu) I noticed that the difference between subsequent values
occasionally went negative (time going backwards).

I found that in the following code:
                this_offset = get_cycles_sync();
                /* FIXME: 1000 or 1000000? */
-->             offset = (this_offset - last_offset)*1000 / cpu_khz;
        }
        return base + offset;

the offset sometimes turns out to be 0, even though
this_offset > last_offset.

+Added fix From: Toyo Abe <toyoa@mvista.com>

The x86_64-mm-monotonic-clock.patch in 2.6.18-rc4-mm2 made a change to
the updating of monotonic_base. It now uses cycles_2_ns().

I suggest that a set_cyc2ns_scale() should be done prior to the setup_irq().
Because cycles_2_ns() can be called from the timer ISR right after the irq0
is enabled.

Signed-off-by: Toyo Abe <toyoa@mvista.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Prasanna S.P
d28c4393a7 [PATCH] x86: error_code is not safe for kprobes
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.

AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.

+From: Jeremy Fitzhardinge <jeremy@goop.org>

KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.

Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup.  Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup).  It's
surprising this didn't cause more havok.

The fix is to use .pushsection/.popsection, so this stuff nests
properly.  A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.

+From: Chuck Ebbert <76306.1226@compuserve.com>

Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.

Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
be7a91709b [PATCH] Check for end of stack trace before falling back
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
c0b766f13d [PATCH] Merge stacktrace and show_trace
This unifies the standard backtracer and the new stacktrace
in memory backtracer. The standard one is converted to use callbacks
and then reimplement stacktrace using new callbacks.

The main advantage is that stacktrace can now use the new dwarf2 unwinder
and avoid false positives in many cases.

I kept it simple to make sure the standard backtracer stays reliable.

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
b7f5e3c774 [PATCH] Don't access the APIC in safe_smp_processor_id when it is not mapped yet
Lockdep can call the dwarf2 unwinder early, and the dwarf2 code
uses safe_smp_processor_id which tries to access the local APIC page.
But that doesn't work before the APIC code has set up its fixmap.

Check for this case and always return boot cpu then.

Cc: jbeulich@novell.com
Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
5a1b3999d6 [PATCH] x86: Some preparationary cleanup for stack trace
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Muli Ben-Yehuda
4ea8a5d8b5 [PATCH] Calgary IOMMU: eradicate sole remaining 80 chars per line offender
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
4ccf4ae314 [PATCH] remove tce_cache_blast_stress()
tce_cache_blast_stress was useful during bringup to stress the IOMMU's
cache flushing. Now that we quiesce DMAs on every cache flush, using
_stress() brings the machine down to its knees once you put it under
load. Remove this debug / bringup code that isn't useful anymore
completely.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
796e4390e0 [PATCH] only verify the allocation bitmap if CONFIG_IOMMU_DEBUG is on
Introduce new function verify_bit_range(). Define two versions, one
for CONFIG_IOMMU_DEBUG enabled and one for disabled. Previously we
were checking that the bitmap was consistent every time we allocated
or freed an entry in the TCE table, which is good for debugging but
incurs an unnecessary penalty on non debug builds.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
de684652f3 [PATCH] print whether CONFIG_IOMMU_DEBUG is enabled
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Chuck Ebbert
2ade2920dc [PATCH] i386/x86-64: rename is_at_popf(), add iret to tests and fix
is_at_popf() needs to test for the iret instruction as well as
popf.  So add that test and rename it to is_setting_trap_flag().

Also change max insn length from 16 to 15 to match reality.

LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Fernando Luis Vázquez Cao
2b94ab2fd5 [PATCH] Replace local_save_flags+local_irq_disable with
The combination of "local_save_flags" and "local_irq_disable" seems to be
equivalent to "local_irq_save" (see code snips below). Consequently, replace
occurrences of local_save_flags+local_irq_disable with local_irq_save.

* local_irq_save
#define raw_local_irq_save(flags) \
                do { (flags) = __raw_local_irq_save(); } while (0)

static inline unsigned long __raw_local_irq_save(void)
{
        unsigned long flags = __raw_local_save_flags();

        raw_local_irq_disable();

        return flags;
}

* local_save_flags
#define raw_local_save_flags(flags) \
                do { (flags) = __raw_local_save_flags(); } while (0)

Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
131cfd7bd5 [PATCH] Add sparse annotation to vsyscall.c
Fixes

linux/arch/x86_64/kernel/vsyscall.c:276:7: warning: constant 0x0f40000000000 is so big it is long
linux/arch/x86_64/kernel/vsyscall.c:80:14: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:80:14:    expected void const volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:80:14:    got void *<noident>
linux/arch/x86_64/kernel/vsyscall.c:200:7: warning: incorrect type in assignment (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:200:7:    expected unsigned short [usertype] *map1
linux/arch/x86_64/kernel/vsyscall.c:200:7:    got void [noderef] *<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:203:7: warning: incorrect type in assignment (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:203:7:    expected unsigned short [usertype] *map2
linux/arch/x86_64/kernel/vsyscall.c:203:7:    got void [noderef] *<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:215:10: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:215:10:    expected void volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:215:10:    got unsigned short [usertype] *map2
linux/arch/x86_64/kernel/vsyscall.c:217:10: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:217:10:    expected void volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:217:10:    got unsigned short [usertype] *map1

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
3bd4d18cba [PATCH] Move e820 map into e820.c
Minor cleanup. Keep setup.c free from unrelated clutter.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
c31fbb1ad8 [PATCH] Clean up acpi_numa variable
Move it into srat.c No need to clutter up setup.c for it

And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
df3bb57d2c [PATCH] i386/x86-64: Move acpi_disabled variables into acpi/boot.c
Removes code duplication between i386/x86-64.

Not needed anymore in setup.c since early_param cleanup

Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
43c85c9c5d [PATCH] Remove need for early lockdep init
I think it was only needed for the printks and we can do them later.

I put in a single early_printk so that we know the kernel is alive
(early_printk doesn't need any locks)

This makes some things easier for initialization of unwind for
lockdep, which is needed by later patches.

cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Andi Kleen
2c8c0e6b8d [PATCH] Convert x86-64 to early param
Instead of hackish manual parsing

Requires earlier i386 patchkit, but also fixes i386 early_printk again.

I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)

Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)

This cleans up a lot of code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Andi Kleen
9ca33eb698 [PATCH] Use early CPU identify before early command line parsing
This makes it possible to modify CPU flags in command line
options without hacks.

And remove another copy in head64.c

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Chuck Ebbert
d4d35854a1 [PATCH] remove lock prefix from is_at_popf() tests
The lock prefix will cause an exception when used with the
popf instruction, so no need to continue searching after it's
found.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Muli Ben-Yehuda
145106e810 [PATCH] remove superflous BUG_ON's in nommu and gart
There's no need to check for invalid DMA data direction in nommu and
gart since we do it in dma-mapping.h anyway before calling the
individual dma-ops.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Eric W. Biederman
29a6c25bd6 [PATCH] Fix gdt table size in trampoline.S
Allows easier extension of the GDT by using the proper C symbol
for the size in the descriptor.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Chuck Ebbert
a752d7194c [PATCH] fix is_at_popf() for compat tasks
When testing for the REX instruction prefix, first check
for 32-bit mode because in compat mode the REX prefix is an
increment instruction.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Muli Ben-Yehuda
0577f148b5 [PATCH] Calgary IOMMU: save a bit of space in bus_info
Make translation_disabled a uchar rather than an int

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
a4fc520a0f [PATCH] Calgary IOMMU: calgary_init_one_nontraslated() can return void
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
871b17008e [PATCH] Calgary IOMMU: fix reference counting of Calgary PCI devices
The pci_get_device() API decrements the reference count on the 'from'
parameter when it continues searching. Therefore, take a ref count on
Calgary bus when we initialize them in either translated or
non-translated mode.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00