Commit Graph

642 Commits

Author SHA1 Message Date
Jan Beulich
00f1ea6967 [PATCH] x86: adjust inclusion of asm/fixmap.h
Move inclusion of asm/fixmap.h to where it is really used rather than
where it may have been used long ago (requires a few other adjustments
to includes due to previous implicit dependencies).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Ingo Molnar
3c43f03908 [PATCH] x86: default to physical mode on hotplug CPU kernels
Default to physical mode on hotplug CPU kernels.  Furher simplify and clean up
the APIC initialization code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-05-02 19:27:04 +02:00
Andrew Morton
a86f34b49f [PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525
Obsoleted by Ingo's genapic stuff.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Eric Dumazet
92f37fd2ee [NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support
Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt  SO_TIMESTAMPNS.

This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)

Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP

A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.

sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:21 -07:00
Eric Dumazet
ae40eb1ef3 [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:04 -07:00
Stephen Hemminger
3927f2e8f9 [NET]: div64_64 consolidate (rev3)
Here is the current version of the 64 bit divide common code.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:33 -07:00
Zachary Amsden
49f1971051 [PATCH] Proper fix for highmem kmap_atomic functions for VMI for 2.6.21
Since lazy MMU batching mode still allows interrupts to enter, it is
possible for interrupt handlers to try to use kmap_atomic, which fails when
lazy mode is active, since the PTE update to highmem will be delayed.  The
best workaround is to issue an explicit flush in kmap_atomic_functions
case; this is the only way nested PTE updates can happen in the interrupt
handler.

Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix.

This patch gets reverted again when we start 2.6.22 and the bug gets fixed
differently.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-08 19:47:55 -07:00
Andi Kleen
3556ddfa92 [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
AMD dual core laptops with C1E do not run the APIC timer correctly
when they go idle. Previously the code assumed this only happened
on C2 or deeper.  But not all of these systems report support C2.

Use a AMD supplied snippet to detect C1E being enabled and then disable
local apic timer use.

This supercedes an earlier workaround using DMI detection of specific systems.

Thanks to Mark Langsdorf for the detection snippet.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Alan Cox
62b3e920ed [PATCH] tty: minor merge correction
Its now used.. because we added the new definitions so enabled all the
goodies on i386

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:15 -07:00
Roland McGrath
6ea65ff79c [PATCH] i386: clear segment register padding in core dumps
The segment register slots in struct pt_regs are padded to 32 bits.
Some of these are stored with instructions like "pushl %es", which
leaves the high 16 bits as they were.  So the high bits of these
fields in struct pt_regs contain kernel stack garbage.  These bits are
ignored by everything and never leak to user space, except in core
dumps.  The user struct pt_regs is always at the base of the thread's
kernel stack and so it seems unlikely the information that leaks from
here is ever worthwhile so as to be a security concern, but I'm not
sure about that.  It has been this way for ages; userland consumers of
core dumps all mask off these high bits themselves.  So it is not urgent.

This change masks off the padding bits of the segment register slots
in core dumps.  ptrace already masks off these high bits, so this
makes the values in core dumps consistent with what ptrace would
report just before the process died.

As I read the processor manuals, the cs and ss values will always be
padded with zero bits rather than stack garbage.  But unlike "pushl %es",
this is not simple to test with a userland program.  So I added the two
instructions rather than wonder if they are really never necessary.

I think that x86_64 does not have this problem (for either 32-bit or
64-bit processes).  It only uses "mov" instructions from segment
registers, which zero-extend.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-23 15:32:58 -07:00
Thomas Gleixner
e585bef815 [PATCH] i386: add command line option "local_apic_timer_c2_ok"
It turned out that it is almost impossible to trust ACPI, BIOS & Co.
regarding the C states. This was the reason to switch the local apic
timer off in C2 state already. OTOH there are sane and well behaving
systems, which get punished by that decision.

Allow the user to confirm that the local apic timer is trustworthy in C2
state. This keeps the default behaviour on the safe side.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-23 10:21:02 -07:00
Jeremy Fitzhardinge
014efb1df7 [PATCH] i386: fix typo in sync_constant_test_bit()'s name
Fix typo in sync_constant_test_bit()'s name, so sync_bitops.h is consistent
with bitops.h

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:07 -07:00
Linus Torvalds
8ce5e3e45e Disable NMI watchdog by default properly
This reverts commit 6ebf622b25 and
replaces it with one that actually works.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 17:53:43 -07:00
Al Viro
192cd59bd9 [PATCH] fastcall still doesn't make sense in paravirt
Andi had removed a bunch of those, but one more had creeped in...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:49 -07:00
Zachary Amsden
2cb8a57b98 [PATCH] Fix vmi time header bug
Some gcc put this function in .init.text because the header didn't
match.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-12 16:36:16 -07:00
Andres Salomon
2272b0e03e [PATCH] i386: make x86_64 tsc header require i386 rather than vice-versa
Prior to commit 95492e4646 ([PATCH] x86:
rewrite SMP TSC sync code), the headers in asm-i386 did not really require
anything in include/asm-x86_64.  This means that distributions such as
fedora did not include asm-x86_64 in kernel-devel headers for i386.  Ingo's
commit changed that, and broke things.  This is easy enough to hack around
in package builds by just including asm-x86_64 on i386, but that's kind of
annoying.  If anything, x86_64 should depend upon i386, not the other way
around.

This patch changes it so that asm-x86_64/tsc.h includes asm-i386/tsc.h,
rather than vice-versa.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-06 09:30:24 -08:00
Andrew Morton
6261d720da [PATCH] fix build with CONFIG_NO_IDLE_HZ=n
arch/i386/kernel/vmi.c: In function 'vmi_safe_halt':
arch/i386/kernel/vmi.c:262: warning: implicit declaration of function 'vmi_stop_hz_timer'
arch/i386/kernel/vmi.c:266: warning: implicit declaration of function 'vmi_account_time_restart_hz_timer'

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-06 09:30:24 -08:00
Ingo Molnar
6ebf622b25 [PATCH] disable NMI watchdog by default
there's a new NMI watchdog related problem: KVM crashes on certain
bzImages because ... we enable the NMI watchdog by default (even if the
user does not ask for it) , and no other OS on this planet does that so
KVM doesnt have emulation for that yet. So KVM injects a #GP, which
crashes the Linux guest:

 general protection fault: 0000 [#1]
 PREEMPT SMP
 Modules linked in:
 CPU:    0
 EIP:    0060:[<c011a8ae>]    Not tainted VLI
 EFLAGS: 00000246   (2.6.20-rc5-rt0 #3)
 EIP is at setup_apic_nmi_watchdog+0x26d/0x3d3

and no, i did /not/ request an nmi_watchdog on the boot command line!

Solution: turn off that darn thing! It's a debug tool, not a 'make life
harder' tool!!

with this patch the KVM guest boots up just fine.

And with this my laptop (Lenovo T60) also stopped its sporadic hard
hanging (sometimes in acpi_init(), sometimes later during bootup,
sometimes much later during actual use) as well. It hung with both
nmi_watchdog=1 and nmi_watchdog=2, so it's generally the fact of NMI
injection that is causing problems, not the NMI watchdog variant, nor
any particular bootup code.

[ NMI breaks on some systems, esp in combination with SMM -Arjan ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 08:23:51 -08:00
Zachary Amsden
772205f62e [PATCH] vmi: apic ops
Use para_fill instead of directly setting the APIC ops to the result of the
vmi_get_function call - this allows one to implement a VMI ROM without
implementing APIC functions, just using the native APIC functions.

While doing this, I realized that there is a lot more cleanup that should have
been done.  Basically, we should never assume that the ROM implements a
specific set of functions, and always allow fallback to the native
implementation.

This is critical for future compatibility.

Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Zachary Amsden <zach@vmware.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:52 -08:00
Zachary Amsden
e30fab3ad3 [PATCH] vmi: pit override
The time_init_hook in paravirt-ops no longer functions in the correct manner
after the integration of the hrtimers code.  The problem is that now the call
path for time initialization is:

  time_init :
       late_time_init = hpet_time_init;

  late_time_init -> hpet_time_init:
       setup_pit_timer (BAD)
       do_time_init --> (via paravirt.h)
          time_init_hook --> (via arch_hooks.h)
              time_init_hook (in SUBARCH/setup.c)

If this isn't confusing enough, the paravirt case goes through an indirect
function pointer in the paravirt-ops table.  The problem is, by the time the
paravirt hook is called, the pit timer is already enabled.

But paravirt guests have their own timer, and don't want to use the PIT.
Rather than intensify the struggle for power going on here, just make it all
nice and simple and just unconditionally do all timer setup in the
late_time_init hook.  This also has the advantage of enabling timers in the
same place in all code paths, so everyone has the same bugs and we don't have
outliers who break other code because they turn on timer too early or too
late.

So the paravirt-ops time init function is now by default hpet_time_init, which
is the time init function used for native hardware.  Paravirt guests have the
chance to override this when they setup the paravirt-ops table, and should
need no change.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:52 -08:00
Zachary Amsden
eda08b1bef [PATCH] vmi: paravirt drop udelay op
Not respecting udelay causes problems with any virtual hardware that is passed
through to real hardware.  This can be noticed by any device that interacts
with the real world in real time - like AP startup, which takes real time.  Or
keyboard LEDs, which should blink in real-time.  Or floppy drives, but only
when passed through to a real floppy controller on OSes which can't
sufficiently buffer the floppy commands to emulate a zero latency floppy.  Or
IDE drives, when connecting to a physical CDROM.

This was mostly a hack to get the kernel to boot faster, but it introduced a
number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
against it.  We were the only client, and now want to clean up this cruft.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:52 -08:00
Zachary Amsden
9a1c13e91f [PATCH] vmi: fix highpte
Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
page tables.  This information is required so the physical address of PTE
updates can be determined; otherwise, the mm layer would have to carry the
physical address all the way to each PTE modification callsite, which is even
more hideous that the macros required to provide the proper hooks.

So lets not mess up arch neutral code to achieve this, but keep the horror in
an #ifdef HIGHPTE in include/asm-i386/pgtable.h.  I had to use macros here
because some types are not yet defined in all the include paths for this
header.

This patch is absolutely required for HIGHPTE kernels to operate properly with
VMI.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:52 -08:00
Zachary Amsden
1182d8528b [PATCH] vmi: cpu cycles fix
In order to share the common code in tsc.c which does CPU Khz calibration, we
need to make an accurate value of CPU speed available to the tsc.c code.  This
value loses a lot of precision in a VM because of the timing differences with
real hardware, but we need it to be as precise as possible so the guest can
make accurate time calculations with the cycle counters.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:52 -08:00
Zachary Amsden
6cb9a8350a [PATCH] vmi: sched clock paravirt op fix
The custom_sched_clock hook is broken.  The result from sched_clock needs to
be in nanoseconds, not in CPU cycles.  The TSC is insufficient for this
purpose, because TSC is poorly defined in a virtual environment, and mostly
represents real world time instead of scheduled process time (which can be
interrupted without notice when a virtual machine is descheduled).

To make the scheduler consistent, we must expose a different nature of time,
that is scheduled time.  So deprecate this custom_sched_clock hack and turn it
into a paravirt-op, as it should have been all along.  This allows the tsc.c
code which converts cycles to nanoseconds to be shared by all paravirt-ops
backends.

It is unfortunate to add a new paravirt-op, but this is a very distinct
abstraction which is clearly different for all virtual machine
implementations, and it gets rid of an ugly indirect function which I
ashamedly admit I hacked in to try to get this to work earlier, and then even
got in the wrong units.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:52 -08:00
Con Kolivas
69f7c0a1be [PATCH] sched: remove SMT nice
Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
facilitiate nice working properly where cpu power is shared.  The idling of
cpus in the presence of runnable tasks is considered too fragile, easy to
break with outside code, and the complexity of managing this system if an
architecture comes along with many logical cores sharing cpu power will be
unworkable.

Remove the associated per_cpu_gain variable in sched_domains used only by
this code.

Also:

  The reason is that with dynticks enabled, this code breaks without yet
  further tweaks so dynticks brought on the rapid demise of this code.  So
  either we tweak this code or kill it off entirely.  It was Ingo's preference
  to kill it off.  Either way this needs to happen for 2.6.21 since dynticks
  has gone in.

Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:51 -08:00
Jean Delvare
58a53b246b [PATCH] io_apic.h needs apicdef.h
A -mm patch caused:

In file included from drivers/pci/quirks.c:532:
include/asm/io_apic.h:61: error: "MAX_IO_APICS" undeclared here (not in a function)

So let's include the needed header.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:50 -08:00
Linus Torvalds
6f8c480f99 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] constify some data tables.
  [CPUFREQ] constify cpufreq_driver where possible.
  {rd,wr}msr_on_cpu SMP=n optimization
  [CPUFREQ] cpufreq_ondemand.c: don't use _WORK_NAR
  rdmsr_on_cpu, wrmsr_on_cpu
  [CPUFREQ] Revert default on deprecated config X86_SPEEDSTEP_CENTRINO_ACPI
2007-02-26 14:17:50 -08:00
Linus Torvalds
ea3d5226f5 Revert "[PATCH] i386: add idle notifier"
This reverts commit 2ff2d3d747.

Uwe Bugla reports that he cannot mount a floppy drive any more, and Jiri
Slaby bisected it down to this commit.

Benjamin LaHaise also points out that this is a big hot-path, and that
interrupt delivery while idle is very common and should not go through
all these expensive gyrations.

Fix up conflicts in arch/i386/kernel/apic.c and arch/i386/kernel/irq.c
due to other unrelated irq changes.

Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Uwe Bugla <uwe.bugla@gmx.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-26 09:21:46 -08:00
Adrian Bunk
b44755cfaa {rd,wr}msr_on_cpu SMP=n optimization
Let's save a few bytes in the CONFIG_SMP=n case.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-02-20 14:29:37 -05:00
Alexey Dobriyan
b077ffb3b7 rdmsr_on_cpu, wrmsr_on_cpu
There was OpenVZ specific bug rendering some cpufreq drivers unusable on SMP.
In short, when cpufreq code thinks it confined itself to needed cpu by means
of set_cpus_allowed() to execute rdmsr, some "virtual cpu" feature can migrate
process to anywhere.  This triggers bugons and does wrong things in general.

This got fixed by introducing rdmsr_on_cpu and wrmsr_on_cpu executing rdmsr
and wrmsr on given physical cpu by means of smp_call_function_single().

Dave Jones mentioned cpufreq might be not only user of rdmsr_on_cpu() and
wrmsr_on_cpu(), so I'm putting them into arch/{i386,x86_64}/lib/ .

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-02-20 14:23:43 -05:00
Len Brown
81450b73dd Pull misc-for-upstream into release branch
Conflicts:

	drivers/usb/misc/appledisplay.c

Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-16 18:52:41 -05:00
Thomas Gleixner
d36b49b910 [PATCH] i386 rework local apic timer calibration
The local apic timer calibration has two problem cases:

1.  The calibration is based on readout of the PIT/HPET timer to detect the
   wrap of the periodic tick.  It happens that a box gets stuck in the
   calibration loop due to a PIT with a broken readout function.

2.  CoreDuo boxen show a sporadic PIT runs too slow defect, which results
   in a wrong lapic calibration.  The PIT goes back to normal operation once
   the lapic timer is switched to periodic mode.

Both are existing and unfixed problems in the current upstream kernel and
prevent certain laptops and other systems from booting Linux.

Rework the code to address both problems:

- Make the calibration interrupt driven.  This removes the wait_timer_tick
  magic hackery from lapic.c and time_hpet.c.  The clockevents framework
  allows easy substitution of the global tick event handler for the
  calibration.  This is more accurate than monitoring jiffies.  At this point
  of the boot process, nothing disturbes the interrupt delivery, so the
  results are very accurate.

- Verify the calibration against the PM timer, when available by using the
  early access function.  When the measured calibration period is outside of
  an one percent window, then the lapic timer calibration is adjusted to the
  pm timer result.

- Verify the calibration by running the lapic timer with the calibration
  handler.  Disable lapic timer in case of deviation.

This also removes the "synchronization" of the local apic timer to the global
tick.  This synchronization never worked, as there is no way to synchronize
PIT(HPET) and local APIC timer.  The synchronization by waiting for the tick
just alignes the local APIC timer for the first events, but later the events
drift away due to the different clocks.  Removing the "sync" is just
randomizing the asynchronous behaviour at setup time.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:59 -08:00
Thomas Gleixner
e9e2cdb412 [PATCH] clockevents: i386 drivers
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global).  Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver.  The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()

Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.

No changes to existing functionality.

[ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
[ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
Cleanups-from: Adrian Bunk <bunk@stusta.de>
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:59 -08:00
Thomas Gleixner
e05d723f98 [PATCH] i386, apic: clean up the APIC code
The apic code is quite unstructured and missing a lot of comments.

- Restructure the code into helper functions, timer, setup/shutdown,
  interrupt and power management blocks.
- Fixup comments.
- Namespace fixups
- Inline helpers for version and is_integrated
- Combine the ack_bad_irq functions

No functional changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:58 -08:00
Marcelo Tosatti
07190a08ee [PATCH] Mark TSC on GeodeLX reliable
The Geode can safely use the TSC for highres, since:

1) Does not support frequency scaling,

2) The TSC _does_ count when the CPU is halted.  Furthermore, the Geode
   supports a mode called "suspension on halt", where Suspend mode (which
   interacts with the power management states) is entered.  TSC counting
   during suspend mode is controlled by bit 8 of the Bus Controller
   Configuration Register #0 (thanks Tom!).

3) no SMP :)

Check if "RTSC counts during suspension" and remove the requirement for
verification, so the clocksource code can safely select it as an timesource
for the highres timers subsystem.

Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:58 -08:00
Ingo Molnar
95492e4646 [PATCH] x86: rewrite SMP TSC sync code
make the TSC synchronization code more robust, and unify it between x86_64 and
i386.

The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
i386, in some rare cases it was /causing/ time-warps on SMP systems.

The new code only checks for TSC asynchronity - and if it can prove a
time-warp (if it can observe the TSC going backwards when going from one CPU
to another within a critical section), then the TSC clock-source is turned
off.

The TSC synchronization-checking code also got moved into a separate file.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:57 -08:00
Rusty Russell
40d22c1b56 [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
Extern declarations belong in headers.  Times, they are a'changin.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>

===================================================================
2007-02-13 13:26:26 +01:00
Rusty Russell
2a57ff1a70 [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
When I implemented the DECLARE_PER_CPU(var) macros, I was careful that
people couldn't use "var" in a non-percpu context, by prepending
percpu__.  I never considered that this would allow them to overload
the same name for a per-cpu and a non-percpu variable.

It is only one of many horrors in the i386 boot code, but let's rename
the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr,
that's something else...)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>

===================================================================
2007-02-13 13:26:26 +01:00
Rusty Russell
105fddb862 [PATCH] i386: Move mce_disabled to asm/mce.h
Allows external actors to disable mce.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>

===================================================================
2007-02-13 13:26:26 +01:00
Andi Kleen
1a1eecd1c2 [PATCH] i386: Remove fastcall in paravirt.[ch]
Not needed because fastcall is always default now

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:25 +01:00
Ingo Molnar
f9690982b8 [PATCH] i386: improve sched_clock() on i686
Clean up sched_clock() on i686: it will use the TSC if available and falls
back to jiffies only if the user asked for it to be disabled via notsc or
the CPU calibration code didnt figure out the right cpu_khz.

This generally makes the scheduler timestamps more finegrained, on all
hardware.  (the current scheduler is pretty resistant against asynchronous
sched_clock() values on different CPUs, it will allow at most up to a jiffy
of jitter.)

Also simplify sched_clock()'s check for TSC availability: propagate the
desire and ability to use the TSC into the tsc_disable flag, previously
this flag only indicated whether the notsc option was passed.  This makes
the rare low-res sched_clock() codepath a single branch off a read-mostly
flag.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:22 +01:00
Stephane Eranian
2ff2d3d747 [PATCH] i386: add idle notifier
Add a notifier mechanism to the low level idle loop.  You can register a
callback function which gets invoked on entry and exit from the low level idle
loop.  The low level idle loop is defined as the polling loop, low-power call,
or the mwait instruction.  Interrupts processed by the idle thread are not
considered part of the low level loop.

The notifier can be used to measure precisely how much is spent in useless
execution (or low power mode).  The perfmon subsystem uses it to turn on/off
monitoring.

Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:22 +01:00
Zachary Amsden
7b35520243 [PATCH] i386: Profile pc badness
Profile_pc was broken when using paravirtualization because the
assumption the kernel was running at CPL 0 was violated, causing
bad logic to read a random value off the stack.

The only way to be in kernel lock functions is to be in kernel
code, so validate that assumption explicitly by checking the CS
value.  We don't want to be fooled by BIOS / APM segments and
try to read those stacks, so only match KERNEL_CS.

I moved some stuff in segment.h to make it prettier.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:21 +01:00
Zachary Amsden
bbab4f3bb7 [PATCH] i386: vMI timer patches
VMI timer code.  It works by taking over the local APIC clock when APIC is
configured, which requires a couple hooks into the APIC code.  The backend
timer code could be commonized into the timer infrastructure, but there are
some pieces missing (stolen time, in particular), and the exact semantics of
when to do accounting for NO_IDLE need to be shared between different
hypervisors as well.  So for now, VMI timer is a separate module.

[Adrian Bunk: cleanups]

Subject: VMI timer patches
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
Zachary Amsden
7ce0bcfd16 [PATCH] i386: vMI backend for paravirt-ops
Fairly straightforward implementation of VMI backend for paravirt-ops.

[Adrian Bunk: some cleanups]

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
Zachary Amsden
ae5da273fe [PATCH] i386: SMP boot hook for paravirt
Add VMI SMP boot hook.  We emulate a regular boot sequence and use the same
APIC IPI initiation, we just poke magic values to load into the CPU state when
the startup IPI is received, rather than having to jump through a real mode
trampoline.

This is all that was needed to get SMP to work.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
Zachary Amsden
9226d125d9 [PATCH] i386: paravirt CPU hypercall batching mode
The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
out to be a significant win during context switch, but must be done at a
specific point before side effects to CPU state are visible to subsequent
instructions.  This is similar to the MMU batching hooks already provided.
The same hooks could be used by the Xen backend to implement a context switch
multicall.

To explain a bit more about lazy modes in the paravirt patches, basically, the
idea is that only one of lazy CPU or MMU mode can be active at any given time.
 Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
multiple PTE updates (say, inside a remap loop), but to avoid keeping some
kind of state machine about when to flush cpu or mmu updates, we just allow
one or the other to be active.  Although there is no real reason a more
comprehensive scheme could not be implemented, there is also no demonstrated
need for this extra complexity.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
Zachary Amsden
c119ecce89 [PATCH] MM: page allocation hooks for VMI backend
The VMI backend uses explicit page type notification to track shadow page
tables.  The allocation of page table roots is especially tricky.  We need to
clone the root for non-PAE mode while it is protected under the pgd lock to
correctly copy the shadow.

We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
they only have 4 entries, and are cached entirely by the processor, which
makes shadowing them rather simple.

For base page table level allocation, pmd_populate provides the exact hook
point we need.  Also, we need to allocate pages when splitting a large page,
and we must release pages before returning the page to any free pool.

Despite being required with these slightly odd semantics for VMI, Xen also
uses these hooks to determine the exact moment when page tables are created or
released.

AK: All nops for other architectures

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
Jeremy Fitzhardinge
464d1a78fb [PATCH] i386: Convert i386 PDA code to use %fs
Convert the PDA code to use %fs rather than %gs as the segment for
per-processor data.  This is because some processors show a small but
measurable performance gain for reloading a NULL segment selector (as %fs
generally is in user-space) versus a non-NULL one (as %gs generally is).

On modern processors the difference is very small, perhaps undetectable.
Some old AMD "K6 3D+" processors are noticably slower when %fs is used
rather than %gs; I have no idea why this might be, but I think they're
sufficiently rare that it doesn't matter much.

This patch also fixes the math emulator, which had not been adjusted to
match the changed struct pt_regs.

[frederik.deweerdt@gmail.com: fixit with gdb]
[mingo@elte.hu: Fix KVM too]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ian Campbell <Ian.Campbell@XenSource.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Zachary Amsden <zach@vmware.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:20 +01:00
Rusty Russell
a795ca5852 ACPI: cleanup: make disable_acpi() valid w/o CONFIG_ACPI
Len Brown <lenb@kernel.org> said:
> Okay, but better to use disable_acpi()
> indeed, since this would be the first code not already inside CONFIG_ACPI
> to invoke disable_acpi(), we could define the inline as empty and you could
> then scratch the #ifdef too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-13 00:09:13 -05:00
Alon Bar-Lev
7bf9f974fb [PATCH] i386: 2048-byte command line
Current implementation allows the kernel to receive up to 255 characters from
the bootloader.  While the boot protocol allows greater buffers to be sent.

In current environment, the command-line is used in order to specify many
values, including suspend/resume, module arguments, splash, initramfs and
more.

255 characters are not enough anymore.

After edd issue was fixed, and dynammic kernel command-line patch was
accepted, we can extend the COMMAND_LINE_SIZE without runtime memory
requirements.

Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:39 -08:00
Robert P. J. Day
72fd4a35a8 [PATCH] Numerous fixes to kernel-doc info in source files.
A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
source files, including:

  * make multi-line initial descriptions single line
  * denote some function names, constants and structs as such
  * change erroneous opening '/*' to '/**' in a few places
  * reword some text for clarity

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:32 -08:00
Tilman Schmidt
16cf5b39b8 [PATCH] fix sparse warnings from {asm,net}/checksum.h
Rename the variable "sum" in the __range_ok macros to avoid name collisions
causing lots of "symbol shadows an earlier one" warnings by sparse.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Andi Kleen <ak@suse.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:31 -08:00
Tilman Schmidt
4564f9e5fd [PATCH] consolidate line discipline number definitions
The line discipline numbers N_* are currently defined for each architecture
individually, but (except for a seeming mistake) identically, in
asm/termios.h.  There is no obvious reason why these numbers should be
architecture specific, nor any apparent relationship with the termios
structure.  The total number of these, NR_LDISCS, is defined in linux/tty.h
anyway.  So I propose the following patch which moves the definitions of
the individual line disciplines to linux/tty.h too.

Three of these numbers (N_MASC, N_PROFIBUS_FDL, and N_SMSBLOCK) are unused
in the current kernel, but the patch still keeps the complete set in case
there are plans to use them yet.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:26 -08:00
Al Viro
4ec031166f [PATCH] kill eth_io_copy_and_sum()
On all targets that sucker boils down to memcpy_fromio(sbk->data, from, len).
The function name is highly misguiding (it _never_ does any checksums), the
last argument is just a noise and simply expanding the call to memcpy_fromio()
gives shorter and more readable source.  For a lot of reasons it has almost
no remaining users, so it's better to just outright kill it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Alexey Starikovskiy
f18c5a08bf ACPICA: Allow ACPI id to be u32 instead of u8.
Allow ACPI id to be u32 instead of u8.
Requires drop of conversion tables with the acpiid as index.

Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:31 -05:00
Alexey Starikovskiy
15a58ed121 ACPICA: Remove duplicate table definitions (non-conflicting), cont
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:29 -05:00
Alexey Starikovskiy
ad71860a17 ACPICA: minimal patch to integrate new tables into Linux
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:22 -05:00
Roland McGrath
f47aef55d9 [PATCH] i386 vDSO: use VM_ALWAYSDUMP
This patch fixes core dumps to include the vDSO vma, which is left out now.
It removes the special-case core writing macros, which were not doing the
right thing for the vDSO vma anyway.  Instead, it uses VM_ALWAYSDUMP in the
vma; there is no need for the fixmap page to be installed.  It handles the
CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
get_gate_vma after real vmas in the same way the /proc/PID/maps code does.

This changes core dumps so they no longer include the non-PT_LOAD phdrs from
the vDSO.  I made the change to add them in the first place, but in turned out
that nothing ever wanted them there since the advent of NT_AUXV.  It's cleaner
to leave them out, and just let the phdrs inside the vDSO image speak for
themselves.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
Roland McGrath
a1f3bb9ae4 [PATCH] Fix CONFIG_COMPAT_VDSO
I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely.  But if it's there,
it should work properly.  Currently it's quite haphazard: both real vma and
fixmap are mapped, both are put in the two different AT_* slots, sysenter
returns to the vma address rather than the fixmap address, and core dumps yet
are another story.

This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the fixmap
area consistently.  This makes it actually compatible with what the old vdso
implementation did.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
James Bottomley
9ee79a3d37 [PATCH] x86: fix PDA variables to work during boot
The current PDA code, which went in in post 2.6.19 has a flaw in that it
doesn't correctly cycle the GDT and %GS segment through the boot PDA,
the CPU PDA and finally the per-cpu PDA.

The bug generally doesn't show up if the boot CPU id is zero, but
everything falls apart for a non zero boot CPU id.  The basically kills
voyager which is perfectly capable of doing non zero CPU id boots, so
voyager currently won't boot without this.

The fix is to be careful and actually do the GDT setups correctly.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-22 19:39:36 -08:00
Vivek Goyal
dd0ec16fa6 [PATCH] i386: Restore CONFIG_PHYSICAL_START option
o Relocatable bzImage support had got rid of CONFIG_PHYSICAL_START option
  thinking that now this option is not required as people can build a
  second kernel as relocatable and load it anywhere. So need of compiling
  the kernel for a custom address was gone. But Magnus uses vmlinux images
  for second kernel in Xen environment and he wants to continue to use
  it.

o Restoring the CONFIG_PHYSICAL_START option for the time being. I think
  down the line we can get rid of it.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:23 -08:00
Linus Torvalds
18ed1c0513 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits)
  ACPI: replace kmalloc+memset with kzalloc
  ACPI: Add support for acpi_load_table/acpi_unload_table_id
  fbdev: update after backlight argument change
  ACPI: video: Add dev argument for backlight_device_register
  ACPI: Implement acpi_video_get_next_level()
  ACPI: Kconfig - depend on PM rather than selecting it
  ACPI: fix NULL check in drivers/acpi/osl.c
  ACPI: make drivers/acpi/ec.c:ec_ecdt static
  ACPI: prevent processor module from loading on failures
  ACPI: fix single linked list manipulation
  ACPI: ibm_acpi: allow clean removal
  ACPI: fix git automerge failure
  ACPI: ibm_acpi: respond to workqueue update
  ACPI: dock: add uevent to indicate change in device status
  ACPI: ec: Lindent once again
  ACPI: ec: Change #define to enums there possible.
  ACPI: ec: Style changes.
  ACPI: ec: Acquire Global Lock under EC mutex.
  ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead.
  ACPI: ec: Rename gpe_bit to gpe
  ...
2006-12-22 18:46:56 -08:00
Yasunori Goto
5c95da9f5a [PATCH] compile error of register_memory()
register_memory() becomes double definition in 2.6.20-rc1.  It is defined
in arch/i386/kernel/setup.c as static definition in 2.6.19.  But it is
moved to arch/i386/kernel/e820.c in 2.6.20-rc1.  And same name function is
defined in driver/base/memory.c too.  So, it becomes cause of compile error
of duplicate definition if memory hotplug option is on.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:49 -08:00
Len Brown
9774f33841 merge linus into test branch 2006-12-20 02:53:13 -05:00
Len Brown
463e7c7cf9 Pull trivial into test branch
Conflicts:

	drivers/acpi/ec.c
2006-12-16 00:45:07 -05:00
Linus Torvalds
d1526e2cda Remove stack unwinder for now
It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed.  We can put it back when
it's stable and isn't likely to make warning or bug events worse.

In the meantime, enable frame pointers for more readable stack traces.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-15 08:47:51 -08:00
Ralf Baechle
ec8c0446b6 [PATCH] Optimize D-cache alias handling on fork
Virtually index, physically tagged cache architectures can get away
without cache flushing when forking.  This patch adds a new cache
flushing function flush_cache_dup_mm(struct mm_struct *) which for the
moment I've implemented to do the same thing on all architectures
except on MIPS where it's a no-op.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:27:08 -08:00
Rafael J. Wysocki
8a102eed9c [PATCH] PM: Fix SMP races in the freezer
Currently, to tell a task that it should go to the refrigerator, we set the
PF_FREEZE flag for it and send a fake signal to it.  Unfortunately there
are two SMP-related problems with this approach.  First, a task running on
another CPU may be updating its flags while the freezer attempts to set
PF_FREEZE for it and this may leave the task's flags in an inconsistent
state.  Second, there is a potential race between freeze_process() and
refrigerator() in which freeze_process() running on one CPU is reading a
task's PF_FREEZE flag while refrigerator() running on another CPU has just
set PF_FROZEN for the same task and attempts to reset PF_FREEZE for it.  If
the refrigerator wins the race, freeze_process() will state that PF_FREEZE
hasn't been set for the task and will set it unnecessarily, so the task
will go to the refrigerator once again after it's been thawed.

To solve first of these problems we need to stop using PF_FREEZE to tell
tasks that they should go to the refrigerator.  Instead, we can introduce a
special TIF_*** flag and use it for this purpose, since it is allowed to
change the other tasks' TIF_*** flags and there are special calls for it.

To avoid the freeze_process()-refrigerator() race we can make
freeze_process() to always check the task's PF_FROZEN flag after it's read
its "freeze" flag.  We should also make sure that refrigerator() will
always reset the task's "freeze" flag after it's set PF_FROZEN for it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:49 -08:00
Dave Jones
c4366889dd Merge ../linus
Conflicts:

	drivers/cpufreq/cpufreq.c
2006-12-12 17:41:41 -05:00
Christoph Lameter
08c183f31b [PATCH] sched: add option to serialize load balancing
Large sched domains can be very expensive to scan.  Add an option SD_SERIALIZE
to the sched domain flags.  If that flag is set then we make sure that no
other such domain is being balanced.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Alan Cox
b148900996 [PATCH] ide: more conversion to pci_get APIs
This completes IDE except for one use which requires a new core PCI function
and will be polished up at the end

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:03 -08:00
Alan Cox
be90038a24 [PATCH] tty: preparatory structures for termios revamp
In order to sort out our struct termios and add proper speed control we need
to separate the kernel and user termios structures.  Glibc is fine but the
other libraries rely on the kernel exported struct termios and we need to
extend this without breaking the ABI/API

To do so we add a struct ktermios which is the kernel view of a termios
structure and overlaps the struct termios with extra fields on the end for
now.  (That limitation will go away in later patches).  Some platforms (eg
alpha) planned ahead and thus use the same struct for both, others did not.

This just adds the structures but does not use them, it seems a sensible
splitting point for bisect if there are compile failures (not that I expect
them)

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:56 -08:00
Jeremy Fitzhardinge
91768d6c2b [PATCH] Generic BUG for i386
This makes i386 use the generic BUG machinery.  There are no functional
changes from the old i386 implementation.

The main advantage in using the generic BUG machinery for i386 is that the
inlined overhead of BUG is just the ud2a instruction; the file+line(+function)
information are no longer inlined into the instruction stream.  This reduces
cache pollution, and makes disassembly work properly.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Linus Torvalds
4522d58275 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
  [PATCH] x86-64: Export smp_call_function_single
  [PATCH] i386: Clean up smp_tune_scheduling()
  [PATCH] unwinder: move .eh_frame to RODATA
  [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
  [PATCH] x86-64: don't use set_irq_regs()
  [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
  [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
  [PATCH] i386: replace kmalloc+memset with kzalloc
  [PATCH] x86-64: remove remaining pc98 code
  [PATCH] x86-64: remove unused variable
  [PATCH] x86-64: Fix constraints in atomic_add_return()
  [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
  [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
  [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
  [PATCH] x86-64: Fix numaq build error
  [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
  [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
  [PATCH] x86-64: Clarify error message in GART code
  [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
  [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
  ...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:59:11 -08:00
Adrian Bunk
7d1362c0d0 [PATCH] cleanup asm/setup.h userspace visibility
Make the contents of the userspace asm/setup.h header consistent on all
architectures:

 - export setup.h to userspace on all architectures
 - export only COMMAND_LINE_SIZE to userspace
 - frv: move COMMAND_LINE_SIZE from param.h
 - i386: remove duplicate COMMAND_LINE_SIZE from param.h
 - arm:
   - export ATAGs to userspace
   - change u8/u16/u32 to __u8/__u16/__u32

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:46 -08:00
Ralf Baechle
d3fa72e455 [PATCH] Pass struct dev pointer to dma_cache_sync()
Pass struct dev pointer to dma_cache_sync()

dma_cache_sync() is ill-designed in that it does not have a struct device
pointer argument which makes proper support for systems that consist of a
mix of coherent and non-coherent DMA devices hard.  Change dma_cache_sync
to take a struct device pointer as first argument and fix all its callers
to pass it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:41 -08:00
Ralf Baechle
f67637ee4b [PATCH] Add struct dev pointer to dma_is_consistent()
dma_is_consistent() is ill-designed in that it does not have a struct
device pointer argument which makes proper support for systems that consist
of a mix of coherent and non-coherent DMA devices hard.  Change
dma_is_consistent to take a struct device pointer as first argument and fix
the sole caller to pass it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:41 -08:00
Arnd Bergmann
f5738ceed4 [PATCH] remove kernel syscalls
The last thing we agreed on was to remove the macros entirely for 2.6.19,
on all architectures. Unfortunately, I think nobody actually _did_ that,
so they are still there.

[akpm@osdl.org: x86_64 fix]
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Schafer <gschafer@zip.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:37 -08:00
Peter Zijlstra
6cfd76a26d [PATCH] lockdep: name some old style locks
Name some of the remaning 'old_style_spin_init' locks

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Rafael J. Wysocki
2d4a34c936 [PATCH] swsusp: Support i386 systems with PAE or without PSE
Make swsusp support i386 systems with PAE or without PSE.

This is done by creating temporary page tables located in resume-safe page
frames before the suspend image is restored in the same way as x86_64 does
it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Nigel Cunningham <ncunningham@linuxmail.org>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:28 -08:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Andy Whitcroft
4af2bfc120 [PATCH] silence unused pgdat warning from alloc_bootmem_node and friends
x86 NUMA systems only define bootmem for node 0.  alloc_bootmem_node() and
friends therefore ignore the passed pgdat and use NODE_DATA(0) in all
cases.  This leads to the following warnings as we are not using the passed
parameter:

  .../mm/page_alloc.c: In function 'zone_wait_table_init':
  .../mm/page_alloc.c:2259: warning: unused variable 'pgdat'

One option would be to define all variables used with these macros
__attribute__ ((unused)), but this would leave us exposed should these
become genuinely unused.

The key here is that we _are_ using the value, we ignore it but that is a
deliberate action.  This patch adds a nested local variable within the
alloc_bootmem_node helper to which the pgdat parameter is assigned making
it 'used'.  The nested local is marked __attribute__ ((unused)) to silence
this same warning for it.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
Peter Zijlstra
a866374aec [PATCH] mm: pagefault_{disable,enable}()
Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.

Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.

(NOTE: the extra barrier() in pagefault_disable might fix some holes on
       machines which have too many registers for their own good)

[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Burman Yan
116780fc04 [PATCH] i386: replace kmalloc+memset with kzalloc
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:19 +01:00
Adrian Bunk
d7fb027128 [PATCH] x86-64: remove remaining pc98 code
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:19 +01:00
Duncan Sands
e4b522d7ef [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
Since v->counter is both read and written, it should be an output as well
as an input for the asm.  The current code only gets away with this because
counter is volatile.  Also, according to Documents/atomic_ops.txt,
atomic_add_return should provide a memory barrier, in particular a compiler
barrier, so the asm should be marked as clobbering memory.

Test case:

#include <stdio.h>

typedef struct { int counter; } atomic_t; /* NB: no "volatile" */

#define ATOMIC_INIT(i)	{ (i) }

#define atomic_read(v)		((v)->counter)

static __inline__ int atomic_add_return(int i, atomic_t *v)
{
	int __i = i;

	__asm__ __volatile__(
		"lock; xaddl %0, %1;"
		:"=r"(i)
		:"m"(v->counter), "0"(i));
/*	__asm__ __volatile__(
		"lock; xaddl %0, %1"
		:"+r" (i), "+m" (v->counter)
		: : "memory"); */
	return i + __i;
}

int main (void) {
	atomic_t a = ATOMIC_INIT(0);
	int x;

	x = atomic_add_return (1, &a);
	if ((x!=1) || (atomic_read(&a)!=1))
		printf("fail: %i, %i\n", x, atomic_read(&a));
}

Signed-off-by: Duncan Sands <baldrick@free.fr>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:13 +01:00
Jan Beulich
359ad0d401 [PATCH] unwinder: more sanity checks in Dwarf2 unwinder
Tighten the requirements on both input to and output from the Dwarf2
unwinder.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Adrian Bunk
a1a70c25be [PATCH] i386: always enable regparm
-mregparm=3 has been enabled by default for some time on i386, and AFAIK
there aren't any problems with it left.

This patch removes the REGPARM config option and sets -mregparm=3
unconditionally.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Stephane Eranian
538f188e03 [PATCH] i386: i386 add Intel BTS cpufeature bit and detection (take 2)
Here is a small patch for i386 which adds a cpufeature flag and
detection code for Intel's Branch Trace Store (BTS) feature. This
feature can be found on Intel P4 and Core 2 processors among others.
It can also be used by perfmon.

changelog:
	- add CPU_FEATURE_BTS
	- add Branch Trace Store detection

signed-off-by: stephane eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Siddha, Suresh B
b0d0a4ba45 [PATCH] x86: fix the irqbalance quirk for E7320/E7520/E7525
Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early
quirks.

And add a PCI quirk for these platforms to check(which happens very late
during the boot) if the APIC routing is indeed set to default flat mode.

This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
selects physical mode instead of the logical flat(as needed for this errata
workaround).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:10 +01:00
Siddha, Suresh B
fd6d7d2689 [PATCH] i386: introduce the mechanism of disabling cpu hotplug control
Add 'enable_cpu_hotplug' flag and when cleared, the hotplug control file
("online") will not be added under /sys/devices/system/cpu/cpuX/

Next patch doing PCI quirks will use this.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:10 +01:00
Andi Kleen
c55d92d141 [PATCH] i386: Add support for compilation for Core2
gcc doesn't support -mtune=core2 yet, but will be soon. Use -mtune=generic or -mtune=i686
as fallback

TBD need benchmarking for INTEL_USERCOPY etc. So far I used the same defaults as MPENTIUMM

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Zachary Amsden
8ecb895069 [PATCH] paravirt: fix missing pte update
The function ptep_get_and_clear uses an atomic instruction sequence to get and
clear an active pte.  Rather than add such an atomic operator to all virtual
machine implementations in paravirt-ops, it is easier to support the raw
atomic sequence and use either a trapping writable pagetable approach, or a
post-update notification.  For the post update notification, we require the
pte_update function to be called after the access.  Combine the 2-level and
3-level paging operators into one common function which does the post-update
notification, and rename the actual atomic sequences to raw_ptep_xxx
operators.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:09 +01:00
Zachary Amsden
dfbea0ad50 [PATCH] paravirt: fix parameter names in mmu operations
Make parameter names match function argument names for the yet to be defined
pte_update_defer accessor.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00
Zachary Amsden
a2952d8949 [PATCH] paravirt: Preparatory mmu header movement
Move header includes for the nopud / nopmd types to the location of the actual
pte / pgd type definitions.  This allows generic 4-level page type code to be
written before the split 2/3 level page table headers are included.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00
Rusty Russell
da181a8b39 [PATCH] paravirt: Add MMU virtualization to paravirt_ops
Add the three bare TLB accessor functions to paravirt-ops.  Most amusingly,
flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
Instead, I chose to indicate the actual flush type, kernel (global) vs. user
(non-global).  Global in this sense means using the global bit in the page
table entry, which makes TLB entries persistent across CR3 reloads, not
global as in the SMP sense of invoking remote shootdowns, so the term is
confusingly overloaded.

AK: folded in fix from Zach for PAE compilation

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00
Rusty Russell
13623d7930 [PATCH] paravirt: Add APIC accessors to paravirt-ops.
Add APIC accessors to paravirt-ops.  Unfortunately, we need two write
functions, as some older broken hardware requires workarounds for
Pentium APIC errata - this is the purpose of apic_write_atomic.

AK: replaced __inline with inline

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00
Rusty Russell
4f205fd45a [PATCH] paravirt: Allow selected bug checks to be
Allow selected bug checks to be skipped by paravirt kernels.  The two most
important are the F00F workaround (which is either done by the hypervisor,
or not required), and the 'hlt' instruction check, which can break under
some hypervisors.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00
Rusty Russell
c9ccf30d77 [PATCH] paravirt: Add startup infrastructure for paravirtualization
1) Each hypervisor writes a probe function to detect whether we are
   running under that hypervisor.  paravirt_probe() registers this
   function.

2) If vmlinux is booted with ring != 0, we call all the probe
   functions (with registers except %esp intact) in link order: the
   winner will not return.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00