Commit Graph

244 Commits

Author SHA1 Message Date
Dave Hansen
e1474e2d9d [PATCH] re-disable TSC on NUMAQ
Somewhere recently, the TSC got re-enabled for timekeeping on NUMAQ
machines.  However, the hardware makes these get unsynchronized quite
badly.  So badly, in fact, that the code to fix up the skew can just hang
on boot.

This patch re-disables them.  It's nicely confined to the numaq.c file.  It
would be great if this could make it into 2.6.13, I think it counts as a
bugfix.

Tested on a 16-proc 4-node NUMAQ.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:05 -07:00
Andi Kleen
e2cac78935 [PATCH] x86_64: When running cpuid4 need to run on the correct CPU
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:01 -07:00
Dave Jones
7153d9612f powernow-k8.c: In function `query_current_values_with_pending_wait':
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-07-28 09:45:10 -07:00
Dave Jones
841e40b380 Opteron revision F will support higher frequencies than
can be encoded in the current driver's 4 bit frequency
field.  This patch updates the driver to support Rev F
including 6 bit FIDs and processor ID updates.

This should apply cleanly whether or not the dual-core
bugfix I sent out last week is applied.  I'd prefer
that both get applied, of course.

Signed-off-by: David Keck <david.keck@amd.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-28 09:40:04 -07:00
Dave Jones
03938c3f10 powernow-k8 requires that a data structure for
each core be created in the _cpu_init function
call.  The cpufreq infrastructure doesn't call
_cpu_init for the second core in each processor.
Some systems crashed when _get was called with
an odd-numbered core because it tried to
dereference a NULL pointer since the data
structure had not been created.

The attached patch solves the problem by
initializing data structures for all shared
cores in the _cpu_init function.  It should
apply to 2.6.12-rc6 and has been tested by
AMD and Sun.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-28 09:38:21 -07:00
Blaisorblade
71ae18ec69 [PATCH] sys_get_thread_area does not clear the returned argument
sys_get_thread_area does not memset to 0 its struct user_desc info before
copying it to user space...  since sizeof(struct user_desc) is 16 while the
actual datas which are filled are only 12 bytes + 9 bits (across the
bitfields), there is a (small) information leak.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:08 -07:00
Brian Gerst
7657e20e46 [PATCH] Fix warning in powernow-k8.c
powernow-k8.c: In function `query_current_values_with_pending_wait':
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:25:54 -07:00
Eric W. Biederman
910de55c66 [PATCH] APM: Remove redundant call to set_cpus_allowed
machine_power_off now always switches to the boot cpu so there
is no reason for APM to also do that.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:45 -07:00
Eric W. Biederman
4fa2564a6f [PATCH] i386 machine_power_off cleanup
Call machine_shutdown() to move to the boot cpu
and disable apics.  Both acpi_power_off and
apm_power_off want to move to the boot cpu.
and we are already disabling the local apics
so calling machine_shutdown simply reuses
code.

ia64 doesn't have a special path in power_off
for efi so there is no reason i386 should.  If
we really need to call the efi power off path
the efi driver can set pm_power_off like everyone
else.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:44 -07:00
Eric W. Biederman
d8e392e7c8 [PATCH] machine_shutdown: Typo fix to actually allow specifying which cpu to reboot on
This appears to be a typo I introduced when cleaning
this code up earlier. Ooops.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:44 -07:00
Eric W. Biederman
4a1421f81b [PATCH] i386: Implement machine_emergency_reboot
set_cpus_allowed is not safe in interrupt context
and disabling apics is complicated code so don't
call machine_shutdown on i386 from emergency_restart().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:42 -07:00
Eric W. Biederman
59586e5a26 [PATCH] Don't export machine_restart, machine_halt, or machine_power_off.
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with.  Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:42 -07:00
Linus Torvalds
72538d8565 Remove "noreplacement" kernel command line option.
It is no longer valid to not replace instructions, since we depend on
different behaviour depending on CPU capabilities.

If you need to limit the capabilities of the replacements (because the
boot CPU has features that non-boot CPU's do not have, for example), you
need to explicitly disable those capabilities that are not shared across
all CPU's.

For example, if your boot CPU has FXSR, but other CPU's in your system
do not, you need to use the "nofxsr" kernel command line, not disable
instruction replacement per se.
2005-07-22 18:29:40 -04:00
Linus Torvalds
8ed1383fb7 x86: make restore_fpu() use alternative assembler instructions
It's really just a single instruction, conditional on whether the CPU
supports FXSR or not, so implement it as such instead of making it a
function that queries FXSR dynamically.

This means that the instruction just gets automatically rewritten to the
correct one at boot-time.
2005-07-22 16:06:16 -04:00
Linus Torvalds
b339a18b81 Fix up incorrect "unlikely()" on %gs reload in x86 __switch_to
These days %gs is normally the TLS segment, so it's no longer zero.  As
a result, we shouldn't just assume that %fs/%gs tend to be zero
together, but test them independently instead.

Also, fix setting of debug registers to use the "next" pointer instead
of "current".  It so happens that the scheduler will have set the new
current pointer before calling __switch_to(), but that's just an
implementation detail.
2005-07-22 15:23:47 -04:00
Robert Love
0eeca28300 [PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

        * dnotify requires the opening of one fd per each directory
          that you intend to watch. This quickly results in too many
          open files and pins removable media, preventing unmount.
        * dnotify is directory-based. You only learn about changes to
          directories. Sure, a change to a file in a directory affects
          the directory, but you are then forced to keep a cache of
          stat structures.
        * dnotify's interface to user-space is awful.  Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

        * inotify's interface is a system call that returns a fd, not SIGIO.
	  You get a single fd, which is select()-able.
        * inotify has an event that says "the filesystem that the item
          you were watching is on was unmounted."
        * inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.

See Documentation/filesystems/inotify.txt.

Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 20:38:38 -07:00
Len Brown
5028770a42 [ACPI] merge acpi-2.6.12 branch into latest Linux 2.6.13-rc...
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 17:21:56 -04:00
Venkatesh Pallipadi
02df8b9385 [ACPI] enable C2 and C3 idle power states on SMP
http://bugzilla.kernel.org/show_bug.cgi?id=4401

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 00:14:36 -04:00
Nickolai Zeldovich
9d9437759e [ACPI] S3 resume -- use lgdtl, not lgdt
From: Nickolai Zeldovich <kolya@MIT.EDU>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 00:04:31 -04:00
Christoph Lameter
6c036527a6 [PATCH] mostly_read data section
Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.

If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated.  In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.

The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:46 -07:00
Shaohua Li
3b520b238e [PATCH] MTRR suspend/resume cleanup
There has been some discuss about solving the SMP MTRR suspend/resume
breakage, but I didn't find a patch for it.  This is an intent for it.  The
basic idea is moving mtrr initializing into cpu_identify for all APs (so it
works for cpu hotplug).  For BP, restore_processor_state is responsible for
restoring MTRR.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:42 -07:00
Rusty Lynch
6772926bef [PATCH] kprobes: fix namespace problem and sparc64 build
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.

Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05 19:19:00 -07:00
Greg Kroah-Hartman
7586585897 [PATCH] PCI: clean up dynamic pci id logic
The dynamic pci id logic has been bothering me for a while, and now that
I started to look into how to move some of this to the driver core, I
thought it was time to clean it all up.

It ends up making the code smaller, and easier to follow, and fixes a
few bugs at the same time (dynamic ids were not being matched
everywhere, and so could be missed on some call paths for new devices,
semaphore not needed to be grabbed when adding a new id and calling the
driver core, etc.)

I also renamed the function pci_match_device() to pci_match_id() as
that's what it really does.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-07-01 13:35:50 -07:00
Ingo Molnar
306e440daf [PATCH] x86: i8253/i8259A lock cleanup
Introduce proper declarations for i8253_lock and i8259A_lock.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-30 08:45:10 -07:00
Greg KH
8644d2a42b Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2005-06-27 22:07:56 -07:00
Greg Kroah-Hartman
545493917d [PATCH] PCI: add proper MCFG table parsing to ACPI core.
This patch is the first step in properly handling the MCFG PCI table.
It defines the structures properly, and saves off the table so that the
pci mmconfig code can access it.  It moves the parsing of the table a
little later in the boot process, but still before the information is
needed.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:47 -07:00
Kenji Kaneshige
b1bb248a5d [PATCH] ACPI based I/O APIC hot-plug: add interfaces
This patch adds the following new interfaces for I/O xAPIC
hotplug. The implementation of these interfaces depends on each
architecture.

    o int acpi_register_ioapic(acpi_handle handle, u64 phys_addr,
			       u32 gsi_base);

        This new interface is to add a new I/O xAPIC specified by
        phys_addr and gsi_base pair. phys_addr is the physical address
        to which the I/O xAPIC is mapped and gsi_base is global system
        interrupt base of the I/O xAPIC. acpi_register_ioapic returns
        0 on success, or negative value on error.

    o int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base);

        This new interface is to remove a I/O xAPIC specified by
        gsi_base. acpi_unregister_ioapic returns 0 on success, or
        negative value on error.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:44 -07:00
Rusty Lynch
4bdbd37f6d [PATCH] Return probe redesign: i386 specific changes
The following patch contains the i386 specific changes for the new
return probe design.  Changes include:

 * Removing the architecture specific functions for querying a return probe
   instance off a stack address
 * Complete rework onf arch_prepare_kretprobe() and trampoline_probe_handler()
 * Removing trampoline_post_handler()
 * Adding arch_init() so that now we handle registering the return probe
   trampoline instead of kernel/kprobes.c doing it

Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:23:53 -07:00
Andrea Arcangeli
ffaa8bd6c9 [PATCH] seccomp: tsc disable
I believe at least for seccomp it's worth to turn off the tsc, not just for
HT but for the L2 cache too.  So it's up to you, either you turn it off
completely (which isn't very nice IMHO) or I recommend to apply this below
patch.

This has been tested successfully on x86-64 against current cogito
repository (i686 compiles so I didn't bother testing ;).  People selling
the cpu through cpushare may appreciate this bit for a peace of mind.

There's no way to get any timing info anymore with this applied
(gettimeofday is forbidden of course).  The seccomp environment is
completely deterministic so it can't be allowed to get timing info, it has
to be deterministic so in the future I can enable a computing mode that
does a parallel computing for each task with server side transparent
checkpointing and verification that the output is the same from all the 2/3
seller computers for each task, without the buyer even noticing (for now
the verification is left to the buyer client side and there's no
checkpointing, since that would require more kernel changes to track the
dirty bits but it'll be easy to extend once the basic mode is finished).

Eliminating a cold-cache read of the cr4 global variable will save one
cacheline during the tlb flush while making the code per-cpu-safe at the
same time.  Thanks to Mikael Pettersson for noticing the tlb flush wasn't
per-cpu-safe.

The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
it'll be transparent to the switch_to code since the IPI won't make any
change to the cr4 contents from the point of view of the interrupted code
and since it's now all per-cpu stuff, it will not race.  So no need to
disable irqs in switch_to slow path.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:11:44 -07:00
Jens Axboe
22e2c507c3 [PATCH] Update cfq io scheduler to time sliced design
This updates the CFQ io scheduler to the new time sliced design (cfq
v3).  It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes.  It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls.  The latter closely mimic
set/getpriority.

This import is based on my latest from -mm.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 14:33:29 -07:00
Linus Torvalds
6f0dcb72d6 Fix up try_to_freeze() usage in arch/i386/kernel/signal.c
The parentheses were missing. Noted by Pavel Machek.
2005-06-25 20:09:12 -07:00
Linus Torvalds
2031d0f586 Merge Christoph's freeze cleanup patch 2005-06-25 17:16:53 -07:00
Christoph Lameter
3e1d1d28d9 [PATCH] Cleanup patch for process freezing
1. Establish a simple API for process freezing defined in linux/include/sched.h:

   frozen(process)		Check for frozen process
   freezing(process)		Check if a process is being frozen
   freeze(process)		Tell a process to freeze (go to refrigerator)
   thaw_process(process)	Restart process
   frozen_process(process)	Process is frozen now

2. Remove all references to PF_FREEZE and PF_FROZEN from all
   kernel sources except sched.h

3. Fix numerous locations where try_to_freeze is manually done by a driver

4. Remove the argument that is no longer necessary from two function calls.

5. Some whitespace cleanup

6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
   cleared before setting PF_FROZEN, recalc_sigpending does not check
   PF_FROZEN).

This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 17:10:13 -07:00
Adrian Bunk
9e84d1c36a [PATCH] i386: cleanup boot_cpu_logical_apicid variables
There are currently two different boot_cpu_logical_apicid variables:
- a global one in mpparse.c
- a static one in smpboot.c

Of these two, only the one in smpboot.c might be used (through
boot_cpu_apicid).

This patch therefore removes the one in mpparse.c .

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:05 -07:00
Jesper Juhl
4ae6673e02 [PATCH] get rid of redundant NULL checks before kfree() in arch/i386/
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:00 -07:00
Domen Puncer
77617bd806 [PATCH] arch/i386/kernel/apm.c: fix sparse warnings
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:58 -07:00
Domen Puncer
3f3ae3471f [PATCH] arch/i386/kernel/traps.c: fix sparse warnings
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:58 -07:00
Maneesh Soni
72414d3f1d [PATCH] kexec code cleanup
o Following patch provides purely cosmetic changes and corrects CodingStyle
  guide lines related certain issues like below in kexec related files

  o braces for one line "if" statements, "for" loops,
  o more than 80 column wide lines,
  o No space after "while", "for" and "switch" key words

o Changes:
  o take-2: Removed the extra tab before "case" key words.
  o take-3: Put operator at the end of line and space before "*/"

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:55 -07:00
Alexander Nyberg
4f339ecb30 [PATCH] kdump: Save trap information for later analysis
If we are faulting in kernel it is quite possible this will lead to a
panic.  Save trap number, cr2 (in case of page fault) and error_code in the
current thread (these fields already exist for signal delivery but are not
used here).

This helps later kdump crash analyzing from user-space (a script has been
submitted to dig this info out in gdb).

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Cc: <fastboot@lists.osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:55 -07:00
Alexander Nyberg
6e274d1443 [PATCH] kdump: Use real pt_regs from exception
Makes kexec_crashdump() take a pt_regs * as an argument.  This allows to
get exact register state at the point of the crash.  If we come from direct
panic assertion NULL will be passed and the current registers saved before
crashdump.

This hooks into two places:
die(): check the conditions under which we will panic when calling
do_exit and go there directly with the pt_regs that caused the fatal
fault.

die_nmi(): If we receive an NMI lockup while in the kernel use the
pt_regs and go directly to crash_kexec(). We're probably nested up badly
at this point so this might be the only chance to escape with proper
information.

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:54 -07:00
Vivek Goyal
2030eae52b [PATCH] Retrieve elfcorehdr address from command line
This patch adds support for retrieving the address of elf core header if one
is passed in command line.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:53 -07:00
Vivek Goyal
92aa63a5a1 [PATCH] kdump: Retrieve saved max pfn
This patch retrieves the max_pfn being used by previous kernel and stores it
in a safe location (saved_max_pfn) before it is overwritten due to user
defined memory map.  This pfn is used to make sure that user does not try to
read the physical memory beyond saved_max_pfn.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:52 -07:00
Vivek Goyal
a3ea8ac846 [PATCH] Kexec: Kexec on panic fix with nmi watchdog enabled
o Problem: Kexec on panic hangs if first kernel is booted with nmi_watchdog
  command line parameter. This problem occurs because kexec crash shutdown
  code replaces the NMI callback handler. This handler saves the cpu register
  states and halts the cpu. If system is booted with nmi_watchdog parameter,
  then crashing cpu also runs this nmi handler and halts itself.

o This patch fixes the problem by keeping a track of crashing cpu and not
  executing the new nmi handler on crashing cpu.

o There is a dependence on smp_processor_id() function which might return
  insane value for cpu, if cpu field of thread_info is corrupted.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:52 -07:00
Vivek Goyal
4d55476c3f [PATCH] kdump: NMI handler segment selector, stack pointer fix
CPU does not save ss and esp on stack if execution was already in kernel mode
at the time of NMI occurrence.  This leads to saving of erractic values for ss
and esp.  This patch fixes the issue.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:52 -07:00
Vivek Goyal
625f1c8219 [PATCH] Kdump: Export crash notes section address through sysfs
o Following patch exports kexec global variable "crash_notes" to user space
  through sysfs as kernel attribute in /sys/kernel.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:51 -07:00
Eric W. Biederman
1bc3b91aee [PATCH] crashdump: x86 crashkernel option
This is the x86 implementation of the crashkernel option.  It reserves a
window of memory very early in the bootup process, so we never use it for
anything but the kernel to switch to when the running kernel panics.

In addition to reserving this memory a resource structure is registered so
looking at /proc/iomem it is clear what happened to that memory.

ISSUES:
Is it possible to implement this in a architecture generic way?
What should be done with architectures that always use an iommu and
thus don't report their RAM memory resources in /proc/iomem?

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:50 -07:00
Eric W. Biederman
63d30298ef [PATCH] kexec: x86 shutdown APICs during crash_shutdown
In the case of a crash/panic an architecture specific function
machine_crash_shutdown is called.  This patch adds to the x86 machine_crash
function the standard kernel code for shutting down apics.

Every line of code added to that function increases the risk that we will call
code after a kernel panic that is not safe.

This patch should not make it to the stable kernel without a being reviewed a
lot more.  It is unclear how much a hardned kernel can take when it comes to
misconfigured apics.  So since a normal kernel has problems this patch does a
clean shutdown.

It is my expectation this patch will be dropped from future generations of the
kexec work.  But for the moment it is a crutch to keep from breaking
everything.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:50 -07:00
Eric W. Biederman
2c818b45a2 [PATCH] kexec: x86: snapshot registers during crash shutdown
After the kernel panics if we wish to generate an entire machine core file it
is very nice to know the register state at the time the machine crashed.

After long discussion it was realized that if you are going to be saving the
information anyway it is reasonable to store the information in a format that
it will be used and recognized in so the register state is stored in the
standard ELF note format.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Eric W. Biederman
c4ac4263a0 [PATCH] crashdump: x86: add NMI handler to capture other CPUs
One of the dangers when switching from one kernel to another is what happens
to all of the other cpus that were running in the crashed kernel.  In an
attempt to avoid that problem this patch adds a nmi handler and attempts to
shoot down the other cpus by sending them non maskable interrupts.

The code then waits for 1 second or until all known cpus have stopped running
and then jumps from the running kernel that has crashed to the kernel in
reserved memory.

The kernel spin loop is used for the delay as that should behave continue to
be safe even in after a crash.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Eric W. Biederman
5033cba087 [PATCH] kexec: x86 kexec core
This is the i386 implementation of kexec.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Eric W. Biederman
dd2a13054f [PATCH] kexec: x86: factor out apic shutdown code
Factor out the apic and smp shutdown code from machine_restart so it can be
called by in the kexec reboot path as well.

By switching to the bootstrap cpu by default on reboot I can delete/simplify
some motherboard fixups well.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Vivek Goyal
8a9190853c [PATCH] kexec: reserve Bootmem fix for booting nondefault location kernel
This patch fixes a problem with reserving memory during boot up of a kernel
built for non-default location.  Currently boot memory allocator reserves
the memory required by kernel image, boot allocaotor bitmap etc.  It
assumes that kernel is loaded at 1MB (HIGH_MEMORY hard coded to 1024*1024).
 But kernel can be built for non-default locatoin, hence existing
hardcoding will lead to reserving unnecessary memory.  This patch fixes it.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:48 -07:00
Eric W. Biederman
3d345e3fc9 [PATCH] kexec: x86: add CONFIG_PYSICAL_START
For one kernel to report a crash another kernel has created we need
to have 2 kernels loaded simultaneously in memory.  To accomplish this
the two kernels need to built to run at different physical addresses.

This patch adds the CONFIG_PHYSICAL_START option to the x86 kernel
so we can do just that.  You need to know what you are doing and
the ramifications are before changing this value, and most users
won't care so I have made it depend on CONFIG_EMBEDDED

bzImage kernels will work and run at a different address when compiled
with this option but they will still load at 1MB.  If you need a kernel
loaded at a different address as well you need to boot a vmlinux.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:48 -07:00
Eric W. Biederman
ad0d75ebac [PATCH] kexec: x86: vmlinux: fix physical addresses
The vmlinux on i386 does not report the correct physical address of
the kernel.  Instead in the physical address field it currently
reports the virtual address of the kernel.

This is patch is a bug fix that corrects vmlinux to report the
proper physical addresses.

This is potentially a help for crash dump analysis tools.

This definitiely allows bootloaders that load vmlinux as a standard
ELF executable.  Bootloaders directly loading vmlinux become of
practical importance when we consider the kexec on panic case.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:47 -07:00
Eric W. Biederman
650927ef8a [PATCH] kexec: x86: resture apic virtual wire mode on shutdown
When coming out of apic mode attempt to set the appropriate
apic back into virtual wire mode.  This improves on previous versions
of this patch by by never setting bot the local apic and the ioapic
into veritual wire mode.

This code looks at data from the mptable to see if an ioapic has
an ExtInt input to make this decision.  A future improvement
is to figure out which apic or ioapic was in virtual wire mode
at boot time and to remember it.  That is potentially a more accurate
method, of selecting which apic to place in virutal wire mode.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:47 -07:00
Eric W. Biederman
cee5dab485 [PATCH] kexec: x86: i8259 shutdown: disable interrupts
From: Eric W. Biederman <ebiederm@xmission.com>

This patch disables interrupt generation from the legacy pic on reboot.  Now
that there is a sys_device class it should not be called while drivers are
still using interrupts.

There is a report about this breaking ACPI power off on some systems.
http://bugme.osdl.org/show_bug.cgi?id=4041
However the final comment seems to exonerate this code.  So until
I get more information I believe that was a false positive.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:46 -07:00
Eric W. Biederman
9635b47d91 [PATCH] kexec: x86: local apic fix
From: "Maciej W. Rozycki" <macro@linux-mips.org>

Fix a kexec problem whcih causes local APIC detection failure.

The problem is detect_init_APIC() is called early, before the command line
have been processed.  Therefore "lapic" (and "nolapic") have not been seen,
yet.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:46 -07:00
Li Shaohua
5a72e04df5 [PATCH] suspend/resume SMP support
Using CPU hotplug to support suspend/resume SMP.  Both S3 and S4 use
disable/enable_nonboot_cpus API.  The S4 part is based on Pavel's original S4
SMP patch.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:32 -07:00
Li Shaohua
e1367daf3e [PATCH] cpu state clean after hot remove
Clean CPU states in order to reuse smp boot code for CPU hotplug.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:30 -07:00
Li Shaohua
0bb3184df5 [PATCH] init call cleanup
Trival patch for CPU hotplug.  In CPU identify part, only did cleaup for intel
CPUs.  Need do for other CPUs if they support S3 SMP.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:30 -07:00
Li Shaohua
d720803a93 [PATCH] sibling map initializing rework
Make sibling map init per-cpu.  Hotplug CPU may change the map at runtime.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Li Shaohua
6fe940d6c3 [PATCH] sep initializing rework
Make SEP init per-cpu, so it is hotplug safe.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Zwane Mwaikambo
f370513640 [PATCH] i386 CPU hotplug
(The i386 CPU hotplug patch provides infrastructure for some work which Pavel
is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
<shaohua.li@intel.com> is doing)

The following provides i386 architecture support for safely unregistering and
registering processors during runtime, updated for the current -mm tree.  In
order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
cpu_online check in do_IRQ() by modifying fixup_irqs().  The difference being
that on cpu offline, fixup_irqs() is called before we clear the cpu from
cpu_online_map and a long delay in order to ensure that we never have any
queued external interrupts on the APICs.  There are additional changes to s390
and ppc64 to account for this change.

1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.

Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Shaohua Li
d92de65cab [PATCH] variable overflow after hundreds round of hotplug CPU
I'm doing the cpu hotplug stress test and found a variable ('ready') is
overflow after several hundreds rounds of cpu hotplug.  Here is a fix.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Shaohua Li
a13db56624 [PATCH] CPU hotplug: fix hpet sectioning
With hpet enabled, cpu hotplug uses some routines marked with __init.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
1249c5138f [PATCH] dmi: spring cleanup
Whitespace and CodingStyle cleanup. No functionality changes.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
b625883f24 [PATCH] dmi: remove central blacklist
Since last dmi quirk looks useless (it just prints 404 compliant url) we can
finally remove central dmi blacklist.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
0f8133a8db [PATCH] dmi: move ACPI sleep quirk
This patch moves ACPI sleep quirk out of dmi_scan.c

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
aea00143a8 [PATCH] dmi: move ACPI boot quirk
This patch moves ACPI boot quirks out of dmi_scan.c

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:27 -07:00
Dmitry Torokhov
e70c9d5e61 [PATCH] I8K: use standard DMI interface
I8K: Change to use stock dmi infrastructure instead of homegrown
     parsing code. The driver now requires box's DMI data to match
     list of supported models so driver can be safely compiled-in
     by default without fear of it poking into random SMM BIOS
     code. DMI checks can be ignored with i8k.ignore_dmi option.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:24 -07:00
Prasanna S Panchamukhi
417c8da651 [PATCH] kprobes: Temporary disarming of reentrant probe for i386
This patch includes i386 architecture specific changes to support temporary
disarming on reentrancy of probes.

Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:24 -07:00
Hien Nguyen
0aa55e4d7d [PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task
This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().

Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Rusty Lynch
7e1048b11c [PATCH] Move kprobe [dis]arming into arch specific code
The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time.  The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address.  This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:

*p->addr = BREAKPOINT_INSTRUCTION;

The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:

     * void arch_arm_kprobe(struct kprobe *p)
     * void arch_disarm_kprobe(struct kprobe *p)

and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).

I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So...  I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.

So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.

Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Hien Nguyen
b94cce926b [PATCH] kprobes: function-return probes
This patch adds function-return probes to kprobes for the i386
architecture.  This enables you to establish a handler to be run when a
function returns.

1. API

Two new functions are added to kprobes:

	int register_kretprobe(struct kretprobe *rp);
	void unregister_kretprobe(struct kretprobe *rp);

2. Registration and unregistration

2.1 Register

  To register a function-return probe, the user populates the following
  fields in a kretprobe object and calls register_kretprobe() with the
  kretprobe address as an argument:

  kp.addr - the function's address

  handler - this function is run after the ret instruction executes, but
  before control returns to the return address in the caller.

  maxactive - The maximum number of instances of the probed function that
  can be active concurrently.  For example, if the function is non-
  recursive and is called with a spinlock or mutex held, maxactive = 1
  should be enough.  If the function is non-recursive and can never
  relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
  be enough.  maxactive is used to determine how many kretprobe_instance
  objects to allocate for this particular probed function.  If maxactive <=
  0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
  NR_CPUS) else maxactive=NR_CPUS)

  For example:

    struct kretprobe rp;
    rp.kp.addr = /* entrypoint address */
    rp.handler = /*return probe handler */
    rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
    register_kretprobe(&rp);

  The following field may also be of interest:

  nmissed - Initialized to zero when the function-return probe is
  registered, and incremented every time the probed function is entered but
  there is no kretprobe_instance object available for establishing the
  function-return probe (i.e., because maxactive was set too low).

2.2 Unregister

  To unregiter a function-return probe, the user calls
  unregister_kretprobe() with the same kretprobe object as registered
  previously.  If a probed function is running when the return probe is
  unregistered, the function will return as expected, but the handler won't
  be run.

3. Limitations

3.1 This patch supports only the i386 architecture, but patches for
    x86_64 and ppc64 are anticipated soon.

3.2 Return probes operates by replacing the return address in the stack
    (or in a known register, such as the lr register for ppc).  This may
    cause __builtin_return_address(0), when invoked from the return-probed
    function, to return the address of the return-probes trampoline.

3.3 This implementation uses the "Multiprobes at an address" feature in
    2.6.12-rc3-mm3.

3.4 Due to a limitation in multi-probes, you cannot currently establish
    a return probe and a jprobe on the same function.  A patch to remove
    this limitation is being tested.

This feature is required by SystemTap (http://sourceware.org/systemtap),
and reflects ideas contributed by several SystemTap developers, including
Will Cohen and Ananth Mavinakayanahalli.

Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Vincent Hanquez
717b594a41 [PATCH] xen: x86: Use more usermode macro
Use the user_mode macro where it's possible.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
fa1e1bdf78 [PATCH] xen: x86: Rename usermode macro
Rename user_mode to user_mode_vm and add a user_mode macro similar to the
x86-64 one.

This is useful for Xen because the linux xen kernel does not runs on the same
priviledge that a vanilla linux kernel, and with this we just need to redefine
user_mode().

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
1cc6f12e03 [PATCH] xen: x86: Use new macro for debugreg
Make use of the 2 new macro set_debugreg and get_debugreg.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:13 -07:00
Andrew Morton
c92c6ffdb1 [PATCH] mtrr size-and-base debugging
Consolidate the mtrr sanity checking, add a dump_stack().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:12 -07:00
Andrew Morton
a3a255e744 [PATCH] x86: cpu_khz type fix
x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
unsigned long.

So make cpu_khz unsigned int on x86 as well.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Alexey Dobriyan
129f69465b [PATCH] Remove i386_ksyms.c, almost.
* EXPORT_SYMBOL's moved to other files
* #include <linux/config.h>, <linux/module.h> where needed
* #include's in i386_ksyms.c cleaned up
* After copy-paste, redundant due to Makefiles rules preprocessor directives
  removed:

	#ifdef CONFIG_FOO
	EXPORT_SYMBOL(foo);
	#endif

	obj-$(CONFIG_FOO) += foo.o

* Tiny reformat to fit in 80 columns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Natalie Protasevich
c434b7a6ae [PATCH] x86: avoid wasting IRQs for PCI devices
I have submitted the patch for x86_64, this is submission for i386.

The patch changes the way IRQs are handed out to PCI devices.  Currently,
each I/O APIC pin gets associated with an IRQ, no matter if the pin is used
or not.  This imposes severe limitation on systems that have designs that
employ many I/O APICs, only utilizing couple lines of each, such as P64H2
chipset.  It is used in ES7000, and currently, there is no way to boot the
system with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC.  It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Jan Beulich
7fbb4f6e68 [PATCH] adjust i386 watchdog tick calculation
Get the i386 watchdog tick calculation into a state where it can also be used
on CPUs with frequencies beyond 4GHz, and it consolidates the calculation into
a single place (for potential furture adjustments).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Natalie Protasevich
ca05fea6db [PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386)
This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and
use heuristics to prevent unique I/O APIC ID check for systems that don't
need it.  The patch disables unique I/O APIC ID check for Xeon-based and
other platforms that don't use serial APIC bus for interrupt delivery.
Andi stated that AMD systems don't need unique IO_APIC_IDs either.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Roland McGrath
7c1def1652 [PATCH] i386: never block forced SIGSEGV
This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way.  The
signal blocking for sa_mask and the handled signal takes place after the
handler setup.  When the stack is bogus, the handler setup forces a
SIGSEGV.  But then this will be blocked, and returning to user mode will
fault again and iterate.  This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so.  This
copies what was done in the ppc code.  I think all architectures' signal
handler setup code follows this pattern and needs the change.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Venkatesh Pallipadi
8a9e1b0f56 [PATCH] Platform SMIs and their interferance with tsc based delay calibration
Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2)   wait for jiffy transition (jiffy1)
(3)   Note down current tsc (tsc1)
(4)   loop until tsc becomes tsc1 + loops_per_jiffy
(5)   check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:08 -07:00
Andy Whitcroft
05b79bdcb4 [PATCH] sparsemem memory model for i386
Provide the architecture specific implementation for SPARSEMEM for i386 SMP
and NUMA systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
Martin Hicks
753ee72896 [PATCH] VM: early zone reclaim
This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Ingo Molnar
39c715b717 [PATCH] smp_processor_id() cleanup
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:13 -07:00
gregkh@suse.de
8874b414ff [PATCH] class: convert arch/* to use the new class api instead of class_simple
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:09 -07:00
Thomas Hood
92c6dc59b7 [PATCH] apm.c: ignore_normal_resume is set a bit too late
This patch causes the ignore_normal_resume flag to be set slightly earlier,
before there is a chance that the apm driver will receive the normal resume
event from the BIOS.  (Addresses Debian bug #310865)

Signed-off-by: Thomas Hood <jdthood@yahoo.co.uk>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-14 07:19:35 -07:00
Keith Owens
5754c9b649 [PATCH] Stop arch/i386/kernel/vsyscall-note.o being rebuilt every time
arch/i386/kernel/vsyscall-note.o is not listed as a target so its .cmd file
is neither considered as a target nor is it read on the next build.  This
causes vsyscall-note.o to be rebuilt every time that you run make, which
causes vmlinux to be rebuilt every time.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-08 16:21:13 -07:00
Dave Jones
f94ea640a2 [CPUFREQ] Typos.
cpfureq developers cant spel.

Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:52 -07:00
Dave Jones
6778bae0f2 [CPUFREQ] longhaul - adjust transition latency.
From patch by: Ken Staton <ken_staton@agilent.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:51 -07:00
Dave Jones
1174631418 [CPUFREQ] Longhaul: Magic timer frobbing.
As mandated by the spec, disable timer around transitions.

From code by : Ken Staton <ken_staton@agilent.com
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:51 -07:00
Dave Jones
3be6a48f3c [CPUFREQ] longhaul - disable PCI mastering around transition.
The spec states that we have to do this, which is *horrid*.

Based on code from: Ken Staton <ken_staton@agilent.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:51 -07:00
Dave Jones
065b807ca1 [CPUFREQ] dual-core powernow-k8
With the release of the dual-core AMD Opterons last week,
it's high time that cpufreq supported them.  The attached
patch applies cleanly to 2.6.12-rc3 and updates powernow-k8
to support the latest Athlon 64 and Opteron processors.

Update the driver to version 1.40.0 and provide support
for dual-core processors.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:46 -07:00
Dave Jones
c5d28fb297 [CPUFREQ] Recalibrate cpu_khz [2/2]
Some cpufreq drivers (at that time, only powernow-k7) need to recalibrate the
cpu_khz at runtime.

Signed-off-by: Bruno Ducrot <ducrot@poupinou.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:46 -07:00
Dave Jones
91350ed49b [CPUFREQ] Recalibrate cpu_khz [1/2]
We have to recalibrate cpu_khz in order to use the current FID instead the max
FID since some BIOS do not put the processor at maximum frequency at POST. 
Also, some BIOS will change the processor frequency at our back after cpu_khz
was calibrate.  Finally, this will fix a long standing bug when we do
something like this:

# rmmod powernow-k7
# modprobe powernow-k7

Signed-off-by: Bruno Ducrot <ducrot@poupinou.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:45 -07:00
Dave Jones
bf6fc9fd2d [CPUFREQ] AMD Elan SC520 cpufreq driver.
From: Sean Young <sean@mess.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:45 -07:00
Dave Jones
6f4095af6d [CPUFREQ] speedstep-smi: it works on at least one P4M
The speedstep-smi driver actually works on >=1 notebook with a
Pentium 4-M CPU where all other cpufreq drivers fail. Therefore,
allow speedstep-smi on P4Ms again, but warn users of likely failure

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:44 -07:00
Dave Jones
8282864a96 [CPUFREQ] speedstep-centrino: Pentium 4 - M (HT) support
The Pentium 4 - Ms (HT) with CPUID 0xF34 and 0xF41 seem to support
centrino-like enhanced speedstep; however, no "table" support is possible.
Therefore, put NULL entries into speedstep-centrino.c

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:43 -07:00
Dave Jones
7eb53d8823 [CPUFREQ] powernow-k7: don't print khz element of FSB.
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:42 -07:00
Alexander Nyberg
adaa765d76 [PATCH] acpi build fix: x86 setup.c
This is a neverending story

linux/acpi.h contains empty declarations for acpi_boot_init() &
acpi_boot_table_init() but they are nested inside #ifdef CONFIG_ACPI.

So we'll have to #ifdef in arch/i386/kernel/setup.c: setup_arch()

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-31 14:54:17 -07:00
Siddha, Suresh B
49f384b82b [PATCH] x86: fix smp_num_siblings on buggy BIOSes
This fixes 'smp_num_siblings' value on the systems with a buggy bios,
which sets number of siblings to '2' even when HT is disabled.  (more
details are at http://bugzilla.kernel.org/show_bug.cgi?id=4359)

I am planning to do more cleanup in this area (like moving smp_num_siblings
to per cpuinfo) shortly.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28 11:14:00 -07:00
Adrian Bunk
70ffc71c5c [PATCH] arch/i386/kernel/cpu/intel_cacheinfo.c: section fix
num_cache_leaves is used in __devexit cache_remove_dev() and can therefore
not be __devinit.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28 11:14:00 -07:00
Andi Kleen
2df9fa3664 [PATCH] x86_64: i386/x86-64: Export cpu_core_map
Needed for the powernow k8 driver for dual core support.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: <mark.langsdorf@amd.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20 15:48:21 -07:00
Andi Kleen
b41e29398a [PATCH] x86_64: 386/x86-64 Further AMD dual core fixes
- Remove duplicated ifdef
- Make core_id match what Intel uses
- Initialize phys_proc_id correctly for non DC case
- Handle non power of two core numbers.

Fixes for both i386 and x86-64

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20 15:48:20 -07:00
Andi Kleen
a158608bf4 [PATCH] x86_64/i386: fix defaults for physical/core id in /proc/cpuinfo
Last round hopefully of cpu_core_id changes hopefully fow now:

- Always initialize cpu_core_id for all CPUs, even when no dual core setup
  is detected.  This prevents funny /proc/cpuinfo output

- Do the same with phys_proc_id[] even when no HyperThreading - dito.

- Use the CPU APIC-ID from CPUID 1 instead of the linux virtual CPU number
  to identify the core for AMD dual core setups.

Patch for i386/x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17 07:59:13 -07:00
Linus Torvalds
22490eb80c Fix acpi_find_rsdp() - acpi_scan_rsdp takes length, not end
Noticed by Jakub Jermar <jermar@itbs.cz>
2005-05-06 15:39:23 -07:00
maximilian attems
a27e951f1e [PATCH] cyrix: eliminate bad section references
Fix cyrix section references:
 convert __initdata to __devinitdata.

Error: ./arch/i386/kernel/cpu/mtrr/cyrix.o .text refers to 00000379
R_386_32          .init.data
Error: ./arch/i386/kernel/cpu/mtrr/cyrix.o .text refers to 00000399
R_386_32          .init.data
Error: ./arch/i386/kernel/cpu/mtrr/cyrix.o .text refers to 000003b3
R_386_32          .init.data
Error: ./arch/i386/kernel/cpu/mtrr/cyrix.o .text refers to 000003b9
R_386_32          .init.data
Error: ./arch/i386/kernel/cpu/mtrr/cyrix.o .text refers to 000003bf
R_386_32          .init.data

Signed-of-by: maximilian attems <janitor@sternwelten.at>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:47 -07:00
Prasanna S Panchamukhi
0b9e2cac8a [PATCH] Kprobes: Incorrect handling of probes on ret/lret instruction
Kprobes could not handle the insertion of a probe on the ret/lret
instruction and used to oops after single stepping since kprobes was
modifying eip/rip incorrectly.  Adjustment of eip/rip is not required after
single stepping in case of ret/lret instruction, because eip/rip points to
the correct location after execution of the ret/lret instruction.  This
patch fixes the above problem.

Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:39 -07:00
Paolo 'Blaisorblade' Giarrusso
0c28130b5c [PATCH] x86_64: make string func definition work as intended
In include/asm-x86_64/string.h there are such comments:

/* Use C out of line version for memcmp */
#define memcmp __builtin_memcmp
int memcmp(const void * cs,const void * ct,size_t count);

This would mean that if the compiler does not decide to use __builtin_memcmp,
it emits a call to memcmp to be satisfied by the C out-of-line version in
lib/string.c.  What happens is that after preprocessing, in lib/string.i you
may find the definition of "__builtin_strcmp".

Actually, by accident, in the object you will find the definition of strcmp
and such (maybe a trick intended to redirect calls to __builtin_memcmp to the
default memcmp when the definition is not expanded); however, this particular
case is not a documented feature as far as I can see.

Also, the EXPORT_SYMBOL does not work, so it's duplicated in the arch.

I simply added some #undef to lib/string.c and removed the (now duplicated)
exports in x86-64 and UML/x86_64 subarchs (the second ones are introduced by
another patch I just posted for -mm).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:33 -07:00
Alexander Nyberg
f48d9663f1 [PATCH] x86 stack initialisation fix
The recent change fix-crash-in-entrys-restore_all.patch

 	childregs->esp = esp;

 	p->thread.esp = (unsigned long) childregs;
-	p->thread.esp0 = (unsigned long) (childregs+1);
+	p->thread.esp0 = (unsigned long) (childregs+1) - 8;

 	p->thread.eip = (unsigned long) ret_from_fork;

introduces an inconsistency between esp and esp0 before the task is run the
first time.  esp0 is no longer the actual start of the stack, but 8 bytes
off.

This shows itself clearly in a scenario when a ptracer that is set to also
ptrace eventual children traces program1 which then clones thread1.  Now
the ptracer wants to modify the registers of thread1.  The x86 ptrace
implementation bases it's knowledge about saved user-space registers upon
p->thread.esp0.  But this will be a few bytes off causing certain writes to
the kernel stack to overwrite a saved kernel function address making the
kernel when actually running thread1 jump out into user-space.  Very
spectacular.

The testcase I've used is:
/* start with strace -f ./a.out */
#include <pthread.h>
#include <stdio.h>

void *do_thread(void *p)
{
	for (;;);
}

int main()
{
	pthread_t one;
	pthread_create(&one, NULL, &do_thread, NULL);
	for (;;);
	return 0;
}

So, my solution is to instead of just adjusting esp0 that creates an
inconsitent state I adjust where the user-space registers are saved with -8
bytes.  This gives us the wanted extra bytes on the start of the stack and
esp0 is now correct.  This solves the issues I saw from the original
testcase from Mateusz Berezecki and has survived testing here.  I think
this should go into -mm a round or two first however as there might be some
cruft around depending on pt_regs lying on the start of the stack.  That
however would have broken with the first change too!

It's actually a 2-line diff but I had to move the comment of why the -8 bytes
are there a few lines up. Thanks to Zwane for helping me with this.

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:30 -07:00
David Woodhouse
27b030d58c Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-05-03 08:14:09 +01:00
Adrian Bunk
408b664a7d [PATCH] make lots of things static
Another large rollup of various patches from Adrian which make things static
where they were needlessly exported.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:29 -07:00
Jesper Juhl
7ed20e1ad5 [PATCH] convert that currently tests _NSIG directly to use valid_signal()
Convert most of the current code that uses _NSIG directly to instead use
valid_signal().  This avoids gcc -W warnings and off-by-one errors.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:14 -07:00
Jesper Juhl
e49332bd12 [PATCH] misc verify_area cleanups
There were still a few comments left refering to verify_area, and two
functions, verify_area_skas & verify_area_tt that just wrap corresponding
access_ok_skas & access_ok_tt functions, just like verify_area does for
access_ok - deprecate those.

There was also a few places that still used verify_area in commented-out
code, fix those up to use access_ok.

After applying this one there should not be anything left but finally
removing verify_area completely, which will happen after a kernel release
or two.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:08 -07:00
Matt Mackall
d59745ce3e [PATCH] clean up kernel messages
Arrange for all kernel printks to be no-ops.  Only available if
CONFIG_EMBEDDED.

This patch saves about 375k on my laptop config and nearly 100k on minimal
configs.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:02 -07:00
Paolo 'Blaisorblade' Giarrusso
5e7b83ffc6 [PATCH] uml: fix syscall table by including $(SUBARCH)'s one, for i386
Split the i386 entry.S files into entry.S and syscall_table.S which is
included in the previous one (so actually there is no difference between them)
and use the syscall_table.S in the UML build, instead of tracking by hand the
syscall table changes (which is inherently error-prone).

We must only insert the right #defines to inject the changes we need from the
i386 syscall table (for instance some different function names); also, we
don't implement some i386 syscalls, as ioperm(), nor some TLS-related ones
(yet to provide).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:55 -07:00
Pavel Pisa
ad6714230f [PATCH] Linux 2.6.x VM86 interrupt emulation fixes
Patch solves VM86 interrupt emulation deadlock on SMP systems.  The VM86
interrupt emulation has been heavily tested and works well on UP systems
after last update, but it seems to deadlock when we have used it on SMP/HT
boxes now.

It seems, that disable_irq() cannot be called from interrupts, because it
waits until disabled interrupt handler finishes
(/kernel/irq/manage.c:synchronize_irq():while(IRQ_INPROGRESS);).  This
blocks one CPU after another.  Solved by use disable_irq_nosync.

There is the second problem.  If IRQ source is fast, it is possible, that
interrupt is sometimes processed and re-enabled by the second CPU, before
it is disabled by the first one, but negative IRQ disable depths are not
allowed.  The spinlocking and disabling IRQs over call to
disable_irq_nosync/enable_irq is the only solution found reliable till now.

Signed-off-by: Michal Sojka <sojkam1@control.felk.cvut.cz>
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:52 -07:00
Zwane Mwaikambo
3c3b73b6f5 [PATCH] cpuid x87 bit on AMD falsely marked as PNI
http://bugme.osdl.org/show_bug.cgi?id=4426

vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP
stepping        : 0
cpu MHz         : 2204.807
<snipped>
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 4358.14

We're marking bit 0 of extended function 0x80000001 cpuid as PNI support on
AMD processors, when it actually denotes x87 FPU present.  Patch for i386
and x86_64 below.

Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:51 -07:00
john stultz
35492df5ae [PATCH] i386: fix hpet for systems that don't support legacy replacement
Currently the i386 HPET code assumes the entire HPET implementation from
the spec is present.  This breaks on boxes that do not implement the
optional legacy timer replacement functionality portion of the spec.

This patch, which is very similar to my x86-64 patch for the same issue,
fixes the problem allowing i386 systems that cannot use the HPET for the
timer interrupt and RTC to still use the HPET as a time source.  I've
tested this patch on a system systems without HPET, with HPET but without
legacy timer replacement, as well as HPET with legacy timer replacement.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:50 -07:00
Lee Revell
a6954ba2e8 [PATCH] Enable write combining for server works LE rev > 6
Enable write combining for server works LE rev > 6 per
http://www.ussg.iu.edu/hypermail/linux/kernel/0104.3/1007.html

Signed-Off-By: Lee Revell <rlrevell@joe-job.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:49 -07:00
Stas Sergeev
48c88211a6 [PATCH] x86: entry.S trap return fixes
do_debug() and do_int3() return void.

This patch fixes the CONFIG_KPROBES variant of do_int3() to return void too
and adjusts entry.S accordingly.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:49 -07:00
Jaya Kumar
a2f7c35415 [PATCH] x86 reboot: Add reboot fixup for gx1/cs5530a
This patch by Jaya Kumar introduces a generic infrastructure to deal with
x86 chipsets with nonstandard reset sequences, and adds support for the
Geode gx1/cs5530a chipset.

Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:49 -07:00
Jack F Vogel
67701ae976 [PATCH] check nmi watchdog is broken
A bug against an xSeries system showed up recently noting that the
check_nmi_watchdog() test was failing.

I have been investigating it and discovered in both i386 and x86_64 the
recent change to the routine to use the cpu_callin_map has uncovered a
problem.  Prior to that change, on an SMP box, the test was trivally
passing because all cpu's were found to not yet be online, but now with the
callin_map they are discovered, it goes on to test the counter and they
have not yet begun to increment, so it announces a CPU is stuck and bails
out.

On all the systems I have access to test, the announcement of failure is
also bougs...  by the time you can login and check /proc/interrupts, the
NMI count is happily incrementing on all CPUs.  Its just that the test is
being done too early.

I have tried moving the call to the test around a bit, and it was always
too early.  I finally hit on this proposed solution, it delays the routine
via a late_initcall(), seems like the right solution to me.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:48 -07:00
H. J. Lu
fd51f666fa [PATCH] i386/x86_64 segment register access update
The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

        movl (%eax),%ds
        movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

        mov (%eax),%ds
        mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

        movw (%eax),%ds
        movw %ds,(%eax)

without the 0x66 prefix. I am enclosing patches for 2.4 and 2.6 kernels
here. The resulting kernel binaries should be unchanged as before, with
old and new assemblers, if gcc never generates memory access for

               unsigned gsindex;
               asm volatile("movl %%gs,%0" : "=g" (gsindex));

If gcc does generate memory access for the code above, the upper bits
in gsindex are undefined and the new assembler doesn't allow it.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:48 -07:00
Linus Torvalds
a879cbbb34 x86: make traps on 'iret' be debuggable in user space
This makes a trap on the 'iret' that returns us to user space
cause a nice clean SIGSEGV, instead of just a hard (and silent)
exit.

That way a debugger can actually try to see what happened, and
we also properly notify everybody who might be interested about
us being gone.

This loses the error code, but tells the debugger what happened
with ILL_BADSTK in the siginfo.
2005-04-29 09:38:44 -07:00
2fd6f58ba6 [AUDIT] Don't allow ptrace to fool auditing, log arch of audited syscalls.
We were calling ptrace_notify() after auditing the syscall and arguments,
but the debugger could have _changed_ them before the syscall was actually
invoked. Reorder the calls to fix that.

While we're touching ever call to audit_syscall_entry(), we also make it
take an extra argument: the architecture of the syscall which was made,
because some architectures allow more than one type of syscall.

Also add an explicit success/failure flag to audit_syscall_exit(), for
the benefit of architectures which return that in a condition register
rather than only returning a single register.

Change type of syscall return value to 'long' not 'int'.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-04-29 16:08:28 +01:00
James Bottomley
1cff94c6fe [PATCH] fix subarch breakage in amd dual core updates
The patch to arch/i386/kernel/cpu/amd.c relies on the variable
cpu_core_id which is defined in i386/kernel/smpboot.c.  This means it is
only present if CONFIG_X86_SMP is defined, not CONFIG_SMP (alternative
SMP harnesses won't have it, which is why it breaks voyager).

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-21 16:20:35 -07:00
Chris Wedgwood
15e8869943 [PATCH] x86: fix acpi compile without CONFIG_ACPI_BUS
The recent acpi boot patch breaks for me: acpi_fadt needs CONFIG_ACPI_BUS.

Signed-off-By: Chris Wedgwood <cw@f00f.org>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-18 08:01:30 -07:00
maximilian attems
c41f5eb3b8 [PATCH] efi: eliminate bad section references
Randy please double check especially this one.
there may be a better solution.

Fix efi section references:
 remove __initdata for struct efi efi_phys 
 and struct efi_memory_map memmap

Error: ./arch/i386/kernel/efi.o .text refers to 000000d3 R_386_32
.init.data
Error: ./arch/i386/kernel/efi.o .text refers to 000000ff R_386_32
.init.data

efi_memmap_walk (which is not __init nor static) 
accesses both efi_phys and memmap.

Signed-off-by: maximilian attems <janitor@sternwelten.at>
Acked-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:53 -07:00
Pavel Machek
438510f6f0 [PATCH] pm_message_t: more fixes in common and i386
I thought I'm done with fixing u32 vs.  pm_message_t ...  unfortunately
that turned out not to be the case as Russel King pointed out.  Here are
fixes for Documentation and common code (mainly system devices).

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:24 -07:00
Siddha, Suresh B
d31ddaa172 [PATCH] x86, x86_64: dual core proc-cpuinfo and sibling-map fix
- broken sibling_map setup in x86_64

- grouping all the core and HT related cpuinfo fields.
  We are reasonably sure that adding new cpuinfo fields after "siblings" field,
  will not cause any app failure. Thats because today's /proc/cpuinfo
  format is completely different on x86, x86_64 and we haven't heard of any
  x86 app breakage because of this issue. Grouping these fields will 
  result in more or less common format on all architectures (ia64, x86 and 
  x86_64) and will cause less confusion.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:20 -07:00
Andi Kleen
635186447d [PATCH] x86_64: Final support for AMD dual core
Clean up the code greatly.  Now uses the infrastructure from the Intel dual
core patch Should fix a final bug noticed by Tyan of not detecting the nodes
correctly in some corner cases.

Patch for x86-64 and i386

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:16 -07:00
Andi Kleen
3dd9d51484 [PATCH] x86_64: add support for Intel dual-core detection and displaying
Appended patch adds the support for Intel dual-core detection and displaying
the core related information in /proc/cpuinfo.  

It adds two new fields "core id" and "cpu cores" to x86 /proc/cpuinfo and the
"core id" field for x86_64("cpu cores" field is already present in x86_64).

Number of processor cores in a die is detected using cpuid(4) and this is
documented in IA-32 Intel Architecture Software Developer's Manual (vol 2a)
(http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)

This patch also adds cpu_core_map similar to cpu_sibling_map.

Slightly hacked by AK.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:15 -07:00
Siddha, Suresh B
cf94b62f70 [PATCH] x86_64-always-use-cpuid-80000008-to-figure-out-mtrr fix
We need to use the size_and_mask in set_mtrr_var_ranges(which is called
while programming MTRR's for AP's

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:11 -07:00
Andi Kleen
1f2c958ad5 [PATCH] x86_64: Always use CPUID 80000008 to figure out MTRR address space size
It doesn't make sense to only do this only for AMD K8.

This would support future CPUs with extended address spaces properly.

For i386 and x86-64

Cc: <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:10 -07:00
Jason Davis
90660ec3c3 [PATCH] x86_64 genapic update
x86_64 genapic mechanism should be aware of machines that use physical APIC
mode regardless of how many clusters/processors are detected.

ACPI 3.0 FADT makes this determination very simple by providing a feature
flag "force_apic_physical_destination_mode" to state whether the machine
unconditionally uses physical APIC mode.

Unisys' next generation x86_64 ES7000 will need to utilize this FADT
feature flag in order to boot the x86_64 kernel in the correct APIC mode. 
This patch has been tested on both x86_64 commodity and ES7000 boxes.

Signed-off-by: Jason Davis <jason.davis@unisys.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:53 -07:00
Andi Kleen
db46868128 [PATCH] x86-64/i386: Revert cpuinfo siblings behaviour back to 2.6.10
Only display physical id/siblings when there are siblings or dual core.

In 2.6.11 I accidentially broke it and it was always displaying these
fields But for compatibility to all these /proc parsers around it is better
to do it in the old way again.  

Noticed by Suresh Siddha

Cc: <Suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:51 -07:00
Roland McGrath
c97db4a0a7 [PATCH] i386 vDSO: add PT_NOTE segment
This patch adds an ELF note to the vDSO giving the LINUX_VERSION_CODE
value.  Having this in the vDSO lets the dynamic linker avoid the `uname'
syscall it now always does at startup to ascertain the kernel ABI
available.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:48 -07:00
Roland McGrath
ecd02dddd1 [PATCH] i386: Use loaddebug macro consistently
This moves the macro loaddebug from asm-i386/suspend.h to
asm-i386/processor.h, which is the place that makes sense for it to be
defined, removes the extra copy of the same macro in
arch/i386/kernel/process.c, and makes arch/i386/kernel/signal.c use the
macro in place of its expansion.

This is a purely cosmetic cleanup for the normal i386 kernel.  However, it
is handy for Xen to be able to just redefine the loaddebug macro once
instead of also changing the signal.c code.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:46 -07:00
Stas Sergeev
5df240826c [PATCH] fix crash in entry.S restore_all
Fix the access-above-bottom-of-stack crash.

1. Allows to preserve the valueable optimization

2. Works for NMIs

3.  Doesn't care whether or not there are more of the like instances
   where the stack is left empty.

4. Seems to work for me without the crashes:) 

(akpm: this is still under discussion, although I _think_ it's OK.  You might
want to hold off)

Signed-off-by: Stas Sergeev <stsp@aknet.ru> 
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:01 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00