Commit Graph

607 Commits

Author SHA1 Message Date
Zachary Amsden
72e12b76fe [PATCH] x86: bogus tls from gdt
The per-CPU initialization code is copying in bogus data into
thread->tls_array.  Note that it copies &per_cpu(cpu_gdt_table, cpu), not
&per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TLS_MIN).  That is totally broken
and unnecessary.  Make the initialization explicitly NULL.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:12 -08:00
Natalie Protasevich
9f40a72a7e [PATCH] x86: hot plug CPU to support physical add of new processors
The patch allows physical bring-up of new processors (not initially present
in the configuration) from facilities such as driver/utility implemented on
a platform.  The actual method of making processors available is up to the
platform implementation.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:12 -08:00
Siddha, Suresh B
d16aafff25 [PATCH] intel_cacheinfo: remove MAX_CACHE_LEAVES limit
Initial internal version of Venki's cpuid(4) deterministic cache parameter
identification patch used static arrays of size MAX_CACHE_LEAVES.  Final patch
which made to the base used dynamic array allocation, with this
MAX_CACHE_LEAVES limit hunk still in place.

cpuid(4) already has a mechanism to find out the number of cache levels
implemented and there is no need for this hardcoded MAX_CACHE_LEAVES limit.

So remove the MAX_CACHE_LEAVES limit from the routine which calculates the
number of cache levels using cpuid(4)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:11 -08:00
Bart Oldeman
d5cd4aadd3 [PATCH] x86: initialise tss->io_bitmap_owner to something
There exists a field io_bitmap_owner in the TSS that is only checked, but
never set to anything else but NULL.

Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:11 -08:00
Shaohua Li
08967f941a [PATCH] FPU context corrupted after resume
mxcsr_feature_mask_init isn't needed in suspend/resume time (we can use
boot time mask).  And actually it's harmful, as it clear task's saved
fxsave in resume.  This bug is widely seen by users using zsh.

(akpm: my eyes.  Fixed some surrounding whitespace mess)

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:11 -08:00
Jan Beulich
8896fab35e [PATCH] x86: cmpxchg improvements
This adjusts i386's cmpxchg patterns so that

- for word and long cmpxchg-es the compiler can utilize all possible
  registers

- cmpxchg8b gets disabled when the minimum specified hardware architectur
  doesn't support it (like was already happening for the byte, word, and
  long ones).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:11 -08:00
Mathieu Desnoyers
dacb16b1a0 [PATCH] i386 and x86_64 TSC set_cyc2ns_scale imprecision
I just found out that some precision is unnecessarily lost in the
arch/i386/kernel/timers/timer_tsc.c:set_cyc2ns_scale function.  It uses a
cpu_mhz parameter when it could use a cpu_khz.  In the specific case of an
Intel P4 running at 3001.171 Mhz, the truncation to 3001 Mhz leads to an
imprecision of 19 microseconds per second : this is very sad for a timer with
nearly nanosecond accuracy.

Fix the x86_64 architecture too.

Cc: george anzinger <george@mvista.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:11 -08:00
Brian Gerst
0d078f6f96 [PATCH] CONFIG_IA32
Add CONFIG_X86_32 for i386.  This allows selecting options that only apply
to 32-bit systems.

(X86 && !X86_64) becomes X86_32
(X86 ||  X86_64) becomes X86

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:10 -08:00
Dave Hansen
05039b9263 [PATCH] memory hotplug: i386 addition functions
Adds the necessary for non-NUMA hot-add of highmem to an existing zone on
i386.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:45 -07:00
Dave Hansen
208d54e551 [PATCH] memory hotplug locking: node_size_lock
pgdat->node_size_lock is basically only neeeded in one place in the normal
code: show_mem(), which is the arch-specific sysrq-m printing function.

Strictly speaking, the architectures not doing memory hotplug do no need this
locking in show_mem().  However, they are all included for completeness.  This
should also make any future consolidation of all of the implementations a
little more straightforward.

This lock is also held in the sparsemem code during a memory removal, as
sections are invalidated.  This is the place there pfn_valid() is made false
for a memory area that's being removed.  The lock is only required when doing
pfn_valid() operations on memory which the user does not already have a
reference on the page, such as in show_mem().

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:44 -07:00
Hugh Dickins
4c21e2f244 [PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock.  (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access.  Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS.  But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:42 -07:00
Hugh Dickins
60ec558549 [PATCH] mm: i386 sh sh64 ready for split ptlock
Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate
pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places.

The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the
screen fits inside the one page table, as indeed it does.

The sh __do_page_fault: which handles both kernel faults (without lock) and
user mm faults (locked - though it set_pte without locking before).

The sh64 flush_cache_range and helpers: which wrongly thought callers held
page_table_lock before (only its tlb_start_vma did, and no longer does so);
moved the flush loop down, and adjusted the large versus small range decision
to consider a range which spans page tables as large.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:41 -07:00
Hugh Dickins
c34d1b4d16 [PATCH] mm: kill check_user_page_readable
check_user_page_readable is a problematic variant of follow_page.  It's used
only by oprofile's i386 and arm backtrace code, at interrupt time, to
establish whether a userspace stackframe is currently readable.

This is problematic, because we want to push the page_table_lock down inside
follow_page, and later split it; whereas oprofile is doing a spin_trylock on
it (in the i386 case, forgotten in the arm case), and needs that to pin
perhaps two pages spanned by the stackframe (which might be covered by
different locks when we split).

I think oprofile is going about this in the wrong way: it doesn't need to know
the area is readable (neither i386 nor arm uses read protection of user
pages), it doesn't need to pin the memory, it should simply
__copy_from_user_inatomic, and see if that succeeds or not.  Sorry, but I've
not got around to devising the sparse __user annotations for this.

Then we can eliminate check_user_page_readable, and return to a single
follow_page without the __follow_page variants.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:41 -07:00
Hugh Dickins
872fec16d9 [PATCH] mm: init_mm without ptlock
First step in pushing down the page_table_lock.  init_mm.page_table_lock has
been used throughout the architectures (usually for ioremap): not to serialize
kernel address space allocation (that's usually vmlist_lock), but because
pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.

Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
and drop it when allocating a new one, to check lest a racing task already
did.  Similarly no page_table_lock in vmalloc's map_vm_area.

Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
user mms, which are converted only by a later patch, for now they have to lock
differently according to whether or not it's init_mm.

If sources get muddled, there's a danger that an arch source taking
init_mm.page_table_lock will be mixed with common source also taking it (or
neither take it).  So break the rules and make another change, which should
break the build for such a mismatch: remove the redundant mm arg from
pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).

Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
took page_table_lock for no good reason.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:40 -07:00
Jesse Barnes
f8977d0a9b [PATCH] PCI fixup for Toshiba laptops and ohci1394
This is a fix for a bug I see on my Toshiba laptop, where the ohci1394
controller gets initialized improperly.  The patch adds two PCI fixups
to arch/i386/pci/fixup.c, one that happens early on to cache the value
of the PCI_CACHE_LINE_SIZE config register, and another that later
restores the value, along with a valid IRQ number and some BAR values.
I've tested it on my laptop, and it prevents me from running into what I
consider to be a major bug: IRQ 11 is disabled by the IRQ debug code,
causing my wireless to break.

Thanks to Rob for the original patch to ohci1394.c and Stefan for lots
of proofreading (and a last minute bug caught in review!) and additional
information collection.  I think the DMI system list is correct, but we
may need to add some more PCI IDs to the PCI_FIXUP macros over time.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 15:37:02 -07:00
Greg Kroah-Hartman
53f4654272 [PATCH] Driver Core: fix up all callers of class_device_create()
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create().  This patch
fixes up all in-kernel users of the function.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 09:52:52 -07:00
Chris Wright
63172cb3d5 [PATCH] typo fix in last cpufreq powernow patch
Not sure how it slipped by, but here's a trivial typo fix for powernow.

Signed-off-by: Chris Wright <chrisw@osdl.org>
[ It's "nurter" backwards.. Maybe we have a hillbilly The Shining fan? ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-21 17:08:30 -07:00
Dave Jones
0213df7431 [PATCH] cpufreq: fix pending powernow timer stuck condition
AMD recently discovered that on some hardware, there is a race condition
possible when a C-state change request goes onto the bus at the same
time as a P-state change request.

Both requests happen, but the southbridge hardware only acknowledges the
C-state change.  The PowerNow! driver is then stuck in a loop, waiting
for the P-state change acknowledgement.  The driver eventually times
out, but can no longer perform P-state changes.

It turns out the solution is to resend the P-state change, which the
southbridge will acknowledge normally.

Thanks to Johannes Winkelmann for reporting this and testing the fix.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-21 14:28:58 -07:00
Dave Jones
bfdc708dc7 [CPUFREQ] kzalloc conversions for i386 drivers.
Signed-off-by: Dave Jones <davej@redhat.com>
2005-10-20 15:16:15 -07:00
Andi Kleen
3c92c2ba33 [PATCH] i386: Don't discard upper 32bits of HWCR on K8
Need to use long long, not long when RMWing a MSR. I think
it's harmless right now, but still should be better fixed
if AMD adds any bits in the upper 32bit of HWCR.

Bug was introduced with the TLB flush filter fix for i386

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 16:34:09 -07:00
Markus F.X.J. Oberhumer
d347f37227 [PATCH] i386: fix stack alignment for signal handlers
This fixes the setup of the alignment of the signal frame, so that all
signal handlers are run with a properly aligned stack frame.

The current code "over-aligns" the stack pointer so that the stack frame
is effectively always mis-aligned by 4 bytes.  But what we really want
is that on function entry ((sp + 4) & 15) == 0, which matches what would
happen if the stack were aligned before a "call" instruction.

Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 08:45:06 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Nick Piggin
b33fa1f3c3 [PATCH] i386: include linux/irq.h rather than asm/hw_irq.h
I need the following patch to compile -git8 here, otherwise these
files fail to compile (asm/hw_irq.h needs definitions from
linux/irq.h and that file provides the required include ordering).

I did not do a full audit, though there looks to be many other
places that should get the same treatment, if this is  the right
way to do it.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 10:58:37 -07:00
Andi Kleen
7d318d7747 [PATCH] Fix up TLB flush filter disabling
I checked with AMD and they requested to only disable it for family 15.
Also disable it for i386 too. And some style fixes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 15:41:42 -07:00
Al Viro
ce3a161e69 [PATCH] useless includes of linux/irq.h in arch/i386
Most of these guys are simply not needed (pulled by other stuff
via asm-i386/hardirq.h).  One that is not entirely useless is hilarious -
arch/i386/oprofile/nmi_timer_int.c includes linux/irq.h... as a way to
get linux/errno.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-26 18:29:50 -07:00
Dave Jones
b9111b7b7f [CPUFREQ] Remove preempt_disable from powernow-k8
Via reading the code, my understanding is that powernow-k8 uses
preempt_disable to ensure that driver->target doesn't migrate across cpus
whilst it's accessing per processor registers, however set_cpus_allowed
will provide this for us.  Additionally, remove schedule() calls from
set_cpus_allowed as set_cpus_allowed ensures that you're executing on the
target processor on return.

Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-09-23 11:10:42 -07:00
David S. Miller
4db2ce0199 [LIB]: Consolidate _atomic_dec_and_lock()
Several implementations were essentialy a common piece of C code using
the cmpxchg() macro.  Put the implementation in one spot that everyone
can share, and convert sparc64 over to using this.

Alpha is the lone arch-specific implementation, which codes up a
special fast path for the common case in order to avoid GP reloading
which a pure C version would require.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-14 21:47:01 -07:00
Linus Torvalds
1619cca292 Partially revert "Fix time going twice as fast problem on ATI Xpress chipsets"
Commit 66759a01ad introduced the fix for
time ticking too fast on some boards by disabling one of the doubly
connected timer pins on ATI boards.

However, it ends up being _much_ too broad a brush, and that just makes
some other ATI boards not work at all since they now have no timer
source.

So disable the automatic ATI southbridge detection, and just rely on
people who see this problem disabling it by hand with the option
"disable_timer_pin_1" on the kernel command line.

Maybe somebody can figure out the proper tests at a later date.

Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 15:56:27 -07:00
Cal Peake
0a305d2e1b [PATCH] Even more fallout from ATI Xpress timer workaround
disable_timer_pin_1 needs IO-APIC, not just local APIC.

Signed-off-by: Cal Peake <cp@absolutedigital.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 15:07:06 -07:00
Chuck Ebbert
33333373c4 [PATCH] i386: Ignore masked FPU exceptions
Masked FPU exceptions should obviously not happen in the first place,
but if they do, ignoring them seems to be the right thing to do.

Although there is no documentation available for Cyrix MII, I did find
erratum F-7 for Winchip C6, "FPU instruction may result in spurious
exception under certain conditions" which seems to indicate that this
can happen.

That would also explain the behaviour Ondrej Zary reported on the MII.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 09:59:04 -07:00
Tobias Klauser
6f673d83ca [PATCH] arch/i386: Replace custom macro with isdigit()
Replace the custom is_digit() macro with isdigit() from <linux/ctype.h>

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:33 -07:00
Randy Dunlap
9f1583339a [PATCH] use add_taint() for setting tainted bit flags
Use the add_taint() interface for setting tainted bit flags instead of
doing it manually.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:29 -07:00
Linus Torvalds
a217e8c181 Fix fallout from ATI Xpress timer workaround
ACPI earlyquirks needs to honor the proper config variables, and include
the right header file.

(Fixes commit 66759a01ad)

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 12:32:31 -07:00
Chuck Ebbert
66759a01ad [PATCH] x86-64: i386/x86-64: Fix time going twice as fast problem on ATI Xpress chipsets
Original patch from Bertro Simul

This is probably still not quite correct, but seems to be
the best solution so far.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Andi Kleen
69e1a33f62 [PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignments
Since this is shared code I had to implement it for i386 too

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:57 -07:00
Andi Kleen
f343bb4cd7 [PATCH] x86{-64}: Remove old hack that disabled mmconfig support on AMD systems.
Now that Greg implemented MCFG/_SEG support this shouldn't be needed
anymore

Cc: gregkh@suse.de

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:55 -07:00
Roland McGrath
c3ff8ec31c [PATCH] i386: Don't miss pending signals returning to user mode after signal processing
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 07:52:16 -07:00
Paolo 'Blaisorblade' Giarrusso
a7d0c21033 [PATCH] i386 / uml: add dwarf sections to static link script
Inside the linker script, insert the code for DWARF debug info sections. This
may help GDB'ing a Uml binary. Actually, it seems that ld is able to guess
what I added correctly, but normal linker scripts include this section so it
should be correct anyway adding it.

On request by Sam Ravnborg <sam@ravnborg.org>, I've added it to
asm-generic/vmlinux.lds.s. I've also moved there the stabs debug section,
used the new macro in i386 linker script and added DWARF debug section to
that.

In the truth, I've not been able to verify the difference in GDB behaviour
after this change (I've seen large improvements with another patch). This
may depend on my binutils version, older one may have worse defaults.

However, this section is present in normal linker script, so add it at
least for the sake of cleanness.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 12:00:17 -07:00
Nishanth Aravamudan
52e6e63088 [PATCH] i386: fix-up schedule_timeout() usage
Use schedule_timeout_interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:37 -07:00
Adrian Bunk
672289e9fa [PATCH] i386/x86_64: make get_cpu_vendor() static
get_cpu_vendor() no longer has any users in other files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
Pavel Machek
b01d8684e9 [PATCH] remove ACPI S4bios support
Remove S4BIOS support.  It is pretty useless, and only ever worked for _me_
once.  (I do not think anyone else ever tried it).  It was in feature-removal
for a long time, and it should have been removed before.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
Nishanth Aravamudan
aeb8397b6a [PATCH] i386/smpboot: use msleep() instead of schedule_timeout()
Replace schedule_timeout() with msleep() to guarantee the task delays as
expected.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:29 -07:00
Linus Torvalds
486a153f0e Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild 2005-09-09 15:46:49 -07:00
Daniel Ritz
f0eca9626c [PATCH] Update PCI IOMEM allocation start
This fixes the problem with "Averatec 6240 pcmcia_socket0: unable to
apply power", which was due to the CardBus IOMEM register region being
allocated at an address that was actually inside the RAM window that had
been reserved for video frame-buffers in an UMA setup.

The BIOS _should_ have marked that region reserved in the e820 memory
descriptor tables, but did not.

It is fixed by rounding up the default starting address of PCI memory
allocations, so that we leave a bigger gap after the final known memory
location.  The amount of rounding depends on how big the unused memory
gap is that we can allocate IOMEM from.

Based on example code by Linus.

Acked-by: Greg KH <greg@kroah.com>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 15:25:46 -07:00
Ingo Molnar
8d06afab73 [PATCH] timer initialization cleanup: DEFINE_TIMER
Clean up timer initialization by introducing DEFINE_TIMER a'la
DEFINE_SPINLOCK.  Build and boot-tested on x86.  A similar patch has been
been in the -RT tree for some time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:48 -07:00
Antonino A. Daplas
5e518d7672 [PATCH] fbdev: Resurrect hooks to get EDID from firmware
For the i386, code is already present in video.S that gets the EDID from the
video BIOS.  Make this visible so drivers can also use this data as fallback
when i2c does not work.

To ensure that the EDID block is returned for the primary graphics adapter
only, by check if the IORESOURCE_ROM_SHADOW flag is set.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:59 -07:00
Antonino A. Daplas
d2d58384fc [PATCH] vesafb: Add blanking support
Add rudimentary support by manipulating the VGA registers.  However, not
all vesa modes are VGA compatible, so VGA compatiblity is checked first.
Only 2 levels are supported, powerup and powerdown.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:58 -07:00
Andrea Arcangeli
4c7fc7220f [PATCH] i386: seccomp fix for auditing/ptrace
This is the same issue as ppc64 before, when returning to userland we
shouldn't re-compute the seccomp check or the task could be killed during
sigreturn when orig_eax is overwritten by the sigreturn syscall.  This was
found by Roland.

This was harmless from a security standpoint, but some i686 users reported
failures with auditing enabled system wide (some distro surprisingly makes
it the default) and I reproduced it too by keeping the whole workload under
strace -f.

Patch is tested and works for me under strace -f.

nobody@athlon:~/cpushare> strace -o /tmp/o -f python seccomp_test.py
make: Nothing to be done for `seccomp_test'.
Starting computing some malicious bytecode
init
load
start
stop
receive_data failure
kill
exit_code 0 signal 9
The malicious bytecode has been killed successfully by seccomp
Starting computing some safe bytecode
init
load
start
stop
174 counts
kill
exit_code 0 signal 0
The seccomp_test.py completed successfully, thank you for testing.

(akpm: collaterally cleaned up a bit of do_syscall_trace() too)

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:30 -07:00
Andrew Morton
1299232b57 [PATCH] x86: MP_processor_info fix
Remove the weird and apparently unnecessary logic in MP_processor_info() which
assumes that the BSP is the first one to run MP_processor_info().  On one of
my boxes that isn't true and cpu_possible_map gets the wrong value.

Cc: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Cc: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:56:43 -07:00
Karsten Wiese
0b968d2361 [PATCH] Fix misspelled i8259 typo in io_apic.c
The legacy PIC's name is "i8259".

Signed-off-by: Karsten Wiese <annabellesgarden@yahoo.de>
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 10:37:10 -07:00
viro@ZenIV.linux.org.uk
fc0b1af257 [PATCH] __user annotations for pointers in i386 sigframe
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 10:31:59 -07:00
Sam Ravnborg
86feeaa812 kbuild: full dependency check on asm-offsets.h
Building asm-offsets.h has been moved to a seperate Kbuild file
located in the top-level directory. This allow us to share the
functionality across the architectures.

The old rules in architecture specific Makefiles will die
in subsequent patches.

Furhtermore the usual kbuild dependency tracking is now used
when deciding to rebuild asm-offsets.s. So we no longer risk
to fail a rebuild caused by asm-offsets.c dependencies being touched.

With this common rule-set we now force the same name across
all architectures. Following patches will fix the rest.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-09-09 19:28:28 +02:00
Linus Torvalds
529980c8b0 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq 2005-09-08 17:24:34 -07:00
Michael S. Tsirkin
346d38823b [PATCH] arch/386/pci: remap_pfn_range -> io_remap_pfn_range
Convert i386/pci to use io_remap_pfn_range instead of remap_pfn_range.
This is good for Xen which reuses i386/pci/i386.c for domain 0 code.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-09-08 14:57:24 -07:00
Len Brown
64e47488c9 Merge linux-2.6 with linux-acpi-2.6 2005-09-08 01:45:47 -04:00
viro@ZenIV.linux.org.uk
a08b6b7968 [PATCH] Kconfig fix (BLK_DEV_FD dependencies)
Sanitized and fixed floppy dependencies: split the messy dependencies for
BLK_DEV_FD by introducing a new symbol (ARCH_MAY_HAVE_PC_FDC), making
BLK_DEV_FD depend on that one and taking declarations of ARCH_MAY_HAVE_PC_FDC
to arch/*/Kconfig.  While we are at it, fixed several obvious cases when
BLK_DEV_FD should have been excluded (architectures lacking asm/floppy.h
are *not* going to have floppy.c compile, let alone work).

If you can come up with better name for that ("this architecture might
have working PC-compatible floppy disk controller"), you are more than
welcome - just s/ARCH_MAY_HAVE_PC_FDC/your_prefered_name/g in the patch
below...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 17:17:12 -07:00
Keshavamurthy Anil S
deac66ae45 [PATCH] kprobes: fix bug when probed on task and isr functions
This patch fixes a race condition where in system used to hang or sometime
crash within minutes when kprobes are inserted on ISR routine and a task
routine.

The fix has been stress tested on i386, ia64, pp64 and on x86_64.  To
reproduce the problem insert kprobes on schedule() and do_IRQ() functions
and you should see hang or system crash.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:58:01 -07:00
Jim Keniston
bce0649417 [PATCH] kprobes: fix handling of simultaneous probe hit/unregister
This patch fixes a bug in kprobes's handling of a corner case on i386 and
x86_64.  On an SMP system, if one CPU unregisters a kprobe just after
another CPU hits that probepoint, kprobe_handler() on the latter CPU sees
that the kprobe has been unregistered, and attempts to let the CPU continue
as if the probepoint hadn't been hit.  The bug is that on i386 and x86_64,
we were neglecting to set the IP back to the beginning of the probed
instruction.  This could cause an oops or crash.

This bug doesn't exist on ppc64 and ia64, where a breakpoint instruction
leaves the IP pointing to the beginning of the instruction.  I don't know
about sparc64.  (Dave, could you please advise?)

This fix has been tested on i386 and x86_64 SMP systems.  To reproduce the
problem, set one CPU to work registering and unregistering a kprobe
repeatedly, and another CPU pounding the probepoint in a tight loop.

Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:58:01 -07:00
Prasanna S Panchamukhi
3d97ae5b95 [PATCH] kprobes: prevent possible race conditions i386 changes
This patch contains the i386 architecture specific changes to prevent the
possible race conditions.

Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:59 -07:00
Robert Love
640e803376 [PATCH] fix: dmi_check_system
Background:

	1) dmi_check_system() returns the count of the number of
	   matches.  Zero thus means no matches.
	2) A match callback can return nonzero to stop the match
	   checking.

Bug: The count is incremented after we check for the nonzero return value,
so it does not reflect the actual count.  We could say this is intended,
for some dumb reason, except that it means that a match on the first check
returns zero--no matches--if the callback returns nonzero.

Attached patch implements the count before calling the callback and thus
before potentially short-circuiting.

Signed-off-by: Robert Love <rml@novell.com>
Cc: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:44 -07:00
Andrey Panin
ebad6a4230 [PATCH] dmi: add onboard devices discovery
This patch adds onboard devices and IPMI BMC discovery into DMI scan code.
Drivers can use dmi_find_device() function to search for devices by type and
name.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:44 -07:00
Andrey Panin
c3c7120d55 [PATCH] dmi: make dmi_string() behave like strdup()
This patch changes dmi_string() function to allocate string copy by itself, to
avoid code duplication in the next patch.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:44 -07:00
Andrey Panin
4e70b9a3d6 [PATCH] dmi: remove old debugging code
DMI debugging code is unused for ages.  This patch removes it.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:44 -07:00
Andrey Panin
61e032fa2f [PATCH] dmi: remove uneeded function
After elimination of central DMI blacklist dmi_scan_machine() function became
a wrapper for dmi_iterate().  This patch moves some code around to kill
unneeded function.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:44 -07:00
john stultz
b149ee2233 [PATCH] NTP: ntp-helper functions
This patch cleans up a commonly repeated set of changes to the NTP state
variables by adding two helper inline functions:

ntp_clear(): Clears the ntp state variables

ntp_synced(): Returns 1 if the system is synced with a time server.

This was compile tested for alpha, arm, i386, x86-64, ppc64, s390, sparc,
sparc64.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:34 -07:00
Ravikiran G Thirumalai
6c231b7bab [PATCH] Additions to .data.read_mostly section
Mark variables which are usually accessed for reads with __readmostly.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Adrian Bunk
7f4bde9a34 [PATCH] remove the second arg of do_timer_interrupt()
The second arg of do_timer_interrupt() is not used in the functions, and
all callers pass NULL.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Paul Mundt <lethal@Linux-SH.ORG>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:32 -07:00
David Gibson
96d0821cac [PATCH] Fix function/macro name collision on i386 oprofile
The i386 OProfile code has a function named nmi_exit(), which collides with
the nmi_exit() macro in linux/hardirq.h.  At the moment, we get away with
it, because hardirq.h isn't included in the oprofile code.  I hit this as a
bug when working with a patch which (indirectly) adds a #include of
hardirq.h to oprofile.

Regardless, the name collision is probably not a good idea, so this patch
fixes it, renaming the oprofile function to op_nmi_exit().  It also renames
the nmi_init() and nmi_timer_init() functions similarly, for consistency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
H. Peter Anvin
f8eeaaf418 [PATCH] Make the bzImage format self-terminating
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Frank Sorenson <frank@tuxrocks.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
Paul E. McKenney
19306059cd [PATCH] NMI: Update NMI users of RCU to use new API
Uses of RCU for dynamically changeable NMI handlers need to use the new
rcu_dereference() and rcu_assign_pointer() facilities.  This change makes
it clear that these uses are safe from a memory-barrier viewpoint, but the
main purpose is to document exactly what operations are being protected by
RCU.  This has been tested on x86 and x86-64, which are the only
architectures affected by this change.

Signed-off-by: <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:19 -07:00
Christoph Lameter
c3d8c14145 [PATCH] More __read_mostly variables
Move some more frequently read variables that showed up during some of our
performance tests as sometimes ending up in hot cachelines to the
read_mostly section.

Fix: Move the __read_mostly from before hpet_usec_quotient to follow the
variable like the other uses of __read_mostly.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:18 -07:00
Ingo Molnar
8446f1d391 [PATCH] detect soft lockups
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.

When enabled then per-CPU watchdog threads are started, which try to run
once per second.  If they get delayed for more than 10 seconds then a
callback from the timer interrupt detects this condition and prints out a
warning message and a stack dump (once per lockup incident).  The feature
is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
only gets the debug info out, automatically, and on all CPUs affected by
the lockup.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:17 -07:00
Ashok Raj
a888cebe17 [PATCH] x86_64: create sysfs entries for cpu only for present cpus
Need to create sysfs only for cpus that are present.  Without which we see
NR_CPUS entries created when we have CONFIG_HOTPLUG and CONFIG_HOTPLUG_CPU
enabled.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:16 -07:00
Ashok Raj
54d5d42404 [PATCH] x86/x86_64: deferred handling of writes to /proc/irqxx/smp_affinity
When handling writes to /proc/irq, current code is re-programming rte
entries directly. This is not recommended and could potentially cause
chipset's to lockup, or cause missing interrupts.

CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
interrupt is pending. The same needs to be done for /proc/irq handling as well.
Otherwise user space irq balancers are really not doing the right thing.

- Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
  lack of a generic name.
- added move_irq out of IRQ_BALANCE, and added this same to X86_64
- Added new proc handler for write, so we can do deferred write at irq
  handling time.
- Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
  it now shows only active cpu masks, or exactly what was set.
- Provided a common move_irq implementation, instead of duplicating
  when using generic irq framework.

Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
Tested UP builds as well.

MSI testing: tbd: I have cards, need to look for a x-over cable, although I
did test an earlier version of this patch.  Will test in a couple days.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Zwane Mwaikambo <zwane@holomorphy.com>
Grudgingly-acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:15 -07:00
Paolo 'Blaisorblade' Giarrusso
640aa46e25 [PATCH] uml: SYSEMU: slight cleanup and speedup
As a follow-up to "UML Support - Ptrace: adds the host SYSEMU support, for
UML and general usage" (i.e.  uml-support-* in current mm).

Avoid unconditionally jumping to work_pending and code copying, just reuse
the already existing resume_userspace path.

One interesting note, from Charles P.  Wright, suggested that the API is
improvable with no downsides for UML (except that it will have to support
yet another host API, since dropping support for the current API, for UML,
is not reasonable from users' point of view).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Charles P. Wright <cwright@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:20 -07:00
Bodo Stroesser
ab1c23c244 [PATCH] SYSEMU: fix sysaudit / singlestep interaction
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>

This is simply an adjustment for "Ptrace - i386: fix Syscall Audit interaction
with singlestep" to work on top of SYSEMU patches, too.  On this patch, I have
some doubts: I wonder why we need to alter that way ptrace_disable().

I left the patch this way because it has been extensively tested, but I don't
understand the reason.

The current PTRACE_DETACH handling simply clears child->ptrace; actually this
is not enough because entry.S just looks at the thread_flags; actually,
do_syscall_trace checks current->ptrace but I don't think depending on that is
good, at least for performance, so I think the clearing is done elsewhere.
For instance, on PTRACE_CONT it's done, but doing PTRACE_DETACH without
PTRACE_CONT is possible (and happens when gdb crashes and one kills it
manually).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Roland McGrath <roland@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:20 -07:00
Bodo Stroesser
1b38f0064e [PATCH] Uml support: add PTRACE_SYSEMU_SINGLESTEP option to i386
This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which
can be used by UML to singlestep a process: it will receive SINGLESTEP
interceptions for normal instructions and syscalls, but syscall execution will
be skipped just like with PTRACE_SYSEMU.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:20 -07:00
Bodo Stroesser
c8c86cecd1 [PATCH] Uml support: reorganize PTRACE_SYSEMU support
With this patch, we change the way we handle switching from PTRACE_SYSEMU to
PTRACE_{SINGLESTEP,SYSCALL}, to free TIF_SYSCALL_EMU from double use as a
preparation for PTRACE_SYSEMU_SINGLESTEP extension, without changing the
behavior of the host kernel.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:20 -07:00
Laurent Vivier
ed75e8d580 [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage
Jeff Dike <jdike@addtoit.com>,
      Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
      Bodo Stroesser <bstroesser@fujitsu-siemens.com>

Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
except that the kernel does not execute the requested syscall; this is useful
to improve performance for virtual environments, like UML, which want to run
the syscall on their own.

In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
and on exit, and each time you also have two context switches; with SYSEMU you
avoid the 2nd stop and so save two context switches per syscall.

Also, some architectures don't have support in the host for changing the
syscall number via ptrace(), which is currently needed to skip syscall
execution (UML turns any syscall into getpid() to avoid it being executed on
the host).  Fixing that is hard, while SYSEMU is easier to implement.

* This version of the patch includes some suggestions of Jeff Dike to avoid
  adding any instructions to the syscall fast path, plus some other little
  changes, by myself, to make it work even when the syscall is executed with
  SYSENTER (but I'm unsure about them). It has been widely tested for quite a
  lot of time.

* Various fixed were included to handle the various switches between
  various states, i.e. when for instance a syscall entry is traced with one of
  PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
  Basically, this is done by remembering which one of them was used even after
  the call to ptrace_notify().

* We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
  to make do_syscall_trace() notice that the current syscall was started with
  SYSEMU on entry, so that no notification ought to be done in the exit path;
  this is a bit of a hack, so this problem is solved in another way in next
  patches.

* Also, the effects of the patch:
"Ptrace - i386: fix Syscall Audit interaction with singlestep"
are cancelled; they are restored back in the last patch of this series.

Detailed descriptions of the patches doing this kind of processing follow (but
I've already summed everything up).

* Fix behaviour when changing interception kind #1.

  In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
  only after doing the debugger notification; but the debugger might have
  changed the status of this flag because he continued execution with
  PTRACE_SYSCALL, so this is wrong.  This patch fixes it by saving the flag
  status before calling ptrace_notify().

* Fix behaviour when changing interception kind #2:
  avoid intercepting syscall on return when using SYSCALL again.

  A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
  crashes.

  The problem is in arch/i386/kernel/entry.S.  The current SYSEMU patch
  inhibits the syscall-handler to be called, but does not prevent
  do_syscall_trace() to be called after this for syscall completion
  interception.

  The appended patch fixes this.  It reuses the flag TIF_SYSCALL_EMU to
  remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
  the flag is unused in the depicted situation.

* Fix behaviour when changing interception kind #3:
  avoid intercepting syscall on return when using SINGLESTEP.

  When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
  problems with singlestepping on UML in SKAS with SYSEMU.  It looped
  receiving SIGTRAPs without moving forward.  EIP of the traced process was
  the same for all SIGTRAPs.

What's missing is to handle switching from PTRACE_SYSCALL_EMU to
PTRACE_SINGLESTEP in a way very similar to what is done for the change from
PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.

I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
notified and then wake ups the process; the syscall is executed (or skipped,
when do_syscall_trace() returns 0, i.e.  when using PTRACE_SYSEMU), and
do_syscall_trace() is called again.  Since we are on the return path of a
SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
we must still avoid notifying the parent of the syscall exit.  Now, this
behaviour is extended even to resuming with PTRACE_SINGLESTEP.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:20 -07:00
Bodo Stroesser
94c80b2598 [PATCH] Ptrace/i386: fix "syscall audit" interaction with singlestep
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>

Avoid giving two traps for singlestep instead of one, when syscall auditing is
enabled.

In fact no singlestep trap is sent on syscall entry, only on syscall exit, as
can be seen in entry.S:

# Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<<
        testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp)
        jnz syscall_trace_entry
	...
syscall_trace_entry:
	...
	call do_syscall_trace

But auditing a SINGLESTEP'ed process causes do_syscall_trace to be called, so
the tracer will get one more trap on the syscall entry path, which it
shouldn't.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Roland McGrath <roland@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:19 -07:00
Shaohua Li
c3c433e4f3 [PATCH] add suspend/resume for timer
The timers lack .suspend/.resume methods.  Because of this, jiffies got a
big compensation after a S3 resume.  And then softlockup watchdog reports
an oops.  This occured with HPET enabled, but it's also possible for other
timers.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:18 -07:00
Pavel Machek
829ca9a30a [PATCH] swsusp: fix remaining u32 vs. pm_message_t confusion
Fix remaining bits of u32 vs.  pm_message confusion.  Should not break
anything.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:15 -07:00
Pierre Ossman
795312e763 [PATCH] ISA DMA suspend for i386
Reset the ISA DMA controller into a known state after a suspend.  Primary
concern was reenabling the cascading DMA channel (4).

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:14 -07:00
Benjamin LaHaise
52fdd08903 [PATCH] unify x86/x86-64 semaphore code
This patch moves the common code in x86 and x86-64's semaphore.c into a
single file in lib/semaphore-sleepers.c.  The arch specific asm stubs are
left in the arch tree (in semaphore.c for i386 and in the asm for x86-64).
There should be no changes in code/functionality with this patch.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:14 -07:00
Zwane Mwaikambo
4ad8d38342 [PATCH] i386 boottime for_each_cpu broken
for_each_cpu walks through all processors in cpu_possible_map, which is
defined as cpu_callout_map on i386 and isn't initialised until all
processors have been booted. This breaks things which do for_each_cpu
iterations early during boot. So, define cpu_possible_map as a bitmap with
NR_CPUS bits populated. This was triggered by a patch i'm working on which
does alloc_percpu before bringing up secondary processors.

From: Alexander Nyberg <alexn@telia.com>

i386-boottime-for_each_cpu-broken.patch
i386-boottime-for_each_cpu-broken-fix.patch

The SMP version of __alloc_percpu checks the cpu_possible_map before
allocating memory for a certain cpu.  With the above patches the BSP cpuid
is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
machines (as soon as someone tries to dereference something allocated via
__alloc_percpu, which in fact is never allocated since the cpu is not set
in cpu_possible_map).

Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:13 -07:00
Zachary Amsden
d7271b14b2 [PATCH] i386: encapsulate copying of pgd entries
Add a clone operation for pgd updates.

This helps complete the encapsulation of updates to page tables (or pages
about to become page tables) into accessor functions rather than using
memcpy() to duplicate them.  This is both generally good for consistency
and also necessary for running in a hypervisor which requires explicit
updates to page table entries.

The new function is:

clone_pgd_range(pgd_t *dst, pgd_t *src, int count);

   dst - pointer to pgd range anwhere on a pgd page
   src - ""
   count - the number of pgds to copy.

   dst and src can be on the same page, but the range must not overlap
   and must not cross a page boundary.

Note that I ommitted using this call to copy pgd entries into the
software suspend page root, since this is not technically a live paging
structure, rather it is used on resume from suspend.  CC'ing Pavel in case
he has any feedback on this.

Thanks to Chris Wright for noticing that this could be more optimal in
PAE compiles by eliminating the memset.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:13 -07:00
George Anzinger
748f2edb52 [PATCH] x86 NMI: better support for debuggers
This patch adds a notify to the die_nmi notify that the system is about to
be taken down.  If the notify is handled with a NOTIFY_STOP return, the
system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and b) possibly
allow the system to continue, realizing that the time out may be due to
debugger activities such as single stepping which is usually done with
"other" cpus held.

Signed-off-by: George Anzinger<george@mvista.com>
Cc: Keith Owens <kaos@ocs.com.au>
Signed-off-by: George Anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:13 -07:00
Zachary Amsden
f2f30ebca6 [PATCH] x86: introduce a write acessor for updating the current LDT
Introduce a write acessor for updating the current LDT.  This is required
for hypervisors like Xen that do not allow LDT pages to be directly
written.

Testing - here's a fun little LDT test that can be trivially modified to
test limits as well.

/*
 * Copyright (c) 2005, Zachary Amsden (zach@vmware.com)
 * This is licensed under the GPL.
 */

#include <stdio.h>
#include <signal.h>
#include <asm/ldt.h>
#include <asm/segment.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/mman.h>
#define __KERNEL__
#include <asm/page.h>

void main(void)
{
        struct user_desc desc;
        char *code;
        unsigned long long tsc;

        code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE,
                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        desc.entry_number = 0;
        desc.base_addr = code;
        desc.limit = 1;
        desc.seg_32bit = 1;
        desc.contents = MODIFY_LDT_CONTENTS_CODE;
        desc.read_exec_only = 0;
        desc.limit_in_pages = 1;
        desc.seg_not_present = 0;
        desc.useable = 1;
        if (modify_ldt(1, &desc, sizeof(desc)) != 0) {
                perror("modify_ldt");
        }
        printf("code base is 0x%08x\n", (unsigned)code);
        code[0x0ffe] = 0x0f;  /* rdtsc */
        code[0x0fff] = 0x31;
        code[0x1000] = 0xcb;  /* lret */
        __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc));
        printf("TSC is 0x%016llx\n", tsc);
}

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:13 -07:00
Zachary Amsden
e9f86e351f [PATCH] x86: remove redundant TSS clearing
When reviewing GDT updates, I found the code:

	set_tss_desc(cpu,t);	/* This just modifies memory; ... */
        per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff;

This second line is unnecessary, since set_tss_desc() has already cleared
the busy bit.

Commented disassembly, line 1:

c028b8bd:       8b 0c 86                mov    (%esi,%eax,4),%ecx
c028b8c0:       01 cb                   add    %ecx,%ebx
c028b8c2:       8d 0c 39                lea    (%ecx,%edi,1),%ecx

  => %ecx = per_cpu(cpu_gdt_table, cpu)

c028b8c5:       8d 91 80 00 00 00       lea    0x80(%ecx),%edx

  => %edx = &per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS]

c028b8cb:       66 c7 42 00 73 20       movw   $0x2073,0x0(%edx)
c028b8d1:       66 89 5a 02             mov    %bx,0x2(%edx)
c028b8d5:       c1 cb 10                ror    $0x10,%ebx
c028b8d8:       88 5a 04                mov    %bl,0x4(%edx)
c028b8db:       c6 42 05 89             movb   $0x89,0x5(%edx)

  => ((char *)%edx)[5] = 0x89
  (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] = 0x89

c028b8df:       c6 42 06 00             movb   $0x0,0x6(%edx)
c028b8e3:       88 7a 07                mov    %bh,0x7(%edx)
c028b8e6:       c1 cb 10                ror    $0x10,%ebx

  => other bits

Commented disassembly, line 2:

c028b8e9:       8b 14 86                mov    (%esi,%eax,4),%edx
c028b8ec:       8d 04 3a                lea    (%edx,%edi,1),%eax

  => %eax = per_cpu(cpu_gdt_table, cpu)

c028b8ef:       81 a0 84 00 00 00 ff    andl   $0xfffffdff,0x84(%eax)

  => per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff;
  (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] &= 0xfd

Note that (0x89 & ~0xfd) == 0; i.e, set_tss_desc(cpu,t) has already stored
the type field in the GDT with the busy bit clear.

Eliminating redundant and obscure code is always a good thing; in fact, I
pointed out this same optimization many moons ago in arch/i386/setup.c,
back when it used to be called that.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:13 -07:00
Zachary Amsden
a520112930 [PATCH] x86: make IOPL explicit
The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
explicit in C code is more clear.  This pushf/popf pair was added as a
bugfix for leaking IOPL to unprivileged processes when using
sysenter/sysexit based system calls (sysexit does not restore flags).

When requesting an IOPL change in sys_iopl(), it is just as easy to change
the current flags and the flags in the stack image (in case an IRET is
required), but there is no reason to force an IRET if we came in from the
SYSENTER path.

This change is the minimal solution for supporting a paravirtualized Linux
kernel that allows user processes to run with I/O privilege.  Other
solutions require radical rewrites of part of the low level fault / system
call handling code, or do not fully support sysenter based system calls.

Unfortunately, this added one field to the thread_struct.  But as a bonus,
on P4, the fastest time measured for switch_to() went from 312 to 260
cycles, a win of about 17% in the fast case through this performance
critical path.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:12 -07:00
Zachary Amsden
0998e4228a [PATCH] x86: privilege cleanup
Privilege checking cleanup.  Originally, these diffs were much greater, but
recent cleanups in Linux have already done much of the cleanup.  I added
some explanatory comments in places where the reasoning behind certain
tests is rather subtle.

Also, in traps.c, we can skip the user_mode check in handle_BUG().  The
reason is, there are only two call chains - one via die_if_kernel() and one
via do_page_fault(), both entering from die().  Both of these paths already
ensure that a kernel mode failure has happened.  Also, the original check
here, if (user_mode(regs)) was insufficient anyways, since it would not
rule out BUG faults from V8086 mode execution.

Saving the %ss segment in show_regs() rather than assuming a fixed value
also gives better information about the current kernel state in the
register dump.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:12 -07:00
Zachary Amsden
f2ab446124 [PATCH] x86: more asm cleanups
Some more assembler cleanups I noticed along the way.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:12 -07:00
Zachary Amsden
c9b02a2413 [PATCH] i386: use set_pte macros in a couple places where they were missing
Also, setting PDPEs in PAE mode does not require atomic operations, since the
PDPEs are cached by the processor, and only reloaded on an explicit or
implicit reload of CR3.

Since the four PDPEs must always be present in an active root, and the kernel
PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using
task gates (which reload CR3).  Actually, much of this is moot, since the user
PDPEs are never updated either, and the only usage of task gates is by the
doublefault handler.  It appears the only place PGDs get updated in PAE mode
is in init_low_mappings() / zap_low_mapping() for initial page table creation
and recovery from ACPI sleep state, and these sites are safe by inspection.
Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on
P4.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:12 -07:00
Zachary Amsden
e7a2ff593c [PATCH] i386: load_tls() fix
Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid
creating non-reversible segments.  This could conceivably cause a bug if the
kernel ever needed to save and restore fs/gs from the NMI handler.  It
currently does not, but this is the safest approach to avoiding fs/gs
corruption.  SMIs are safe, since SMI saves the descriptor hidden state.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:11 -07:00
Zachary Amsden
4d37e7e3fd [PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management
i386 inline assembler cleanup.

This change encapsulates descriptor and task register management.  Also,
it is possible to improve assembler generation in two cases; savesegment
may store the value in a register instead of a memory location, which
allows GCC to optimize stack variables into registers, and MOV MEM, SEG
is always a 16-bit write to memory, making the casting in math-emu
unnecessary.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:11 -07:00
Zachary Amsden
245067d167 [PATCH] i386: cleanup serialize msr
i386 arch cleanup.  Introduce the serialize macro to serialize processor
state.  Why the microcode update needs it I am not quite sure, since wrmsr()
is already a serializing instruction, but it is a microcode update, so I will
keep the semantic the same, since this could be a timing workaround.  As far
as I can tell, this has always been there since the original microcode update
source.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:11 -07:00
Zachary Amsden
4bb0d3ec3e [PATCH] i386: inline asm cleanup
i386 Inline asm cleanup.  Use cr/dr accessor functions.

Also, a potential bugfix.  Also, some CR accessors really should be volatile.
Reads from CR0 (numeric state may change in an exception handler), writes to
CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
re-ordering.  I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
was not there to begin with, and in no case should kernel memory be clobbered,
except when doing a TLB flush, which already has memory clobber.

I noticed that page invalidation does not have a memory clobber.  I can't find
a bug as a result, but there is definitely a potential for a bug here:

#define __flush_tlb_single(addr) \
	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:11 -07:00
Roland McGrath
2a0694d15d [PATCH] i386: clean up vDSO alignment padding
This makes the vDSO use nops for all its padding around instructions,
rather than sometimes zeros, and nop-pads the end of the area containing
instructions to a 32-byte cache line, to keep text and data in separate
lines.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:10 -07:00
Natalie.Protasevich@unisys.com
56f1d5d52a [PATCH] ES7000 platform update (i386)
This is subarch update for ES7000.  I've modified platform check code and
removed unnecessary OEM table parsing for newer systems that don't use OEM
information during boot.  Parsing the table in fact is causing problems,
and the platform doesn't get recognized.  The patch only affects the ES7000
subach.

Signed-off-by: <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:10 -07:00
Venkatesh Pallipadi
252943efcf [PATCH] x86: Add the check for all the cores in a package in cache information
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:10 -07:00
Venkatesh Pallipadi
911a62d423 [PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUs
i386 generic subarchitecture requires explicit dmi strings or command line
to enable bigsmp mode.  The patch below removes that restriction, and uses
bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
xAPIC support.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:10 -07:00
Vivek Goyal
484b90c4b9 [PATCH] kdump: Save parameter segment in protected mode (x86)
o With introduction of kexec as boot-loader, the assumption that parameter
  segment will always be loaded at lower address than kernel and will be
  addressable by early bootup page tables is no longer valid. In kexec on
  panic case parameter segment might well be loaded beyond kernel image and
  might not be addressable by early boot page tables.
o This case might hit in the scenario where user has reserved a chunk of
  memory for second kernel, for example 16MB to 64MB, and has also built
  second kernel for physical memory location 16MB. In this case kexec has no
  choice but to load the parameter segment at a higher address than new kernel
  image at safe location where new kernel does not stomp it.
o Though problem should automatically go away once relocatable kernel for i386
  is in place and kexec can determine the location of new kernel at run time
  and load parameter segment at lower address than kernel image. But till then
  this patch can go in (assuming it does not break something else).
o This patch moves up the boot parameter saving code. Now boot parameters
  are copied out in protected mode before page tables are initialized. This
  will ensure that parameter segment is always addressable irrespective of
  its physical location.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:09 -07:00
Petr Tesarik
5fd75ebb1a [PATCH] vm86: Honor TF bit when emulating an instruction
If the virtual 86 machine reaches an instruction which raises a General
Protection Fault (such as CLI or STI), the instruction is emulated (in
handle_vm86_fault).  However, the emulation ignored the TF bit, so the
hardware debug interrupt was not invoked after such an emulated instruction
(and the DOS debugger missed it).

This patch fixes the problem by emulating the hardware debug interrupt as
the last action before control is returned to the VM86 program.

Signed-off-by: Petr Tesarik <kernel@tesarici.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:09 -07:00
Matt Tolentino
7ae65fd334 [PATCH] x86: fix EFI memory map parsing
The memory descriptors that comprise the EFI memory map are not fixed in
stone such that the size could change in the future.  This uses the memory
descriptor size obtained from EFI to iterate over the memory map entries
during boot.  This enables the removal of an x86 specific pad (and ifdef)
in the EFI header.  I also couldn't stomach the broken up nature of the
function to put EFI runtime calls into virtual mode any longer so I fixed
that up a bit as well.

For reference, this patch only impacts x86.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:09 -07:00
Venkatesh Pallipadi
4116c527ea [PATCH] hpet: use read_timer_tsc only when CPU has TSC
Only use read_timer_tsc only when CPU has TSC.  Thanks to Andrea for
pointing this out.  Should not be issue on any platforms as all recent
systems that has HPET also has CPUs that supports TSC.  The patch is still
required for correctness.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:09 -07:00
Ingo Molnar
869f96a00e [PATCH] x86: compress the stack layout of do_page_fault()
This patch pushes the creation of a rare signal frame (SIGBUS or SIGSEGV)
into a separate function, thus saving stackspace in the main
do_page_fault() stackframe.  The effect is 132 bytes less of stack used by
the typical do_page_fault() invocation - resulting in a denser
cache-layout.

(Another minor effect is that in case of kernel crashes that come from a
pagefault, we add less space to the already existing frame, giving the
crash functions a slightly higher chance to do their stuff without
overflowing the stack.)

(The changes also result in slightly cleaner code.)

argument bugfix from "Guillaume C." <guichaz@gmail.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:06:09 -07:00
Chen, Kenneth W
0e5c9f39f6 [PATCH] remove hugetlb_clean_stale_pgtable() and fix huge_pte_alloc()
I don't think we need to call hugetlb_clean_stale_pgtable() anymore
in 2.6.13 because of the rework with free_pgtables().  It now collect
all the pte page at the time of munmap.  It used to only collect page
table pages when entire one pgd can be freed and left with staled pte
pages.  Not anymore with 2.6.13.  This function will never be called
and We should turn it into a BUG_ON.

I also spotted two problems here, not Adam's fault :-)
(1) in huge_pte_alloc(), it looks like a bug to me that pud is not
    checked before calling pmd_alloc()
(2) in hugetlb_clean_stale_pgtable(), it also missed a call to
    pmd_free_tlb.  I think a tlb flush is required to flush the mapping
    for the page table itself when we clear out the pmd pointing to a
    pte page.  However, since hugetlb_clean_stale_pgtable() is never
    called, so it won't trigger the bug.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:46 -07:00
Adam Litke
02b0ccef90 [PATCH] hugetlb: check p?d_present in huge_pte_offset()
For demand faulting, we cannot assume that the page tables will be
populated.  Do what the rest of the architectures do and test p?d_present()
while walking down the page table.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:46 -07:00
Adam Litke
7bf07f3d4b [PATCH] hugetlb: move stale pte check into huge_pte_alloc()
Initial Post (Wed, 17 Aug 2005)

This patch moves the
	if (! pte_none(*pte))
		hugetlb_clean_stale_pgtable(pte);
logic into huge_pte_alloc() so all of its callers can be immune to the bug
described by Kenneth Chen at http://lkml.org/lkml/2004/6/16/246

> It turns out there is a bug in hugetlb_prefault(): with 3 level page table,
> huge_pte_alloc() might return a pmd that points to a PTE page. It happens
> if the virtual address for hugetlb mmap is recycled from previously used
> normal page mmap. free_pgtables() might not scrub the pmd entry on
> munmap and hugetlb_prefault skips on any pmd presence regardless what type
> it is.

Unless I am missing something, it seems more correct to place the check inside
huge_pte_alloc() to prevent a the same bug wherever a huge pte is allocated.
It also allows checking for this condition when lazily faulting huge pages
later in the series.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:46 -07:00
Bob Picco
3e347261a8 [PATCH] sparsemem extreme implementation
With cleanups from Dave Hansen <haveblue@us.ibm.com>

SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
mem_sections.  This two level layout scheme is able to achieve smaller
memory requirements for SPARSEMEM with the tradeoff of an additional shift
and load when fetching the memory section.  The current SPARSEMEM
implementation is a one dimensional array of mem_sections which is the
default SPARSEMEM configuration.  The patch attempts isolates the
implementation details of the physical layout of the sparsemem section
array.

SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
memory_present() calls.  This is not always feasible, so architectures
which do not need it may allocate everything statically by using
SPARSEMEM_STATIC.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:38 -07:00
Len Brown
129521dcc9 Merge linux-2.6 into linux-acpi-2.6 test 2005-09-03 02:44:09 -04:00
Dave Jones
52c18fd2dc [CPUFREQ] Remove trailing whitespace before \n's in printks.
From: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-09-01 11:01:02 -07:00
Mika Kukkonen
ce38b51edf [CPUFREQ] Remove extra arg from dprintk in cpufreq/speedstep-smi.c
Minor fallout from my upcoming __attribute__((format(printf,x,y)))
patches. The variable 'result' is untouched, so this patch just removes
it.

Signed-off-by: Mika Kukkonen <mikukkon@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-08-31 22:21:29 -07:00
Mika Kukkonen
123411f2d0 [CPUFREQ] dprintf format fixes in cpufreq/speedstep-centrino.c
Ho-hum, did not notice there was more printf fixes for cpufreq (you
should see the amount I have for isdn and reiser ...). Sorry for noise.

Signed-off-by: Mika Kukkonen <mikukkon@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-08-31 22:21:29 -07:00
Venkatesh Pallipadi
f914be79ab [CPUFREQ] speedstep-centrino: skip extract_clock logic for acpi based centrino
speedstep_centrino.c:extract_clock() assumes the bus speed of 100MHz, which is
not true with latest laptops. Due to this assumption and due to the encoded
frequency check during initialization, speedstep-centrino driver fails even
on systems that has proper ACPI information to do the P-state transition.

The change below moves the centrino-speedstep detection to be used only
when table based P-state transition is done. For ACPI based P-state
transition, we skip the centrino_cpu identification, and as a result we
don't use the bus speed assumption in extract_clock. This change makes
speedstep-centrino work on Pentium-M based systems, which have more than 100MHz
bus speed.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-08-31 22:21:29 -07:00
Ivan Kokshaysky
81d4af1340 [PATCH] x86: pci_assign_unassigned_resources() update
I had some time to think about PCI assign issues in 2.6.13-rc series.

The major problem here is that we call pci_assign_unassigned_resources()
way too early - at subsys_initcall level. Therefore we give no chances
to ACPI and PnP routines (called at fs_initcall level) to reserve their
respective resources properly, as the comments in drivers/pnp/system.c
and drivers/acpi/motherboard.c suggest:

 /**
  * Reserve motherboard resources after PCI claim BARs,
  * but before PCI assign resources for uninitialized PCI devices
  */

So I moved the pci_assign_unassigned_resources() call to
pcibios_assign_resources() (fs_initcall), which should hopefully fix a
lot of problems and make PCIBIOS_MIN_IO tweaks unnecessary.

Other changes:
- remove resource assignment code from pcibios_assign_resources(), since
  it duplicates pci_assign_unassigned_resources() functionality and
  actually does nothing in 2.6.13;
- modify ROM assignment code as per Ben's suggestion: try to use firmware
  settings by default (if PCI_ASSIGN_ROMS is not set);
- set CARDBUS_IO_SIZE back to 4K as it's a wonderful stress test for
  various setups.

Confirmed by Tero Roponen <teanropo@cc.jyu.fi> (who had problems with
the 4kB CardBus IO size previously).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-30 11:14:48 -07:00
Len Brown
27a639a92d Auto-update from upstream 2005-08-29 17:02:17 -04:00
Steven Rostedt
69be8f1896 [PATCH] convert signal handling of NODEFER to act like other Unix boxes.
It has been reported that the way Linux handles NODEFER for signals is
not consistent with the way other Unix boxes handle it.  I've written a
program to test the behavior of how this flag affects signals and had
several reports from people who ran this on various Unix boxes,
confirming that Linux seems to be unique on the way this is handled.

The way NODEFER affects signals on other Unix boxes is as follows:

1) If NODEFER is set, other signals in sa_mask are still blocked.

2) If NODEFER is set and the signal is in sa_mask, then the signal is
still blocked. (Note: this is the behavior of all tested but Linux _and_
NetBSD 2.0 *).

The way NODEFER affects signals on Linux:

1) If NODEFER is set, other signals are _not_ blocked regardless of
sa_mask (Even NetBSD doesn't do this).

2) If NODEFER is set and the signal is in sa_mask, then the signal being
handled is not blocked.

The patch converts signal handling in all current Linux architectures to
the way most Unix boxes work.

Unix boxes that were tested:  DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.

* NetBSD was the only other Unix to behave like Linux on point #2. The
main concern was brought up by point #1 which even NetBSD isn't like
Linux.  So with this patch, we leave NetBSD as the lonely one that
behaves differently here with #2.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-29 10:03:11 -07:00
Venkatesh Pallipadi
d395bf12d1 [ACPI] Reduce acpi-cpufreq switching latency by 50%
The acpi-cpufreq driver does a P-state get after a P-state set
to verify whether set went through successfully. This test
is kind of redundant as set goes throught most of the times,
and the test is also expensive as a get of P-states can
take a lot of time (same as a set operation) as it goes
through SMM mode. Effectively, we are doubling the P-state
latency due to this get opertion.

momdule parameter "acpi_pstate_strict" restores orginal paranoia.

http://bugzilla.kernel.org/show_bug.cgi?id=5129

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-26 22:22:33 -04:00
Len Brown
09d4a80e66 Merge HEAD from ../from-linus 2005-08-25 12:45:49 -04:00
Len Brown
6153df7b2f [ACPI] delete CONFIG_ACPI_PCI
Delete the ability to build an ACPI kernel that does
not include PCI support.  When such a machine is created
and it requires a tuned kernel, send a patch.

http://bugzilla.kernel.org/show_bug.cgi?id=1364

Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-25 12:40:44 -04:00
Len Brown
76f5858482 [ACPI] delete CONFIG_ACPI_BUS
it is a synonym for CONFIG_ACPI

Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-24 12:11:34 -04:00
Len Brown
8466361ad5 [ACPI] delete CONFIG_ACPI_INTERPRETER
it is a synonym for CONFIG_ACPI

Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-24 12:10:43 -04:00
Len Brown
888ba6c62b [ACPI] delete CONFIG_ACPI_BOOT
it has been a synonym for CONFIG_ACPI since 2.6.12

Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-24 12:08:54 -04:00
Chuck Ebbert
b1daec3089 [PATCH] i386: fix incorrect FP signal code
i386 floating-point exception handling has a bug that can cause error
code 0 to be sent instead of the proper code during signal delivery.

This is caused by unconditionally checking the IS and c1 bits from the
FPU status word when they are not always relevant.  The IS bit tells
whether an exception is a stack fault and is only relevant when the
exception is IE (invalid operation.) The C1 bit determines whether a
stack fault is overflow or underflow and is only relevant when IS and IE
are set.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-23 19:52:37 -07:00
Len Brown
84ffa74752 Merge from-linus to-akpm 2005-08-23 22:12:23 -04:00
Steven Rostedt
cd3716ab40 [PATCH] Mobil Pentium 4 HT and the NMI
I'm trying to get the nmi working with my laptop (IBM ThinkPad G41) and after
debugging it a while, I found that the nmi code doesn't want to set it up for
this particular CPU.

Here I have:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz
stepping        : 1
cpu MHz         : 3320.084
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl est tm2 cid xtpr
bogomips        : 6642.39

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz
stepping        : 1
cpu MHz         : 3320.084
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl est tm2 cid xtpr
bogomips        : 6637.46

And the following code shows:

$ cat linux-2.6.13-rc6/arch/i386/kernel/nmi.c

[...]

void setup_apic_nmi_watchdog (void)
{
        switch (boot_cpu_data.x86_vendor) {
        case X86_VENDOR_AMD:
                if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
                        return;
                setup_k7_watchdog();
                break;
        case X86_VENDOR_INTEL:
                 switch (boot_cpu_data.x86) {
                case 6:
                        if (boot_cpu_data.x86_model > 0xd)
                                return;

                        setup_p6_watchdog();
                        break;
                case 15:
                        if (boot_cpu_data.x86_model > 0x3)
                                return;

Here I get boot_cpu_data.x86_model == 0x4.  So I decided to change it and
reboot.  I now seem to have a working NMI.  So, unless there's something know
to be bad about this processor and the NMI.  I'm submitting the following
patch.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Acked-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-19 18:44:56 -07:00
Andi Kleen
6be382ea0c [PATCH] x86: Remove obsolete get_cpu_vendor call
Since early CPU identify is in this information is already available

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-18 12:53:59 -07:00
Len Brown
95f193aa4f Merge ../to-linus 2005-08-11 00:56:08 -04:00
Ravikiran G Thirumalai
4b0271eb9d [PATCH] Move the fix to align node_end_pfns to a proper location
Move the fix to align node_end_pfns to a proper location.  The earlier fix
made the node_remap_start_vaddr to get misaligned causing remap_numa_kva to
barf again :-/

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-07 10:00:39 -07:00
Tom Duffy
46bdac9938 [PATCH] visws: linkage fix
This patch add stubs to allow the visws subarch to link again.

Signed-off-by: Tom Duffy <thomas.duffy.99@alumni.brown.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-07 10:00:38 -07:00
Eric W. Biederman
36cf446c2c [PATCH] i386 visws: Add machine_shutdown and emergency_restart
Another x86 subarchitecture bit I missed.  This adds both
machine_emergency_restart missed in my reboot fixes and
machine_shutdown needed for kexec support.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-06 12:54:57 -07:00
Eric W. Biederman
094528a7fb [PATCH] i386 voyager: Add machine_shutdown
Here is one more bit of breakage my x86 sub-architecture
confusion caused.

Add machine_shutdown to voyager so it will compile with CONFIG_KEXEC.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-06 12:54:57 -07:00
James Bottomley
e6cb99413d [PATCH] fix voyager compile after machine_emergency_restart breakage
[PATCH] i386: Implement machine_emergency_reboot

introduced this new function into arch/i386/reboot.c.  However,
subarchitectures are entitled to implement their own copies of reboot.c
from which this new function is now missing.

It looks like visws will also need a similar fixup

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-05 12:22:37 -07:00
Len Brown
1f3a730117 [ACPI] Lindent created a syntax error that broke the build
Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-05 03:52:45 -04:00
Len Brown
c202ac9fbd Merge ../to-linus 2005-08-05 00:49:06 -04:00
Len Brown
4be44fcd3b [ACPI] Lindent all ACPI files
Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-05 00:45:14 -04:00
Len Brown
1d492eb413 [ACPI] Merge acpi-2.6.12 branch into 2.6.13-rc3
Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-05 00:31:42 -04:00
Kenji Kaneshige
1f3a6a1577 [ACPI] acpi_register_gsi() can return error
Current acpi_register_gsi() function has no way to indicate errors to its
callers even though acpi_register_gsi() can fail to register gsi because of
some reasons (out of memory, lack of interrupt vectors, incorrect BIOS, and so
on).  As a result, caller of acpi_register_gsi() cannot handle the case that
acpi_register_gsi() fails.  I think failure of acpi_register_gsi() should be
handled properly.

This series of patches changes acpi_register_gsi() to return negative value on
error, and also changes callers of acpi_register_gsi() to handle failure of
acpi_register_gsi().

This patch changes the type of return value of acpi_register_gsi() from
"unsigned int" to "int" to indicate an error.  If acpi_register_gsi() fails to
register gsi, it returns negative value.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-08-04 22:12:08 -04:00
Venkatesh Pallipadi
c91096d85c [PATCH] remove special HPET_EMULATE_RTC config option
We had a user whose apps weren't working correctly because his "rtc" wasn't
working fully.

For the sake of simplicity, it seems sensible to always enable HPET RTC
emulation.

Remove a special config option for HPET_EMULATE_RTC and make it directly
depend on HPET_TIMER and RTC. This will avoid the hangs when EMULATE_RTC
is not configured and when some userlevel script depends on RTC interrupt,
as in:

http://bugzilla.kernel.org/show_bug.cgi?id=4904

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 16:27:58 -07:00
Andrew Morton
39bbb07d7c [PATCH] transmeta: CONFIG_PROC_FS=n build fix
Fix bug found by Grant Coady <lkml@dodo.com.au>'s autobuild setup.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01 21:38:00 -07:00
Eric Lammerts
cdf32eaa4e [PATCH] disable addres space randomization default on transmeta CPUs
We know that the randomisation slows down some workloads on Transmeta CPUs
by quite large amounts.  We think it's because the CPU needs to recode the
same x86 instructions when they pop up at a different virtual address after
a fork+exec.

So disable randomization by default on those CPUs.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01 19:13:59 -07:00
Ingo Molnar
6cb54819d7 [PATCH] remove sys_set_zone_reclaim()
This removes sys_set_zone_reclaim() for now.  While i'm sure Martin is
trying to solve a real problem, we must not hard-code an incomplete and
insufficient approach into a syscall, because syscalls are pretty much
for eternity.  I am quite strongly convinced that this syscall must not
hit v2.6.13 in its current form.

Firstly, the syscall lacks basic syscall design: e.g. it allows the
global setting of VM policy for unprivileged users. (!) [ Imagine an
Oracle installation and a SAP installation on the same NUMA box fighting
over the 'optimal' setting for this flag. What will they do? Will they
try to set the flag to their own preferred value every second or so? ]

Secondly, it was added based on a single datapoint from Martin:

 http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2

where Martin characterizes the numbers the following way:

 ' Run-to-run variability for "make -j" is huge, so these numbers aren't
   terribly useful except to see that with reclaim the benchmark still
   finishes in a reasonable amount of time. '

in other words: the fundamental problem has likely not been solved, only
a tendential move into the right direction has been observed, and a
handful of numbers were picked out of a set of hugely variable results,
without showing the variability data. How much variance is there
run-to-run?

I'd really suggest to first walk the walk and see what's needed to get
stable & predictable kernel compilation numbers on that NUMA box, before
adding random syscalls to tune a particular aspect of the VM ... which
approach might not even matter once the whole picture has been analyzed
and understood!

The third, most important point is that the syscall exposes VM tuning
internals in a completely unstructured way. What sense does it make to
have a _GLOBAL_ per-node setting for 'should we go to another node for
reclaim'? If then it might make sense to do this per-app, via numalib or
so.

The change is minimalistic in that it doesnt remove the syscall and the
underlying infrastructure changes, only the user-visible changes.  We
could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
/proc/sys/vm/swappiness, but even that looks quite counterproductive
when the generic approach is that we are trying to reduce the number of
external factors in the VM balance picture.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01 10:03:56 -07:00
Len Brown
adbedd3424 merge 2.6.13-rc4 with ACPI's to-linus tree 2005-07-30 01:55:32 -04:00
Len Brown
d6ac1a7910 /home/lenb/src/to-linus branch 'acpi-2.6.12' 2005-07-29 23:31:17 -04:00
David Shaohua Li
87bec66b96 [ACPI] suspend/resume ACPI PCI Interrupt Links
Add reference count and disable ACPI PCI Interrupt Link
when no device still uses it.

Warn when drivers have not released Link at suspend time.

http://bugzilla.kernel.org/show_bug.cgi?id=3469

Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-29 22:49:38 -04:00
Dominik Brodowski
4b31e77455 [ACPI] Always set P-state on initialization
Otherwise a platform that supports ACPI based cpufreq
and boots up at lowest possible speed could stay there
forever.  This because the governor may request max speed,
but the code doesn't update if there is no change in
speed, and it assumed the initial state of max speed.

http://bugzilla.kernel.org/show_bug.cgi?id=4634

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-29 18:29:47 -04:00
Natalie.Protasevich@unisys.com
e1afc3f522 [PATCH] x86: avoid wasting IRQs patch update
The patch addresses a problem with ACPI SCI interrupt entry, which gets
re-used, and the IRQ is assigned to another unrelated device.  The patch
corrects the code such that SCI IRQ is skipped and duplicate entry is
avoided.  Second issue came up with VIA chipset, the problem was caused by
original patch assigning IRQs starting 16 and up.  The VIA chipset uses
4-bit IRQ register for internal interrupt routing, and therefore cannot
handle IRQ numbers assigned to its devices.  The patch corrects this
problem by allowing PCI IRQs below 16.

Signed-off by: Natalie Protasevich <Natalie.Protasevich@unisys.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29 15:01:13 -07:00
Ravikiran G Thirumalai
94d2ac66c1 [PATCH] mm: Ensure proper alignment for node_remap_start_pfn
While reserving KVA for lmem_maps of node, we have to make sure that
node_remap_start_pfn[] is aligned to a proper pmd boundary.
(node_remap_start_pfn[] gets its value from node_end_pfn[])

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29 15:01:13 -07:00
Linus Torvalds
590f47a1d9 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq 2005-07-29 14:40:08 -07:00
Dave Jones
094ce7fde4 arch/i386/kernel/cpu/cpufreq/powernow-k8.c: In function `powernow_k8_cpu_init_acpi':
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:740: warning: unused variable `vid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:739: warning: unused variable `fid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:743: warning: unused variable `vid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:742: warning: unused variable `fid'
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:746: `fid' undeclared (first use in this function)
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:746: `vid' undeclared (first use in this function)

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-29 12:55:40 -07:00
Eric W. Biederman
e7b47ccaf6 [PATCH] i386 machine_kexec: Cleanup inline assembly
For some reason I was telling my inline assembly that the
input argument was an output argument.

Playing in the trampoline code I have seen a couple of
instances where lgdt get the wrong size (because the
trampolines run in 16bit mode) so use lgdtl and lidtl to
be explicit.

Additionally gcc-3.3 and gcc-3.4 want's an lvalue for a
memory argument and it doesn't think an array of characters
is an lvalue so use a packed structure instead.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29 12:17:26 -07:00
Dave Jones
2bcad935a3 Fix up powernow-k8 compile. (Missing definitions).
From: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-29 09:56:41 -07:00
Linus Torvalds
33ac02aa4c Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq 2005-07-29 10:16:25 -07:00
Dave Hansen
e1474e2d9d [PATCH] re-disable TSC on NUMAQ
Somewhere recently, the TSC got re-enabled for timekeeping on NUMAQ
machines.  However, the hardware makes these get unsynchronized quite
badly.  So badly, in fact, that the code to fix up the skew can just hang
on boot.

This patch re-disables them.  It's nicely confined to the numaq.c file.  It
would be great if this could make it into 2.6.13, I think it counts as a
bugfix.

Tested on a 16-proc 4-node NUMAQ.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:05 -07:00
Andi Kleen
e2cac78935 [PATCH] x86_64: When running cpuid4 need to run on the correct CPU
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:01 -07:00
Dave Jones
7153d9612f powernow-k8.c: In function `query_current_values_with_pending_wait':
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-07-28 09:45:10 -07:00
Dave Jones
841e40b380 Opteron revision F will support higher frequencies than
can be encoded in the current driver's 4 bit frequency
field.  This patch updates the driver to support Rev F
including 6 bit FIDs and processor ID updates.

This should apply cleanly whether or not the dual-core
bugfix I sent out last week is applied.  I'd prefer
that both get applied, of course.

Signed-off-by: David Keck <david.keck@amd.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-28 09:40:04 -07:00
Dave Jones
03938c3f10 powernow-k8 requires that a data structure for
each core be created in the _cpu_init function
call.  The cpufreq infrastructure doesn't call
_cpu_init for the second core in each processor.
Some systems crashed when _get was called with
an odd-numbered core because it tried to
dereference a NULL pointer since the data
structure had not been created.

The attached patch solves the problem by
initializing data structures for all shared
cores in the _cpu_init function.  It should
apply to 2.6.12-rc6 and has been tested by
AMD and Sun.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-28 09:38:21 -07:00
Giancarlo Formicuccia
ac12259f29 [PATCH] Fix incorrect Asus k7m irq router detection
This patch:
http://marc.theaimsgroup.com/?l=bk-commits-head&m=111955644929114&w=2
uncovered a k7m bios bug, where the VT82C686A router is reported as
being "586-compatible". The two chips have different pirq mapping, so
this leads to "irq routing conflict" on many pci devices.

The suggested fix was discussed with Aleksey Gorelov, who helped me
to identify the problem as a probable bios bug.

Signed-off-by: Giancarlo Formicuccia <giancarlo.formicuccia@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 08:39:01 -07:00
Blaisorblade
71ae18ec69 [PATCH] sys_get_thread_area does not clear the returned argument
sys_get_thread_area does not memset to 0 its struct user_desc info before
copying it to user space...  since sizeof(struct user_desc) is 16 while the
actual datas which are filled are only 12 bytes + 9 bits (across the
bitfields), there is a (small) information leak.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:08 -07:00
Adrian Bunk
dab175f393 [PATCH] i386: add missing Kconfig help text
There's no help text for CONFIG_DEBUG_STACKOVERFLOW - add one.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:25:58 -07:00
Brian Gerst
7657e20e46 [PATCH] Fix warning in powernow-k8.c
powernow-k8.c: In function `query_current_values_with_pending_wait':
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:25:54 -07:00
Eric W. Biederman
910de55c66 [PATCH] APM: Remove redundant call to set_cpus_allowed
machine_power_off now always switches to the boot cpu so there
is no reason for APM to also do that.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:45 -07:00
Eric W. Biederman
4fa2564a6f [PATCH] i386 machine_power_off cleanup
Call machine_shutdown() to move to the boot cpu
and disable apics.  Both acpi_power_off and
apm_power_off want to move to the boot cpu.
and we are already disabling the local apics
so calling machine_shutdown simply reuses
code.

ia64 doesn't have a special path in power_off
for efi so there is no reason i386 should.  If
we really need to call the efi power off path
the efi driver can set pm_power_off like everyone
else.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:44 -07:00
Eric W. Biederman
d8e392e7c8 [PATCH] machine_shutdown: Typo fix to actually allow specifying which cpu to reboot on
This appears to be a typo I introduced when cleaning
this code up earlier. Ooops.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:44 -07:00
Eric W. Biederman
4a1421f81b [PATCH] i386: Implement machine_emergency_reboot
set_cpus_allowed is not safe in interrupt context
and disabling apics is complicated code so don't
call machine_shutdown on i386 from emergency_restart().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:42 -07:00
Eric W. Biederman
59586e5a26 [PATCH] Don't export machine_restart, machine_halt, or machine_power_off.
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with.  Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:42 -07:00
Linus Torvalds
72538d8565 Remove "noreplacement" kernel command line option.
It is no longer valid to not replace instructions, since we depend on
different behaviour depending on CPU capabilities.

If you need to limit the capabilities of the replacements (because the
boot CPU has features that non-boot CPU's do not have, for example), you
need to explicitly disable those capabilities that are not shared across
all CPU's.

For example, if your boot CPU has FXSR, but other CPU's in your system
do not, you need to use the "nofxsr" kernel command line, not disable
instruction replacement per se.
2005-07-22 18:29:40 -04:00
Linus Torvalds
8ed1383fb7 x86: make restore_fpu() use alternative assembler instructions
It's really just a single instruction, conditional on whether the CPU
supports FXSR or not, so implement it as such instead of making it a
function that queries FXSR dynamically.

This means that the instruction just gets automatically rewritten to the
correct one at boot-time.
2005-07-22 16:06:16 -04:00
Linus Torvalds
b339a18b81 Fix up incorrect "unlikely()" on %gs reload in x86 __switch_to
These days %gs is normally the TLS segment, so it's no longer zero.  As
a result, we shouldn't just assume that %fs/%gs tend to be zero
together, but test them independently instead.

Also, fix setting of debug registers to use the "next" pointer instead
of "current".  It so happens that the scheduler will have set the new
current pointer before calling __switch_to(), but that's just an
implementation detail.
2005-07-22 15:23:47 -04:00
Alexey Dobriyan
35e422c967 [PATCH] visws: reexport pm_power_off
More fallout from the i386_ksyms.c cleanups.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-15 09:54:51 -07:00
James Bottomley
153f805781 [PATCH] fix voyager subarchitecture EXPORT_SYMBOL breakage caused by i386_ksym reduction
This patch:

	[PATCH] Remove i386_ksyms.c, almost

made files like smp.c do their own EXPORT_SYMBOLS.  This means that all
subarchitectures that override these symbols now have to do the exports
themselves.  This patch adds the exports for voyager (which is the most
affected since it has a separate smp harness).  However, someone should
audit all the other subarchitectures to see if any others got broken.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:07:54 -07:00
Robert Love
0eeca28300 [PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

        * dnotify requires the opening of one fd per each directory
          that you intend to watch. This quickly results in too many
          open files and pins removable media, preventing unmount.
        * dnotify is directory-based. You only learn about changes to
          directories. Sure, a change to a file in a directory affects
          the directory, but you are then forced to keep a cache of
          stat structures.
        * dnotify's interface to user-space is awful.  Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

        * inotify's interface is a system call that returns a fd, not SIGIO.
	  You get a single fd, which is select()-able.
        * inotify has an event that says "the filesystem that the item
          you were watching is on was unmounted."
        * inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.

See Documentation/filesystems/inotify.txt.

Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 20:38:38 -07:00
Len Brown
5028770a42 [ACPI] merge acpi-2.6.12 branch into latest Linux 2.6.13-rc...
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 17:21:56 -04:00
Venkatesh Pallipadi
02df8b9385 [ACPI] enable C2 and C3 idle power states on SMP
http://bugzilla.kernel.org/show_bug.cgi?id=4401

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 00:14:36 -04:00
Nickolai Zeldovich
9d9437759e [ACPI] S3 resume -- use lgdtl, not lgdt
From: Nickolai Zeldovich <kolya@MIT.EDU>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 00:04:31 -04:00
Sam Ravnborg
d5950b4355 [NET]: add a top-level Networking menu to *config
Create a new top-level menu named "Networking" thus moving
net related options and protocol selection way from the drivers
menu and up on the top-level where they belong.

To implement this all architectures has to source "net/Kconfig" before
drivers/*/Kconfig in their Kconfig file. This change has been
implemented for all architectures.

Device drivers for ordinary NIC's are still to be found
in the Device Drivers section, but Bluetooth, IrDA and ax25
are located with their corresponding menu entries under the new
networking menu item.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:03:49 -07:00
David Shaohua Li
c9c3e457de [ACPI] PNPACPI vs sound IRQ
http://bugme.osdl.org/show_bug.cgi?id=4016

Written-by: David Shaohua Li <shaohua.li@intel.com>
Acked-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 00:03:30 -04:00
Christoph Lameter
6c036527a6 [PATCH] mostly_read data section
Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.

If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated.  In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.

The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:46 -07:00
Shaohua Li
3b520b238e [PATCH] MTRR suspend/resume cleanup
There has been some discuss about solving the SMP MTRR suspend/resume
breakage, but I didn't find a patch for it.  This is an intent for it.  The
basic idea is moving mtrr initializing into cpu_identify for all APs (so it
works for cpu hotplug).  For BP, restore_processor_state is responsible for
restoring MTRR.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:42 -07:00
Andrew Morton
c23a4e9649 [PATCH] iounmap debugging
We get sporadic reports of `__iounmap: bad address' coming out.  Add a
dump_stack() to find the culprit.

Try to identify which subsystem is having iounmap() problems.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:35 -07:00
Rusty Lynch
6772926bef [PATCH] kprobes: fix namespace problem and sparc64 build
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.

Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05 19:19:00 -07:00
Greg Kroah-Hartman
7586585897 [PATCH] PCI: clean up dynamic pci id logic
The dynamic pci id logic has been bothering me for a while, and now that
I started to look into how to move some of this to the driver core, I
thought it was time to clean it all up.

It ends up making the code smaller, and easier to follow, and fixes a
few bugs at the same time (dynamic ids were not being matched
everywhere, and so could be missed on some call paths for new devices,
semaphore not needed to be grabbed when adding a new id and calling the
driver core, etc.)

I also renamed the function pci_match_device() to pci_match_id() as
that's what it really does.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-07-01 13:35:50 -07:00
Ivan Kokshaysky
299de0343c [PATCH] PCI: pci_assign_unassigned_resources() on x86
- Add sanity check for io[port,mem]_resource in setup-bus.c. These
  resources look like "free" as they have no parents, but obviously
  we must not touch them.
- In i386.c:pci_allocate_bus_resources(), if a bridge resource cannot be
  allocated for some reason, then clear its flags. This prevents any child
  allocations in this range, so the setup-bus code will work with a clean
  resource sub-tree.
- i386.c:pcibios_enable_resources() doesn't enable bridges, as it checks
  only resources 0-5, which looks like a clear bug to me. I suspect it
  might break hotplug as well in some cases.

From: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-07-01 13:35:50 -07:00
Ingo Molnar
306e440daf [PATCH] x86: i8253/i8259A lock cleanup
Introduce proper declarations for i8253_lock and i8259A_lock.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-30 08:45:10 -07:00
KAMBAROV, ZAUR
a8f5034540 [PATCH] coverity: i386: build.c: negative return to unsigned fix
Variable "c" was declared as an unsigned int, but used in:

125  		for (i=0 ; (c=read(fd, buf, sizeof(buf)))>0 ; i+=c )
126  			if (write(1, buf, c) != c)
127  				die("Write call failed");

(akpm: read() can return -1.  If it does, we fill the disk up with garbage).

Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 21:20:33 -07:00
Greg KH
8644d2a42b Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2005-06-27 22:07:56 -07:00
Greg Kroah-Hartman
d57e26ceb7 [PATCH] PCI: use the MCFG table to properly access pci devices (i386)
Now that we have access to the whole MCFG table, let's properly use it
for all pci device accesses (as that's what it is there for, some boxes
don't put all the busses into one entry.)

If, for some reason, the table is incorrect, we fallback to the "old
style" of mmconfig accesses, namely, we just assume the first entry in
the table is the one for us, and blindly use it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:47 -07:00
Greg Kroah-Hartman
545493917d [PATCH] PCI: add proper MCFG table parsing to ACPI core.
This patch is the first step in properly handling the MCFG PCI table.
It defines the structures properly, and saves off the table so that the
pci mmconfig code can access it.  It moves the parsing of the table a
little later in the boot process, but still before the information is
needed.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:47 -07:00
Kenji Kaneshige
b1bb248a5d [PATCH] ACPI based I/O APIC hot-plug: add interfaces
This patch adds the following new interfaces for I/O xAPIC
hotplug. The implementation of these interfaces depends on each
architecture.

    o int acpi_register_ioapic(acpi_handle handle, u64 phys_addr,
			       u32 gsi_base);

        This new interface is to add a new I/O xAPIC specified by
        phys_addr and gsi_base pair. phys_addr is the physical address
        to which the I/O xAPIC is mapped and gsi_base is global system
        interrupt base of the I/O xAPIC. acpi_register_ioapic returns
        0 on success, or negative value on error.

    o int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base);

        This new interface is to remove a I/O xAPIC specified by
        gsi_base. acpi_unregister_ioapic returns 0 on success, or
        negative value on error.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:44 -07:00
Rajesh Shah
c431ada45d [PATCH] acpi bridge hotadd: ACPI based root bridge hot-add
When you hot-plug a (root) bridge hierarchy, it may have p2p bridges and
devices attached to it that have not been configured by firmware.  In this
case, we need to configure the devices before starting them.  This patch
separates device start from device scan so that we can introduce the
configuration step in the middle.

I kept the existing semantics for pci_scan_bus() since there are a huge number
of callers to that function.

Also, I have no way of testing the changes I made to the parisc files, so this
needs review by those folks.  Sorry for the massive cross-post, this touches
files in many different places.

Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:39 -07:00
jayalk@intworks.biz
120bb4246a [PATCH] PCI Allow OutOfRange PIRQ table address
I updated this to remove unnecessary variable initialization, make
check_routing be inline only and not __init, switch to strtoul, and
formatting fixes as per Randy Dunlap's recommendations.

I updated this to change pirq_table_addr to a long, and to add a warning
msg if the PIRQ table wasn't found at the specified address, as per thread
with Matthew Wilcox.

In our hardware situation, the BIOS is unable to store or generate it's PIRQ
table in the F0000h-100000h standard range. This patch adds a pci kernel
parameter, pirqaddr to allow the bootloader (or BIOS based loader) to inform
the kernel where the PIRQ table got stored. A beneficial side-effect is that,
if one's BIOS uses a static address each time for it's PIRQ table, then
pirqaddr can be used to avoid the $pirq search through that address block each
time at boot for normal PIRQ BIOSes.

Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-27 21:52:38 -07:00
Rusty Lynch
4bdbd37f6d [PATCH] Return probe redesign: i386 specific changes
The following patch contains the i386 specific changes for the new
return probe design.  Changes include:

 * Removing the architecture specific functions for querying a return probe
   instance off a stack address
 * Complete rework onf arch_prepare_kretprobe() and trampoline_probe_handler()
 * Removing trampoline_post_handler()
 * Adding arch_init() so that now we handle registering the return probe
   trampoline instead of kernel/kprobes.c doing it

Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:23:53 -07:00
Andrea Arcangeli
ffaa8bd6c9 [PATCH] seccomp: tsc disable
I believe at least for seccomp it's worth to turn off the tsc, not just for
HT but for the L2 cache too.  So it's up to you, either you turn it off
completely (which isn't very nice IMHO) or I recommend to apply this below
patch.

This has been tested successfully on x86-64 against current cogito
repository (i686 compiles so I didn't bother testing ;).  People selling
the cpu through cpushare may appreciate this bit for a peace of mind.

There's no way to get any timing info anymore with this applied
(gettimeofday is forbidden of course).  The seccomp environment is
completely deterministic so it can't be allowed to get timing info, it has
to be deterministic so in the future I can enable a computing mode that
does a parallel computing for each task with server side transparent
checkpointing and verification that the output is the same from all the 2/3
seller computers for each task, without the buyer even noticing (for now
the verification is left to the buyer client side and there's no
checkpointing, since that would require more kernel changes to track the
dirty bits but it'll be easy to extend once the basic mode is finished).

Eliminating a cold-cache read of the cr4 global variable will save one
cacheline during the tlb flush while making the code per-cpu-safe at the
same time.  Thanks to Mikael Pettersson for noticing the tlb flush wasn't
per-cpu-safe.

The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
it'll be transparent to the switch_to code since the IPI won't make any
change to the cr4 contents from the point of view of the interrupted code
and since it's now all per-cpu stuff, it will not race.  So no need to
disable irqs in switch_to slow path.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:11:44 -07:00
Jens Axboe
22e2c507c3 [PATCH] Update cfq io scheduler to time sliced design
This updates the CFQ io scheduler to the new time sliced design (cfq
v3).  It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes.  It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls.  The latter closely mimic
set/getpriority.

This import is based on my latest from -mm.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 14:33:29 -07:00
Linus Torvalds
6f0dcb72d6 Fix up try_to_freeze() usage in arch/i386/kernel/signal.c
The parentheses were missing. Noted by Pavel Machek.
2005-06-25 20:09:12 -07:00
Linus Torvalds
2031d0f586 Merge Christoph's freeze cleanup patch 2005-06-25 17:16:53 -07:00
Christoph Lameter
3e1d1d28d9 [PATCH] Cleanup patch for process freezing
1. Establish a simple API for process freezing defined in linux/include/sched.h:

   frozen(process)		Check for frozen process
   freezing(process)		Check if a process is being frozen
   freeze(process)		Tell a process to freeze (go to refrigerator)
   thaw_process(process)	Restart process
   frozen_process(process)	Process is frozen now

2. Remove all references to PF_FREEZE and PF_FROZEN from all
   kernel sources except sched.h

3. Fix numerous locations where try_to_freeze is manually done by a driver

4. Remove the argument that is no longer necessary from two function calls.

5. Some whitespace cleanup

6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
   cleared before setting PF_FROZEN, recalc_sigpending does not check
   PF_FROZEN).

This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 17:10:13 -07:00
Christophe Lucas
f90e7185ee [PATCH] printk: arch/i386/mm/pgtable.c
printk() calls should include appropriate KERN_* constant.

Signed-off-by: Christophe Lucas <clucas@rotomalug.org>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:08 -07:00
Christophe Lucas
ee48dd5799 [PATCH] printk: arch/i386/mm/ioremap.c
printk() calls should include appropriate KERN_* constant.

Signed-off-by: Christophe Lucas <clucas@rotomalug.org>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:08 -07:00
Adrian Bunk
9e84d1c36a [PATCH] i386: cleanup boot_cpu_logical_apicid variables
There are currently two different boot_cpu_logical_apicid variables:
- a global one in mpparse.c
- a static one in smpboot.c

Of these two, only the one in smpboot.c might be used (through
boot_cpu_apicid).

This patch therefore removes the one in mpparse.c .

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:05 -07:00
Domen Puncer
f45494480f [PATCH] x86_64: coding style and whitespace fixups
Remove some of the unnecessary differences between arch/i386 and
arch/x86_64.  This patch fixes more whitespace issues, some miscellaneous
typos, a wrong URL and a factually incorrect statement about the current
boot sector code.

Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:02 -07:00
Jesper Juhl
4ae6673e02 [PATCH] get rid of redundant NULL checks before kfree() in arch/i386/
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:25:00 -07:00
Domen Puncer
40086ea17e [PATCH] arch/i386/crypto/aes.c: fix sparse warnings
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:59 -07:00
Domen Puncer
c7c5844526 [PATCH] arch/i386/mm/fault.c: fix sparse warnings
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:59 -07:00
Domen Puncer
77617bd806 [PATCH] arch/i386/kernel/apm.c: fix sparse warnings
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:58 -07:00
Domen Puncer
3f3ae3471f [PATCH] arch/i386/kernel/traps.c: fix sparse warnings
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:58 -07:00
Maneesh Soni
72414d3f1d [PATCH] kexec code cleanup
o Following patch provides purely cosmetic changes and corrects CodingStyle
  guide lines related certain issues like below in kexec related files

  o braces for one line "if" statements, "for" loops,
  o more than 80 column wide lines,
  o No space after "while", "for" and "switch" key words

o Changes:
  o take-2: Removed the extra tab before "case" key words.
  o take-3: Put operator at the end of line and space before "*/"

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:55 -07:00
Alexander Nyberg
4f339ecb30 [PATCH] kdump: Save trap information for later analysis
If we are faulting in kernel it is quite possible this will lead to a
panic.  Save trap number, cr2 (in case of page fault) and error_code in the
current thread (these fields already exist for signal delivery but are not
used here).

This helps later kdump crash analyzing from user-space (a script has been
submitted to dig this info out in gdb).

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Cc: <fastboot@lists.osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:55 -07:00
Alexander Nyberg
6e274d1443 [PATCH] kdump: Use real pt_regs from exception
Makes kexec_crashdump() take a pt_regs * as an argument.  This allows to
get exact register state at the point of the crash.  If we come from direct
panic assertion NULL will be passed and the current registers saved before
crashdump.

This hooks into two places:
die(): check the conditions under which we will panic when calling
do_exit and go there directly with the pt_regs that caused the fatal
fault.

die_nmi(): If we receive an NMI lockup while in the kernel use the
pt_regs and go directly to crash_kexec(). We're probably nested up badly
at this point so this might be the only chance to escape with proper
information.

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:54 -07:00
Vivek Goyal
2030eae52b [PATCH] Retrieve elfcorehdr address from command line
This patch adds support for retrieving the address of elf core header if one
is passed in command line.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:53 -07:00
Vivek Goyal
60e64d46a5 [PATCH] kdump: Routines for copying dump pages
This patch provides the interfaces necessary to read the dump contents,
treating it as a high memory device.

Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:53 -07:00
Vivek Goyal
5f016456c9 [PATCH] kdump: Kconfig
- config option CONFIG_CRASH_DUMP

- Made it dependent on HIGHMEM.  This is required as capture kernel treats
  the previous kernel's memory as high memmory and stitches a PTE for
  accessing it.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:53 -07:00
Vivek Goyal
92aa63a5a1 [PATCH] kdump: Retrieve saved max pfn
This patch retrieves the max_pfn being used by previous kernel and stores it
in a safe location (saved_max_pfn) before it is overwritten due to user
defined memory map.  This pfn is used to make sure that user does not try to
read the physical memory beyond saved_max_pfn.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:52 -07:00
Vivek Goyal
a3ea8ac846 [PATCH] Kexec: Kexec on panic fix with nmi watchdog enabled
o Problem: Kexec on panic hangs if first kernel is booted with nmi_watchdog
  command line parameter. This problem occurs because kexec crash shutdown
  code replaces the NMI callback handler. This handler saves the cpu register
  states and halts the cpu. If system is booted with nmi_watchdog parameter,
  then crashing cpu also runs this nmi handler and halts itself.

o This patch fixes the problem by keeping a track of crashing cpu and not
  executing the new nmi handler on crashing cpu.

o There is a dependence on smp_processor_id() function which might return
  insane value for cpu, if cpu field of thread_info is corrupted.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:52 -07:00
Vivek Goyal
4d55476c3f [PATCH] kdump: NMI handler segment selector, stack pointer fix
CPU does not save ss and esp on stack if execution was already in kernel mode
at the time of NMI occurrence.  This leads to saving of erractic values for ss
and esp.  This patch fixes the issue.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:52 -07:00
Vivek Goyal
625f1c8219 [PATCH] Kdump: Export crash notes section address through sysfs
o Following patch exports kexec global variable "crash_notes" to user space
  through sysfs as kernel attribute in /sys/kernel.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:51 -07:00
Eric W. Biederman
1bc3b91aee [PATCH] crashdump: x86 crashkernel option
This is the x86 implementation of the crashkernel option.  It reserves a
window of memory very early in the bootup process, so we never use it for
anything but the kernel to switch to when the running kernel panics.

In addition to reserving this memory a resource structure is registered so
looking at /proc/iomem it is clear what happened to that memory.

ISSUES:
Is it possible to implement this in a architecture generic way?
What should be done with architectures that always use an iommu and
thus don't report their RAM memory resources in /proc/iomem?

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:50 -07:00
Eric W. Biederman
63d30298ef [PATCH] kexec: x86 shutdown APICs during crash_shutdown
In the case of a crash/panic an architecture specific function
machine_crash_shutdown is called.  This patch adds to the x86 machine_crash
function the standard kernel code for shutting down apics.

Every line of code added to that function increases the risk that we will call
code after a kernel panic that is not safe.

This patch should not make it to the stable kernel without a being reviewed a
lot more.  It is unclear how much a hardned kernel can take when it comes to
misconfigured apics.  So since a normal kernel has problems this patch does a
clean shutdown.

It is my expectation this patch will be dropped from future generations of the
kexec work.  But for the moment it is a crutch to keep from breaking
everything.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:50 -07:00
Eric W. Biederman
2c818b45a2 [PATCH] kexec: x86: snapshot registers during crash shutdown
After the kernel panics if we wish to generate an entire machine core file it
is very nice to know the register state at the time the machine crashed.

After long discussion it was realized that if you are going to be saving the
information anyway it is reasonable to store the information in a format that
it will be used and recognized in so the register state is stored in the
standard ELF note format.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Eric W. Biederman
c4ac4263a0 [PATCH] crashdump: x86: add NMI handler to capture other CPUs
One of the dangers when switching from one kernel to another is what happens
to all of the other cpus that were running in the crashed kernel.  In an
attempt to avoid that problem this patch adds a nmi handler and attempts to
shoot down the other cpus by sending them non maskable interrupts.

The code then waits for 1 second or until all known cpus have stopped running
and then jumps from the running kernel that has crashed to the kernel in
reserved memory.

The kernel spin loop is used for the delay as that should behave continue to
be safe even in after a crash.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Eric W. Biederman
5033cba087 [PATCH] kexec: x86 kexec core
This is the i386 implementation of kexec.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Eric W. Biederman
dd2a13054f [PATCH] kexec: x86: factor out apic shutdown code
Factor out the apic and smp shutdown code from machine_restart so it can be
called by in the kexec reboot path as well.

By switching to the bootstrap cpu by default on reboot I can delete/simplify
some motherboard fixups well.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:49 -07:00
Vivek Goyal
8a9190853c [PATCH] kexec: reserve Bootmem fix for booting nondefault location kernel
This patch fixes a problem with reserving memory during boot up of a kernel
built for non-default location.  Currently boot memory allocator reserves
the memory required by kernel image, boot allocaotor bitmap etc.  It
assumes that kernel is loaded at 1MB (HIGH_MEMORY hard coded to 1024*1024).
 But kernel can be built for non-default locatoin, hence existing
hardcoding will lead to reserving unnecessary memory.  This patch fixes it.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:48 -07:00
Eric W. Biederman
3d345e3fc9 [PATCH] kexec: x86: add CONFIG_PYSICAL_START
For one kernel to report a crash another kernel has created we need
to have 2 kernels loaded simultaneously in memory.  To accomplish this
the two kernels need to built to run at different physical addresses.

This patch adds the CONFIG_PHYSICAL_START option to the x86 kernel
so we can do just that.  You need to know what you are doing and
the ramifications are before changing this value, and most users
won't care so I have made it depend on CONFIG_EMBEDDED

bzImage kernels will work and run at a different address when compiled
with this option but they will still load at 1MB.  If you need a kernel
loaded at a different address as well you need to boot a vmlinux.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:48 -07:00
Eric W. Biederman
ad0d75ebac [PATCH] kexec: x86: vmlinux: fix physical addresses
The vmlinux on i386 does not report the correct physical address of
the kernel.  Instead in the physical address field it currently
reports the virtual address of the kernel.

This is patch is a bug fix that corrects vmlinux to report the
proper physical addresses.

This is potentially a help for crash dump analysis tools.

This definitiely allows bootloaders that load vmlinux as a standard
ELF executable.  Bootloaders directly loading vmlinux become of
practical importance when we consider the kexec on panic case.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:47 -07:00
Eric W. Biederman
650927ef8a [PATCH] kexec: x86: resture apic virtual wire mode on shutdown
When coming out of apic mode attempt to set the appropriate
apic back into virtual wire mode.  This improves on previous versions
of this patch by by never setting bot the local apic and the ioapic
into veritual wire mode.

This code looks at data from the mptable to see if an ioapic has
an ExtInt input to make this decision.  A future improvement
is to figure out which apic or ioapic was in virtual wire mode
at boot time and to remember it.  That is potentially a more accurate
method, of selecting which apic to place in virutal wire mode.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:47 -07:00
Eric W. Biederman
cee5dab485 [PATCH] kexec: x86: i8259 shutdown: disable interrupts
From: Eric W. Biederman <ebiederm@xmission.com>

This patch disables interrupt generation from the legacy pic on reboot.  Now
that there is a sys_device class it should not be called while drivers are
still using interrupts.

There is a report about this breaking ACPI power off on some systems.
http://bugme.osdl.org/show_bug.cgi?id=4041
However the final comment seems to exonerate this code.  So until
I get more information I believe that was a false positive.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:46 -07:00
Eric W. Biederman
9635b47d91 [PATCH] kexec: x86: local apic fix
From: "Maciej W. Rozycki" <macro@linux-mips.org>

Fix a kexec problem whcih causes local APIC detection failure.

The problem is detect_init_APIC() is called early, before the command line
have been processed.  Therefore "lapic" (and "nolapic") have not been seen,
yet.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:46 -07:00
Ingo Molnar
cc19ca86a0 [PATCH] consolidate PREEMPT options into kernel/Kconfig.preempt
This patch consolidates the CONFIG_PREEMPT and CONFIG_PREEMPT_BKL
preemption options into kernel/Kconfig.preempt.  This, besides reducing
source-code, also enables more centralized tweaking of preemption related
options.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:45 -07:00
Pavel Machek
8d783b3e02 [PATCH] swsusp: clean assembly parts
This patch fixes register saving so that each register is only saved once,
and adds missing saving of %cr8 on x86-64.  Some reordering so that
save/restore is more logical/safer (segment registers should be restored
after gdt).

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:33 -07:00
Pavel Machek
648be31881 [PATCH] swsusp: kill config_pm_disk
CONFIG_PM_DISK is long gone, but it still managed to survived at few
places.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:32 -07:00
Li Shaohua
5a72e04df5 [PATCH] suspend/resume SMP support
Using CPU hotplug to support suspend/resume SMP.  Both S3 and S4 use
disable/enable_nonboot_cpus API.  The S4 part is based on Pavel's original S4
SMP patch.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:32 -07:00
Ashok Raj
76e4f660d9 [PATCH] x86_64: CPU hotplug support
Experimental CPU hotplug patch for x86_64
  -----------------------------------------
This supports logical CPU online and offline.
- Test with maxcpus=1, and then kick other cpu's off to test if init code
  is all cleaned up. CONFIG_SCHED_SMT works as well.
- idle threads are forked on demand from keventd threads for clean startup

TBD:
1. Not tested on a real NUMA machine (tested with numa=fake=2)
2. Handle ACPI pieces for physical hotplug support.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Shaohua.li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:30 -07:00
Li Shaohua
e1367daf3e [PATCH] cpu state clean after hot remove
Clean CPU states in order to reuse smp boot code for CPU hotplug.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:30 -07:00
Li Shaohua
0bb3184df5 [PATCH] init call cleanup
Trival patch for CPU hotplug.  In CPU identify part, only did cleaup for intel
CPUs.  Need do for other CPUs if they support S3 SMP.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:30 -07:00
Li Shaohua
d720803a93 [PATCH] sibling map initializing rework
Make sibling map init per-cpu.  Hotplug CPU may change the map at runtime.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Li Shaohua
6fe940d6c3 [PATCH] sep initializing rework
Make SEP init per-cpu, so it is hotplug safe.

Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Ashok Raj
67664c8f7e [PATCH] i386: Dont use IPI broadcast when using cpu hotplug.
This patch introduces a startup parameter no_broadcast.  When we enable
CONFIG_HOTPLUG_CPU, we dont want to use broadcast shortcut as it has ill
effects on a offline cpu.  If we issue broadcast, the IPI is also delivered
to offline cpus, or partially up cpu causing stale IPI's to be handled,
which is a problem and can cause undesirable effects.

Introduces a new startup cmdline option no_ipi_broadcast, that can be
switched at cmdline if necessary.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Zwane Mwaikambo
f370513640 [PATCH] i386 CPU hotplug
(The i386 CPU hotplug patch provides infrastructure for some work which Pavel
is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
<shaohua.li@intel.com> is doing)

The following provides i386 architecture support for safely unregistering and
registering processors during runtime, updated for the current -mm tree.  In
order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
cpu_online check in do_IRQ() by modifying fixup_irqs().  The difference being
that on cpu offline, fixup_irqs() is called before we clear the cpu from
cpu_online_map and a long delay in order to ensure that we never have any
queued external interrupts on the APICs.  There are additional changes to s390
and ppc64 to account for this change.

1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.

Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:29 -07:00
Shaohua Li
d92de65cab [PATCH] variable overflow after hundreds round of hotplug CPU
I'm doing the cpu hotplug stress test and found a variable ('ready') is
overflow after several hundreds rounds of cpu hotplug.  Here is a fix.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Shaohua Li
a13db56624 [PATCH] CPU hotplug: fix hpet sectioning
With hpet enabled, cpu hotplug uses some routines marked with __init.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
1249c5138f [PATCH] dmi: spring cleanup
Whitespace and CodingStyle cleanup. No functionality changes.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
b625883f24 [PATCH] dmi: remove central blacklist
Since last dmi quirk looks useless (it just prints 404 compliant url) we can
finally remove central dmi blacklist.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
0f8133a8db [PATCH] dmi: move ACPI sleep quirk
This patch moves ACPI sleep quirk out of dmi_scan.c

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:28 -07:00
Andrey Panin
aea00143a8 [PATCH] dmi: move ACPI boot quirk
This patch moves ACPI boot quirks out of dmi_scan.c

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:27 -07:00
Dmitry Torokhov
e70c9d5e61 [PATCH] I8K: use standard DMI interface
I8K: Change to use stock dmi infrastructure instead of homegrown
     parsing code. The driver now requires box's DMI data to match
     list of supported models so driver can be safely compiled-in
     by default without fear of it poking into random SMM BIOS
     code. DMI checks can be ignored with i8k.ignore_dmi option.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:24 -07:00
Prasanna S Panchamukhi
417c8da651 [PATCH] kprobes: Temporary disarming of reentrant probe for i386
This patch includes i386 architecture specific changes to support temporary
disarming on reentrancy of probes.

Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:24 -07:00
Hien Nguyen
0aa55e4d7d [PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task
This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().

Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Rusty Lynch
7e1048b11c [PATCH] Move kprobe [dis]arming into arch specific code
The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time.  The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address.  This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:

*p->addr = BREAKPOINT_INSTRUCTION;

The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:

     * void arch_arm_kprobe(struct kprobe *p)
     * void arch_disarm_kprobe(struct kprobe *p)

and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).

I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So...  I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.

So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.

Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Hien Nguyen
b94cce926b [PATCH] kprobes: function-return probes
This patch adds function-return probes to kprobes for the i386
architecture.  This enables you to establish a handler to be run when a
function returns.

1. API

Two new functions are added to kprobes:

	int register_kretprobe(struct kretprobe *rp);
	void unregister_kretprobe(struct kretprobe *rp);

2. Registration and unregistration

2.1 Register

  To register a function-return probe, the user populates the following
  fields in a kretprobe object and calls register_kretprobe() with the
  kretprobe address as an argument:

  kp.addr - the function's address

  handler - this function is run after the ret instruction executes, but
  before control returns to the return address in the caller.

  maxactive - The maximum number of instances of the probed function that
  can be active concurrently.  For example, if the function is non-
  recursive and is called with a spinlock or mutex held, maxactive = 1
  should be enough.  If the function is non-recursive and can never
  relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
  be enough.  maxactive is used to determine how many kretprobe_instance
  objects to allocate for this particular probed function.  If maxactive <=
  0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
  NR_CPUS) else maxactive=NR_CPUS)

  For example:

    struct kretprobe rp;
    rp.kp.addr = /* entrypoint address */
    rp.handler = /*return probe handler */
    rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
    register_kretprobe(&rp);

  The following field may also be of interest:

  nmissed - Initialized to zero when the function-return probe is
  registered, and incremented every time the probed function is entered but
  there is no kretprobe_instance object available for establishing the
  function-return probe (i.e., because maxactive was set too low).

2.2 Unregister

  To unregiter a function-return probe, the user calls
  unregister_kretprobe() with the same kretprobe object as registered
  previously.  If a probed function is running when the return probe is
  unregistered, the function will return as expected, but the handler won't
  be run.

3. Limitations

3.1 This patch supports only the i386 architecture, but patches for
    x86_64 and ppc64 are anticipated soon.

3.2 Return probes operates by replacing the return address in the stack
    (or in a known register, such as the lr register for ppc).  This may
    cause __builtin_return_address(0), when invoked from the return-probed
    function, to return the address of the return-probes trampoline.

3.3 This implementation uses the "Multiprobes at an address" feature in
    2.6.12-rc3-mm3.

3.4 Due to a limitation in multi-probes, you cannot currently establish
    a return probe and a jprobe on the same function.  A patch to remove
    this limitation is being tested.

This feature is required by SystemTap (http://sourceware.org/systemtap),
and reflects ideas contributed by several SystemTap developers, including
Will Cohen and Ananth Mavinakayanahalli.

Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Vincent Hanquez
717b594a41 [PATCH] xen: x86: Use more usermode macro
Use the user_mode macro where it's possible.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
fa1e1bdf78 [PATCH] xen: x86: Rename usermode macro
Rename user_mode to user_mode_vm and add a user_mode macro similar to the
x86-64 one.

This is useful for Xen because the linux xen kernel does not runs on the same
priviledge that a vanilla linux kernel, and with this we just need to redefine
user_mode().

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
1cc6f12e03 [PATCH] xen: x86: Use new macro for debugreg
Make use of the 2 new macro set_debugreg and get_debugreg.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:13 -07:00
Andrew Morton
c92c6ffdb1 [PATCH] mtrr size-and-base debugging
Consolidate the mtrr sanity checking, add a dump_stack().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:12 -07:00
Andrew Morton
a3a255e744 [PATCH] x86: cpu_khz type fix
x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
unsigned long.

So make cpu_khz unsigned int on x86 as well.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Alexey Dobriyan
129f69465b [PATCH] Remove i386_ksyms.c, almost.
* EXPORT_SYMBOL's moved to other files
* #include <linux/config.h>, <linux/module.h> where needed
* #include's in i386_ksyms.c cleaned up
* After copy-paste, redundant due to Makefiles rules preprocessor directives
  removed:

	#ifdef CONFIG_FOO
	EXPORT_SYMBOL(foo);
	#endif

	obj-$(CONFIG_FOO) += foo.o

* Tiny reformat to fit in 80 columns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Aleksey Gorelov
80bb82afea [PATCH] VIA 82C586B IRQ routing fix
According to the VIA 82C586B datasheet (still available from
http://gkernel.sourceforge.net/specs/via/586b.pdf.bz2) this chip need a
special PIRQ mapping.

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Natalie Protasevich
c434b7a6ae [PATCH] x86: avoid wasting IRQs for PCI devices
I have submitted the patch for x86_64, this is submission for i386.

The patch changes the way IRQs are handed out to PCI devices.  Currently,
each I/O APIC pin gets associated with an IRQ, no matter if the pin is used
or not.  This imposes severe limitation on systems that have designs that
employ many I/O APICs, only utilizing couple lines of each, such as P64H2
chipset.  It is used in ES7000, and currently, there is no way to boot the
system with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC.  It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Christoph Lameter
5912100372 [PATCH] i386: Selectable Frequency of the Timer Interrupt
Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Jan Beulich
7fbb4f6e68 [PATCH] adjust i386 watchdog tick calculation
Get the i386 watchdog tick calculation into a state where it can also be used
on CPUs with frequencies beyond 4GHz, and it consolidates the calculation into
a single place (for potential furture adjustments).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Natalie Protasevich
ca05fea6db [PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386)
This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and
use heuristics to prevent unique I/O APIC ID check for systems that don't
need it.  The patch disables unique I/O APIC ID check for Xeon-based and
other platforms that don't use serial APIC bus for interrupt delivery.
Andi stated that AMD systems don't need unique IO_APIC_IDs either.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Roland McGrath
7c1def1652 [PATCH] i386: never block forced SIGSEGV
This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way.  The
signal blocking for sa_mask and the handled signal takes place after the
handler setup.  When the stack is bogus, the handler setup forces a
SIGSEGV.  But then this will be blocked, and returning to user mode will
fault again and iterate.  This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so.  This
copies what was done in the ppc code.  I think all architectures' signal
handler setup code follows this pattern and needs the change.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Venkatesh Pallipadi
8a9e1b0f56 [PATCH] Platform SMIs and their interferance with tsc based delay calibration
Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2)   wait for jiffy transition (jiffy1)
(3)   Note down current tsc (tsc1)
(4)   loop until tsc becomes tsc1 + loops_per_jiffy
(5)   check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:08 -07:00
Ian Campbell
0f8e2d62fa [PATCH] use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh
The attached patch causes the various arch specific install.sh scripts to
look for ${CROSS_COMPILE}installkernel rather than just installkernel (in
both /sbin/ and ~/bin/ where the script already did this).  This allows you
to have e.g.  arm-linux-installkernel as a handy way to install on your
cross target.  It also prevents the script picking up on the host
/sbin/installkernel which causes the script to fall through and do the
install itself (which is what I actually use myself, with $INSTALL_PATH
set).

I don't believe it causes back-compatibility problems since calling the
host installkernel was never likely to work or be what you wanted when
cross compiling anyway.  If $CROSS_COMPILE isn't set then nothing changes.

I only use ARM and i386 myself but I figured it couldn't hurt to do the
whole lot.  I've cc'd those who I hope are the arch maintainers for files
that I've touched.

Signed-off-by: Ian Campbell <icampbell@arcom.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
H. Peter Anvin
d0e7feb03d [PATCH] biarch compiler support for i386
This allows the i386 architecture to be built on a system with a biarch
compiler that defaults to x86-64, merely by specifying ARCH=i386.

As previously discussed, this uses the equivalent logic to the ppc port.

Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
Martin J. Bligh
6f4e1e5061 [PATCH] add page_state info to show_mem
This helps a lot when debugging out of memory stuff - useful especially to
see if all the memory is sucked into slab, etc.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
Andy Whitcroft
05b79bdcb4 [PATCH] sparsemem memory model for i386
Provide the architecture specific implementation for SPARSEMEM for i386 SMP
and NUMA systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
Andy Whitcroft
d41dee369b [PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Andy Whitcroft
af705362ab [PATCH] generify memory present
Allow architectures to indicate that they will be providing hooks to indice
installed memory areas, memory_present().  Provide prototypes for the i386
implementation.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Andy Whitcroft
b159d43fbf [PATCH] generify early_pfn_to_nid
Provide a default implementation for early_pfn_to_nid returning node 0.  Allow
architectures to override this with their own implementation out of
asm/mmzone.h.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Dave Hansen
3f22ab276b [PATCH] make each arch use mm/Kconfig
For all architectures, this just means that you'll see a "Memory Model"
choice in your architecture menu.  For those that implement DISCONTIGMEM,
you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
y" and make your users select DISCONTIGMEM right out of the new choice
menu.  The only disadvantage might be if you have some specific things that
you need in your help option to explain something about DISCONTIGMEM.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:02 -07:00
Dave Hansen
5b505b90b2 [PATCH] sparsemem base: teach discontig about sparse ranges
discontig.c has some assumptions that mem_map[]s inside of a node are
contiguous.  Teach it to make sure that each region that it's bringing online
is actually made up of valid ranges of ram.

Written-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:01 -07:00
Dave Hansen
6f167ec721 [PATCH] sparsemem base: simple NUMA remap space allocator
Introduce a simple allocator for the NUMA remap space.  This space is very
scarce, used for structures which are best allocated node local.

This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.

Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
  to allow us to allocate the pgdat's using this mechanism; which we do not yet
  do.  Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:01 -07:00
Dave Hansen
c2ebaa425e [PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)
The following four patches provide the last needed changes before the
introduction of sparsemem.  For a more complete description of what this
will do, please see this patch:

http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

or previous posts on the subject:
http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2

Three of these are i386-only, but one of them reorganizes the macros
used to manage the space in page->flags, and will affect all platforms.
There are analogous patches to the i386 ones for ppc64, ia64, and
x86_64, but those will be submitted by the normal arch maintainers.

The combination of the four patches has been test-booted on a variety of
i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17
different .configs.  It's also been runtime-tested on ia64 configs (with
more patches on top).

This patch:

We _know_ which node pages in general belong to, at least at a very gross
level in node_{start,end}_pfn[].  Use those to target the allocations of
pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:00 -07:00
Dave Hansen
408fde81c1 [PATCH] remove non-DISCONTIG use of pgdat->node_mem_map
This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code.  On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.

There was also a node_mem_map(nid) macro on many architectures.  Its use
along with the use of ->node_mem_map itself was not consistent.  It has
been removed in favor of two new, more explicit, arch-independent macros:

	pgdat_page_nr(pgdat, pagenr)
	nid_page_nr(nid, pagenr)

I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways.  I
believe the newer names are much clearer.

These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead.  We could
make this the only behavior if people want, but I don't want to change too
much at once.  One thing at a time.

This patch removes more code than it adds.

Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64.  Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/

Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:00 -07:00
Coywolf Qi Hunt
2894801db1 [PATCH] kbuild: display compile version
I am always trying to make sure I've booted the right kernel after a new
install.  Too paranoid maybe.  But I guess there're other people like me.
So let's make kbuild display the compile version number at the end to give
us a hint.  I know we may be booting vmlinux someday, but don't care about
it for now.

Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:22 -07:00
Badari Pulavarty
cbe37d0937 [PATCH] mm: remove PG_highmem
Remove PG_highmem, to save a page flag.  Use is_highmem() instead.  It'll
generate a little more code, but we don't use PageHigheMem() in many places.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:17 -07:00
Wolfgang Wander
1363c3cd86 [PATCH] Avoiding mmap fragmentation
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

  1) the free_area_cache is used to continue a search for memory where
     the last search ended.  Before the change new areas were always
     searched from the base address on.

     So now new small areas are cluttering holes of all sizes
     throughout the whole mmap-able region whereas before small holes
     tended to close holes near the base leaving holes far from the base
     large and available for larger requests.

  2) the free_area_cache also is set to the location of the last
     munmap-ed area so in scenarios where we allocate e.g.  five regions of
     1K each, then free regions 4 2 3 in this order the next request for 1K
     will be placed in the position of the old region 3, whereas before we
     appended it to the still active region 1, placing it at the location
     of the old region 2.  Before we had 1 free region of 2K, now we only
     get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache.  If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:16 -07:00
David Gibson
63551ae0fe [PATCH] Hugepage consolidation
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code.  It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.

Tested, at least a little, on ppc64, i386 and x86_64.

Notes:
	- this patch changes the meaning of set_huge_pte() to be more
	  analagous to set_pte()
	- does SH4 need s special huge_ptep_get_and_clear()??

Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:15 -07:00
Martin Hicks
753ee72896 [PATCH] VM: early zone reclaim
This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Ingo Molnar
39c715b717 [PATCH] smp_processor_id() cleanup
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:13 -07:00
gregkh@suse.de
8874b414ff [PATCH] class: convert arch/* to use the new class api instead of class_simple
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:09 -07:00
Thomas Hood
92c6dc59b7 [PATCH] apm.c: ignore_normal_resume is set a bit too late
This patch causes the ignore_normal_resume flag to be set slightly earlier,
before there is a chance that the apm driver will receive the normal resume
event from the BIOS.  (Addresses Debian bug #310865)

Signed-off-by: Thomas Hood <jdthood@yahoo.co.uk>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-14 07:19:35 -07:00
Keith Owens
5754c9b649 [PATCH] Stop arch/i386/kernel/vsyscall-note.o being rebuilt every time
arch/i386/kernel/vsyscall-note.o is not listed as a target so its .cmd file
is neither considered as a target nor is it read on the next build.  This
causes vsyscall-note.o to be rebuilt every time that you run make, which
causes vmlinux to be rebuilt every time.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-08 16:21:13 -07:00
Dave Jones
f94ea640a2 [CPUFREQ] Typos.
cpfureq developers cant spel.

Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:52 -07:00
Dave Jones
6778bae0f2 [CPUFREQ] longhaul - adjust transition latency.
From patch by: Ken Staton <ken_staton@agilent.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:51 -07:00
Dave Jones
1174631418 [CPUFREQ] Longhaul: Magic timer frobbing.
As mandated by the spec, disable timer around transitions.

From code by : Ken Staton <ken_staton@agilent.com
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:51 -07:00
Dave Jones
3be6a48f3c [CPUFREQ] longhaul - disable PCI mastering around transition.
The spec states that we have to do this, which is *horrid*.

Based on code from: Ken Staton <ken_staton@agilent.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:51 -07:00
Dave Jones
065b807ca1 [CPUFREQ] dual-core powernow-k8
With the release of the dual-core AMD Opterons last week,
it's high time that cpufreq supported them.  The attached
patch applies cleanly to 2.6.12-rc3 and updates powernow-k8
to support the latest Athlon 64 and Opteron processors.

Update the driver to version 1.40.0 and provide support
for dual-core processors.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:46 -07:00
Dave Jones
c5d28fb297 [CPUFREQ] Recalibrate cpu_khz [2/2]
Some cpufreq drivers (at that time, only powernow-k7) need to recalibrate the
cpu_khz at runtime.

Signed-off-by: Bruno Ducrot <ducrot@poupinou.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:46 -07:00
Dave Jones
91350ed49b [CPUFREQ] Recalibrate cpu_khz [1/2]
We have to recalibrate cpu_khz in order to use the current FID instead the max
FID since some BIOS do not put the processor at maximum frequency at POST. 
Also, some BIOS will change the processor frequency at our back after cpu_khz
was calibrate.  Finally, this will fix a long standing bug when we do
something like this:

# rmmod powernow-k7
# modprobe powernow-k7

Signed-off-by: Bruno Ducrot <ducrot@poupinou.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:45 -07:00
Dave Jones
bf6fc9fd2d [CPUFREQ] AMD Elan SC520 cpufreq driver.
From: Sean Young <sean@mess.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:45 -07:00
Dave Jones
6f4095af6d [CPUFREQ] speedstep-smi: it works on at least one P4M
The speedstep-smi driver actually works on >=1 notebook with a
Pentium 4-M CPU where all other cpufreq drivers fail. Therefore,
allow speedstep-smi on P4Ms again, but warn users of likely failure

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:44 -07:00
Dave Jones
8282864a96 [CPUFREQ] speedstep-centrino: Pentium 4 - M (HT) support
The Pentium 4 - Ms (HT) with CPUID 0xF34 and 0xF41 seem to support
centrino-like enhanced speedstep; however, no "table" support is possible.
Therefore, put NULL entries into speedstep-centrino.c

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:43 -07:00
Dave Jones
7eb53d8823 [CPUFREQ] powernow-k7: don't print khz element of FSB.
Signed-off-by: Dave Jones <davej@redhat.com>
2005-05-31 19:03:42 -07:00
Alexander Nyberg
adaa765d76 [PATCH] acpi build fix: x86 setup.c
This is a neverending story

linux/acpi.h contains empty declarations for acpi_boot_init() &
acpi_boot_table_init() but they are nested inside #ifdef CONFIG_ACPI.

So we'll have to #ifdef in arch/i386/kernel/setup.c: setup_arch()

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-31 14:54:17 -07:00
Siddha, Suresh B
49f384b82b [PATCH] x86: fix smp_num_siblings on buggy BIOSes
This fixes 'smp_num_siblings' value on the systems with a buggy bios,
which sets number of siblings to '2' even when HT is disabled.  (more
details are at http://bugzilla.kernel.org/show_bug.cgi?id=4359)

I am planning to do more cleanup in this area (like moving smp_num_siblings
to per cpuinfo) shortly.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28 11:14:00 -07:00
Adrian Bunk
70ffc71c5c [PATCH] arch/i386/kernel/cpu/intel_cacheinfo.c: section fix
num_cache_leaves is used in __devexit cache_remove_dev() and can therefore
not be __devinit.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28 11:14:00 -07:00