When using Ski to debug early startup, it's a bit of a pain not to
have printk.
This patch enables the simulated console very early.
It may be worth conditionalising on the command line... but this is
enough for now.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The "ri" field in the processor status register only has defined
values of 0, 1, 2. Do not let ptrace set this to 3. As with
other reserved fields in registers we silently discard the value.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a bug in the ia64_do_page_fault code that can cause a failure
to grow the register backing store, or any mapping that is marked as
VM_GROWSUP if the mapping is the highest mapped area of memory.
When the address accessed is below the first mapping the previous mapping
is returned as NULL, and this case is handled. However, when the address
accessed is above the highest mapping the vma returned is NULL, this
case is not handled correctly, and it fails to spot that this access
might require an existing mapping to grow upwards.
Signed-off-by: Andrew Burgess <andrew@transitive.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The core cpufreq code doesn't appear to understand returning -EAGAIN
for the get() function of the cpufreq_driver. If PAL_GET_PSTATE returns
-1, such as when running on Xen, scaling_cur_freq is happy to return
4294967285 kHz (ie. (unsigned)-11). The other drivers appear to return
0 for a failure, and doing so gives me the max frequency from
scaling_cur_frequency and "<unknown>" from cpuinfo_cur_frequency. I
believe that's the desired behavior.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the interrupt has been disabled, don't call the force_interrupt provider.
Doing so can result in an infinite runaway interrupt loop.
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The slab allocator was changed in 2.6.23 to default to SLUB. However,
the config files in arch/ia64/configs still use SLAB. Switch them to SLUB.
Added same change to arch/ia64/defconfig ... Tony
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Explicitly put the unwind section into its own program-header. This
used to be unnecessary (probably because binutils did it for us), but
with current binutils (e.g., v2.17.50.20070804) we won't get
the PT_IA_64_UNWIND header without this patch which will break
unwinding in a debugger and simulators such as Ski.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add NOTES to linker script such that the kernel can be built with
recent versions of binutils. Without this patch, final link fails
with this error:
ld: .tmp_vmlinux1: section `.text' can't be allocated in segment 0
ld: final link failed: Bad value
This error is due to the fact that the --build-id option is used
with newer linkers to include a .notes section on the kernel, but
without the NOTES macro, that section won't be included in the kernel
which then leads to the above error message.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a dummy nop at the end of _start() to maintain the invariant that
the return-pointer (rp) always point to the calling function. This
makes unwinding stop at the last frame, as it should.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use local_vector_to_irq() instead of looping through all NR_IRQS.
This avoids registering the CPE handler on multiple irqs. Only
register if the irq is valid. If no valid irq is found, print an
error message and set up polling.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/Kconfig failed to include kernel/Kconfig.preempt that meant it
did not support PREEMPT_VOLUNTARY and PREEMPT_BKL (inadvertently).
This was recently noticed when the newly-added PREEMPT_NOTIFIERS in
Kconfig.preempt that was "select"ed from drivers/kvm/Kconfig (therefore)
started giving bogus warnings ('select' used by config symbol 'KVM' refers
to undefined symbol 'PREEMPT_NOTIFIERS') on ia64 builds.
So let's remove the open-coded definition of CONFIG_PREEMPT in
arch/ia64/Kconfig and replace it with just including Kconfig.preempt
instead, like the other archs do.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add base support for implementing platform_irq_to_vector(), and
then use it on SN2.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While sending interrupts to a cpu to repeatedly wake a thread, on occasion
that thread will take a full timer tick cycle (4002 usec in my case)
to wakeup.
The problem concerns a race condition in the code around the safe_halt()
call in the default_idle() routine. Setting 'nohalt' on the kernel
command line causes the long wakeups to disappear.
void
default_idle (void)
{
local_irq_enable();
while (!need_resched()) {
--> if (can_do_pal_halt)
--> safe_halt();
else
A timer tick could arrive between the check for !need_resched and the
actual call to safe_halt() (which does a pal call to PAL_HALT_LIGHT).
By the time the timer tick completes, a thread that might now need to run
could get held up for as long as a timer tick waiting for the halted cpu.
I'm proposing that we disable irq's and check need_resched again before
calling safe_halt(). Does anyone see any problem with this approach?
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Level type interrupts do not need to be resent. It was also found that
some chipsets get confused in case of the resend.
Mark the ioapic level type interrupts as such to avoid the resend
functionality in the generic irq code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (28 commits)
ACPI: thermal: add DMI hooks to handle AOpen's broken Award BIOS
ACPI: thermal: create "thermal.act=" to disable or override active trip point
ACPI: thermal: create "thermal.nocrt" to disable critical actions
ACPI: thermal: create "thermal.psv=" to override passive trip points
ACPI: thermal: expose "thermal.tzp=" to set global polling frequency
ACPI: thermal: create "thermal.off=1" to disable ACPI thermal support
ACPI: thinkpad-acpi: fix sysfs paths in documentation
ACPI: static
ACPI EC: remove potential deadlock from EC
ACPI: dock: Send key=value pair instead of plain value
ACPI: bay: send envp with uevent - fix
acpi-cpufreq: Fix some x86/x86-64 acpi-cpufreq driver issues
ACPI: fix "Time Problems with 2.6.23-rc1-gf695baf2"
ACPI: thinkpad-acpi: change thinkpad-acpi input default and kconfig help
ACPI: EC: fix run-together printk lines
ACPI: sbs: remove dead code
ACPI: EC: acpi_ec_remove(): fix use-after-free
ACPI: EC: Switch from boot_ec as soon as we find its desc in DSDT.
ACPI: EC: fix build warning
ACPI: EC: If ECDT is not found, look up EC in DSDT.
...
Commit 3320ad994a broke mmio config space
accesses totally on i386 - it dropped the "reg" offset to the address.
Cc: dean gaudet <dean@arctic.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
apply_alternatives uses memcpy() to apply alternatives. Which has the
unfortunate effect that while applying memcpy alternative to memcpy
itself it tries to overwrite itself with nops - which causes #UD fault
as it overwrites half of an instruction in copy loop, and from this
point on only possible outcome is triplefault and reboot.
So let's overwrite only first two instructions of memcpy - as long as
the main memcpy loop is not in first two bytes it will work fine.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] monwriter: Serialization bug for multithreaded applications.
[S390] vmur: diag14 only works with buffers below 2GB
[S390] vmur: add "top of queue" sanity check for reader open
[S390] vmur: reject open on z/VM reader files with status HOLD
[S390] vmur: use DECLARE_COMPLETION_ONSTACK to keep lockdep happy
[S390] vmur: allocate single record buffers instead of one big data buffer
[S390] remove DEFAULT_MIGRATION_COST
[S390] qdio: make sure data structures are correctly aligned.
[S390] hypfs: implement show_options
[S390] cio: avoid memory leak on error in css_alloc_subchannel().
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fix size check for hugetlbfs
[POWERPC] Fix initialization and usage of dma_mask
[POWERPC] Fix more section mismatches in head_64.S
[POWERPC] Revert "[POWERPC] Add 'mdio' to bus scan id list for platforms with QE UEC"
[POWERPC] PS3: Update ps3_defconfig
[POWERPC] PS3: Remove text saying PS3 support is incomplete
[POWERPC] PS3: Fix storage probe logic
[POWERPC] cell: Move SPU affinity init to spu_management_of_ops
[POWERPC] Fix potential duplicate entry in SLB shadow buffer
The new percpu code has apparently broken the doublefault handler
when CONFIG_DEBUG_SPINLOCK is set. Doublefault is handled by
a hardware task, making the check
SPIN_BUG_ON(lock->owner == current, lock, "recursion");
fault because it uses the FS register to access the percpu data
for current, and that register is zero in the new TSS. (The trace
I saw was on 2.6.20 where it was GS, but it looks like this will
still happen with FS on 2.6.22.)
Initializing FS in the doublefault_tss should fix it.
AK: Also fix broken ptr_ok() and turn printks into KERN_EMERG
AK: And add a PANIC prefix to make clear the system will hang
AK: (e.g. x86-64 will recover)
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create arch/x86_64/vdso/.gitignore and put vdso.lds into it.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Averatec 2370 and some other Turion laptop BIOS seems to program the
ENABLE_C1E MSR inconsistently between cores. This confuses the lapic
use heuristics because when C1E is enabled anywhere it seems to affect
the complete chip.
Use a global flag instead of a per cpu flag to handle this.
If any CPU has C1E enabled disabled lapic use.
Thanks to Cal Peake for debugging.
Cc: tglx@linutronix.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VT is very picky about when it can enter execution.
Get all segments setup and get LDT and TR into valid state to allow
VT execution under VMware and KVM (untested).
This makes the boot decompression run under VT, which makes it several
orders of magnitude faster on 64-bit Intel hardware.
Before, I was seeing times up to a minute or more to decompress a 1.3MB kernel
on a very fast box.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 19d36ccdc3 "x86: Fix alternatives
and kprobes to remap write-protected kernel text" uses code which is
being patched for patching.
In particular, paravirt_ops does patching in two stages: first it
calls paravirt_ops.patch, then it fills any remaining instructions
with nop_out(). nop_out calls text_poke() which calls
lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
that call site is one of the places we patch.
If we always do patching as one single call to text_poke(), we only
need make sure we're not patching the memcpy in text_poke itself.
This means the prototype to paravirt_ops.patch needs to change, to
marshal the new code into a buffer rather than patching in place as it
does now. It also means all patching goes through text_poke(), which
is known to be safe (apply_alternatives is also changed to make a
single patch).
AK: fix compilation on x86-64 (bad rusty!)
AK: fix boot on x86-64 (sigh)
AK: merged with other patches
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turns out CLFLUSH support is still not complete; we
flush the wrong pages. Again disable it for the release.
Noticed by Jan Beulich who then also noticed a stupid typo later.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current code assumed that devices were directly connected to a Calgary
bridge, as it tried to get the iommu table directly from the parent bus
controller.
When we have another bridge between the Calgary/CalIOC2 bridge and the
device we should look upwards until we get to the top (Calgary/CalIOC2
bridge), where the iommu table resides.
Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some broken devices have been discovered to require %al/%ax/%eax registers
for MMIO config space accesses. Modify mmconfig.c to use these registers
explicitly (rather than modify the global readb/writeb/etc inlines).
AK: also changed i386 to always use eax
AK: moved change to extended space probing to different patch
AK: reworked with inlines according to Linus' requirements.
AK: improve comments.
Signed-off-by: dean gaudet <dean@arctic.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully
also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's
and Andy Whitcroft's patches (thanks!) with some changes:
- introduce pci_scan_bus_with_sysdata() and use it instead of
pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
sysdata work is to make it easy to do root-bus specific things
(e.g., support PCI domains and IOMMU's). I dislike using a default
struct pci_sysdata in some places and a dynamically allocated
pci_sysdata elsewhere - the potential for someone indavertantly
changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
always initialized to -1. Patches to do the right thing with regards
to the NUMA node can build on top of this (either add a 'node'
parameter to pci_scan_bus_with_sysdata() or just update the node
when it becomes known).
The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my
machines exhibited the bugs so caveat emptor.
Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix "pci=noacpi" on your laptop?
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: <riku.seppala@kymp.net>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This code corrects the usage of the request_irq() routine.
Signed-off-by: Jay Estabrook <jay.estabrook@hp.com>
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Connect up the fallocate() system call.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have had four seperate system lockups attributable to this exact problem
in two days of testing. Instead of trying to handle all the weird end
cases and wrap, how about changing it to look for exactly what we appear
to want.
The following patch removes a couple races in setup_APIC_timer. One occurs
when the HPET advances the COUNTER past the T0_CMP value between the time
the T0_CMP was originally read and when COUNTER is read. This results in
a delay waiting for the counter to wrap. The other results from the counter
wrapping.
This change takes a snapshot of T0_CMP at the beginning of the loop and
simply loops until T0_CMP has changed (a tick has happened).
<later>
I have one small concern about the patch. I am not sure it meets the intent
as well as it should. I think we are trying to match APIC timer interrupts up
with the hpet counter increment. The event which appears to be disturbing
this loop in our test environment is the NMI watchdog. What we believe has
been happening with the existing code is the setup_APIC_timer loop has read
the CMP value, and the NMI watchdog code fires for the first time. This
results in a series of icache miss slowdowns and by the time we get back to
things it has wrapped.
I think this code is trying to get the CMP as close to the counter value as
possible. If that is the intent, maybe we should really be testing against a
"window" around the CMP. Something like COUNTER = CMP+/2. It appears COUNTER
should get advanced every 89nSec (IIRC). The above seems like an unreasonably
small window, but may be necessary. Without documentation, I am not sure of
the original intent with this code.
In summary, this code fixes my boot hangs, but since I am not certain of the
intent of the existing code, I am not certain this has not introduced new bugs
or unexpected behaviors.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: "Aaron Durbin" <adurbin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My "slices" address space management code that was added in the 2.6.22
implementation of get_unmapped_area() doesn't properly check that the
size is a multiple of the requested page size. This allows userland to
create VMAs that aren't a multiple of the huge page size with hugetlbfs
(since hugetlbfs entirely relies on get_unmapped_area() to do that
checking) which leads to a kernel BUG() when such areas are torn down.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc has a couple of bugs in the usage of dma_masks that tend to
break when drivers explicitly try to set a 32-bit mask for example.
First, the code that generates the pci devices from the OF device-tree
doesn't initialize the mask properly, then our implementation of
set_dma_mask() was trying to validate the -previous- mask value, not the
one passed in as an argument.
This fixes these problems.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit 3baee95595.
That commit was a mistake from the start; I added mdio type to the
bus scan list early on in my ucc_geth migrate to phylib development,
which is just pure wrong (the ucc_geth_mii driver creates the mii
bus and the PHY layer handles PHY enumeration without translation).
This follows on from commit 77926826f3:
Revert "[POWERPC] Don't complain if size-cells == 0 in prom_parse()"
which was basically trying to hide a symptom of the original mistake
this revert fixes.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the Kconfig message that indicates the PS3 platform support is
incomplete.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix the PS3 storage probe logic to properly find device regions on cold
startup.
o Change the storage probe event mask from notify_device_ready
to notify_region_update.
o Improve the storage probe error handling.
o Change ps3_storage_wait_for_device() to use a temporary variable to hold
the buffer address.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch moves affinity initialization code from spu_base.c to a
new spu_management_of_ops function (init_affinity), which is empty
in the case of PS3. This fixes a linking problem that was happening
when compiling for PS3.
Also, some small code style changes were made.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We were getting a duplicate entry in the SLB shadow buffer in
slb_flush_and_rebolt() if the kernel stack was in the same segment
as PAGE_OFFSET, which on POWER6 causes the hypervisor to terminate
the partition with an error. This fixes it.
Also we were not creating an SLB entry (or an SLB shadow buffer
entry) for the kernel stack on secondary CPUs when starting the
CPU. This isn't a major problem, since an appropriate entry will
be created on demand, but this fixes that also for consistency.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Every time a cpu is added via hotplug, we allocate the per-cpu MONDO
queues but we never free them up. Freeing isn't easy since the first
cpu gets this memory from bootmem.
Therefore, the simplest thing to do to fix this bug is to allocate the
queues for all possible cpus at boot time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the cpu type in the OBP device tree before committing to
using the optimized Niagara memcpy and memset implementation.
If we don't recognize the cpu type, use a completely generic
version.
Signed-off-by: David S. Miller <davem@davemloft.net>