If the PC was ret_from_irq or ret_from_exception, there will be no
more normal stackframe. Instead of stopping the unwinding, use PC and
RA saved by an exception handler to continue unwinding into the
interrupted context. This also simplifies the CONFIG_STACKTRACE code.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Fix build error due to stacktrace API change. Now save_stack_trace()
tries to save all kernel context, including interrupts and exception.
Also some asm code are changed a bit so that we can detect the end of
current context easily.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This very small patch adds support for little endian on the virtual QEMU
mips platform. The status of this platform is the same as the big endian
one, ie it is possible to boot a system with init=/bin/sh.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Add those lines to all defconfigs.
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
This is a patch againt linux-mips.org git tree.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Implement stacktrace interface by using unwind_stack() and enable lockdep
support in Kconfig.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In handle_sys and its variants, we must reload some registers which
might be clobbered by trace_hardirqs_on().
Also we must make sure trace_hardirqs_on() called in kernel level (not
exception level).
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The test for the error from pcmcia_replace_cis() was incorrect, and
would always trigger (because if an error didn't happen, the "ret" value
would not be zero, it would be the passed-in count).
Reported and debugged by Fabrice Bellet <fabrice@bellet.info>
Rather than just fix the single broken test, make the code in question
use an understandable code-sequence instead, fixing the whole function
to be more readable.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's not clear how this thinko got through..
Cc: Olaf Hering <olaf@aepfle.de>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During tracking down a PAE compile failure, I found that config.h was being
included in a bunch of places in i386 code. It is no longer necessary, so
drop it.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a pte_update_hook which notifies about pte changes that have been made
without using the set_pte / clear_pte interfaces. This allows shadow mode
hypervisors which do not trap on page table access to maintain synchronized
shadows.
It also turns out, there was one pte update in PAE mode that wasn't using any
accessor interface at all for setting NX protection. Considering it is PAE
specific, and the accessor is i386 specific, I didn't want to add a generic
encapsulation of this behavior yet.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that ptep_establish has a definition in PAE i386 3-level paging code, the
only paging model which is insane enough to have multi-word hardware PTEs
which are not efficient to set atomically, we can remove the ghost of
set_pte_atomic from other architectures which falesly duplicated it, and
remove all knowledge of it from the generic pgtable code.
set_pte_atomic is now a private pte operator which is specific to i386
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ptep_establish macro is only used on user-level PTEs, for P->P mapping
changes. Since these always happen under protection of the pagetable lock,
the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
even a lock prefix needs to be used. We can simply instead clear the P-bit,
followed by a normal set. The write ordering is still important to avoid the
possibility of the TLB snooping a partially written PTE and getting a bad
mapping installed.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create a new PTE function which combines clearing a kernel PTE with the
subsequent flush. This allows the two to be easily combined into a single
hypercall or paravirt-op. More subtly, reverse the order of the flush for
kmap_atomic. Instead of flushing on establishing a mapping, flush on clearing
a mapping. This eliminates the possibility of leaving stale kmap entries
which may still have valid TLB mappings. This is required for direct mode
hypervisors, which need to reprotect all mappings of a given page when
changing the page type from a normal page to a protected page (such as a page
table or descriptor table page). But it also provides some nicer semantics
for real hardware, by providing extra debug-proofing against using stale
mappings, as well as ensuring that no stale mappings exist when changing the
cacheability attributes of a page, which could lead to cache conflicts when
two different types of mappings exist for the same page.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
dominating functions, ptep_clear_flush_{dirty|young}. This allows the TLB
page flush to be contained in the same macro, and allows for an eager
optimization - if reading the PTE initially returned dirty/accessed, we can
assume the fact that no subsequent update to the PTE which cleared accessed /
dirty has occurred, as the only way A/D bits can change without holding the
page table lock is if a remote processor clears them. This eliminates an
extra branch which came from the generic version of the code, as we know that
no other CPU could have cleared the A/D bit, so the flush will always be
needed.
We still export these two defines, even though we do not actually define
the macros in the i386 code:
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
The reason for this is that the only use of these functions is within the
generic clear_flush functions, and we want a strong guarantee that there
are no other users of these functions, so we want to prevent the generic
code from defining them for us.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement lazy MMU update hooks which are SMP safe for both direct and shadow
page tables. The idea is that PTE updates and page invalidations while in
lazy mode can be batched into a single hypercall. We use this in VMI for
shadow page table synchronization, and it is a win. It also can be used by
PPC and for direct page tables on Xen.
For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified. This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of. Doing this under the protection of the locks guarantees the
synchronization is correct, and also means that spurious faults which are
generated during this window by remote CPUs are properly handled, as the page
fault handler must re-check the PTE under protection of the same lock.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change pte_clear_full to a more appropriately named pte_clear_not_present,
allowing optimizations when not-present mapping changes need not be reflected
in the hardware TLB for protected page table modes. There is also another
case that can use it in the fremap code.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't want to read PTEs directly like this after they have been modified,
as a lazy MMU implementation of direct page tables may not have written the
updated PTE back to memory yet.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
unfix invalidate_inode_pages2().
The problem is that various bits of code in the kernel can take transient refs
on pages: the page scanner will do this when inspecting a batch of pages, and
the lru_cache_add() batching pagevecs also hold a ref.
Net result is transient failures in invalidate_inode_pages2(). This affects
NFS directory invalidation (observed) and presumably also block-backed
direct-io (not yet reported).
Fix it by reverting invalidate_inode_pages2() back to the old version which
ignores the page refcounts.
We may come up with something more clever later, but for now we need a 2.6.18
fix for NFS.
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.
This is done by overloading the existing core_pattern sysctl
with a new syntax:
|program
When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run. The core dump will be
written to the standard input of that program instead of to a file.
This is useful for having automatic core dump analysis without filling up
disks. The program can do some simple analysis and save only a summary of
the core dump.
The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.
I also increased the core pattern size to 128 bytes so that longer command
lines fit.
Most of the changes comes from allowing core dumps without seeks. They are
fairly straight forward though.
One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour. I think that's
unlikely to be a real problem though.
Additional background:
> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps? I'm guessing that the embedded people will
> really want this functionality.
I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.
Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).
Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way. I ran out of time before doing that though.
The demo prototype scripts weren't very good. If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.
Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance). If nobody else
does it I can probably do the rewrite myself again at some point.
My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.
Alan sayeth:
I don't believe that piping as such as neccessarily the right model, but
the ability to intercept and processes core dumps from user space is asked
for by many enterprise users as well. They want to know about, capture,
analyse and process core dumps, often centrally and in automated form.
[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A new member in the ever growing family of call_usermode* functions is
born. The new call_usermodehelper_pipe() function allows to pipe data to
the stdin of the called user mode progam and behaves otherwise like the
normal call_usermodehelp() (except that it always waits for the child to
finish)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split the big and hard to read do_pipe function into smaller pieces.
This creates new create_write_pipe/free_write_pipe/create_read_pipe
functions. These functions are made global so that they can be used by
other parts of the kernel.
The resulting code is more generic and easier to read and has cleaner error
handling and less gotos.
[akpm@osdl.org: cleanup]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David S. Miller <davem@sunset.davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Mark A. Greer <mgreer@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Brent Casavant <bcasavan@sgi.com>
Cc: Pat Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert Alpha to use generic ioremap_page_range() by turning
__alpha_remap_area_pages() into an inline wrapper around ioremap_page_range().
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The existing implementation of ioremap_page_range(), which was taken
from i386, does this:
flush_cache_all();
/* modify page tables */
flush_tlb_all();
I think this is a bit defensive, so this patch changes the generic
implementation to do:
/* modify page tables */
flush_cache_vmap(start, end);
instead, which is similar to what vmalloc() does. This should still
be correct because we never modify existing PTEs. According to
James Bottomley:
The problem the flush_tlb_all() is trying to solve is to avoid stale tlb
entries in the ioremap area. We're just being conservative by flushing
on both map and unmap. Technically what vmalloc/vfree does (only flush
the tlb on unmap) is just fine because it means that the only tlb
entries in the remap area must belong to in-use mappings.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-m32r@ml.linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a generic implementation of ioremap_page_range() in
lib/ioremap.c based on the i386 implementation. It differs from the
i386 version in the following ways:
* The PTE flags are passed as a pgprot_t argument and must be
determined up front by the arch-specific code. No additional
PTE flags are added.
* Uses set_pte_at() instead of set_pte()
[bunk@stusta.de: warning fix]
]dhowells@redhat.com: nommu build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-m32r@ml.linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Re-implement smp_send_nmi_allbutself() so that calls to smp_processor_id
(through send_IPI_allbutself) can be replaced with safe_smp_processor_id
without affecting other parts of the kernel (as suggested by Eric Biederman).
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Substitute "smp_processor_id" with the stack overflow-safe
"safe_smp_processor_id" in the reboot path to the second kernel.
[akpm@osdl.org: build fix]
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a the first of a series of patch-sets aiming at making kdump more
robust against stack overflows.
This patch set does the following:
* Add safe_smp_processor_id function to i386 architecture (this function was
inspired by the x86_64 function of the same name).
* Substitute "smp_processor_id" with the stack overflow-safe
"safe_smp_processor_id" in the reboot path to the second kernel.
This patch:
On the event of a stack overflow critical data that usually resides at the
bottom of the stack is likely to be stomped and, consequently, its use should
be avoided.
In particular, in the i386 and IA64 architectures the macro smp_processor_id
ultimately makes use of the "cpu" member of struct thread_info which resides
at the bottom of the stack. x86_64, on the other hand, is not affected by
this problem because it benefits from the use of the PDA infrastructure.
To circumvent this problem I suggest implementing "safe_smp_processor_id()"
(it already exists in x86_64) for i386 and IA64 and use it as a replacement
for smp_processor_id in the reboot path to the dump capture kernel. This is a
possible implementation for i386.
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation. We need to catch these in addition to the
decrement operations.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>