Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently kprobe handler traps only happen in kernel space, so function
kprobe_exceptions_notify should skip traps which happen in user space.
This patch modifies this, and it is based on 2.6.16-rc4.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: <hiramatu@sdl.hitachi.co.jp>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When kretprobe probes the schedule() function, if the probed process exits
then schedule() will never return, so some kretprobe instances will never
be recycled.
In this patch the parent process will recycle retprobe instances of the
probed function and there will be no memory leak of kretprobe instances.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the assumption that driver_register() returns the number of devices
bound to the driver. In fact, it returns zero for success or a negative
error value.
Nobody uses the return value of of_register_driver() anyway.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In mm_init_ppc64() we calculate the location of the "IO hole", but then
no one ever looks at the value. So don't bother.
That's actually all mm_init_ppc64() does, so get rid of it too.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
remove warnings when building a 64bit kernel.
smp_call_function triggers also with 32bit kernel.
WARNING: vmlinux: duplicate symbol 'smp_call_function' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:164:EXPORT_SYMBOL(smp_call_function);
arch/powerpc/kernel/smp.c:300:EXPORT_SYMBOL(smp_call_function);
WARNING: vmlinux: duplicate symbol 'ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:113:EXPORT_SYMBOL(ioremap);
arch/powerpc/mm/pgtable_64.c:321:EXPORT_SYMBOL(ioremap);
WARNING: vmlinux: duplicate symbol '__ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:117:EXPORT_SYMBOL(__ioremap);
arch/powerpc/mm/pgtable_64.c:322:EXPORT_SYMBOL(__ioremap);
WARNING: vmlinux: duplicate symbol 'iounmap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:118:EXPORT_SYMBOL(iounmap);
arch/powerpc/mm/pgtable_64.c:323:EXPORT_SYMBOL(iounmap);
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We should be memset'ing the data we are pointing to, not the pointer
itself. This is in an error path so we probably don't hit it much.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The recent changes to keep gettimeofday in sync with xtime had the side
effect that it was occasionally possible for the time reported by
gettimeofday to go back by a microsecond. There were two reasons:
(1) when we recalculated the offsets used by gettimeofday every 2^31
timebase ticks, we lost an accumulated fractional microsecond, and
(2) because the update is done some time after the notional start of
jiffy, if ntp is slowing the clock, it is possible to see time go backwards
when the timebase factor gets reduced.
This fixes it by (a) slowing the gettimeofday clock by about 1us in
2^31 timebase ticks (a factor of less than 1 in 3.7 million), and (b)
adjusting the timebase offsets in the rare case that the gettimeofday
result could possibly go backwards (i.e. when ntp is slowing the clock
and the timer interrupt is late). In this case the adjustment will
reduce to zero eventually because of (a).
Signed-off-by: Paul Mackerras <paulus@samba.org>
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:
* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
path, so it was only doing anything with it once it saw some other
bit being set. In other words, the noerror behaviour would apply to
the next system call where we had to reschedule or deliver a signal,
which is not necessarily the current system call.
* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
path when the _TIF_SINGLESTEP bit was set.
* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
_TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
by system calls. I took it out of _TIF_USER_WORK_MASK.
* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
to be restored (unless perhaps a signal was delivered or the syscall
was traced or single-stepped). Thus the non-volatile registers
weren't restored on exit from a signal handler. We probably got
away with it mostly because signal handlers written in C wouldn't
alter the non-volatile registers.
* On 32-bit I simplified the code and made it more like 64-bit by
making the syscall exit path jump to ret_from_except to handle
preemption and signal delivery.
* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
set - but I think because of that 32-bit was actually restoring the
non-volatile registers on exit from a signal handler.
* I changed the order of enabling interrupts and saving the
non-volatile registers before calling do_syscall_trace_leave; now we
enable interrupts first.
Signed-off-by: Paul Mackerras <paulus@samba.org>
yaboot is scrogged and calls us with an invalid stack alignment,
it seems.
Thanks to David Woodhouse to pointing me to the problem.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Thu, 2006-03-02 at 19:55 +0100, Olaf Hering wrote:
> My iBook1 has 2 memory regions in reg. Depending on how I boot it
> (vmlinux+initrd) or zImage.initrd, it will not boot with current Linus
> tree.
> rmo_top should be 160MB instead of 32MB.
On logically-partitioned machines the first element of the reg
property in the memory node is defined to be the "RMO" region,
i.e. the memory that the processor can access in real mode. On other
machines the first element has no special meaning, so only take it to
be the RMO region on LPAR machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes userland aware of the icache snoop capability of the
POWER5 (and possibly others in the future) and of SMT capabilities.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On 32-bit, the exception prolog for the program check exception doesn't
enable interrupts early on. If it is an illegal instruction exception,
we read the instruction in order to emulate certain instructions, and
the get_user of the instruction triggers a WARN_ON since interrupts
are still disabled. This adds a local_irq_enable() to enable
interrupts before reading the instruction.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds mm->task_size to keep track of the task size of a given mm
and uses that to fix the powerpc vdso so that it uses the mm task size to
decide what pages to fault in instead of the current thread flags (which
broke when ptracing).
(akpm: I expect that mm_struct.task_size will become the way in which we
finally sort out the confusion between 32-bit processes and 32-bit mm's. It
may need tweaks, but at this stage this patch is powerpc-only.)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A bug in the assembly code of the vdso can cause gettimeofday() to hang
or to return incorrect results. The wrong register was used to test for
pending updates of the calibration variables and to create a dependency
for subsequent loads. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The inline cputime_to_foo and foo_to_cputime conversion functions in
include/asm-powerpc/cputime.h refer to 5 variables, which need to be
exported if those functions are to be usable from modules.
Signed-off-by: Paul Mackerras <paulus@samba.org>
mem= command line option was being ignored in arch/powerpc if we were not
a CONFIG_MULTIPLATFORM (which is handled via prom_init stub). The initial
command line extraction and parsing needed to be moved earlier in the boot
process and have code to actual parse mem= and do something about it.
Also, fixed a compile warning in the file.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This implements accurate task and cpu time accounting for 64-bit
powerpc kernels. Instead of accounting a whole jiffy of time to a
task on a timer interrupt because that task happened to be running at
the time, we now account time in units of timebase ticks according to
the actual time spent by the task in user mode and kernel mode. We
also count the time spent processing hardware and software interrupts
accurately. This is conditional on CONFIG_VIRT_CPU_ACCOUNTING. If
that is not set, we do tick-based approximate accounting as before.
To get this accurate information, we read either the PURR (processor
utilization of resources register) on POWER5 machines, or the timebase
on other machines on
* each entry to the kernel from usermode
* each exit to usermode
* transitions between process context, hard irq context and soft irq
context in kernel mode
* context switches.
On POWER5 systems with shared-processor logical partitioning we also
read both the PURR and the timebase at each timer interrupt and
context switch in order to determine how much time has been taken by
the hypervisor to run other partitions ("steal" time). Unfortunately,
since we need values of the PURR on both threads at the same time to
accurately calculate the steal time, and since we can only calculate
steal time on a per-core basis, the apportioning of the steal time
between idle time (time which we ceded to the hypervisor in the idle
loop) and actual stolen time is somewhat approximate at the moment.
This is all based quite heavily on what s390 does, and it uses the
generic interfaces that were added by the s390 developers,
i.e. account_system_time(), account_user_time(), etc.
This patch doesn't add any new interfaces between the kernel and
userspace, and doesn't change the units in which time is reported to
userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(),
times(), etc. Internally the various task and cpu times are stored in
timebase units, but they are converted to USER_HZ units (1/100th of a
second) when reported to userspace. Some precision is therefore lost
but there should not be any accumulating error, since the internal
accumulation is at full precision.
Signed-off-by: Paul Mackerras <paulus@samba.org>
HMT support is currently broken and needs to be reworked to play nicely
with the SMT scheduler. Remove the bit rotten bits for the time being.
I also updated an incorrect comment, we enter __secondary_hold with the
physical cpu id in r3.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The runlatch SPR can take a lot of time to write. My original runlatch
code would set it on every exception entry even though most of the time
this was not required. It would also continually set it in the idle
loop, which is an issue on an SMT capable processor.
Now we cache the runlatch value in a threadinfo bit, and only check for
it in decrementer and hardware interrupt exceptions as well as the idle
loop. Boot on POWER3, POWER5 and iseries, and compile tested on pmac32.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
altivec_unavailable_exception is called without setting r3... it looks like
the r3 that actually gets passed in as struct pt_regs *regs is the
undisturbed value of r3 at the time the altivec instruction was encountered.
The user actually gets to choose the pt_regs printed in the Oops!
This fixes the oops by passing the correct pt_regs pointer to
altivec_unavailable_exception.
Signed-off-by: Alan Curry <pacman@TheWorld.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The panic CPU is waiting forever due to some large timeout value if some
CPU is not responding to an IPI.
This patch fixes the problem - the maximum waiting period will be
10 seconds and then the kdump boot will go ahead.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For kexec we need to know the size of the MMU hash table.
Currently we calculate the size once in the htab code, and then twice more in
the kexec code, once using htab_hash_mask and once using ppc64_pft_size.
On some machines the ppc64_pft_size calculation is broken because
ppc64_pft_size is not set.
So we need to fix the second calculation, but better still we should just
calculate the size once and use it everywhere else.
Tested on Power5 LPAR, Power4 non-LPAR and Power3.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For UP to SMP kexec to work we need to jump into pSeries_secondary_smp_init
event on a UP + KEXEC kernel. The secondary cpus will not find their hw_cpu_id
in the paca and so they'll jump into kexec_wait, ready for a kexec.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because smp_release_cpus() is built for SMP || KEXEC, it's not safe to
unconditionally call it from setup_system(). On a UP && KEXEC kernel we'll
start up the secondary CPUs which will then go beserk and we die.
Simple fix is to conditionally call smp_release_cpus() in setup_system(). With
that in place we don't need the dummy definition of smp_release_cpus() because
all call sites are #ifdef'ed either SMP or KEXEC.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fallback gracefully when reading /proc/ppc64/lparcfg when the /rtas
device node can't be found.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A few symbols are exported twice, remove them from ppc_ksyms.c
Remove users of sys_ctrler in arch/ppc/
WARNING: vmlinux: duplicate symbol '__delay' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__up' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down_interruptible' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'sys_ctrler' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncat' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncmp' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strrchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strnlen' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strpbrk' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'memscan' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strstr' previous definition was in vmlinux
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a regression which was introduced by moving ppc32 to use
the same sort of lockless gettimeofday as ppc64 has been using for
some time. This involves getting the timebase and performing some
simple arithmetic to convert it to seconds and microseconds. However,
the factor and offset used there weren't being updated when NTP
varied the tick length using adjtimex. 64-bit didn't notice the
problem because it had a hook in the 32-bit adjtimex compat routine
that attempted to work out what the generic timekeeping code would
do and alter the factor and offset to match. However, that code
was very complex and it wasn't clear that it still matched what the
generic code would do.
Now we use the generic current_tick_length() routine that was recently
added to check that the current tick will be as long as we expect; if
not we recompute the factor and offset. This keeps gettimeofday and
xtime in sync. In addition we check that gettimeofday hasn't got ahead
of xtime on each timer interrupt; if it has, we resync.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we have some stuff in firmware.h and kernel/firmware.c that is
#ifdef CONFIG_PPC_PSERIES. Move it all into platforms/pseries.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The e500 core reference manual indicates that isync is required
after mtmsr(DE bit) and mtspr DBCR0. Add isyncs to make the code
conform to the spec.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Updated FP unavailable exception to refer to the correct
function in traps.c. head_booke.h was using the old name, KernelFP,
instead of kernel_fp_unavailable_exception.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Registers system call for the powerpc architecture.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With this, new system calls only have to be wired up in one place
for ARCH=ppc and ARCH=powerpc, rather than 2.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently most callers of lmb_alloc() don't check if it worked or not, if it
ever does weird bad things will probably happen. The few callers who do check
just panic or BUG_ON.
So make lmb_alloc() panic internally, to catch bugs at the source. The few
callers who did check the result no longer need to.
The only caller that did anything interesting with the return result was
careful_allocation(). For it we create __lmb_alloc_base() which _doesn't_ panic
automatically, a little messy, but passable.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
remove extern declarations of pmac_newworld
move pmac_newworld to bss
if there is any "interrupt-controller" device, then it is newworld.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>