This removes redundant arch code for generic ptrace requests
already handled by ptrace_request and compat_ptrace_request.
It simplifies things to just have the standard entry points,
and use the generic compat_sys_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a bug with cpu bound guest on kvm-s390. Sometimes it
was impossible to deliver a signal to a spinning guest. We used
preemption as a circumvention. The preemption notifiers called
vcpu_load, which checked for pending signals and triggered a host
intercept. But even with preemption, a sigkill was not delivered
immediately.
This patch changes the low level host interrupt handler to check for the
SIE instruction, if TIF_WORK is set. In that case we change the
instruction pointer of the return PSW to rerun the vcpu_run loop. The kvm
code sees an intercept reason 0 if that happens. This patch adds accounting
for these types of intercept as well.
The advantages:
- works with and without preemption
- signals are delivered immediately
- much better host latencies without preemption
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On return from syscall or interrupt, we have to check if we return to
userspace (likely) and if there is work todo (less likely) to decide
if we handle the work. We can optimize this check: we first check for
the less likely work case and then check for userspace.
This patch is also a preparation for an additional patch, that fixes a bug
in KVM dealing with cpu bound guests.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks. Those low
bits are scarce, and are all used up now. Renumber TIF_RESTORE_SIGMASK to
free one up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the PT_IEEE_IP hack has been removed s390 can now use
the common code sys_ptrace function.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The self referential PT_IEEE_IP ptrace peek & poke calls have been
broken for that last 6 years. For peek the code always returns 0
instead of the last ieee fault and for poke the code does nothing.
Since nobody noticed the code seems to be superfluous. So lets
remove it.
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Heiko Carstens <heiko.carstens@de.ibm.com>
From: Carsten Otte <cotte@de.ibm.com>
This lets us use defines for the magic bits in machine flags instead
of using plain numbers all over the place.
In addition on newer machines features/facilities are indicated by the
result of the stfl instruction. So we use these bits instead of trying
to execute new instructions and check wether we get an exception or
not.
Also the mvpg instruction is always available when in zArch mode,
whereas the idte instruction is only available in zArch mode. This
results in some minor optimizations.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When we get a notification that cpu topology changed, we schedule a
work struct which just calls arch_reinit_sched_domains. This function
in turn calls get_online_cpus() which results int the lockdep warning
below.
After all it turnded out that it's not legal to call get_online_cpus()
from the context of a multi-threaded work queue.
It could deadlock this way:
process 0 (events/cpu-x):
-> run_workqueue
-> removes my work_struct from the work queue
-> calls work_struct->fn
-> get_online_cpus()
-> locks on cpu_hotplug.lock since process 1 below is doing cpu hotplug
process 1:
-> cpu_down (for cpu-x)
-> cpu_hotplug_begin (holds cpu_hotplug.lock now)
-> cpu-x dead
-> notifier_call_chain with CPU_DEAD
-> cleanup_workqueue_thread
-> flush_cpu_workqueue (succeeds)
-> kthread_stop for events/cpu-x
-> now kthread_stop waits for my work_struct to complete from within
process 0. -> dead.
A single threaded workqueue wouldn't have such problems, however there is
no such common queue available and it's not worth to create one for the
very rare calls to arch_reinit_sched_domains.
So we just create a kernel thread from our work struct which calls
arch_reinit_sched_domains and are done with it.
Thanks to Oleg Nesterov and Peter Zijlstra for helping me figuring out
that this isn't a false positive lockdep warning:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25-03562-g3dc5063-dirty #12
-------------------------------------------------------
events/3/14 is trying to acquire lock:
(&cpu_hotplug.lock){--..}, at: [<0000000000076094>] get_online_cpus+0x50/0x78
but task is already holding lock:
(topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (topology_work){--..}:
[<000000000006fc74>] __lock_acquire+0x1010/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<0000000000059d48>] run_workqueue+0x170/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
-> #1 (events){--..}:
[<000000000006fc74>] __lock_acquire+0x1010/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<000000000005a23c>] cleanup_workqueue_thread+0x60/0xa8
[<00000000003b2ab8>] workqueue_cpu_callback+0xbc/0x170
[<00000000003bba80>] notifier_call_chain+0x5c/0xa4
[<00000000000655a2>] __raw_notifier_call_chain+0x26/0x38
[<00000000000655e2>] raw_notifier_call_chain+0x2e/0x40
[<0000000000075e00>] cpu_down+0x228/0x31c
[<00000000003b1dd8>] store_online+0x64/0xb8
[<00000000001e7128>] sysdev_store+0x48/0x58
[<0000000000121cd2>] sysfs_write_file+0x126/0x1c0
[<00000000000c1944>] vfs_write+0xb0/0x15c
[<00000000000c20e6>] sys_write+0x56/0x88
[<0000000000027a68>] sys32_write+0x34/0x4c
[<0000000000023f70>] sysc_noemu+0x10/0x16
[<0000000077f3f186>] 0x77f3f186
-> #0 (&cpu_hotplug.lock){--..}:
[<000000000006fa84>] __lock_acquire+0xe20/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<00000000003b701c>] mutex_lock_nested+0xd0/0x364
[<0000000000076094>] get_online_cpus+0x50/0x78
[<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58
[<000000000002700e>] topology_work_fn+0x26/0x34
[<0000000000059d4e>] run_workqueue+0x176/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
other info that might help us debug this:
2 locks held by events/3/14:
#0: (events){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
#1: (topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
stack backtrace:
CPU: 3 Not tainted 2.6.25-03562-g3dc5063-dirty #12
Process events/3 (pid: 14, task: 000000002fb04038, ksp: 000000002fb0bd70)
0400000000000000 000000002fb0ba40 0000000000000002 0000000000000000
000000002fb0bae0 000000002fb0ba58 000000002fb0ba58 0000000000016488
0000000000000000 000000002fb0bd70 0000000000000000 0000000000000000
000000002fb0ba40 000000000000000c 000000002fb0ba40 000000002fb0bab0
00000000003c99e0 0000000000016488 000000002fb0ba40 000000002fb0ba90
Call Trace:
([<00000000000163fc>] show_trace+0x138/0x158)
[<00000000000164e2>] show_stack+0xc6/0xf8
[<0000000000016624>] dump_stack+0xb0/0xc0
[<000000000006cd36>] print_circular_bug_tail+0xa2/0xb4
[<000000000006fa84>] __lock_acquire+0xe20/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<00000000003b701c>] mutex_lock_nested+0xd0/0x364
[<0000000000076094>] get_online_cpus+0x50/0x78
[<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58
[<000000000002700e>] topology_work_fn+0x26/0x34
[<0000000000059d4e>] run_workqueue+0x176/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On some smp sysfs store attributes get_online_cpus() may block on
cpu_hotplug.lock, but we hold already smp_cpu_state_mutex. Since the
locking order on cpu hotplug via arch_update_cpu_topology is inverse
this might lead to deadlocks.
So make sure locking order is always the same.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is where it should be and we can get rid of some externs
and a static inline function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
New version that does not preserve the marker. Arch maintainers indicate
that the marker functionality is is not needed anymore.
Note you may simplify the s390 asm-offsets.c code further if you use the
OFFSET() macro instead of the DEFINE. See kbuild.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 has a strange marker in DEFINE. Undefine the DEFINE from kbuild.h and
define it the way s390 wants it to preserve things as they were.
May be good if the arch maintainer could go over this and check if this
workaround is really necessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for __do_softirq() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
(aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.
The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
arch callbacks for kvm. __vcpu_run calls back into
sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
context via SIE, and switches world before and after that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
virtual machines on the mainframe
include/asm-s390/kvm.h defines kvm_regs and friends for user access to
guest register content
arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
later patches
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Newer s390 models have a breaking-event-address-recording register.
Each time an instruction causes a break in the sequential instruction
execution, the address is saved in that hardware register. On a program
interrupt the address is copied to the lowcore address 272-279, which
makes it software accessible.
This patch changes the program check handler and the stack overflow
checker to copy the value into the pt_regs argument.
The oops output is enhanced to show the last known breaking address.
It might give additional information if the stack trace is corrupted.
The feature is only available on 64 bit.
The new oops output looks like:
[---------snip----------]
Modules linked in: vmcp sunrpc qeth_l2 dm_mod qeth ccwgroup
CPU: 2 Not tainted 2.6.24zlive-host #8
Process modprobe (pid: 4788, task: 00000000bf3d8718, ksp: 00000000b2b0b8e0)
Krnl PSW : 0704200180000000 000003e000020028 (vmcp_init+0x28/0xe4 [vmcp])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 0000000004000002 000003e000020000 0000000000000000 0000000000000001
000000000015734c ffffffffffffffff 000003e0000b3b00 0000000000000000
000003e00007ca30 00000000b5bb5d40 00000000b5bb5800 000003e0000b3b00
000003e0000a2000 00000000003ecf50 00000000b2b0bd50 00000000b2b0bcb0
Krnl Code: 000003e000020018: c0c000040ff4 larl %r12,3e0000a2000
000003e00002001e: e3e0f0000024 stg %r14,0(%r15)
000003e000020024: a7f40001 brc 15,3e000020026
>000003e000020028: e310c0100004 lg %r1,16(%r12)
000003e00002002e: c020000413dc larl %r2,3e0000a27e6
000003e000020034: c0a00004aee6 larl %r10,3e0000b5e00
000003e00002003a: a7490001 lghi %r4,1
000003e00002003e: a75900f0 lghi %r5,240
Call Trace:
([<000000000014b300>] blocking_notifier_call_chain+0x2c/0x40)
[<000000000015735c>] sys_init_module+0x19d8/0x1b08
[<0000000000110afc>] sysc_noemu+0x10/0x16
[<000002000011cda2>] 0x2000011cda2
Last Breaking-Event-Address:
[<000003e000020024>] vmcp_init+0x24/0xe4 [vmcp]
[---------snip----------]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Most noteable part of this commit is the new local header file entry.h
which contains all the function declarations of functions that get only
called from asm code or are arch internal. That way we can avoid extern
declarations in C files.
This is more or less the same that was done for sparc64.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks
variant instead. In addition we get high resolution timers for free.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Remove the program check generating monitor calls and use function
calls instead. Theres is no real advantage in using monitor calls,
but they do make debugging harder, because of all the program checks
it generates.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The new function supports setting of permissions for the debugfs files
created by the debug feature. In addition to that, the function provides
uid and gid as parameters for future use. Currently only root is allowed
for uid and gid.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add get_clock_xt to read an 8 byte clock value using store clock
extended (STCKE) and use get_clock_xt for sched_clock. STCKE should
be faster than STCK on newer machines.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If vertical cpu polarization is active then the hypervisor will
dispatch certain cpus for a longer time than other cpus for maximum
performance. For example if a guest would have three virtual cpus,
each of them with a share of 33 percent, then in case of vertical
cpu polarization all of the processing time would be combined to a
single cpu which would run all the time, while the other two cpus
would get nearly no cpu time.
There are three different types of vertical cpus: high, medium and
low. Low cpus hardly get any real cpu time, while high cpus get a
full real cpu. Medium cpus get something in between.
In order to switch between the two possible modes (default is
horizontal) a 0 for horizontal polarization or a 1 for vertical
polarization must be written to the dispatching sysfs attribute:
/sys/devices/system/cpu/dispatching
The polarization of each single cpu can be figured out by the
polarization sysfs attribute of each cpu:
/sys/devices/system/cpu/cpuX/polarization
horizontal, vertical:high, vertical:medium, vertical:low or unknown.
When switching polarization the polarization attribute may contain
the value unknown until the configuration change is done and the
kernel has figured out the new polarization of each cpu.
Note that running a system with different types of vertical cpus may
result in significant performance regressions. If possible only one
type of vertical cpus should be used. All other cpus should be
offlined.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add s390 backend so we can give the scheduler some hints about the
cpu topology.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Make stfle visible so other code can call this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This is just a port of 83bd01024b
"x86: protect against sigaltstack wraparound".
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
/sys/firmware/reipl/nss/name contains the nss name when defsys or
savesys command has been executed. If the defsys or savesys command
fails the kernel_nss_name has to be cleared since a reipl on that
nss name won't be possible.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Normally this should not happen, but it's cleaner to do it that way.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Compile smp.o with -Wno-nonnull so gcc stops warning about memcpy
being used with a null parameter. Also remove the workaround code
and use a char * cast instead of a void * cast to do computations.
Cc: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If a machine check handling is pending when the idle loop is entered
default_idle will be left with timer ticks and virtual timer disabled.
Fix this by "calling" the idle_chain. Also a BUG_ON(!in_interrupt) in
start_hz_timer must be removed since the function now gets called from
non interrupt context as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since a5fbb6d106
"KVM: fix !SMP build error" smp_call_function isn't a define anymore
that folds into nothing but a define that calls up_smp_call_function
with all parameters. Hence we cannot #ifdef out the unused code
anymore...
This seems to be the preferred method, so do this for s390 as well.
arch/s390/kernel/time.c: In function 'etr_sync_clock':
arch/s390/kernel/time.c:825: error: 'clock_sync_cpu_start' undeclared
arch/s390/kernel/time.c:862: error: 'clock_sync_cpu_end' undeclared
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just copy the first 512 read-only bytes of the current cpu lowcore if
a new cpu gets onlined. The rest is zeroed out and must be explicitly
initialized. Current code just copies the entire lowcore and
initializes the needed fields.
This should reveal bugs in future enhancements quite early.
Also when the lowcore of the first cpu is replaced this is now done
atomically (no interrupts, no machine checks).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If both NO_IDLE_HZ and VIRT_TIMER are disabled default_idle won't load
an enabled wait psw and busy loop instead. This is because the
idle_chain is empty and the return value of atomic_notifier_call_chain
will be NOTIFY_DONE, which causes default_idle to return instead of
loading an enabled wait psw.
Fix this by calling __atomic_notifier_call_chain instead and add proper
return value handling.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for different number of page table levels dependent
on the highest address used for a process. This will cause a 31 bit
process to use a two level page table instead of the four level page
table that is the default after the pud has been introduced. Likewise
a normal 64 bit process will use three levels instead of four. Only
if a process runs out of the 4 tera bytes which can be addressed with
a three level page table the fourth level is dynamically added. Then
the process can use up to 8 peta byte.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently we possibly lookup the pid in the wrong pid namespace. So
seq_file convert proc_pid_status which ensures the proper pid namespaces is
passed in.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: s390 build fix]
[akpm@linux-foundation.org: fix task_name() output]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] dcss: Initialize workqueue before using it.
[S390] Remove BUILD_BUG_ON() in vmem code.
[S390] sclp_tty/sclp_vt220: Fix scheduling while atomic
[S390] dasd: fix panic caused by alias device offline
[S390] dasd: add ifcc handling
[S390] latencytop s390 support.
[S390] Implement ext2_find_next_bit.
[S390] Cleanup & optimize bitops.
[S390] Define GENERIC_LOCKBREAK.
[S390] console: allow vt220 console to be the only console
[S390] Fix couple of section mismatches.
[S390] Fix smp_call_function_mask semantics.
[S390] Fix linker script.
[S390] DEBUG_PAGEALLOC support for s390.
[S390] cio: Add shutdown callback for ccwgroup.
[S390] cio: Update documentation.
[S390] cio: Clean up chsc response code handling.
[S390] cio: make sense id procedure work with partial hardware response
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>