ff6552e4f3
4 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Heiko Carstens
|
fd781fa25c |
[S390] cpu topology: Fix possible deadlock.
When we get a notification that cpu topology changed, we schedule a work struct which just calls arch_reinit_sched_domains. This function in turn calls get_online_cpus() which results int the lockdep warning below. After all it turnded out that it's not legal to call get_online_cpus() from the context of a multi-threaded work queue. It could deadlock this way: process 0 (events/cpu-x): -> run_workqueue -> removes my work_struct from the work queue -> calls work_struct->fn -> get_online_cpus() -> locks on cpu_hotplug.lock since process 1 below is doing cpu hotplug process 1: -> cpu_down (for cpu-x) -> cpu_hotplug_begin (holds cpu_hotplug.lock now) -> cpu-x dead -> notifier_call_chain with CPU_DEAD -> cleanup_workqueue_thread -> flush_cpu_workqueue (succeeds) -> kthread_stop for events/cpu-x -> now kthread_stop waits for my work_struct to complete from within process 0. -> dead. A single threaded workqueue wouldn't have such problems, however there is no such common queue available and it's not worth to create one for the very rare calls to arch_reinit_sched_domains. So we just create a kernel thread from our work struct which calls arch_reinit_sched_domains and are done with it. Thanks to Oleg Nesterov and Peter Zijlstra for helping me figuring out that this isn't a false positive lockdep warning: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.25-03562-g3dc5063-dirty #12 ------------------------------------------------------- events/3/14 is trying to acquire lock: (&cpu_hotplug.lock){--..}, at: [<0000000000076094>] get_online_cpus+0x50/0x78 but task is already holding lock: (topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (topology_work){--..}: [<000000000006fc74>] __lock_acquire+0x1010/0x111c [<000000000006fe40>] lock_acquire+0xc0/0xf8 [<0000000000059d48>] run_workqueue+0x170/0x278 [<0000000000059edc>] worker_thread+0x8c/0xf0 [<000000000005f5bc>] kthread+0x68/0xa0 [<000000000001a33e>] kernel_thread_starter+0x6/0xc [<000000000001a338>] kernel_thread_starter+0x0/0xc -> #1 (events){--..}: [<000000000006fc74>] __lock_acquire+0x1010/0x111c [<000000000006fe40>] lock_acquire+0xc0/0xf8 [<000000000005a23c>] cleanup_workqueue_thread+0x60/0xa8 [<00000000003b2ab8>] workqueue_cpu_callback+0xbc/0x170 [<00000000003bba80>] notifier_call_chain+0x5c/0xa4 [<00000000000655a2>] __raw_notifier_call_chain+0x26/0x38 [<00000000000655e2>] raw_notifier_call_chain+0x2e/0x40 [<0000000000075e00>] cpu_down+0x228/0x31c [<00000000003b1dd8>] store_online+0x64/0xb8 [<00000000001e7128>] sysdev_store+0x48/0x58 [<0000000000121cd2>] sysfs_write_file+0x126/0x1c0 [<00000000000c1944>] vfs_write+0xb0/0x15c [<00000000000c20e6>] sys_write+0x56/0x88 [<0000000000027a68>] sys32_write+0x34/0x4c [<0000000000023f70>] sysc_noemu+0x10/0x16 [<0000000077f3f186>] 0x77f3f186 -> #0 (&cpu_hotplug.lock){--..}: [<000000000006fa84>] __lock_acquire+0xe20/0x111c [<000000000006fe40>] lock_acquire+0xc0/0xf8 [<00000000003b701c>] mutex_lock_nested+0xd0/0x364 [<0000000000076094>] get_online_cpus+0x50/0x78 [<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58 [<000000000002700e>] topology_work_fn+0x26/0x34 [<0000000000059d4e>] run_workqueue+0x176/0x278 [<0000000000059edc>] worker_thread+0x8c/0xf0 [<000000000005f5bc>] kthread+0x68/0xa0 [<000000000001a33e>] kernel_thread_starter+0x6/0xc [<000000000001a338>] kernel_thread_starter+0x0/0xc other info that might help us debug this: 2 locks held by events/3/14: #0: (events){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278 #1: (topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278 stack backtrace: CPU: 3 Not tainted 2.6.25-03562-g3dc5063-dirty #12 Process events/3 (pid: 14, task: 000000002fb04038, ksp: 000000002fb0bd70) 0400000000000000 000000002fb0ba40 0000000000000002 0000000000000000 000000002fb0bae0 000000002fb0ba58 000000002fb0ba58 0000000000016488 0000000000000000 000000002fb0bd70 0000000000000000 0000000000000000 000000002fb0ba40 000000000000000c 000000002fb0ba40 000000002fb0bab0 00000000003c99e0 0000000000016488 000000002fb0ba40 000000002fb0ba90 Call Trace: ([<00000000000163fc>] show_trace+0x138/0x158) [<00000000000164e2>] show_stack+0xc6/0xf8 [<0000000000016624>] dump_stack+0xb0/0xc0 [<000000000006cd36>] print_circular_bug_tail+0xa2/0xb4 [<000000000006fa84>] __lock_acquire+0xe20/0x111c [<000000000006fe40>] lock_acquire+0xc0/0xf8 [<00000000003b701c>] mutex_lock_nested+0xd0/0x364 [<0000000000076094>] get_online_cpus+0x50/0x78 [<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58 [<000000000002700e>] topology_work_fn+0x26/0x34 [<0000000000059d4e>] run_workqueue+0x176/0x278 [<0000000000059edc>] worker_thread+0x8c/0xf0 [<000000000005f5bc>] kthread+0x68/0xa0 [<000000000001a33e>] kernel_thread_starter+0x6/0xc [<000000000001a338>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> |
||
Heiko Carstens
|
d00aa4e7d0 |
[S390] Add topology_core_siblings to topology.h
This exposes the core siblings to user space via sysfs. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> |
||
Heiko Carstens
|
c10fde0d9e |
[S390] Vertical cpu management.
If vertical cpu polarization is active then the hypervisor will dispatch certain cpus for a longer time than other cpus for maximum performance. For example if a guest would have three virtual cpus, each of them with a share of 33 percent, then in case of vertical cpu polarization all of the processing time would be combined to a single cpu which would run all the time, while the other two cpus would get nearly no cpu time. There are three different types of vertical cpus: high, medium and low. Low cpus hardly get any real cpu time, while high cpus get a full real cpu. Medium cpus get something in between. In order to switch between the two possible modes (default is horizontal) a 0 for horizontal polarization or a 1 for vertical polarization must be written to the dispatching sysfs attribute: /sys/devices/system/cpu/dispatching The polarization of each single cpu can be figured out by the polarization sysfs attribute of each cpu: /sys/devices/system/cpu/cpuX/polarization horizontal, vertical:high, vertical:medium, vertical:low or unknown. When switching polarization the polarization attribute may contain the value unknown until the configuration change is done and the kernel has figured out the new polarization of each cpu. Note that running a system with different types of vertical cpus may result in significant performance regressions. If possible only one type of vertical cpus should be used. All other cpus should be offlined. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> |
||
Heiko Carstens
|
dbd70fb499 |
[S390] cpu topology support for s390.
Add s390 backend so we can give the scheduler some hints about the cpu topology. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> |