android_kernel_xiaomi_sm8350/kernel
Nick Piggin cc7a486cac generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask()
* Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:

> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask() with
> wait parameter.
>
> The actual oops looked like
>
> [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
> [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
> [   11.278155] PGD 202063 PUD 0
> [   11.278576] Oops: 0010 [1] SMP
> [   11.279006] CPU 5
> [   11.279336] Modules linked in:
> [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f #290
> [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
> [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
> [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
> [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
> [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
> [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
> [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
> [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
> [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
> [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
> [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
> [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
> [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
> [   11.287120] Call Trace:
> [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
> [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
> [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
> [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
> [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
> [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
> [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
> [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
> [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
> [   11.291010]
> [   11.291287]
> [   11.291654] Code:  Bad RIP value.
> [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
> [   11.292380]  RSP <ffff88027f1f7f70>
> [   11.292741] CR2: ffff8802ffffffff
> [   11.310951] ---[ end trace 137c54d525305f1c ]---
>
> The problem is with the following sequence of events:
>
> - CPU A calls smp_call_function_mask() for CPU B with wait parameter
> - CPU A sets up the call_function_data on the stack and does an rcu add to
>   call_function_queue
> - CPU A waits until the WAIT flag is cleared
> - CPU B gets the call function interrupt and starts going through the
>   call_function_queue
> - CPU C also gets some other call function interrupt and starts going through
>   the call_function_queue
> - CPU C, which is also going through the call_function_queue, starts referencing
>   CPU A's stack, as that element is still in call_function_queue
> - CPU B finishes the function call that CPU A set up and as there are no other
>   references to it, rcu deletes the call_function_data (which was from CPU A
>   stack)
> - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
> - CPU A which was waiting on the flag continues executing and the stack
>   contents change
>
> - CPU C is still in rcu_read section accessing the CPU A's stack sees
>   inconsistent call_funation_data and can try to execute
>   function with some random pointer, causing stack corruption for A
>   (by clearing the bits in mask field) and oops.

Nice debugging work.

I'd suggest something like the attached (boot tested) patch as the simple
fix for now.

I expect the benefits from the less synchronized, multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 15:21:28 +02:00
..
irq genirq: better warning on irqchip->set_type() failure 2008-08-05 14:33:47 -07:00
power kexec jump: save/restore device state 2008-07-26 12:00:04 -07:00
time cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu 2008-07-26 16:40:33 +02:00
trace Merge branch 'linus' into cpus4096 2008-07-28 23:32:00 +02:00
.gitignore
acct.c bsdacct: fix and add comments around acct_process() 2008-07-25 10:53:47 -07:00
audit_tree.c
audit.c [PATCH] Fix the bug of using AUDIT_STATUS_RATE_LIMIT when set fail, no error output. 2008-08-01 12:15:16 -04:00
audit.h
auditfilter.c Re: [PATCH] the loginuid field should be output in all AUDIT_CONFIG_CHANGE audit messages 2008-08-01 12:15:03 -04:00
auditsc.c Re: [PATCH] Fix the kernel panic of audit_filter_task when key field is set 2008-08-04 06:13:50 -04:00
backtracetest.c backtrace: replace timer with tasklet + completions 2008-06-27 18:09:16 +02:00
bounds.c
capability.c security: filesystem capabilities refactor kernel code 2008-07-24 10:47:22 -07:00
cgroup_debug.c
cgroup.c cgroup: uninline cgroup_has_css_refs() 2008-07-30 09:41:44 -07:00
compat.c
configs.c
cpu.c Merge branch 'linus' into cpus4096 2008-07-28 23:32:00 +02:00
cpuset.c cpuset: clean up cpuset hierarchy traversal code 2008-07-30 09:41:44 -07:00
delayacct.c per-task-delay-accounting: update taskstats for memory reclaim delay 2008-07-25 10:53:47 -07:00
dma-coherent.c dma: fix order calculation in dma_mark_declared_memory_occupied() 2008-08-05 14:33:49 -07:00
dma.c
exec_domain.c [PATCH] kill altroot 2008-07-26 20:53:20 -04:00
exit.c tracehook: fix exit_signal=0 case 2008-08-01 12:01:11 -07:00
extable.c
fork.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
futex_compat.c
futex.c futexes: fix fault handling in futex_lock_pi 2008-06-23 13:31:15 +02:00
hrtimer.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
itimer.c
kallsyms.c kallsyms: fix potential overflow in binary search 2008-07-25 10:53:27 -07:00
Kconfig.hz sched: fix hrtick & generic-ipi dependency 2008-07-23 11:18:28 +02:00
Kconfig.preempt
kexec.c kexec jump: save/restore device state 2008-07-26 12:00:04 -07:00
kfifo.c
kgdb.c kgdb: fix gdb serial thread queries 2008-08-01 08:39:35 -05:00
kmod.c call_usermodehelper(): increase reliability 2008-07-25 10:53:28 -07:00
kprobes.c kprobes: remove redundant config check 2008-07-25 10:53:30 -07:00
ksysfs.c
kthread.c tracehook: wait_task_inactive 2008-07-26 12:00:09 -07:00
latencytop.c
lockdep_internals.h lockdep: add lock_class information to lock_chain and output it 2008-06-24 01:28:20 +02:00
lockdep_proc.c lockdep: add lock_class information to lock_chain and output it 2008-06-24 01:28:20 +02:00
lockdep.c Merge branch 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-14 14:55:13 -07:00
Makefile Merge branch 'linus' into core/generic-dma-coherent 2008-07-29 00:07:55 +02:00
marker.c markers: fix markers read barrier for multiple probes 2008-07-30 09:41:45 -07:00
module.c stop_machine: Wean existing callers off stop_machine_run() 2008-07-28 12:16:31 +10:00
mutex-debug.c
mutex-debug.h
mutex.c locking: fix mutex @key parameter kernel-doc notation 2008-07-28 18:12:36 +02:00
mutex.h
notifier.c
ns_cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
nsproxy.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
panic.c Add a WARN() macro; this is WARN_ON() + printk arguments 2008-07-25 10:53:29 -07:00
params.c
pid_namespace.c bsdacct: switch from global bsd_acct_struct instance to per-pidns one 2008-07-25 10:53:47 -07:00
pid.c pidns: remove now unused find_pid function. 2008-07-25 10:53:45 -07:00
pm_qos_params.c pm_qos: spelling fixes 2008-08-05 14:33:50 -07:00
posix-cpu-timers.c
posix-timers.c posix timers: release_posix_timer: kill the bogus put_task_struct(->it_process); 2008-07-25 10:53:38 -07:00
printk.c printk: fix comment for printk ratelimiting 2008-07-30 09:41:45 -07:00
profile.c build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
ptrace.c tracehook: wait_task_inactive 2008-07-26 12:00:09 -07:00
rcuclassic.c stop_machine: Wean existing callers off stop_machine_run() 2008-07-28 12:16:31 +10:00
rcupdate.c Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-15 14:12:03 -07:00
rcupreempt_trace.c
rcupreempt.c Merge branch 'linus' into cpus4096 2008-07-16 00:29:07 +02:00
rcutorture.c rcu: make rcutorture even more vicious: invoke RCU readers from irq handlers (timers) 2008-06-26 09:24:33 +02:00
relay.c relay: fix "full buffer with exactly full last subbuffer" accounting problem 2008-08-05 14:33:46 -07:00
res_counter.c cgroup files: convert res_counter_write() to be a cgroups write_string() handler 2008-07-25 10:53:36 -07:00
resource.c resource: add resource_size() 2008-07-30 09:41:43 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c sysdev: Pass the attribute to the low level sysdev show/store function 2008-07-21 21:55:02 -07:00
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c Merge branch 'sched/clock' into sched/devel 2008-07-14 12:19:13 +02:00
sched_cpupri.c sched: use a 2-d bitmap for searching lowest-pri CPU 2008-06-06 15:19:28 +02:00
sched_cpupri.h sched: fix the cpuprio count really 2008-06-06 15:19:44 +02:00
sched_debug.c sched: add full schedstats to /proc/sched_debug 2008-06-27 14:31:31 +02:00
sched_fair.c Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-23 19:36:53 -07:00
sched_features.h sched: bias effective_load() error towards failing wake_affine(). 2008-06-27 14:31:47 +02:00
sched_idletask.c
sched_rt.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-24 12:53:51 -07:00
sched_stats.h sched: fix accounting in task delay accounting & migration 2008-07-04 12:50:23 +02:00
sched.c __sched_setscheduler: don't do any policy checks when not "user" 2008-08-04 17:16:20 -07:00
seccomp.c
semaphore.c semaphore: __down_common: use signal_pending_state() 2008-08-05 14:33:47 -07:00
signal.c tracehook: force signal_pending() 2008-07-26 12:00:09 -07:00
smp.c generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() 2008-08-11 15:21:28 +02:00
softirq.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
softlockup.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
spinlock.c
srcu.c
stacktrace.c stacktrace: fix modular build, export print_stack_trace and save_stack_trace 2008-06-30 09:20:55 +02:00
stop_machine.c stop_machine(): stop_machine_run() changed to use cpu mask 2008-07-28 12:16:30 +10:00
sys_ni.c signalfd: fix undefined reference to `compat_sys_signalfd4' when !CONFIG_SIGNALFD 2008-07-25 11:35:41 -07:00
sys.c kexec jump 2008-07-26 12:00:04 -07:00
sysctl_check.c sysctl: check for bogus modes 2008-07-25 10:53:45 -07:00
sysctl.c lost sysctl fix 2008-07-27 09:45:34 -07:00
taskstats.c taskstats: remove initialization of static per-cpu variable 2008-07-25 10:53:47 -07:00
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2008-07-14 16:06:58 -07:00
tsacct.c task IO accounting: move all IO statistics in struct task_io_accounting 2008-07-27 16:12:28 -07:00
uid16.c
user_namespace.c
user.c
utsname_sysctl.c
utsname.c
wait.c
workqueue.c workqueues: add comments to __create_workqueue_key() 2008-07-30 09:41:47 -07:00