android_kernel_xiaomi_sm8350/kernel
Paul Mackerras 564c2b210a perf_counter: Optimize context switch between identical inherited contexts
When monitoring a process and its descendants with a set of inherited
counters, we can often get the situation in a context switch where
both the old (outgoing) and new (incoming) process have the same set
of counters, and their values are ultimately going to be added together.
In that situation it doesn't matter which set of counters are used to
count the activity for the new process, so there is really no need to
go through the process of reading the hardware counters and updating
the old task's counters and then setting up the PMU for the new task.

This optimizes the context switch in this situation.  Instead of
scheduling out the perf_counter_context for the old task and
scheduling in the new context, we simply transfer the old context
to the new task and keep using it without interruption.  The new
context gets transferred to the old task.  This means that both
tasks still have a valid perf_counter_context, so no special case
is introduced when the old task gets scheduled in again, either on
this CPU or another CPU.

The equivalence of contexts is detected by keeping a pointer in
each cloned context pointing to the context it was cloned from.
To cope with the situation where a context is changed by adding
or removing counters after it has been cloned, we also keep a
generation number on each context which is incremented every time
a context is changed.  When a context is cloned we take a copy
of the parent's generation number, and two cloned contexts are
equivalent only if they have the same parent and the same
generation number.  In order that the parent context pointer
remains valid (and is not reused), we increment the parent
context's reference count for each context cloned from it.

Since we don't have individual fds for the counters in a cloned
context, the only thing that can make two clones of a given parent
different after they have been cloned is enabling or disabling all
counters with prctl.  To account for this, we keep a count of the
number of enabled counters in each context.  Two contexts must have
the same number of enabled counters to be considered equivalent.

Here are some measurements of the context switch time as measured with
the lat_ctx benchmark from lmbench, comparing the times obtained with
and without this patch series:

		-----Unmodified-----		With this patch series
Counters:	none	2 HW	4H+4S	none	2 HW	4H+4S

2 processes:
Average		3.44	6.45	11.24	3.12	3.39	3.60
St dev		0.04	0.04	0.13	0.05	0.17	0.19

8 processes:
Average		6.45	8.79	14.00	5.57	6.23	7.57
St dev		1.27	1.04	0.88	1.42	1.46	1.42

32 processes:
Average		5.56	8.43	13.78	5.28	5.55	7.15
St dev		0.41	0.47	0.53	0.54	0.57	0.81

The numbers are the mean and standard deviation of 20 runs of
lat_ctx.  The "none" columns are lat_ctx run directly without any
counters.  The "2 HW" columns are with lat_ctx run under perfstat,
counting cycles and instructions.  The "4H+4S" columns are lat_ctx run
under perfstat with 4 hardware counters and 4 software counters
(cycles, instructions, cache references, cache misses, task
clock, context switch, cpu migrations, and page faults).

[ Impact: performance optimization of counter context-switches ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18966.10666.517218.332164@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-22 12:18:20 +02:00
..
irq Revert "genirq: assert that irq handlers are indeed running in hardirq context" 2009-05-01 15:16:04 +02:00
power PM/Hibernate: Fix waiting for image device to appear on resume 2009-04-24 15:31:30 -07:00
time clockevents: prevent endless loop in tick_handle_periodic() 2009-05-02 10:22:27 +02:00
trace tracing: fix ref count in splice pages 2009-04-29 08:02:44 +02:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c async: remove the temporary (2.6.29) "async is off by default" code 2009-03-28 13:05:30 -07:00
audit_tree.c No need for crossing to mountpoint in audit_tag_tree() 2009-04-20 23:01:15 -04:00
audit.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
audit.h fixing audit rule ordering mess, part 1 2009-01-04 15:14:41 -05:00
auditfilter.c inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive 2009-05-06 16:36:09 -07:00
auditsc.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup_debug.c debug cgroup: remove unneeded cgroup_lock 2009-04-02 19:04:54 -07:00
cgroup_freezer.c
cgroup.c Convert obvious places to deactivate_locked_super() 2009-05-09 10:49:40 -04:00
compat.c signals: implement sys_rt_tgsigqueueinfo 2009-04-30 19:24:24 +02:00
configs.c
cpu.c cpumask: use set_cpu_active in init/main.c 2009-03-30 22:05:12 +10:30
cpuset.c cpusets: prevent PF_THREAD_BOUND tasks from attaching to non-root cpusets 2009-04-02 19:04:57 -07:00
cred-internals.h
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2009-01-09 13:59:25 -08:00
delayacct.c
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c
exec_domain.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
exit.c perf_counter: Dynamically allocate tasks' perf_counter_context struct 2009-05-22 12:18:19 +02:00
extable.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
fork.c perf_counter: Dynamically allocate tasks' perf_counter_context struct 2009-05-22 12:18:19 +02:00
freezer.c
futex_compat.c
futex.c futex: comment requeue key reference semantics 2009-04-02 23:39:53 +02:00
hrtimer.c hrtimer: fix rq->lock inversion (again) 2009-03-31 14:52:52 +02:00
hung_task.c softlockup: ensure the task has been switched out once 2009-02-11 11:04:16 +01:00
itimer.c timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
kallsyms.c Ksplice: Add functions for walking kallsyms symbols 2009-03-31 13:05:32 +10:30
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c kexec: vmcoreinfo_data[] can become static 2009-04-02 19:05:04 -07:00
kfifo.c
kgdb.c sysrq, intel_fb: fix sysrq g collision 2009-05-15 07:56:24 -05:00
kmod.c module: create a request_module_nowait() 2009-03-31 13:05:35 +10:30
kprobes.c kprobes: fix to use text_mutex around arm/disarm kprobe 2009-05-08 16:23:48 -07:00
ksysfs.c kernel/ksysfs.c:fix dependence on CONFIG_NET 2009-01-06 10:44:31 -08:00
kthread.c kthread: move sched-realeted initialization from kthreadd context 2009-04-09 09:50:37 +09:30
latencytop.c sched, latencytop: incorporate review feedback from Andrew Morton 2009-02-11 10:18:04 +01:00
lockdep_internals.h lockdep: get_user_chars() redo 2009-02-14 23:28:22 +01:00
lockdep_proc.c lockstat: warn about disabled lock debugging 2009-02-14 23:28:28 +01:00
lockdep_states.h lockdep: move state bit definitions around 2009-02-14 23:27:59 +01:00
lockdep.c lockdep: more robust lockdep_map init sequence 2009-04-17 18:00:00 +02:00
Makefile Merge commit 'v2.6.30-rc1' into perfcounters/core 2009-04-08 10:35:30 +02:00
marker.c
module.c async: Fix module loading async-work regression 2009-04-11 12:44:49 -07:00
mutex-debug.c mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
mutex-debug.h mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
mutex.c Merge branch 'core/locking' into perfcounters/core 2009-05-06 08:47:26 +02:00
mutex.h mutex: implement adaptive spinning 2009-01-14 18:09:02 +01:00
notifier.c
ns_cgroup.c cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups 2009-04-02 19:04:53 -07:00
nsproxy.c
panic.c Eliminate thousands of warnings with gcc 3.2 build 2009-05-06 16:36:09 -07:00
params.c param: fix charp parameters set via sysfs 2009-03-31 13:05:30 +10:30
perf_counter.c perf_counter: Optimize context switch between identical inherited contexts 2009-05-22 12:18:20 +02:00
pid_namespace.c signals: zap_pid_ns_process() should use force_sig() 2009-04-02 19:04:58 -07:00
pid.c pids: refactor vnr/nr_ns helpers to make them safe 2009-04-02 19:05:02 -07:00
pm_qos_params.c
posix-cpu-timers.c kernel/posix-cpu-timers.c: fix sparse warning 2009-04-30 08:08:31 +02:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 10:23:25 -07:00
profile.c profiling: fix broken profiling regression 2009-02-10 00:50:37 +01:00
ptrace.c ptrace: ptrace_attach: fix the usage of ->cred_exec_mutex 2009-04-27 20:30:51 +10:00
rcuclassic.c kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies 2009-04-03 12:23:02 +02:00
rcupdate.c RCU: Don't try and predeclare inline funcs as it upsets some versions of gcc 2009-04-15 13:55:14 -07:00
rcupreempt_trace.c
rcupreempt.c kmemtrace, rcu: fix rcupreempt.c data structure dependencies 2009-04-03 12:23:04 +02:00
rcutorture.c cpumask: convert rcutorture.c 2009-03-30 22:05:16 +10:30
rcutree_trace.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.h kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies 2009-04-03 12:23:03 +02:00
relay.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
res_counter.c memcg: memory cgroup resource counters for hierarchy 2009-01-08 08:31:05 -08:00
resource.c Remove 'recurse into child resources' logic from 'reserve_region_with_split()' 2009-04-18 21:44:24 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c locking, rtmutex.c: Documentation cleanup 2009-04-29 23:20:17 +02:00
rtmutex.h
rwsem.c
sched_clock.c Merge branch 'tracing/core-v2' into tracing-for-linus 2009-04-02 00:49:02 +02:00
sched_cpupri.c sched_rt: don't allocate cpumask in fastpath 2009-04-01 13:24:51 +02:00
sched_cpupri.h cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sched_debug.c sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched_fair.c Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/core 2009-02-15 21:15:16 +01:00
sched_features.h Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-30 17:17:35 -07:00
sched_idletask.c
sched_rt.c Merge commit 'v2.6.30-rc1' into sched/urgent 2009-04-08 17:26:00 +02:00
sched_stats.h sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched.c perf_counter: Optimize context switch between identical inherited contexts 2009-05-22 12:18:20 +02:00
seccomp.c x86-64: seccomp: fix 32/64 syscall hole 2009-03-02 15:41:30 -08:00
semaphore.c
signal.c signals: implement sys_rt_tgsigqueueinfo 2009-04-30 19:24:24 +02:00
slow-work.c Delete slow-work timers properly 2009-04-24 07:47:59 -07:00
smp.c generic-ipi: eliminate WARN_ON()s during oops/panic 2009-03-13 10:47:34 +01:00
softirq.c kernel/softirq.c: fix sparse warning 2009-04-17 01:57:54 +02:00
softlockup.c softlockup: decouple hung tasks check from softlockup detection 2009-01-16 14:06:04 +01:00
spinlock.c Allow rwlocks to re-enable interrupts 2009-04-02 19:05:11 -07:00
srcu.c
stacktrace.c
stop_machine.c cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sys_ni.c Merge commit 'v2.6.29-rc2' into perfcounters/core 2009-01-21 16:37:27 +01:00
sys.c Merge branch 'linus' into perfcounters/core 2009-04-29 14:47:05 +02:00
sysctl_check.c net: add ARP notify option for devices 2009-02-01 01:04:33 -08:00
sysctl.c Merge commit 'v2.6.30-rc6' into perfcounters/core 2009-05-18 07:37:49 +02:00
taskstats.c cpumask: convert rest of files in kernel/ 2009-01-01 10:12:28 +10:30
test_kprobes.c kprobes: add tests for register_kprobes 2009-01-06 15:59:20 -08:00
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c Merge branch 'linus' into perfcounters/core 2009-04-29 14:47:05 +02:00
tracepoint.c tracepoints: dont update zero-sized tracepoint sections 2009-03-18 19:55:00 +01:00
tsacct.c Fix fixpoint divide exception in acct_update_integrals 2009-03-09 08:13:35 -07:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user_namespace.c Fix recursive lock in free_uid()/free_user_ns() 2009-02-27 16:26:21 -08:00
user.c Merge branch 'master' into next 2009-03-24 10:52:46 +11:00
utsname_sysctl.c proc_sysctl: use CONFIG_PROC_SYSCTL around ipc and utsname proc_handlers 2009-04-02 19:05:01 -07:00
utsname.c
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c work_on_cpu(): rewrite it to create a kernel thread on demand 2009-04-09 09:50:37 +09:30