Impact: Reduce future memory usage, use new cpumask API.
Since the last patch was created and acked, more old cpumask users
slipped into kernel/trace.
Mostly trivial conversions, except struct trace_iterator's "started"
member becomes a cpumask_var_t.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: simplify/generalize/refactor trace.c
The trace.c file is becoming more difficult to maintain due to the
growing number of events. There is several formats that an event may
be printed. This patch sets up the infrastructure of an event hash to
allow for events to register how they should be printed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, remove obsolete code
Now that the ring buffer used by ftrace allows for variable length
entries, we do not need the 'cont' feature of the buffer. This code
makes other parts of ftrace more complex and by removing this it
simplifies the ftrace code.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: tracer output improvement
Ending newlines are appended automatically on comments by the function
graph tracer because the newline needs to be placed after the "*/"
comment characters.
So if the user puts an ending newline, we want to strip it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: provide trace headers to explain a bit the output
This patch implements the print_headers callback for the function graph
tracer. These headers are output according to the current trace options.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Handle the TRACE_PRINT entries from the function grapg tracer
and output them as a C comment just below the function that called
it, as if it was a comment inside this function.
Example with an ftrace_printk inside might_sleep() function:
void __might_sleep(char *file, int line)
{
static unsigned long prev_jiffy; /* ratelimiting */
ftrace_printk("Hi I'm a comment in might_sleep() :-)");
A chunk of a resulting trace:
0) | _reiserfs_free_block() {
0) | reiserfs_read_bitmap_block() {
0) | __bread() {
0) | __getblk() {
0) | __find_get_block() {
0) 0.698 us | mark_page_accessed();
0) 2.267 us | }
0) | __might_sleep() {
0) | /* Hi I'm a comment in might_sleep() :-) */
0) 1.321 us | }
0) 5.872 us | }
0) 7.313 us | }
0) 8.718 us | }
And this patch brings two minor fixes:
- The newline after a switch-out task has disappeared
- The "|" sign just before the cpu number on task-switch has been deleted.
0) 0.616 us | pick_next_task_rt();
0) 1.457 us | _spin_trylock();
0) 0.653 us | _spin_unlock();
0) 0.728 us | _spin_trylock();
0) 0.631 us | _spin_unlock();
0) 0.729 us | native_load_sp0();
0) 0.593 us | native_load_tls();
------------------------------------------
0) cat-2834 => migrati-3
------------------------------------------
0) | finish_task_switch() {
0) 0.841 us | _spin_unlock_irq();
0) 0.616 us | post_schedule_rt();
0) 3.882 us | }
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: better trace output of duration for long calls
The old duration output didn't exceeded 9999.999 us to fit the column
and the nanosecs were always 3 numbers. As Ingo suggested, it's better
to have the whole microseconds elapsed time and shift the nanosecs precision
if needed to fit the maximum 7 numbers. And usec need more number, the case
should be rare and important enough to break a bit the column alignment to
show it.
So, depending of the duration value, we now have these patterns:
u.nnn us
uu.nnn us
uuu.nnn us
uuuu.nnn us
uuuuu.nn us
uuuuuu.n us
uuuuuuuu..... us
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: extend function-graph output: let one know which thread called a function
This patch implements a helper function to print the couple cmdline/pid.
Its output is provided during task switching and on each row if the new
"funcgraph-proc" defualt-off option is set through trace_options file.
The output is center aligned and never exceeds 14 characters. The cmdline
is truncated over 7 chars.
But note that if the pid exceeds 6 characters, the column will overflow (but
the situation is abnormal).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: increase the visual qualities of the call-graph-tracer output
This patch applies various trace output formatting changes:
- CPU is now a decimal number, followed by a parenthesis.
- Overhead is now on the second column (gives a good visibility)
- Cost is now on the third column, can't exceed 9999.99 us. It is
followed by a virtual line based on a "|" character.
- Functions calls are now the last column on the right. This way, we
haven't dynamic column (which flow is harder to follow) on its right.
- CPU and Overhead have their own option flag. They are default-on but you
can disable them easily:
echo nofuncgraph-cpu > trace_options
echo nofuncgraph-overhead > trace_options
TODO:
_ Refactoring of the thread switch output.
_ Give a default-off option to output the thread and its pid on each row.
_ Provide headers
_ ....
Here is an example of the new trace style:
0) | mutex_unlock() {
0) 0.639 us | __mutex_unlock_slowpath();
0) 1.607 us | }
0) | remove_wait_queue() {
0) 0.616 us | _spin_lock_irqsave();
0) 0.616 us | _spin_unlock_irqrestore();
0) 2.779 us | }
0) 0.495 us | n_tty_set_room();
0) ! 9999.999 us | }
0) | tty_ldisc_deref() {
0) 0.615 us | _spin_lock_irqsave();
0) 0.616 us | _spin_unlock_irqrestore();
0) 2.793 us | }
0) | current_fs_time() {
0) 0.488 us | current_kernel_time();
0) 0.495 us | timespec_trunc();
0) 2.486 us | }
0) ! 9999.999 us | }
0) ! 9999.999 us | }
0) ! 9999.999 us | }
0) | sys_read() {
0) 0.796 us | fget_light();
0) | vfs_read() {
0) | rw_verify_area() {
0) | security_file_permission() {
0) 0.488 us | cap_file_permission();
0) 1.720 us | }
0) 3. 4 us | }
0) | tty_read() {
0) 0.488 us | tty_paranoia_check();
0) | tty_ldisc_ref_wait() {
0) | tty_ldisc_try() {
0) 0.615 us | _spin_lock_irqsave();
0) 0.615 us | _spin_unlock_irqrestore();
0) 5.436 us | }
0) 6.427 us | }
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhance the output of the graph-tracer
This patch applies some ideas of Ingo Molnar and Steven Rostedt.
* Output leaf functions in one line with parenthesis, semicolon and duration
output.
* Add a second column (after cpu) for an overhead sign.
if duration > 100 us, "!"
if duration > 10 us, "+"
else " "
* Print output in us with remaining nanosec: u.n
* Print duration on the right end, following the indentation of the functions.
Use also visual clues: "-" on entry call (no duration to output) and "+" on
return (duration output).
The name of the tracer has been fixed as well: function-branch becomes
function_branch.
Here is an example of the new output:
CPU[000] dequeue_entity() { -
CPU[000] update_curr() { -
CPU[000] update_min_vruntime(); + 0.512 us
CPU[000] } + 1.504 us
CPU[000] clear_buddies(); + 0.481 us
CPU[000] update_min_vruntime(); + 0.504 us
CPU[000] } + 4.557 us
CPU[000] hrtick_update() { -
CPU[000] hrtick_start_fair(); + 0.489 us
CPU[000] } + 1.443 us
CPU[000] + } + 14.655 us
CPU[000] + } + 15.678 us
CPU[000] + } + 16.686 us
CPU[000] msecs_to_jiffies(); + 0.481 us
CPU[000] put_prev_task_fair(); + 0.504 us
CPU[000] pick_next_task_fair(); + 0.482 us
CPU[000] pick_next_task_rt(); + 0.504 us
CPU[000] pick_next_task_fair(); + 0.481 us
CPU[000] pick_next_task_idle(); + 0.489 us
CPU[000] _spin_trylock(); + 0.655 us
CPU[000] _spin_unlock(); + 0.609 us
CPU[000] ------------8<---------- thread bash-2794 ------------8<----------
CPU[000] finish_task_switch() { -
CPU[000] _spin_unlock_irq(); + 0.722 us
CPU[000] } + 2.369 us
CPU[000] ! } + 501972.605 us
CPU[000] ! } + 501973.763 us
CPU[000] copy_from_read_buf() { -
CPU[000] _spin_lock_irqsave(); + 0.670 us
CPU[000] _spin_unlock_irqrestore(); + 0.699 us
CPU[000] copy_to_user() { -
CPU[000] might_fault() { -
CPU[000] __might_sleep(); + 0.503 us
CPU[000] } + 1.632 us
CPU[000] __copy_to_user_ll(); + 0.542 us
CPU[000] } + 3.858 us
CPU[000] tty_audit_add_data() { -
CPU[000] _spin_lock_irq(); + 0.609 us
CPU[000] _spin_unlock_irq(); + 0.624 us
CPU[000] } + 3.196 us
CPU[000] _spin_lock_irqsave(); + 0.624 us
CPU[000] _spin_unlock_irqrestore(); + 0.625 us
CPU[000] + } + 13.611 us
CPU[000] copy_from_read_buf() { -
CPU[000] _spin_lock_irqsave(); + 0.624 us
CPU[000] _spin_unlock_irqrestore(); + 0.616 us
CPU[000] } + 2.820 us
CPU[000]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhancement for function graph tracer
When run on a SMP box, the function graph tracer is confusing because
it shows the different CPUS as changes in the trace.
This patch adds the annotation of 'CPU[###]' where ### is a three digit
number. The output will look similar to this:
CPU[001] dput() {
CPU[000] } 726
CPU[001] } 487
CPU[000] do_softirq() {
CPU[001] } 2221
CPU[000] __do_softirq() {
CPU[000] __local_bh_disable() {
CPU[001] unroll_tree_refs() {
CPU[000] } 569
CPU[001] } 501
CPU[000] rcu_process_callbacks() {
CPU[001] kfree() {
What makes this nice is that now you can grep the file and produce
readable format for a particular CPU.
# cat /debug/tracing/trace > /tmp/trace
# grep '^CPU\[000\]' /tmp/trace > /tmp/trace0
# grep '^CPU\[001\]' /tmp/trace > /tmp/trace1
Will give you:
# head /tmp/trace0
CPU[000] ------------8<---------- thread sshd-3899 ------------8<----------
CPU[000] inotify_dentry_parent_queue_event() {
CPU[000] } 2531
CPU[000] inotify_inode_queue_event() {
CPU[000] } 505
CPU[000] } 69626
CPU[000] } 73089
CPU[000] audit_syscall_exit() {
CPU[000] path_put() {
CPU[000] dput() {
# head /tmp/trace1
CPU[001] ------------8<---------- thread pcscd-3446 ------------8<----------
CPU[001] } 4186
CPU[001] dput() {
CPU[001] } 543
CPU[001] vfs_permission() {
CPU[001] inode_permission() {
CPU[001] shmem_permission() {
CPU[001] generic_permission() {
CPU[001] } 501
CPU[001] } 2205
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhancement to function graph tracer
Export the trace_find_cmdline so the function graph tracer can
use it to print the comms of the threads.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: feature
This patch sets a C-like output for the function graph tracing.
For this aim, we now call two handler for each function: one on the entry
and one other on return. This way we can draw a well-ordered call stack.
The pid of the previous trace is loosely stored to be compared against
the one of the current trace to see if there were a context switch.
Without this little feature, the call tree would seem broken at
some locations.
We could use the sched_tracer to capture these sched_events but this
way of processing is much more simpler.
2 spaces have been chosen for indentation to fit the screen while deep
calls. The time of execution in nanosecs is printed just after closed
braces, it seems more easy this way to find the corresponding function.
If the time was printed as a first column, it would be not so easy to
find the corresponding function if it is called on a deep depth.
I plan to output the return value but on 32 bits CPU, the return value
can be 32 or 64, and its difficult to guess on which case we are.
I don't know what would be the better solution on X86-32: only print
eax (low-part) or even edx (high-part).
Actually it's thee same problem when a function return a 8 bits value, the
high part of eax could contain junk values...
Here is an example of trace:
sys_read() {
fget_light() {
} 526
vfs_read() {
rw_verify_area() {
security_file_permission() {
cap_file_permission() {
} 519
} 1564
} 2640
do_sync_read() {
pipe_read() {
__might_sleep() {
} 511
pipe_wait() {
prepare_to_wait() {
} 760
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 504
} 1587
clear_buddies() {
} 512
add_cfs_task_weight() {
} 519
update_min_vruntime() {
} 511
} 5602
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 496
} 1631
clear_buddies() {
} 496
update_min_vruntime() {
} 527
} 4580
hrtick_update() {
hrtick_start_fair() {
} 488
} 1489
} 13700
} 14949
} 16016
msecs_to_jiffies() {
} 496
put_prev_task_fair() {
} 504
pick_next_task_fair() {
} 489
pick_next_task_rt() {
} 496
pick_next_task_fair() {
} 489
pick_next_task_idle() {
} 489
------------8<---------- thread 4 ------------8<----------
finish_task_switch() {
} 1203
do_softirq() {
__do_softirq() {
__local_bh_disable() {
} 669
rcu_process_callbacks() {
__rcu_process_callbacks() {
cpu_quiet() {
rcu_start_batch() {
} 503
} 1647
} 3128
__rcu_process_callbacks() {
} 542
} 5362
_local_bh_enable() {
} 587
} 8880
} 9986
kthread_should_stop() {
} 669
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
calc_delta_mine() {
} 511
update_min_vruntime() {
} 511
} 2813
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>