Impact: clean up
The traceon and traceoff function probes are confusing to developers
to what happens when a counter is not specified. This should help
clear things up.
# echo "*:traceoff" > set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter
#### all functions enabled ####
do_fork:traceoff:unlimited
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: cleanup
Fix incorrect hint message in code and typos in comments.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch is to fix the return value of trace_selftest_startup_sysprof
and trace_selftest_startup_branch on failure.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Pass tsk to tracing_record_cmdline instead of current.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
Ingo Molnar did not like the _hook naming convention used by the
select function tracer. Luis Claudio R. Goncalves suggested using
the "_probe" extension. This patch implements the change of
calling the functions and variables "_hook" and replacing them
with "_probe".
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Ingo Molnar pointed out some coding style issues with the recent ftrace
updates. This patch cleans them up.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a pretty print version of traceon and traceoff
output for set_ftrace_filter.
# echo 'sys_open:traceon:4' > set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
sys_open:traceon:count=4
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a call back for the tracers that have hooks to
selected functions. This allows the tracer to show better output
in the set_ftrace_filter file.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds output to show what functions have tracer hooks
attached to them.
# echo 'sys_open:traceon:4' > /debug/tracing/set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000004
# echo 'do_fork:traceoff:' > set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000002
do_fork:ftrace_traceoff:ffffffffffffffff
Note the 4 changed to a 2. This is because The code was executed twice
since the traceoff was added. If a cat is done again:
#### all functions enabled ####
sys_open:ftrace_traceon
do_fork:ftrace_traceoff:ffffffffffffffff
The number disappears. That is because it will not print a NULL.
Callbacks to allow the tracer to pretty print will be implemented soon.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds the new function selection commands traceon and
traceoff. traceon sets the function to enable the ring buffers
while traceoff disables the ring buffers. You can pass in the
number of times you want the command to be executed when the function
is hit. It will only execute if the state of the buffers are not
already in that state.
Example:
# echo do_fork:traceon:4
Will enable the ring buffers if they are disabled every time it
hits do_fork, up to 4 times.
# echo sys_close:traceoff
This will disable the ring buffers every time (unlimited) when
sys_close is called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
Now that ftrace_lock is a mutex, there is no reason to have three
different mutexes protecting similar data. All the mutex paths
are not in hot paths, so having a mutex to cover more data is
not a problem.
This patch removes the ftrace_sysctl_lock and ftrace_start_lock
and uses the ftrace_lock to protect the locations that were protected
by these locks. By doing so, this change also removes some of
the lock nesting that was taking place.
There are still more mutexes in ftrace.c that can probably be
consolidated, but they can be dealt with later. We need to be careful
about the way the locks are nested, and by consolidating, we can cause
a recursive deadlock.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
The older versions of ftrace required doing the ftrace list
search under atomic context. Now all the calls are in non-atomic
context. There is no reason to keep the ftrace_lock as a spinlock.
This patch converts it to a mutex.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Allow for other tracers to add their own commands for function
selection. This interface gives a trace the ability to name a
command for function selection. Right now it is pretty limited
in what it offers, but this is a building step for more features.
The :mod: command is converted to this interface and also serves
as a template for other implementations.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix to prevent empty set_ftrace_filter and no ftrace output
The function filter is used to only trace a given set of functions.
The filter is enabled when a function name is echoed into the
set_ftrace_filter file. But if the name has a typo and the function
is not found, the filter is enabled, but no function is listed.
This makes a confusing situation where set_ftrace_filter is empty
but no functions ever get enabled for tracing.
For example:
# cat /debug/tracing/set_ftrace_filter
#### all functions enabled ####
# echo bad_name > set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter
# echo function > current_tracer
# cat trace
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
This patch changes that to only enable filtering if a function
is set to be filtered on. Now, the filter is not enabled if
a bad name is echoed into set_ftrace_filter.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a "command" syntax to the function filtering files:
/debugfs/tracing/set_ftrace_filter
/debugfs/tracing/set_ftrace_notrace
Of the format: <function>:<command>:<parameter>
The command is optional, and dependent on the command, so are
the parameters.
echo do_fork > set_ftrace_filter
Will only trace 'do_fork'.
echo 'sched_*' > set_ftrace_filter
Will only trace functions starting with the letters 'sched_'.
echo '*:mod:ext3' > set_ftrace_filter
Will trace only the ext3 module functions.
echo '*write*:mod:ext3' > set_ftrace_notrace
Will prevent the ext3 functions with the letters 'write' in
the name from being traced.
echo '!*_allocate:mod:ext3' > set_ftrace_filter
Will remove the functions in ext3 that end with the letters
'_allocate' from the ftrace filter.
Although this patch implements the 'command' format, only the
'mod' command is supported. More commands to follow.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
ftrace_match_records does a lot of things that other features
can use. This patch breaks up ftrace_match_records and pulls
out ftrace_setup_glob and ftrace_match_record.
ftrace_setup_glob prepares a simple glob expression for use with
ftrace_match_record. ftrace_match_record compares a single record
with a glob type.
Breaking this up will allow for more features to run on individual
records.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
ftrace_match is too generic of a name. What it really does is
search all records and matches the records with the given string,
and either sets or unsets the functions to be traced depending
on if the parameter 'enable' is set or not.
This allows us to make another function called ftrace_match that
can be used to test a single record.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
To iterate over all the functions that dynamic trace knows about
it requires two for loops. One to iterate over the pages and the
other to iterate over the records within the page.
There are several duplications of these loops in ftrace.c. This
patch creates the macros do_for_each_ftrace_rec and
while_for_each_ftrace_rec to handle this logic, and removes the
duplicate code.
While making this change, I also discovered and fixed a small
bug that one of the iterations should exit the loop after it found the
record it was searching for. This used a break when it should have
used a goto, since there were two loops it needed to break out
from. No real harm was done by this bug since it would only continue
to search the other records, and the code was in a slow path anyway.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up, make set_ftrace_filter less confusing
The set_ftrace_filter shows only the functions that will be traced.
But when it is empty, it will trace all functions. This can be a bit
confusing.
This patch makes set_ftrace_filter show:
#### all functions enabled ####
When all functions will be traced, and we do not filter only a select
few.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
The function bts_trace_init() references a variable
bts_hotcpu_notifier which is marked
as __cpuinitdata. Thus causes section mismatch. This patch fixes it.
LD kernel/trace/built-in.o
WARNING: kernel/trace/built-in.o(.text+0xc90c): Section mismatch in
reference from the function bts_trace_init() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_init() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_init lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.
WARNING: kernel/trace/built-in.o(.text+0xc92a): Section mismatch in
reference from the function bts_trace_reset() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_reset() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_reset lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: markus.t.metzger@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cosmetic change in Kconfig menu layout
This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.
CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.
Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhances lost events counting in mmiotrace
The tracing framework, or the ring buffer facility it uses, has a switch
to stop recording data. When recording is off, the trace events will be
lost. The framework does not count these, so mmiotrace has to count them
itself.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert the c/p state "power" tracer to use tracepoints. Avoids a
function call when the tracer is disabled.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
While reviewing the ring buffer code, I thougth I saw a bug with
if (!__raw_spin_trylock(&cpu_buffer->lock))
goto out_unlock;
But I forgot that we use a variable "lock_taken" that is set if
the spinlock is taken, and only unlock it if that variable is set.
To avoid further confusion from other reviewers, this patch
renames the label out_unlock with out_reset, which is the more
appropriate name.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Add the missing pair tracing_{start,stop}_record_cmdline() to record well
the cmdline associated with pid.
Changes in v2:
- fix a build error, the sched_switch tracer is needed to record the
cmdline.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix these sparse warnings:
kernel/trace/ring_buffer.c:70:37: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:84:39: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:96:43: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2500:40: warning: incorrect type in argument 3 (different signedness)
kernel/trace/ring_buffer.c:2505:44: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2507:46: warning: incorrect type in argument 2 (different signedness)
kernel/trace/trace.c:2130:40: warning: incorrect type in argument 3 (different signedness)
kernel/trace/trace.c:2280:40: warning: incorrect type in argument 3 (different signedness)
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make global variables and a global function static
The function '__trace_userstack' does not seem to have a caller, so it
is commented out.
Fix this sparse warnings:
kernel/trace/trace.c:82:5: warning: symbol 'tracing_disabled' was not declared. Should it be static?
kernel/trace/trace.c:600:10: warning: symbol 'trace_record_cmdline_disabled' was not declared. Should it be static?
kernel/trace/trace.c:957:6: warning: symbol '__trace_userstack' was not declared. Should it be static?
kernel/trace/trace.c:1694:5: warning: symbol 'tracing_release' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is to make the function return early on failure, and give
correct return value on success.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: change API and init bpage when copy
ring_buffer_read_page()/rb_remove_entries() may be called for
a partially consumed page.
Add a parameter for rb_remove_entries() and make it update
cpu_buffer->entries correctly for partially consumed pages.
ring_buffer_read_page() now returns the offset to the next event.
Init the bpage's time_stamp when return value is 0.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: Fix bug
I found several very very curious line.
It's so curious that it may be brought by typing mistake.
When (cpu_buffer->reader_page == cpu_buffer->commit_page):
1) We haven't copied it for bpage is changed:
bpage = cpu_buffer->reader_page->page;
memcpy(bpage->data, cpu_buffer->reader_page->page->data + read ... )
2) We need update cpu_buffer->reader_page->read, but
"cpu_buffer->reader_page += read;" is not right.
[
This bug was a typo. The commit->reader_page is a page pointer
and not an index into the page. The line should have been
commit->reader_page->read += read. The other changes
by Lai are nice clean ups to the code. - SDR
]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Ingo Molnar suggested a series of clean ups for the splice code.
This patch implements those suggestions.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This moves the pipe waiting code from tracing_read_pipe() into
tracing_wait_pipe(), which is useful to implement other fops, like
splice_read.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Added and implemented tracing_pipe_fops->splice_read(). This allows
userspace programs to get tracing data more efficiently.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
When one cats the trace file, the leaf functions are printed without brackets:
function();
whereas in the trace_pipe file we'll see the following:
function() {
}
This is because the ring_buffer handling is not the same between those two files.
On the trace file, when an entry is printed, the iterator advanced and then we can
check the next entry.
There is no iterator with trace_pipe, the current entry to print has been peeked
and not consumed. So checking the next entry will still return the current one while
we don't consume it.
This patch introduces a new value for the output callbacks to ask the tracing
core to not consume the current entry after printing it.
We need it because we will have to consume the current entry ourself to check
the next one.
Now the trace_pipe is able to handle well the leaf functions.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
The BLK_DEV_IO_TRACE entry used to be in block/Kconfig - which
file itself was dependent on CONFIG_BLOCK. But now the entry is
in kernel/trace/Kconfig - which is present even on !CONFIG_BLOCK.
So add a 'depends on BLOCK' to BLK_DEV_IO_TRACE.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: simplification
Instead of requiring that plugins have the sequence:
my_tracer_stop(my_trace_array);
unregister_tracer(my_tracer);
it should be possible just do a:
unregister_tracer(my_tracer);
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Move the power tracer headers to trace/power.h to keep ftrace.h and power bits
more easy to maintain as separated topics.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Making it more easy to do a basic regression test for this tracer.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
Fixed several typos in the comments.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
Now that a generic in_nmi is available, this patch removes the
special code in the ring_buffer and implements the in_nmi generic
version instead.
With this change, I was also able to rename the "arch_ftrace_nmi_enter"
back to "ftrace_nmi_enter" and remove the code from the ring buffer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: prevent deadlock in NMI
The ring buffers are not yet totally lockless with writing to
the buffer. When a writer crosses a page, it grabs a per cpu spinlock
to protect against a reader. The spinlocks taken by a writer are not
to protect against other writers, since a writer can only write to
its own per cpu buffer. The spinlocks protect against readers that
can touch any cpu buffer. The writers are made to be reentrant
with the spinlocks disabling interrupts.
The problem arises when an NMI writes to the buffer, and that write
crosses a page boundary. If it grabs a spinlock, it can be racing
with another writer (since disabling interrupts does not protect
against NMIs) or with a reader on the same CPU. Luckily, most of the
users are not reentrant and protects against this issue. But if a
user of the ring buffer becomes reentrant (which is what the ring
buffers do allow), if the NMI also writes to the ring buffer then
we risk the chance of a deadlock.
This patch moves the ftrace_nmi_enter called by nmi_enter() to the
ring buffer code. It replaces the current ftrace_nmi_enter that is
used by arch specific code to arch_ftrace_nmi_enter and updates
the Kconfig to handle it.
When an NMI is called, it will set a per cpu variable in the ring buffer
code and will clear it when the NMI exits. If a write to the ring buffer
crosses page boundaries inside an NMI, a trylock is used on the spin
lock instead. If the spinlock fails to be acquired, then the entry
is discarded.
This bug appeared in the ftrace work in the RT tree, where event tracing
is reentrant. This workaround solved the deadlocks that appeared there.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix to prevent developers from using entry->cpu
With the new ring buffer infrastructure, the cpu for the entry is
implicit with which CPU buffer it is on.
The original code use to record the current cpu into the generic
entry header, which can be retrieved by entry->cpu. When the
ring buffer was introduced, the users were convert to use the
the cpu number of which cpu ring buffer was in use (this was passed
to the tracers by the iterator: iter->cpu).
Unfortunately, the cpu item in the entry structure was never removed.
This allowed for developers to use it instead of the proper iter->cpu,
unknowingly, using an uninitialized variable. This was not the fault
of the developers, since it would seem like the logical place to
retrieve the cpu identifier.
This patch removes the cpu item from the entry structure and fixes
all the users that should have been using iter->cpu.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: cleanup
To make it easy for ftrace plugin writers, as this was open coded in
the existing plugins
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new API
These new functions do what previously was being open coded, reducing
the number of details ftrace plugin writers have to worry about.
It also standardizes the handling of stacktrace, userstacktrace and
other trace options we may introduce in the future.
With this patch, for instance, the blk tracer (and some others already
in the tree) can use the "userstacktrace" /d/tracing/trace_options
facility.
$ codiff /tmp/vmlinux.before /tmp/vmlinux.after
linux-2.6-tip/kernel/trace/trace.c:
trace_vprintk | -5
trace_graph_return | -22
trace_graph_entry | -26
trace_function | -45
__ftrace_trace_stack | -27
ftrace_trace_userstack | -29
tracing_sched_switch_trace | -66
tracing_stop | +1
trace_seq_to_user | -1
ftrace_trace_special | -63
ftrace_special | +1
tracing_sched_wakeup_trace | -70
tracing_reset_online_cpus | -1
13 functions changed, 2 bytes added, 355 bytes removed, diff: -353
linux-2.6-tip/block/blktrace.c:
__blk_add_trace | -58
1 function changed, 58 bytes removed, diff: -58
linux-2.6-tip/kernel/trace/trace.c:
trace_buffer_lock_reserve | +88
trace_buffer_unlock_commit | +86
2 functions changed, 174 bytes added, diff: +174
/tmp/vmlinux.after:
16 functions changed, 176 bytes added, 413 bytes removed, diff: -237
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar suggested using goto logic to keep the indentation
down and to be able to remove the nasty line breaks. This actually
makes the code a bit more readable.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: simplification of tracers
As all tracers are doing this we might as well do it in
register_ftrace_event and save one branch each time we call these
callbacks.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As they actually all return these enumerators.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: bugfix and cleanup
Some callsites were returning either TRACE_ITER_PARTIAL_LINE if the
trace_seq routines (trace_seq_printf, etc) returned 0 meaning its buffer
was full, or zero otherwise.
But...
/* Return values for print_line callback */
enum print_line_t {
TRACE_TYPE_PARTIAL_LINE = 0, /* Retry after flushing the seq */
TRACE_TYPE_HANDLED = 1,
TRACE_TYPE_UNHANDLED = 2 /* Relay to other output functions */
};
In other cases the return value was not being relayed at all.
Most of the time it didn't hurt because the page wasn't get filled, but
for correctness sake, handle the return values everywhere.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"ftrace: use struct pid" commit 978f3a45d9
converted ftrace_pid_trace to "struct pid*".
But we can't use do_each_pid_task() without rcu_read_lock() even if
we know the pid itself can't go away (it was pinned in ftrace_pid_write).
The exiting task can detach itself from this pid at any moment.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: API change
The trace_seq and trace_entry are in trace_iterator, where there are
more fields that may be needed by tracers, so just pass the
tracer_iterator as is already the case for struct tracer->print_line.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make trace_event more convenient for tracers
All tracers (for the moment) that use the struct trace_event want to
have the context info printed before their own output: the pid/cmdline,
cpu, and timestamp.
But some other tracers that want to implement their trace_event
callbacks will not necessary need these information or they may want to
format them as they want.
This patch adds a new default-enabled trace option:
TRACE_ITER_CONTEXT_INFO When disabled through:
echo nocontext-info > /debugfs/tracing/trace_options
The pid, cpu and timestamps headers will not be printed.
IE with the sched_switch tracer with context-info (default):
bash-2935 [001] 100.356561: 2935:120:S ==> [001] 0:140:R <idle>
<idle>-0 [000] 100.412804: 0:140:R + [000] 11:115:S events/0
<idle>-0 [000] 100.412816: 0:140:R ==> [000] 11:115:R events/0
events/0-11 [000] 100.412829: 11:115:S ==> [000] 0:140:R <idle>
Without context-info:
2935:120:S ==> [001] 0:140:R <idle>
0:140:R + [000] 11:115:S events/0
0:140:R ==> [000] 11:115:R events/0
11:115:S ==> [000] 0:140:R <idle>
A tracer can disable it at runtime by clearing the bit
TRACE_ITER_CONTEXT_INFO in trace_flags.
The print routines were renamed to trace_print_context and
trace_print_lat_context, so that they can be used by tracers if they
want to use them for one of the trace_event callbacks.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that we have a working ftrace=<tracer> function, make the boot
tracer get activated by it. This way we can turn it on or off without
recompiling the kernel, as well as keeping the selftests on. The
selftests are disabled whenever a default tracer starts running.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra started the functionality to start up a default
tracing at bootup. This patch finishes the work.
Now if you add 'ftrace=<tracer>' to the command line, when that tracer
is registered on bootup, that tracer is selected and starts tracing.
Note, all selftests for tracers that are registered after this tracer
is disabled. This prevents the selftests from disturbing the running
tracer, or the running tracer from disturbing the selftest.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: New way of using the blktrace infrastructure
This drops the requirement of userspace utilities to use the blktrace
facility.
Configuration is done thru sysfs, adding a "trace" directory to the
partition directory where blktrace can be enabled for the associated
request_queue.
The same filters present in the IOCTL interface are present as sysfs
device attributes.
The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
filters.
The other files in this directory: pid, act_mask, start_lba and end_lba
can be used with the same meaning as with the IOCTL interface.
Using the sysfs interface will only setup the request_queue->blk_trace
fields, tracing will only take place when the "blk" tracer is selected
via the ftrace interface, as in the following example:
To see the trace, one can use the /d/tracing/trace file or the
/d/tracign/trace_pipe file, with semantics defined in the ftrace
documentation in Documentation/ftrace.txt.
[root@f10-1 ~]# cat /t/trace
kjournald-305 [000] 3046.491224: 8,1 A WBS 6367 + 8 <- (8,1) 6304
kjournald-305 [000] 3046.491227: 8,1 Q R 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491236: 8,1 G RB 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491239: 8,1 P NS [kjournald]
kjournald-305 [000] 3046.491242: 8,1 I RBS 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491251: 8,1 D WB 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491610: 8,1 U WS [kjournald] 1
<idle>-0 [000] 3046.511914: 8,1 C RS 6367 + 8 [6367]
[root@f10-1 ~]#
The default line context (prefix) format is the one described in the ftrace
documentation, with the blktrace specific bits using its existing format,
described in blkparse(8).
If one wants to have the classic blktrace formatting, this is possible by
using:
[root@f10-1 ~]# echo blk_classic > /t/trace_options
[root@f10-1 ~]# cat /t/trace
8,1 0 3046.491224 305 A WBS 6367 + 8 <- (8,1) 6304
8,1 0 3046.491227 305 Q R 6367 + 8 [kjournald]
8,1 0 3046.491236 305 G RB 6367 + 8 [kjournald]
8,1 0 3046.491239 305 P NS [kjournald]
8,1 0 3046.491242 305 I RBS 6367 + 8 [kjournald]
8,1 0 3046.491251 305 D WB 6367 + 8 [kjournald]
8,1 0 3046.491610 305 U WS [kjournald] 1
8,1 0 3046.511914 0 C RS 6367 + 8 [6367]
[root@f10-1 ~]#
Using the ftrace standard format allows more flexibility, such
as the ability of asking for backtraces via trace_options:
[root@f10-1 ~]# echo noblk_classic > /t/trace_options
[root@f10-1 ~]# echo stacktrace > /t/trace_options
[root@f10-1 ~]# cat /t/trace
kjournald-305 [000] 3318.826779: 8,1 A WBS 6375 + 8 <- (8,1) 6312
kjournald-305 [000] 3318.826782:
<= submit_bio
<= submit_bh
<= sync_dirty_buffer
<= journal_commit_transaction
<= kjournald
<= kthread
<= child_rip
kjournald-305 [000] 3318.826836: 8,1 Q R 6375 + 8 [kjournald]
kjournald-305 [000] 3318.826837:
<= generic_make_request
<= submit_bio
<= submit_bh
<= sync_dirty_buffer
<= journal_commit_transaction
<= kjournald
<= kthread
Please read the ftrace documentation to use aditional, standardized
tracing filters such as /d/tracing/trace_cpumask, etc.
See also /d/tracing/trace_mark to add comments in the trace stream,
that is equivalent to the /d/block/sdaN/msg interface.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix kmemtrace printk warnings:
kernel/trace/kmemtrace.c:142: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
kernel/trace/kmemtrace.c:147: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch brings various bugfixes:
- Drop the first irrelevant task switch on the very beginning of a trace.
- Drop the OVERHEAD word from the headers, the DURATION word is sufficient
and will not overlap other columns.
- Make the headers fit well their respective columns whatever the
selected options.
Ie, default options:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
1) 0.646 us | }
1) | mem_cgroup_del_lru_list() {
1) 0.624 us | lookup_page_cgroup();
1) 1.970 us | }
echo funcgraph-proc > trace_options
# tracer: function_graph
#
# CPU TASK/PID DURATION FUNCTION CALLS
# | | | | | | | | |
0) bash-2937 | 0.895 us | }
0) bash-2937 | 0.888 us | __rcu_read_unlock();
0) bash-2937 | 0.864 us | conv_uni_to_pc();
0) bash-2937 | 1.015 us | __rcu_read_lock();
echo nofuncgraph-cpu > trace_options
echo nofuncgraph-proc > trace_options
# tracer: function_graph
#
# DURATION FUNCTION CALLS
# | | | | | |
3.752 us | native_pud_val();
0.616 us | native_pud_val();
0.624 us | native_pmd_val();
About features, one can now disable the duration (this will hide the
overhead too for convenient reasons and because on doesn't need
overhead if it hasn't the duration):
echo nofuncgraph-duration > trace_options
# tracer: function_graph
#
# FUNCTION CALLS
# | | | |
cap_vm_enough_memory() {
__vm_enough_memory() {
vm_acct_memory();
}
}
}
And at last, an option to print the absolute time:
//Restart from default options
echo funcgraph-abstime > trace_options
# tracer: function_graph
#
# TIME CPU DURATION FUNCTION CALLS
# | | | | | | | |
261.339774 | 1) + 42.823 us | }
261.339775 | 1) 1.045 us | _spin_lock_irq();
261.339777 | 1) 0.940 us | _spin_lock_irqsave();
261.339778 | 1) 0.752 us | _spin_unlock_irqrestore();
261.339780 | 1) 0.857 us | _spin_unlock_irq();
261.339782 | 1) | flush_to_ldisc() {
261.339783 | 1) | tty_ldisc_ref() {
261.339783 | 1) | tty_ldisc_try() {
261.339784 | 1) 1.075 us | _spin_lock_irqsave();
261.339786 | 1) 0.842 us | _spin_unlock_irqrestore();
261.339788 | 1) 4.211 us | }
261.339788 | 1) 5.662 us | }
The format is seconds.usecs.
I guess no one needs the nanosec precision here, the main goal is to have
an overview about the general timings of events, and to see the place when
the trace switches from one cpu to another.
ie:
274.874760 | 1) 0.676 us | _spin_unlock();
274.874762 | 1) 0.609 us | native_load_sp0();
274.874763 | 1) 0.602 us | native_load_tls();
274.878739 | 0) 0.722 us | }
274.878740 | 0) 0.714 us | native_pmd_val();
274.878741 | 0) 0.730 us | native_pmd_val();
Here there is a 4000 usecs difference when we switch the cpu.
Changes in V2:
- Completely fix the first pointless task switch.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The logic in the tracing_start/stop code prevents the WARN_ON
from ever detecting if a start/stop pair was mismatched.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup of duplicate features
The trace output disables the ring buffer and prevents tracing to
occur. The code in irqsoff to do the same thing is no longer needed.
This patch removes it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix bad times of recent resets
The ring buffer needs to reset its timestamps when reseting of the
buffer, otherwise the timestamps are stale and might be used to
calculate times in the buffer causing funny timestamps to appear.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix bad times of recent resets
The ring buffer needs to reset its timestamps when reseting of the
buffer, otherwise the timestamps are stale and might be used to
calculate times in the buffer causing funny timestamps to appear.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: better data for wakeup tracer
This patch adds the wakeup and schedule calls that are used by
the scheduler tracer to make the wakeup tracer more readable.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add option to trace all tasks or just RT tasks
The current wakeup tracer only traces RT task wakeups. This is
fine for those interested in wake up timings of RT tasks, but
it is useless for those that are interested in the causes
of long wakeups for non RT tasks.
This patch creates a "wakeup_rt" to implement the tracing of just
RT tasks (as the current "wakeup" does). And makes "wakeup" now
trace all tasks as an average developer would expect.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the ring buffer recording has been disabled. Do not let
swapping of ring buffers occur. Simply return -EAGAIN.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to erased trace output
To try not to have the outputing of a trace interfere with the wakeup
tracer, it would disable tracing while the output was printing. But
if a trace had started when it was disabled, it can show a partial
trace. To try to solve this, on closing of the tracer, it would
clear the trace buffer.
The latency tracers (wakeup and irqsoff) have two buffers. One for
recording and one for holding the max trace that is printed. The
clearing of the trace above should only affect the recording buffer.
But for some reason it would move the erased trace to the print
buffer. Probably due to a race with the closing of the trace and
the saving ofhe max race.
The above is all pretty useless, and if the user does not want the
printing of the trace to be traced itself, then the user can manual
disable tracing. This patch removes all the code that tries to keep
the output of the tracer from modifying the trace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: trace max latencies on start of latency tracing
This patch sets the max latency to zero whenever one of the
irq variant tracers or the wakeup tracer is set to current tracer.
Most developers expect to see output when starting up a latency
tracer. But since the max_latency is already set to max, and
it takes a latency greater than max_latency to be recorded, there
is no trace. This is not the expected behavior and has even confused
myself.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: limit ftrace dump output
Currently ftrace_dump only calls ftrace_kill that is a fast way
to prevent the function tracer functions from being called (just sets
a flag and clears the function to call, nothing else). It is better
to also turn off any recording to the ring buffers as well.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to print out ftrace_dump when expected
I was debugging a hard race condition to only find out that
after I hit the race, my log level was not at level to show
KERN_INFO. The time it took to trigger the race was wasted because
I did not capture the trace.
Since ftrace_dump is only called from kernel oops (and only when
it is set in the kernel command line to do so), or when a
developer adds it to their own local tree, the log level of
the print should be at KERN_EMERG to make sure the print appears.
ftrace_dump is not called by a normal user setup, and will not
add extra unwanted print out to the console. There is no reason
it should be at KERN_INFO.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reset struct buffer_page.write when interrupt storm
if struct buffer_page.write is not reset, any succedent committing
will corrupted ring_buffer:
static inline void
rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
{
......
cpu_buffer->commit_page->commit =
cpu_buffer->commit_page->write;
......
}
when "if (RB_WARN_ON(cpu_buffer, next_page == reader_page))", ring_buffer
is disabled, but some reserved buffers may haven't been committed.
we need reset struct buffer_page.write.
when "if (unlikely(next_page == cpu_buffer->commit_page))", ring_buffer
is still available, we should not corrupt it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix a crash while kernel image restore
When the function graph tracer is running and while suspend to disk, some racy
and dangerous things happen against this tracer.
The current task will save its registers including the stack pointer which
contains the return address hooked by the tracer. But the current task will
continue to enter other functions after that to save the memory, and then
it will store other return addresses, and finally loose the old depth which
matches the return address saved in the old stack (during the registers saving).
So on image restore, the code will return to wrong addresses.
And there are other things: on restore, the task will have it's "current"
pointer overwritten during registers restoring....switching from one task to
another... That would be insane to try to trace function graphs at these
stages.
This patch makes the function graph tracer listening on power events, making
it's tracing disabled for the current task (the one that performs the
hibernation work) while suspend/resume to disk, making the tracing safe
during hibernation.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to allow some archs to use the ring buffer
Commits in the ring buffer are checked by pointer arithmetic.
If the calculation is incorrect, then the commits will never take
place and the buffer will simply fill up and report an error.
Each page in the ring buffer has a small header:
struct buffer_data_page {
u64 time_stamp;
local_t commit;
unsigned char data[];
};
Unfortuntely, some of the calculations used sizeof(struct buffer_data_page)
to know the size of the header. But this is incorrect on some archs,
where sizeof(struct buffer_data_page) does not equal
offsetof(struct buffer_data_page, data), and on those archs, the commits
are never processed.
This patch replaces the sizeof with offsetof.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use percpu data instead of a global structure
Use:
static DEFINE_PER_CPU(struct workqueue_global_stats, all_workqueue_stat);
instead of allocating a global structure.
percpu data also works well on NUMA.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the hw-branch-tracer format to be more readable.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reset the ftrace buffer on close. Since we use cyclic buffers, the
trace is not contiguous, anyway.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>