Commit Graph

6914 Commits

Author SHA1 Message Date
Steven Rostedt
17c873ec28 tracing/events: add export symbols for trace events in modules
Impact: let modules add trace events

The trace event code requires some functions to be exported to allow
modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the
necessary functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:58:01 -04:00
Steven Rostedt
a59fd60272 tracing/events: convert event call sites to use a link list
Impact: makes it possible to define events in modules

The events are created by reading down the section that they are linked
in by the macros. But this is not scalable to modules. This patch converts
the manipulations to use a global link list, and on boot up it adds
the items in the section to the list.

This change will allow modules to add their tracing events to the list as
well.

Note, this change alone does not permit modules to use the TRACE_EVENT macros,
but the change is needed for them to eventually do so.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:58:00 -04:00
Steven Rostedt
f42c85e74f tracing/events: move the ftrace event tracing code to core
This patch moves the ftrace creation into include/trace/ftrace.h and
simplifies the work of developers in adding new tracepoints.
Just the act of creating the trace points in include/trace and including
define_trace.h will create the events in the debugfs/tracing/events
directory.

This patch removes the need of include/trace/trace_events.h

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:57:59 -04:00
Steven Rostedt
97f2025153 tracing/events: move declarations from trace directory to core include
In preparation to allowing trace events to happen in modules, we need
to move some of the local declarations in the kernel/trace directory
into include/linux.

This patch simply moves the declarations and performs no context changes.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:57:58 -04:00
Steven Rostedt
9504504cba tracing: make trace_seq operations available for core kernel
In the process to make TRACE_EVENT macro work for modules, the trace_seq
operations must be available for core kernel code.

These operations are quite useful and can be used for other implementations.

The main idea is that we create a trace_seq handle that acts very much
like the seq_file handle.

	struct trace_seq *s = kmalloc(sizeof(*s, GFP_KERNEL);

	trace_seq_init(s);
	trace_seq_printf(s, "some data %d\n", variable);

	printk("%s", s->buffer);

The main use is to allow a top level function call several other functions
that may store printf like data into the buffer. Then at the end, the top
level function can process all the data with any method it would like to.
It could be passed to userspace, output via printk or even use seq_file:

	trace_seq_to_user(s, ubuf, cnt);
	seq_puts(m, s->buffer);

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:57:57 -04:00
Steven Rostedt
a8d154b009 tracing: create automated trace defines
This patch lowers the number of places a developer must modify to add
new tracepoints. The current method to add a new tracepoint
into an existing system is to write the trace point macro in the
trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
DECLARE_TRACE, then they must add the same named item into the C file
with the macro DEFINE_TRACE(name) and then add the trace point.

This change cuts out the needing to add the DEFINE_TRACE(name).
Every file that uses the tracepoint must still include the trace/<type>.h
file, but the one C file must also add a define before the including
of that file.

 #define CREATE_TRACE_POINTS
 #include <trace/mytrace.h>

This will cause the trace/mytrace.h file to also produce the C code
necessary to implement the trace point.

Note, if more than one trace/<type>.h is used to create the C code
it is best to list them all together.

 #define CREATE_TRACE_POINTS
 #include <trace/foo.h>
 #include <trace/bar.h>
 #include <trace/fido.h>

Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
the cleaner solution of the define above the includes over my first
design to have the C code include a "special" header.

This patch converts sched, irq and lockdep and skb to use this new
method.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:57:28 -04:00
Steven Rostedt
ea20d9293c tracing: consolidate trace and trace_event headers
Impact: clean up

Neil Horman (et. al.) criticized the way the trace events were broken up
into two files. The reason for that was that ftrace needed to separate out
the declarations from where the #include <linux/tracepoint.h> was used.
It then dawned on me that the tracepoint.h header only needs to define the
TRACE_EVENT macro if it is not already defined.

The solution is simply to test if TRACE_EVENT is defined, and if it is not
then the linux/tracepoint.h header can define it. This change consolidates
all the <traces>.h and <traces>_event_types.h into the <traces>.h file.

Reported-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Theodore Tso <tytso@mit.edu>
Reported-by: Jiaying Zhang <jiayingz@google.com>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 09:43:40 -04:00
Tom Zanussi
0a19e53c15 tracing/filters: allow on-the-fly filter switching
This patch allows event filters to be safely removed or switched
on-the-fly while avoiding the use of rcu or the suspension of tracing of
previous versions.

It does it by adding a new filter_pred_none() predicate function which
does nothing and by never deallocating either the predicates or any of
the filter_pred members used in matching; the predicate lists are
allocated and initialized during ftrace_event_calls initialization.

Whenever a filter is removed or replaced, the filter_pred_* functions
currently in use by the affected ftrace_event_call are immediately
switched over to to the filter_pred_none() function, while the rest of
the filter_pred members are left intact, allowing any currently
executing filter_pred_* functions to finish up, using the values they're
currently using.

In the case of filter replacement, the new predicate values are copied
into the old predicates after the above step, and the filter_pred_none()
functions are replaced by the filter_pred_* functions for the new
filter.  In this case, it is possible though very unlikely that a
previous filter_pred_* is still running even after the
filter_pred_none() switch and the switch to the new filter_pred_*.  In
that case, however, because nothing has been deallocated in the
filter_pred, the worst that can happen is that the old filter_pred_*
function sees the new values and as a result produces either a false
positive or a false negative, depending on the values it finds.

So one downside to this method is that rarely, it can produce a bad
match during the filter switch, but it should be possible to live with
that, IMHO.

The other downside is that at least in this patch the predicate lists
are always pre-allocated, taking up memory from the start.  They could
probably be allocated on first-use, and de-allocated when tracing is
completely stopped - if this patch makes sense, I could create another
one to do that later on.

Oh, and it also places a restriction on the size of __arrays in events,
currently set to 128, since they can't be larger than the now embedded
str_val arrays in the filter_pred struct.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: paulmck@linux.vnet.ibm.com
LKML-Reference: <1239610670.6660.49.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:03:55 +02:00
Ingo Molnar
b5c851a88a Merge branch 'linus' into tracing/core
Merge reason: merge latest tracing fixes to avoid conflicts in
              kernel/trace/trace_events_filter.c with upcoming change

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:02:22 +02:00
Tom Zanussi
eb02ce017d tracing/filters: use ring_buffer_discard_commit() in filter_check_discard()
This patch changes filter_check_discard() to make use of the new
ring_buffer_discard_commit() function and modifies the current users to
call the old commit function in the non-discard case.

It also introduces a version of filter_check_discard() that uses the
global trace buffer (filter_current_check_discard()) for those cases.

v2 changes:

- fix compile error noticed by Ingo Molnar

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1239178554.10295.36.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:56 +02:00
Tom Zanussi
5f77a88b3f tracing/infrastructure: separate event tracer from event support
Add a new config option, CONFIG_EVENT_TRACING that gets selected
when CONFIG_TRACING is selected and adds everything needed by the stuff
in trace_export - basically all the event tracing support needed by e.g.
bprint, minus the actual events, which are only included if
CONFIG_EVENT_TRACER is selected.

So CONFIG_EVENT_TRACER can be used to turn on or off the generated events
(what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns
on or off the base event tracing support used by both the event tracer and
the other things such as bprint that can't be configured out.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1239178441.10295.34.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:55 +02:00
Steven Rostedt
77d9f465d4 tracing/filters: use ring_buffer_discard_commit for discarded events
The ring_buffer_discard_commit makes better usage of the ring_buffer
when an event has been discarded. It tries to remove it completely if
possible.

This patch converts the trace event filtering to use
ring_buffer_discard_commit instead of the ring_buffer_event_discard.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:54 +02:00
Steven Rostedt
fa1b47dd85 ring-buffer: add ring_buffer_discard_commit
The ring_buffer_discard_commit is similar to ring_buffer_event_discard
but it can only be done on an event that has yet to be commited.
Unpredictable results can happen otherwise.

The main difference between ring_buffer_discard_commit and
ring_buffer_event_discard is that ring_buffer_discard_commit will try
to free the data in the ring buffer if nothing has addded data
after the reserved event. If something did, then it acts almost the
same as ring_buffer_event_discard followed by a
ring_buffer_unlock_commit.

Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
can be called on an event, not both.

This commit also exports both discard functions to be usable by
GPL modules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:53 +02:00
Tom Zanussi
e45f2e2bd2 tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
Frederic Weisbecker suggested that the trace_special event shouldn't be
filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
that allows an event format to be exported without having a filter
attached, and removes filtering from the trace_special event.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:51 +02:00
Tom Zanussi
e1112b4d96 tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
This patch adds run-time field descriptions to all the event formats
exported using TRACE_EVENT_FORMAT.  It also hooks up all the tracers
that use them (i.e. the tracers in the 'ftrace subsystem') so they can
also have their output filtered by the event-filtering mechanism.

When I was testing this, there were a couple of things that fooled me
into thinking the filters weren't working, when actually they were -
I'll mention them here so others don't make the same mistakes (and file
bug reports. ;-)

One is that some of the tracers trace multiple events e.g. the
sched_switch tracer uses the context_switch and wakeup events, and if
you don't set filters on all of the traced events, the unfiltered output
from the events without filters on them can make it look like the
filtering as a whole isn't working properly, when actually it is doing
what it was asked to do - it just wasn't asked to do the right thing.

The other is that for the really high-volume tracers e.g. the function
tracer, the volume of filtered events can be so high that it pushes the
unfiltered events out of the ring buffer before they can be read so e.g.
cat'ing the trace file repeatedly shows either no output, or once in
awhile some output but that isn't there the next time you read the
trace, which isn't what you normally expect when reading the trace file.
If you read from the trace_pipe file though, you can catch them before
they disappear.

Changes from v1:

As suggested by Frederic Weisbecker:

- get rid of externs in functions
- added unlikely() to filter_check_discard()

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:50 +02:00
Rafael J. Wysocki
c751085943 PM/Hibernate: Wait for SCSI devices scan to complete during resume
There is a race between resume from hibernation and the asynchronous
scanning of SCSI devices and to prevent it from happening we need to
call scsi_complete_async_scans() during resume from hibernation.

In addition, if the resume from hibernation is userland-driven, it's
better to wait for all device probes in the kernel to complete before
attempting to open the resume device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 11:37:07 -07:00
Linus Torvalds
8255309b88 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/filters: return proper error code when writing filter file
  tracing/filters: allow user input integer to be oct or hex
  tracing/filters: fix NULL pointer dereference
  tracing/filters: NIL-terminate user input filter
  ftrace: Output REC->var instead of __entry->var for trace format
  Make __stringify support variable argument macros too
  tracing: fix document references
  tracing: fix splice return too large
  tracing: update file->f_pos when splice(2) it
  tracing: allocate page when needed
  tracing: disable seeking for trace_pipe_raw
2009-04-13 11:31:28 -07:00
Linus Torvalds
bf20753c0c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: continue lock debugging despite some taints
  lockdep: warn about lockdep disabling after kernel taint
2009-04-13 11:30:26 -07:00
Linus Torvalds
d811f236d9 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  percpu: unbreak alpha percpu
  mutex: have non-spinning mutexes on s390 by default
2009-04-13 08:22:43 -07:00
Frederic Weisbecker
574bbe7820 lockdep: continue lock debugging despite some taints
Impact: broaden lockdep checks

Lockdep is disabled after any kernel taints. This might be convenient
to ignore bad locking issues which sources come from outside the kernel
tree. Nevertheless, it might be a frustrating experience for the
staging developers or those who experience a warning but are focused
on another things that require lockdep.

The v2 of this patch simply don't disable anymore lockdep in case
of TAINT_CRAP and TAINT_WARN events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LTP <ltp-list@lists.sourceforge.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@suse.de>
LKML-Reference: <1239412638-6739-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 16:10:52 +02:00
Frederic Weisbecker
9eeba6138c lockdep: warn about lockdep disabling after kernel taint
Impact: provide useful missing info for developers

Kernel taint can occur in several situations such as warnings,
load of prorietary or staging modules, bad page, etc...

But when such taint happens, a developer might still be working on
the kernel, expecting that lockdep is still enabled. But a taint
disables lockdep without ever warning about it.
Such a kernel behaviour doesn't really help for kernel development.

This patch adds this missing warning.

Since the taint is done most of the time after the main message that
explain the real source issue, it seems safe to warn about it inside
add_taint() so that it appears at last, without hurting the main
information.

v2: Use a generic helper to disable lockdep instead of an
    open coded xchg().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1239412638-6739-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 16:10:51 +02:00
Li Zefan
66de7792c0 blktrace: fix output of BLK_TC_PC events
BLK_TC_PC events should be treated differently with BLK_TC_FS events.

Before this patch:

 # echo 1 > /sys/block/sda/sda1/trace/enable
 # echo pc > /sys/block/sda/sda1/trace/act_mask
 # echo blk > /debugfs/tracing/current_tracer
 # (generate some BLK_TC_PC events)
 # cat trace
        bash-2184  [000]  1774.275413:   8,7    I   N [bash]
        bash-2184  [000]  1774.275435:   8,7    D   N [bash]
        bash-2184  [000]  1774.275540:   8,7    I   R [bash]
        bash-2184  [000]  1774.275547:   8,7    D   R [bash]
 ksoftirqd/0-4     [000]  1774.275580:   8,7    C   N 0 [0]
        bash-2184  [000]  1774.275648:   8,7    I   R [bash]
        bash-2184  [000]  1774.275653:   8,7    D   R [bash]
 ksoftirqd/0-4     [000]  1774.275682:   8,7    C   N 0 [0]
        bash-2184  [000]  1774.275739:   8,7    I   R [bash]
        bash-2184  [000]  1774.275744:   8,7    D   R [bash]
 ksoftirqd/0-4     [000]  1774.275771:   8,7    C   N 0 [0]
        bash-2184  [000]  1774.275804:   8,7    I   R [bash]
        bash-2184  [000]  1774.275808:   8,7    D   R [bash]
 ksoftirqd/0-4     [000]  1774.275836:   8,7    C   N 0 [0]

After this patch:

 # cat trace
        bash-2263  [000]   366.782149:   8,7    I   N 0 (00 ..) [bash]
        bash-2263  [000]   366.782323:   8,7    D   N 0 (00 ..) [bash]
        bash-2263  [000]   366.782557:   8,7    I   R 8 (25 00 ..) [bash]
        bash-2263  [000]   366.782560:   8,7    D   R 8 (25 00 ..) [bash]
 ksoftirqd/0-4     [000]   366.782582:   8,7    C   N (25 00 ..) [0]
        bash-2263  [000]   366.782648:   8,7    I   R 8 (5a 00 3f 00) [bash]
        bash-2263  [000]   366.782650:   8,7    D   R 8 (5a 00 3f 00) [bash]
 ksoftirqd/0-4     [000]   366.782669:   8,7    C   N (5a 00 3f 00) [0]
        bash-2263  [000]   366.782710:   8,7    I   R 8 (5a 00 08 00) [bash]
        bash-2263  [000]   366.782713:   8,7    D   R 8 (5a 00 08 00) [bash]
 ksoftirqd/0-4     [000]   366.782730:   8,7    C   N (5a 00 08 00) [0]
        bash-2263  [000]   366.783375:   8,7    I   R 36 (5a 00 08 00) [bash]
        bash-2263  [000]   366.783379:   8,7    D   R 36 (5a 00 08 00) [bash]
 ksoftirqd/0-4     [000]   366.783404:   8,7    C   N (5a 00 08 00) [0]

This is what we do with PC events in user-space blktrace.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49D32387.9040106@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 15:32:46 +02:00
Li Zefan
b78825d608 blktrace: fix output of unknown events
Not all events are pc (packet command) events. An event is a pc
event only if it has BLK_TC_PC bit set.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49D3236D.3090705@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 15:32:45 +02:00
Zhaolei
02af61bb50 tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part
Impact: refactor code for future changes

Current kmemtrace.h is used both as header file of kmemtrace and kmem's
tracepoints definition.

Tracepoints' definition file may be used by other code, and should only have
definition of tracepoint.

We can separate include/trace/kmemtrace.h into 2 files:

  include/linux/kmemtrace.h: header file for kmemtrace
  include/trace/kmem.h:      definition of kmem tracepoints

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 15:22:55 +02:00
Li Zefan
44e9c8b7ad tracing/filters: return proper error code when writing filter file
- propagate return value of filter_add_pred() to the user

- return -ENOSPC but not -ENOMEM or -EINVAL when the filter array
  is full

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E04CF0.3010105@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 11:59:29 +02:00
Li Zefan
a3e0ab0507 tracing/filters: allow user input integer to be oct or hex
Before patch:

 # echo 'parent_pid == 0x10' > events/sched/sched_process_fork/filter
 # cat sched/sched_process_fork/filter
 parent_pid == 0

After patch:

 # cat sched/sched_process_fork/filter
 parent_pid == 16

Also check the input more strictly.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E04C53.4010600@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 11:59:28 +02:00
Li Zefan
bcabd91c27 tracing/filters: fix NULL pointer dereference
Try this, and you'll see NULL pointer dereference bug:

  # echo -n 'parent_comm ==' > sched/sched_process_fork/filter

Because we passed NULL ptr to simple_strtoull().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E04C43.1050504@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 11:59:28 +02:00
Li Zefan
8433a40eb7 tracing/filters: NIL-terminate user input filter
Make sure messages from user space are NIL-terminated strings,
otherwise we could dump random memory while reading filter file.

Try this:
 # echo 'parent_comm ==' > events/sched/sched_process_fork/filter
 # cat events/sched/sched_process_fork/filter
 parent_comm == �

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E04C32.6060508@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 11:59:27 +02:00
Linus Torvalds
d6de2c80e9 async: Fix module loading async-work regression
Several drivers use asynchronous work to do device discovery, and we
synchronize with them in the compiled-in case before we actually try to
mount root filesystems etc.

However, when compiled as modules, that synchronization is missing - the
module loading completes, but the driver hasn't actually finished
probing for devices, and that means that any user mode that expects to
use the devices after the 'insmod' is now potentially broken.

We already saw one case of a similar issue in the ACPI battery code,
where the kernel itself expected the module to be all done, and unmapped
the init memory - but the async device discovery was still running.
That got hacked around by just removing the "__init" (see commit
5d38258ec0 "ACPI battery: fix async boot
oops"), but the real fix is to just make the module loading wait for all
async work to be completed.

It will slow down module loading, but since common devices should be
built in anyway, and since the bug is really annoying and hard to handle
from user space (and caused several S3 resume regressions), the simple
fix to wait is the right one.

This fixes at least

	http://bugzilla.kernel.org/show_bug.cgi?id=13063

but probably a few other bugzilla entries too (12936, for example), and
is confirmed to fix Rafael's storage driver breakage after resume bug
report (no bugzilla entry).

We should also be able to now revert that ACPI battery fix.

Reported-and-tested-by: Rafael J. Wysocki <rjw@suse.com>
Tested-by: Heinz Diehl <htd@fancy-poultry.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-11 12:44:49 -07:00
Zhaolei
0462b5664b ftrace: Output REC->var instead of __entry->var for trace format
print fmt: "irq=%d return=%s", __entry->irq, __entry->ret ? \"handled\" : \"unhandled\"

"__entry" should be convert to "REC" by __stringify() macro.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49DC679D.2090901@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 15:48:53 +02:00
Li Zefan
4d1f4372db tracing: fix document references
When moving documents to Documentation/trace/, I forgot to
grep Kconfig to find out those references.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Pekka Paalanen <pq@iki.fi>
Cc: eduard.munteanu@linux360.ro
LKML-Reference: <49DE97EF.7080208@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 13:08:50 +02:00
Frederic Weisbecker
2062501ae6 tracing/lockdep: report the time waited for a lock
While trying to optimize the new lock on reiserfs to replace
the bkl, I find the lock tracing very useful though it lacks
something important for performance (and latency) instrumentation:
the time a task waits for a lock.

That's what this patch implements:

  bash-4816  [000]   202.652815: lock_contended: lock_contended: &sb->s_type->i_mutex_key
  bash-4816  [000]   202.652819: lock_acquired: &rq->lock (0.000 us)
 <...>-4787  [000]   202.652825: lock_acquired: &rq->lock (0.000 us)
 <...>-4787  [000]   202.652829: lock_acquired: &rq->lock (0.000 us)
  bash-4816  [000]   202.652833: lock_acquired: &sb->s_type->i_mutex_key (16.005 us)

As shown above, the "lock acquired" field is followed by the time
it has been waiting for the lock. Usually, a lock contended entry
is followed by a near lock_acquired entry with a non-zero time waited.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1238975373-15739-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:50:56 +02:00
Ingo Molnar
1cad1252ed Merge branch 'tracing/urgent' into tracing/core
Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes]
              and pick up the current lineup of tracing/urgent fixes as well

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:46:51 +02:00
Lai Jiangshan
93cfb3c9fd tracing: fix splice return too large
I got these from strace:

 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192

I wanted to splice_read 4096 bytes, but it returns 8192 or larger.

It is because the return value of tracing_buffers_splice_read()
does not include "zero out any left over data" bytes.

But tracing_buffers_read() includes these bytes, we make them
consistent.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:46 +02:00
Lai Jiangshan
c7625a555f tracing: update file->f_pos when splice(2) it
Impact: Cleanup

These two lines:

	if (unlikely(*ppos))
		return -ESPIPE;

in tracing_buffers_splice_read() are not needed, VFS layer
has disabled seek(2).

We remove these two lines, and then we can update file->f_pos.

And tracing_buffers_read() updates file->f_pos, this fix
make tracing_buffers_splice_read() updates file->f_pos too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:44 +02:00
Lai Jiangshan
ddd538f3e6 tracing: allocate page when needed
Impact: Cleanup

Sometimes, we open trace_pipe_raw, but we don't read(2) it,
we just splice(2) it, thus, the page is not used.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:43 +02:00
Lai Jiangshan
d1e7e02f30 tracing: disable seeking for trace_pipe_raw
Impact: disable pread()

We set tracing_buffers_fops.llseek to no_llseek,
but we can still perform pread() to read this file.

That is not expected.

This fix uses nonseekable_open() to disable it.

tracing_buffers_fops.llseek is still set to no_llseek,
it mark this file is a "non-seekable device" and is used by
sys_splice(). See also do_splice() or manual of splice(2):

ERRORS
       EINVAL Target file system doesn't support  splicing;
              neither  of the descriptors refers to a pipe;
              or offset given for non-seekable device.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:42 +02:00
Linus Torvalds
c2ea122cd7 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: consolidate documents
  blktrace: pass the right pointer to kfree()
  tracing/syscalls: use a dedicated file header
  tracing: append a comma to INIT_FTRACE_GRAPH
2009-04-09 10:37:46 -07:00
Linus Torvalds
17b2e9bf27 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: do not count frozen tasks toward load
  sched: refresh MAINTAINERS entry
  sched: Print sched_group::__cpu_power in sched_domain_debug
  cpuacct: add per-cgroup utime/stime statistics
  posixtimers, sched: Fix posix clock monotonicity
  sched_rt: don't allocate cpumask in fastpath
  cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2
2009-04-09 10:37:28 -07:00
Linus Torvalds
422a253483 Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  printk: fix wrong format string iter for printk
  futex: comment requeue key reference semantics

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: fix cpumask memory leak on offstack cpumask kernels

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)
  posix-timers: fix RLIMIT_CPU && fork()
  timers: add missing kernel-doc
2009-04-09 10:35:30 -07:00
Heiko Carstens
36cd3c9f92 mutex: have non-spinning mutexes on s390 by default
Impact: performance regression fix for s390

The adaptive spinning mutexes will not always do what one would expect on
virtualized architectures like s390. Especially the cpu_relax() loop in
mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
away by the hypervisor.

We would end up in a cpu_relax() loop when there is no chance that the
state of the mutex changes until the target cpu has been scheduled again by
the hypervisor.

For that reason we should change the default behaviour to no-spin on s390.

We do have an instruction which allows to yield the current cpu in favour of
a different target cpu. Also we have an instruction which allows us to figure
out if the target cpu is physically backed.

However we need to do some performance tests until we can come up with
a solution that will do the right thing on s390.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
LKML-Reference: <20090409184834.7a0df7b2@osiris.boeblingen.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09 19:28:24 +02:00
Li Zefan
9eb85125ce blktrace: pass the right pointer to kfree()
Impact: fix kfree crash with non-standard act_mask string

If passing a string with leading white spaces to strstrip(),
the returned ptr != the original ptr.

This bug was introduced by me.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49DD694C.8020902@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09 05:52:40 +02:00
Frederic Weisbecker
47788c58e6 tracing/syscalls: use a dedicated file header
Impact: fix build warnings and possibe compat misbehavior on IA64

Building a kernel on ia64 might trigger these ugly build warnings:

CC      arch/ia64/ia32/sys_ia32.o
In file included from arch/ia64/ia32/sys_ia32.c:55:
arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
In file included from include/linux/elf.h:7,
                 from include/linux/module.h:14,
                 from include/linux/ftrace.h:8,
                 from include/linux/syscalls.h:68,
                 from arch/ia64/ia32/sys_ia32.c:18:
arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
[...]

sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
to import the syscalls tracing prototypes.

But including ftrace.h can pull too much things for a low level file,
especially on ia64 where the ia32 private headers conflict with higher
level headers.

Now we isolate the syscall tracing headers in their own lightweight file.

Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Davidson <md@google.com>
LKML-Reference: <20090408184058.GB6017@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09 05:43:32 +02:00
Andrew Morton
6b44003e5c work_on_cpu(): rewrite it to create a kernel thread on demand
Impact: circular locking bugfix

The various implemetnations and proposed implemetnations of work_on_cpu()
are vulnerable to various deadlocks because they all used queues of some
form.

Unrelated pieces of kernel code thus gained dependencies wherein if one
work_on_cpu() caller holds a lock which some other work_on_cpu() callback
also takes, the kernel could rarely deadlock.

Fix this by creating a short-lived kernel thread for each work_on_cpu()
invokation.

This is not terribly fast, but the only current caller of work_on_cpu() is
pci_call_probe().

It would be nice to find some other way of doing the node-local
allocations in the PCI probe code so that we can zap work_on_cpu()
altogether.  The code there is rather nasty.  I can't think of anything
simple at this time...

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09 09:50:37 +09:30
Oleg Nesterov
1c99315bb3 kthread: move sched-realeted initialization from kthreadd context
kthreadd is the single thread which implements ths "create" request, move
sched_setscheduler/etc from create_kthread() to kthread_create() to
improve the scalability.

We should be careful with sched_setscheduler(), use _nochek helper.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Vitaliy Gusev <vgusev@openvz.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09 09:50:37 +09:30
Vitaliy Gusev
3217ab97f1 kthread: Don't looking for a task in create_kthread() #2
Remove the unnecessary find_task_by_pid_ns(). kthread() can just
use "current" to get the same result.

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09 09:50:36 +09:30
Roland McGrath
3a70970353 ptrace: some checkpatch fixes
This fixes all the checkpatch --file complaints about kernel/ptrace.c
and also removes an unused #include.  I've verified that there are no
changes to the compiled code on x86_64.

Signed-off-by: Roland McGrath <roland@redhat.com>
[ Removed the parts that just split a line  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-08 10:21:44 -07:00
Oleg Nesterov
8f2e586567 posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)
update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case
when we already have CPUCLOCK_PROF timer which should expire first. But it
uses cputime_lt() instead of cputime_gt().

Test case:

	int main(void)
	{
		struct itimerval it = {
			.it_value = { .tv_sec = 1000 },
		};

		assert(!setitimer(ITIMER_PROF, &it, NULL));

		struct rlimit rl = {
			.rlim_cur = 1,
			.rlim_max = 1,
		};

		assert(!setrlimit(RLIMIT_CPU, &rl));

		for (;;)
			;

		return 0;
	}

Without this patch, the task is not killed as RLIMIT_CPU demands.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Lojkin <ia6432@inbox.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: stable@kernel.org
LKML-Reference: <20090327000610.GA10108@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 17:51:39 +02:00
Oleg Nesterov
6279a751fe posix-timers: fix RLIMIT_CPU && fork()
See http://bugzilla.kernel.org/show_bug.cgi?id=12911

copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
fastpath_timer_check() returns false unless we have other expired cpu timers.

Change copy_signal() to set cputime_expires.prof_exp if we have RLIMIT_CPU.
Also, set cputimer.running = 1 in that case. This is not strictly necessary,
but imho makes sense.

Reported-by: Peter Lojkin <ia6432@inbox.ru>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Lojkin <ia6432@inbox.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: stable@kernel.org
LKML-Reference: <20090327000607.GA10104@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 17:51:38 +02:00
Ingo Molnar
5af8c4e0fa Merge commit 'v2.6.30-rc1' into sched/urgent
Merge reason: update to latest upstream to queue up fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 17:26:00 +02:00
Ingo Molnar
ff96e612cb Merge commit 'v2.6.30-rc1' into core/urgent
Merge reason: need latest upstream to queue up dependent fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 17:02:57 +02:00
Linus Torvalds
1551260d1f Merge branch 'core/softlockup' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softlockup' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softlockup: make DETECT_HUNG_TASK default depend on DETECT_SOFTLOCKUP
  softlockup: move 'one' to the softlockup section in sysctl.c
  softlockup: ensure the task has been switched out once
  softlockup: remove timestamp checking from hung_task
  softlockup: convert read_lock in hung_task to rcu_read_lock
  softlockup: check all tasks in hung_task
  softlockup: remove unused definition for spawn_softlockup_task
  softlockup: fix potential race in hung_task when resetting timeout
  softlockup: fix to allow compiling with !DETECT_HUNG_TASK
  softlockup: decouple hung tasks check from softlockup detection
2009-04-07 14:11:07 -07:00
Linus Torvalds
c93f216b5b Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y
  branch tracer: Fix for enabling branch profiling makes sparse unusable
  ftrace: Correct a text align for event format output
  Update /debug/tracing/README
  tracing/ftrace: alloc the started cpumask for the trace file
  tracing, x86: remove duplicated #include
  ftrace: Add check of sched_stopped for probe_sched_wakeup
  function-graph: add proper initialization for init task
  tracing/ftrace: fix missing include string.h
  tracing: fix incorrect return type of ns2usecs()
  tracing: remove CALLER_ADDR2 from wakeup tracer
  blktrace: fix pdu_len when tracing packet command requests
  blktrace: small cleanup in blk_msg_write()
  blktrace: NUL-terminate user space messages
  tracing: move scripts/trace/power.pl to scripts/tracing/power.pl
2009-04-07 14:10:10 -07:00
Linus Torvalds
c61b79b6ef Merge branch 'irq/threaded' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq/threaded' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: fix devres.o build for GENERIC_HARDIRQS=n
  genirq: provide old request_irq() for CONFIG_GENERIC_HARDIRQ=n
  genirq: threaded irq handlers review fixups
  genirq: add support for threaded interrupts to devres
  genirq: add threaded interrupt handler support
2009-04-07 14:07:52 -07:00
Masami Hiramatsu
de5bd88d5a kprobes: support per-kprobe disabling
Add disable_kprobe() and enable_kprobe() to disable/enable kprobes
temporarily.

disable_kprobe() asynchronously disables probe handlers of specified
kprobe.  So, after calling it, some handlers can be called at a while.
enable_kprobe() enables specified kprobe.

aggr_pre_handler and aggr_post_handler check disabled probes.  On the
other hand aggr_break_handler and aggr_fault_handler don't check it
because these handlers will be called while executing pre or post handlers
and usually those help error handling.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:08 -07:00
Masami Hiramatsu
e579abeb58 kprobes: rename kprobe_enabled to kprobes_all_disarmed
Rename kprobe_enabled to kprobes_all_disarmed and invert logic due to
avoiding naming confusion from per-probe disabling.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:08 -07:00
Masami Hiramatsu
99081ab553 kprobes: move EXPORT_SYMBOL_GPL just after function definitions
Clean up positions of EXPORT_SYMBOL_GPL in kernel/kprobes.c according to
checkpatch.pl.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:08 -07:00
Masami Hiramatsu
b918e5e60d kprobes: cleanup aggr_kprobe related code
Currently, kprobes can disable all probes at once, but can't disable it
individually (not unregister, just disable an kprobe, because
unregistering needs to wait for scheduler synchronization).  These patches
introduce APIs for on-the-fly per-probe disabling and re-enabling by
dis-arming/re-arming its breakpoint instruction.

This patch:

Change old_p to ap in add_new_kprobe() for readability, copy flags member
in add_aggr_kprobe(), and simplify the code flow of
register_aggr_kprobe().

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:07 -07:00
Peter W Morreale
fafd688e4c mm: add /proc controls for pdflush threads
Add /proc entries to give the admin the ability to control the minimum and
maximum number of pdflush threads.  This allows finer control of pdflush
on both large and small machines.

The rationale is simply one size does not fit all.  Admins on large and/or
small systems may want to tune the min/max pdflush thread count to best
suit their needs.  Right now the min/max is hardcoded to 2/8.  While
probably a fair estimate for smaller machines, large machines with large
numbers of CPUs and large numbers of filesystems/block devices may benefit
from larger numbers of threads working on different block devices.

Even if the background flushing algorithm is radically changed, it is
still likely that multiple threads will be involved and admins would still
desire finer control on the min/max other than to have to recompile the
kernel.

The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
'/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
the current value of nr_pdflush_threads_max.  This minimum is required
since additional thread creation is performed in a pdflush thread itself.

The minimum value for nr_pdflush_threads_max is the current value of
nr_pdflush_threads_min and the maximum value can be 1000.

Documentation/sysctl/vm.txt is also updated.

[akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
Signed-off-by: Peter W Morreale <pmorreale@novell.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:03 -07:00
Zhaolei
dcef788eb9 ftrace: clean up enable logic for sched_switch
Unify sched_switch and sched_wakeup's action to following logic:
Do record_cmdline when start_cmdline_record() is called.
Start tracing events when the tracer is started.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <49D1C596.5050203@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07 14:43:09 +02:00
Steven Rostedt
597af81537 function-graph: use int instead of atomic for ftrace_graph_active
Impact: cleanup

The variable ftrace_graph_active is only modified under the
ftrace_lock mutex, thus an atomic is not necessary for modification.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07 14:43:08 +02:00
Frederic Weisbecker
5452af664f tracing/ftrace: factorize the tracing files creation
Impact: cleanup

Most of the tracing files creation follow the same pattern:

ret = debugfs_create_file(...)
if (!ret)
	pr_warning("Couldn't create ... entry\n")

Unify it!

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07 14:43:07 +02:00
Li Zefan
a5dec5573f tracing: use macros to denote usec and nsec per second
Impact: cleanup

Use USEC_PER_SEC and NSEC_PER_SEC instead of 1000000 and 1000000000.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <49CC7870.9000309@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07 14:43:06 +02:00
Ingo Molnar
86665c75da Merge branch 'tracing/urgent' into tracing/ftrace 2009-04-07 14:41:17 +02:00
Zhaolei
1bbe2a83ab ftrace: Correct a text align for event format output
If we cat debugfs/tracing/events/ftrace/bprint/format, we'll see:
name: bprint
ID: 6
format:
	field:unsigned char common_type;	offset:0;	size:1;
	field:unsigned char common_flags;	offset:1;	size:1;
	field:unsigned char common_preempt_count;	offset:2;	size:1;
	field:int common_pid;	offset:4;	size:4;
	field:int common_tgid;	offset:8;	size:4;

	field:unsigned long ip;	offset:12;	size:4;
	field:char * fmt;	offset:16;	size:4;
	field: char buf;	offset:20;	size:0;

print fmt: "%08lx (%d) fmt:%p %s"

There is an inconsistent blank before char buf.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <49D5E3EE.70201@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:02:42 +02:00
Nikanth Karthikesan
bc2b6871c1 Update /debug/tracing/README
Some of the tracers have been renamed, which was not updated in the in-kernel
run-time README file. Update it.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
LKML-Reference: <200903231158.32151.knikanth@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:02:36 +02:00
Frederic Weisbecker
b0dfa978c7 tracing/ftrace: alloc the started cpumask for the trace file
Impact: fix a crash while cat trace file

Currently we are using a cpumask to remind each cpu where a
trace occured. It lets us notice the user that a cpu just had
its first trace.

But on latest -tip we have the following crash once we cat the trace
file:

IP: [<c0270c4a>] print_trace_fmt+0x45/0xe7
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/class/net/eth0/carrier
Pid: 3897, comm: cat Not tainted (2.6.29-tip-02825-g0f22972-dirty #81)
EIP: 0060:[<c0270c4a>] EFLAGS: 00010297 CPU: 0
EIP is at print_trace_fmt+0x45/0xe7
EAX: 00000000 EBX: 00000000 ECX: c12d9e98 EDX: ccdb7010
ESI: d31f4000 EDI: 00322401 EBP: d31f3f10 ESP: d31f3efc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process cat (pid: 3897, ti=d31f2000 task=d3b3cf20 task.ti=d31f2000)
Stack:
d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000
d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33
00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000
Call Trace:
[<c0270e5c>] ? print_trace_line+0x170/0x17c
[<c02718e8>] ? s_show+0xa7/0xbd
[<c02bfc33>] ? seq_read+0x24a/0x327
[<c02bf9e9>] ? seq_read+0x0/0x327
[<c02ab18b>] ? vfs_read+0x86/0xe1
[<c02ab289>] ? sys_read+0x40/0x65
[<c0202d8f>] ? sysenter_do_call+0x12/0x3c
Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00
EIP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:d31f3efc
CR2: 0000000000000000
---[ end trace aa9cf38e5ebed9dd ]---

This is because we alloc the iter->started cpumask on tracing_pipe_open but
not on tracing_open.

It hadn't been noticed until now because we need to have ring buffer overruns
to activate the starting of cpu buffer detection.

Also, we need a check to not print the messagge for the first trace on the file.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238619188-6109-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:02:03 +02:00
Zhaolei
8bcae09b93 ftrace: Add check of sched_stopped for probe_sched_wakeup
The wakeup tracing in sched_switch does not stop when a user
disables tracing. This is because the probe_sched_wakeup() is missing
the check to prevent the wakeup from being traced.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <49D1C543.3010307@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:01:11 +02:00
Frederic Weisbecker
5f0c6c03c5 tracing/ftrace: fix missing include string.h
Building a kernel with tracing can raise the following warning on
tip/master:

kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf'

We are missing an include to string.h

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238160130-7437-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:00:18 +02:00
Lai Jiangshan
cf8e347465 tracing: fix incorrect return type of ns2usecs()
Impact: fix time output bug in 32bits system

ns2usecs() returns 'long', it's incorrect.

(In i386)
...
          <idle>-0     [000]   521.442100: _spin_lock <-tick_do_update_jiffies64
          <idle>-0     [000]   521.442101: do_timer <-tick_do_update_jiffies64
          <idle>-0     [000]   521.442102: update_wall_time <-do_timer
          <idle>-0     [000]   521.442102: update_xtime_cache <-update_wall_time
....
(It always print the time less than 2200 seconds besides ...)
Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)

...
          <idle>-0     [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
          <idle>-0     [001] 4154502640.134760: _local_bh_enable <-__do_softirq
          <idle>-0     [001] 4154502640.134761: idle_cpu <-irq_exit
...
(very large value)
Because 'long' is a signed type and it is 32bits in i386.

Changes in v2:
return 'unsigned long long' instead of 'cycle_t'

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49D05D10.4030009@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 13:59:23 +02:00
Steven Rostedt
301fd748e2 tracing: remove CALLER_ADDR2 from wakeup tracer
Maneesh Soni was getting a crash when running the wakeup tracer.
We debugged it down to the recording of the function with the
CALLER_ADDR2 macro.  This is used to get the location of the caller
to schedule.

But the problem comes when schedule is called by assmebly. In the case
that Maneesh had, retint_careful would call schedule. But retint_careful
does not set up a proper frame pointer. CALLER_ADDR2 is defined as
__builtin_return_address(2). This produces the following assembly in
the wakeup tracer code.

   mov    0x0(%rbp),%rcx  <--- get the frame pointer of the caller
   mov    %r14d,%r8d
   mov    0xf2de8e(%rip),%rdi

   mov    0x8(%rcx),%rsi  <-- this is __builtin_return_address(1)
   mov    0x28(%rdi,%rax,8),%rbx

   mov    (%rcx),%rax  <-- get the frame pointer of the caller's caller
   mov    %r12,%rcx
   mov    0x8(%rax),%rdx <-- this is __builtin_return_address(2)

At the reading of 0x8(%rax) Maneesh's machine would take a fault.
The reason is that retint_careful did not set up the return address
and the content of %rax here was zero.

To verify this, I sent Maneesh a patch to create a frame pointer
in retint_careful. He ran the test again but this time he would take
the same type of fault from sysret_careful. The retint_careful was no
longer an issue, but there are other callers that still have issues.

Instead of adding frame pointers for all callers to schedule (in possibly
all archs), it is much safer to simply not use CALLER_ADDR2. This
loses out on knowing what called schedule, but the function tracer
will help there if needed.

Reported-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 13:58:54 +02:00
Ingo Molnar
93776a8ec7 Merge branch 'linus' into tracing/core
Merge reason: update to upstream tracing facilities

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 13:47:45 +02:00
Ingo Molnar
a053958f07 Merge branch 'tracing/blktrace-fixes' into tracing/urgent
Merge reason: this used to be a tracing/blktrace-v2 devel topic still
              cooking during the merge window - has propagated to fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 13:40:55 +02:00
Ingo Molnar
5e34437840 Merge branch 'linus' into core/softlockup
Conflicts:
	kernel/sysctl.c
2009-04-07 11:15:40 +02:00
Rusty Russell
2e45e77787 Revert "module: remove the SHF_ALLOC flag on the __versions section."
This reverts commit 9cb610d8e3.

This was an impressively stupid patch.  Firstly, we reset the SHF_ALLOC
flag lower down in the same function, so the patch was useless.  Even
better, find_sec() ignores sections with SHF_ALLOC not set, so
it breaks CONFIG_MODVERSIONS=y with CONFIG_MODULE_FORCE_LOAD=n, which
refuses to load the module since it can't find the __versions section.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-07 17:12:43 +09:30
Oleg Nesterov
432870dab8 exit_notify: kill the wrong capable(CAP_KILL) check
The CAP_KILL check in exit_notify() looks just wrong, kill it.

Whatever logic we have to reset ->exit_signal, the malicious user
can bypass it if it execs the setuid application before exiting.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06 14:57:23 -07:00
Linus Torvalds
cd5f9a4c31 kernel/sysctl.c: avoid annoying warnings
Some of the limit constants are used only depending on some complex
configuration dependencies, yet it's not worth making the simple
variables depend on those configuration details.  Just mark them as
perhaps not being unused, and avoid the warning.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06 13:38:46 -07:00
Linus Torvalds
609862be07 Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: add stack dumps to asserts
  hrtimer: fix rq->lock inversion (again)
2009-04-06 13:37:30 -07:00
Linus Torvalds
12fe32e4f9 Merge branch 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kmemtrace: trace kfree() calls with NULL or zero-length objects
  kmemtrace: small cleanups
  kmemtrace: restore original tracing data binary format, improve ABI
  kmemtrace: kmemtrace_alloc() must fill type_id
  kmemtrace: use tracepoints
  kmemtrace, rcu: don't include unnecessary headers, allow kmemtrace w/ tracepoints
  kmemtrace, rcu: fix rcupreempt.c data structure dependencies
  kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies
  kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies
  kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c
  kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_unlzma.c
  kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_bunzip2.c
  kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_inflate.c
  kmemtrace, squashfs: fix slab.h dependency problem in squasfs
  kmemtrace, befs: fix slab.h dependency problem
  kmemtrace, security: fix linux/key.h header file dependencies
  kmemtrace, fs: fix linux/fdtable.h header file dependencies
  kmemtrace, fs: uninline simple_transaction_set()
  kmemtrace, fs, security: move alloc_secdata() and free_secdata() to linux/security.h
2009-04-06 13:30:00 -07:00
Ingo Molnar
9efe21cb82 Merge branch 'linus' into irq/threaded
Conflicts:
	include/linux/irq.h
	kernel/irq/handle.c
2009-04-06 01:41:22 +02:00
Linus Torvalds
0221c81b1b Merge branch 'audit.b62' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b62' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  Audit: remove spaces from audit_log_d_path
  audit: audit_set_auditable defined but not used
  audit: incorrect ref counting in audit tree tag_chunk
  audit: Fix possible return value truncation in audit_get_context()
  audit: ignore terminating NUL in AUDIT_USER_TTY messages
  Audit: fix handling of 'strings' with NULL characters
  make the e->rule.xxx shorter in kernel auditfilter.c
  auditsc: fix kernel-doc notation
  audit: EXECVE record - removed bogus newline
2009-04-05 12:36:11 -07:00
Linus Torvalds
714f83d5d9 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
  tracing, net: fix net tree and tracing tree merge interaction
  tracing, powerpc: fix powerpc tree and tracing tree interaction
  ring-buffer: do not remove reader page from list on ring buffer free
  function-graph: allow unregistering twice
  trace: make argument 'mem' of trace_seq_putmem() const
  tracing: add missing 'extern' keywords to trace_output.h
  tracing: provide trace_seq_reserve()
  blktrace: print out BLK_TN_MESSAGE properly
  blktrace: extract duplidate code
  blktrace: fix memory leak when freeing struct blk_io_trace
  blktrace: fix blk_probes_ref chaos
  blktrace: make classic output more classic
  blktrace: fix off-by-one bug
  blktrace: fix the original blktrace
  blktrace: fix a race when creating blk_tree_root in debugfs
  blktrace: fix timestamp in binary output
  tracing, Text Edit Lock: cleanup
  tracing: filter fix for TRACE_EVENT_FORMAT events
  ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
  x86: kretprobe-booster interrupt emulation code fix
  ...

Fix up trivial conflicts in
 arch/parisc/include/asm/ftrace.h
 include/linux/memory.h
 kernel/extable.c
 kernel/module.c
2009-04-05 11:04:19 -07:00
Eric Paris
def5754341 Audit: remove spaces from audit_log_d_path
audit_log_d_path had spaces in the strings which would be emitted on the
error paths.  This patch simply replaces those spaces with an _ or removes
the needless spaces entirely.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:49:04 -04:00
Eric Paris
679173b724 audit: audit_set_auditable defined but not used
after 0590b9335a audit_set_auditable() is now only
used by the audit tree code.  If CONFIG_AUDIT_TREE is unset it will be defined
but unused.  This patch simply moves the function inside a CONFIG_AUDIT_TREE
block.

cc1: warnings being treated as errors
/home/acme_unencrypted/git/linux-2.6-tip/kernel/auditsc.c:745: error: ‘audit_set_auditable’ defined but not used
make[2]: *** [kernel/auditsc.o] Error 1
make[1]: *** [kernel] Error 2
make[1]: *** Waiting for unfinished jobs....

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:48:52 -04:00
Eric Paris
318b6d3d7d audit: incorrect ref counting in audit tree tag_chunk
tag_chunk has bad exit paths in which the inotify ref counting is wrong.
At the top of the function we found &old_watch using  inotify_find_watch().
inotify_find_watch takes a reference to the watch.  This is never dropped
on an error path.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:48:26 -04:00
Paul Moore
6d208da89a audit: Fix possible return value truncation in audit_get_context()
The audit subsystem treats syscall return codes as type long, unfortunately
the audit_get_context() function mistakenly converts the return code to an
int type in the parameters which could cause problems on systems where the
sizeof(int) != sizeof(long).

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:46:19 -04:00
Miloslav Trmac
55ad2f8d34 audit: ignore terminating NUL in AUDIT_USER_TTY messages
AUDIT_USER_TTY, like all other messages sent from user-space, is sent
NUL-terminated.  Unlike other user-space audit messages, which come only
from trusted sources, AUDIT_USER_TTY messages are processed using
audit_log_n_untrustedstring().

This patch modifies AUDIT_USER_TTY handling to ignore the trailing NUL
and use the "quoted_string" representation of the message if possible.

Signed-off-by: Miloslav Trmac <mitr@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:43:36 -04:00
Miloslav Trmac
b3897f5671 Audit: fix handling of 'strings' with NULL characters
currently audit_log_n_untrustedstring() uses audit_string_contains_control()
to check if the 'string' has any control characters.  If the 'string' has an
embedded NULL audit_string_contains_control() will return that the data has
no control characters and will then pass the string to audit_log_n_string
with the total length, not the length up to the first NULL.
audit_log_n_string() does a memcpy of the entire length and so the actual
audit record emitted may then contain a NULL and then whatever random memory
is after the NULL.

Since we want to log the entire octet stream (if we can't trust the data
to be a string we can't trust that a NULL isn't actually a part of it)
we should just consider NULL as a control character.  If the caller is
certain they want to stop at the first NULL they should be using
audit_log_untrustedstring.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:43:24 -04:00
Zhenwen Xu
c28bb7da74 make the e->rule.xxx shorter in kernel auditfilter.c
make the e->rule.xxx shorter in kernel/auditfilter.c
--
---------------------------------
Zhenwen Xu - Open and Free
Home Page:	http://zhwen.org
My Studio:	http://dim4.cn

>From 99692dc640b278f1cb1a15646ce42f22e89c0f77 Mon Sep 17 00:00:00 2001
From: Zhenwen Xu <Helight.Xu@gmail.com>
Date: Thu, 12 Mar 2009 22:04:59 +0800
Subject: [PATCH] make the e->rule.xxx shorter in kernel/auditfilter.c

Signed-off-by: Zhenwen Xu <Helight.Xu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:40:33 -04:00
Randy Dunlap
6b96255998 auditsc: fix kernel-doc notation
Fix auditsc kernel-doc notation:

Warning(linux-2.6.28-git7//kernel/auditsc.c:2156): No description found for parameter 'attr'
Warning(linux-2.6.28-git7//kernel/auditsc.c:2156): Excess function parameter 'u_attr' description in '__audit_mq_open'
Warning(linux-2.6.28-git7//kernel/auditsc.c:2204): No description found for parameter 'notification'
Warning(linux-2.6.28-git7//kernel/auditsc.c:2204): Excess function parameter 'u_notification' description in '__audit_mq_notify'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc:	Al Viro <viro@zeniv.linux.org.uk>
cc:	Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:39:19 -04:00
Jiri Pirko
ca96a895a6 audit: EXECVE record - removed bogus newline
(updated)
Added hunk that changes the comment, the rest is the same.

EXECVE records contain a newline after every argument. auditd converts
"\n" to " " so you cannot see newlines even in raw logs, but they're
there nevertheless. If you're not using auditd, you need to work round
them. These '\n' chars are can be easily replaced by spaces when
creating record in kernel. Note there is no need for trailing '\n' in
an audit record.

record before this patch:
"type=EXECVE msg=audit(1231421801.566:31): argc=4 a0=\"./test\"\na1=\"a\"\na2=\"b\"\na3=\"c\"\n"

record after this patch:
"type=EXECVE msg=audit(1231421801.566:31): argc=4 a0=\"./test\" a1=\"a\" a2=\"b\" a3=\"c\""

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-05 13:38:59 -04:00
Linus Torvalds
90975ef712 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits)
  cpumask: remove cpumask allocation from idle_balance, fix
  numa, cpumask: move numa_node_id default implementation to topology.h, fix
  cpumask: remove cpumask allocation from idle_balance
  x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus
  x86: cpumask: update 32-bit APM not to mug current->cpus_allowed
  x86: microcode: cleanup
  x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c
  cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash
  numa, cpumask: move numa_node_id default implementation to topology.h
  cpumask: convert node_to_cpumask_map[] to cpumask_var_t
  cpumask: remove x86 cpumask_t uses.
  cpumask: use cpumask_var_t in uv_flush_tlb_others.
  cpumask: remove cpumask_t assignment from vector_allocation_domain()
  cpumask: make Xen use the new operators.
  cpumask: clean up summit's send_IPI functions
  cpumask: use new cpumask functions throughout x86
  x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask
  cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t
  cpumask: convert node_to_cpumask_map[] to cpumask_var_t
  x86: unify 32 and 64-bit node_to_cpumask_map
  ...
2009-04-05 10:33:07 -07:00
Linus Torvalds
cab4e4c43f Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param:
  module: use strstarts()
  strstarts: helper function for !strncmp(str, prefix, strlen(prefix))
  arm: allow usage of string functions in linux/string.h
  module: don't use stop_machine on module load
  module: create a request_module_nowait()
  module: include other structures in module version check
  module: remove the SHF_ALLOC flag on the __versions section.
  module: clarify the force-loading taint message.
  module: Export symbols needed for Ksplice
  Ksplice: Add functions for walking kallsyms symbols
  module: remove module_text_address()
  module: __module_address
  module: Make find_symbol return a struct kernel_symbol
  kernel/module.c: fix an unused goto label
  param: fix charp parameters set via sysfs

Fix trivial conflicts in kernel/extable.c manually.
2009-04-05 10:30:21 -07:00
Linus Torvalds
e4c393fd55 Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  printk: correct the behavior of printk_timed_ratelimit()
  vsprintf: unify the format decoding layer for its 3 users, cleanup
  fix regression from "vsprintf: unify the format decoding layer for its 3 users"
  vsprintf: fix bug in negative value printing
  vsprintf: unify the format decoding layer for its 3 users
  vsprintf: add binary printf
  printk: introduce printk_once()

Fix trivial conflicts (printk_once vs log_buf_kexec_setup() added near
each other) in include/linux/kernel.h.
2009-04-05 10:23:25 -07:00
Linus Torvalds
09f38dc19d Merge branch 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ptrace: remove a useless goto
2009-04-03 17:35:06 -07:00
Linus Torvalds
30a39e0e97 Merge branch 'stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  symbols, stacktrace: look up init symbols after module symbols
2009-04-03 17:34:41 -07:00
Linus Torvalds
c7edad5fcb Merge branch 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
2009-04-03 17:34:12 -07:00
Linus Torvalds
b1dbb67911 Merge branch 'ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  s390: remove arch specific smp_send_stop()
  panic: clean up kernel/panic.c
  panic, smp: provide smp_send_stop() wrapper on UP too
  panic: decrease oops_in_progress only after having done the panic
  generic-ipi: eliminate WARN_ON()s during oops/panic
  generic-ipi: cleanups
  generic-ipi: remove CSD_FLAG_WAIT
  generic-ipi: remove kmalloc()
  generic IPI: simplify barriers and locking
2009-04-03 17:33:30 -07:00
Linus Torvalds
492f59f526 Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]
  lockdep: remove duplicate CONFIG_DEBUG_LOCKDEP definitions
  lockdep: require framepointers for x86
  lockdep: remove extra "irq" string
  lockdep: fix incorrect state name
2009-04-03 17:29:53 -07:00
Linus Torvalds
811158b147 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
  trivial: Update my email address
  trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
  trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
  trivial: Fix misspelling of "Celsius".
  trivial: remove unused variable 'path' in alloc_file()
  trivial: fix a pdlfush -> pdflush typo in comment
  trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
  trivial: wusb: Storage class should be before const qualifier
  trivial: drivers/char/bsr.c: Storage class should be before const qualifier
  trivial: h8300: Storage class should be before const qualifier
  trivial: fix where cgroup documentation is not correctly referred to
  trivial: Give the right path in Documentation example
  trivial: MTD: remove EOL from MODULE_DESCRIPTION
  trivial: Fix typo in bio_split()'s documentation
  trivial: PWM: fix of #endif comment
  trivial: fix typos/grammar errors in Kconfig texts
  trivial: Fix misspelling of firmware
  trivial: cgroups: documentation typo and spelling corrections
  trivial: Update contact info for Jochen Hein
  trivial: fix typo "resgister" -> "register"
  ...
2009-04-03 15:24:35 -07:00