Commit Graph

18 Commits

Author SHA1 Message Date
Paul Mackerras
57e7986ed1 perf_counter: Provide a way to enable counters on exec
This provides a way to mark a counter to be enabled on the next
exec. This is useful for measuring the total activity of a
program without including overhead from the process that
launches it.

This also changes the perf stat command to use this new
facility.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-30 12:00:16 +02:00
Paul Mackerras
051ae7f734 perf_counter tools: Reduce perf stat measurement overhead/skew
Vince Weaver reported a 'perf stat' measurement overhead in the
count of retired instructions, which can amount to a +6000
instructions inflated count in the reported count.

At present, perf stat creates its counters on the perf process.  Thus
the counters count the fork and various other activity in both the
parent and child, such as the resolver overhead for resolving PLT
entries for any libc functions that haven't been called before, such
as execvp.

This reduces the overhead by creating the counters on the child process
after the fork, using a couple of pipes to synchronize so that the
child process waits until the parent has created the counters before
doing the exec.  To eliminate the PLT resolution overhead on calling
execvp, this does a dummy execvp first which will always fail.

With this, the overhead of executing a program goes down from over
4800 instructions to about 90 instructions on powerpc (32-bit).
This was measured with a statically-linked program written in
assembler which only does the 3 instructions needed to call _exit(0).

Before:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

           4858  instructions

    0.001274523  seconds time elapsed

After:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

             92  instructions

    0.000468153  seconds time elapsed

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 22:38:09 +02:00
Ingo Molnar
210ad39fb7 perf stat: Use percentages for scaling output
Peter expressed a strong preference for percentage based
display of scaled values - so revert to that from the
recently introduced multiplication-factor unit.

Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 21:50:54 +02:00
Jaswinder Singh Rajput
c3043569dc perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
Set attrs and nr_counters if no event is selected and !null_run.

Setting of attrs should depend on number of counters,
so we need to memcpy only for sizeof(default_attrs)

Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
hardcoded value.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-28 15:22:47 +02:00
Jaswinder Singh Rajput
6e750a8fc0 perf stat: Improve output
Increase size for event name to handle bigger names like
'L1-d$-prefetch-misses'

Changed scaled counters from percentage to a multiplicative
factor because the latter is more expressive.

Also aligned the scaling factor, otherwise sometimes it looks
like:

            384  iTLB-load-misses           (4.74x scaled)
         452029  branch-loads               (8.00x scaled)
           5892  branch-load-misses         (20.39x scaled)
         972315  iTLB-loads                 (3.24x scaled)

Before:
         150708  L1-d$-stores          (scaled from 23.57%)
         428804  L1-d$-prefetches      (scaled from 23.47%)
         314446  L1-d$-prefetch-misses  (scaled from 23.42%)
      252626137  L1-i$-loads           (scaled from 23.24%)
        5297550  dTLB-load-misses      (scaled from 23.96%)
      106992392  branch-loads          (scaled from 23.67%)
        5239561  branch-load-misses    (scaled from 23.43%)

After:
        1731713  L1-d$-loads               (  14.25x scaled)
          44241  L1-d$-prefetches          (   3.88x scaled)
          21076  L1-d$-prefetch-misses     (   3.40x scaled)
        5789421  L1-i$-loads               (   3.78x scaled)
          29645  dTLB-load-misses          (   2.95x scaled)
         461474  branch-loads              (   6.52x scaled)
           7493  branch-load-misses        (  26.57x scaled)

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 18:39:41 +02:00
Ingo Molnar
566747e629 perf stat: Fix multi-run stats
In multi-run (-r/--repeat) printouts, print out the noise of
the wall-clock average as well.

Also, fix a bug in printing out scaled counters: if it was not
scaled then we should not update the average with -1.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 06:34:37 +02:00
Ingo Molnar
0cfb7a13b8 perf stat: Add -n/--null option to run without counters
Allow a no-counters run. This can be useful to measure just
elapsed wall-clock time - or to assess the raw overhead of perf
stat itself, without running any counters.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 06:11:24 +02:00
Jaswinder Singh Rajput
3d63259583 perf stat: Remove dead code
Remove dead code and do some code alignment.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245847774.2681.2.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 14:55:45 +02:00
Jaswinder Singh Rajput
cca03c0aeb perf stat: Fix verbose for perf stat
Error message should use stderr for verbose (-v), otherwise
message will be lost for:

 $ ./perf stat -v <cmd>  > /dev/null

For example on AMD bus-cycles event is not available so now
it looks like:

 $ ./perf stat -v -e bus-cycles ls > /dev/null
Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument)

 Performance counter stats for 'ls':

  <not counted>  bus-cycles

    0.006765877  seconds time elapsed.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245757369.3776.1.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-23 21:58:44 +02:00
Paul Mackerras
9cffa8d533 perf_counter tools: Define and use our own u64, s64 etc. definitions
On 64-bit powerpc, __u64 is defined to be unsigned long rather than
unsigned long long.  This causes compiler warnings every time we
print a __u64 value with %Lx.

Rather than changing __u64, we define our own u64 to be unsigned long
long on all architectures, and similarly s64 as signed long long.
For consistency we also define u32, s32, u16, s16, u8 and s8.  These
definitions are put in a new header, types.h, because these definitions
are needed in util/string.h and util/symbol.h.

The main change here is the mechanical change of __[us]{64,32,16,8}
to remove the "__".  The other changes are:

* Create types.h
* Include types.h in perf.h, util/string.h and util/symbol.h
* Add types.h to the LIB_H definition in Makefile
* Added (u64) casts in process_overflow_event() and print_sym_table()
  to kill two remaining warnings.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 18:25:47 +02:00
Ingo Molnar
ef281a196d perf stat: Enable raw data to be printed
If -vv (very verbose) is specified, print out raw data
in the following format:

$ perf stat -vv -r 3 ./loop_1b_instructions

[ perf stat: executing run #1 ... ]
[ perf stat: executing run #2 ... ]
[ perf stat: executing run #3 ... ]

debug:              runtime[0]: 235871872
debug:             walltime[0]: 236646752
debug:       runtime_cycles[0]: 755150182
debug:            counter/0[0]: 235871872
debug:            counter/1[0]: 235871872
debug:            counter/2[0]: 235871872
debug:               scaled[0]: 0
debug:            counter/0[1]: 2
debug:            counter/1[1]: 235870662
debug:            counter/2[1]: 235870662
debug:               scaled[1]: 0
debug:            counter/0[2]: 1
debug:            counter/1[2]: 235870437
debug:            counter/2[2]: 235870437
debug:               scaled[2]: 0
debug:            counter/0[3]: 140
debug:            counter/1[3]: 235870298
debug:            counter/2[3]: 235870298
debug:               scaled[3]: 0
debug:            counter/0[4]: 755150182
debug:            counter/1[4]: 235870145
debug:            counter/2[4]: 235870145
debug:               scaled[4]: 0
debug:            counter/0[5]: 1001411258
debug:            counter/1[5]: 235868838
debug:            counter/2[5]: 235868838
debug:               scaled[5]: 0
debug:            counter/0[6]: 27897
debug:            counter/1[6]: 235868560
debug:            counter/2[6]: 235868560
debug:               scaled[6]: 0
debug:            counter/0[7]: 2910
debug:            counter/1[7]: 235868151
debug:            counter/2[7]: 235868151
debug:               scaled[7]: 0
debug:              runtime[0]: 235980257
debug:             walltime[0]: 236770942
debug:       runtime_cycles[0]: 755114546
debug:            counter/0[0]: 235980257
debug:            counter/1[0]: 235980257
debug:            counter/2[0]: 235980257
debug:               scaled[0]: 0
debug:            counter/0[1]: 3
debug:            counter/1[1]: 235980049
debug:            counter/2[1]: 235980049
debug:               scaled[1]: 0
debug:            counter/0[2]: 1
debug:            counter/1[2]: 235979907
debug:            counter/2[2]: 235979907
debug:               scaled[2]: 0
debug:            counter/0[3]: 135
debug:            counter/1[3]: 235979780
debug:            counter/2[3]: 235979780
debug:               scaled[3]: 0
debug:            counter/0[4]: 755114546
debug:            counter/1[4]: 235979652
debug:            counter/2[4]: 235979652
debug:               scaled[4]: 0
debug:            counter/0[5]: 1001439771
debug:            counter/1[5]: 235979304
debug:            counter/2[5]: 235979304
debug:               scaled[5]: 0
debug:            counter/0[6]: 23723
debug:            counter/1[6]: 235979050
debug:            counter/2[6]: 235979050
debug:               scaled[6]: 0
debug:            counter/0[7]: 2213
debug:            counter/1[7]: 235978820
debug:            counter/2[7]: 235978820
debug:               scaled[7]: 0
debug:              runtime[0]: 235888002
debug:             walltime[0]: 236700533
debug:       runtime_cycles[0]: 754881504
debug:            counter/0[0]: 235888002
debug:            counter/1[0]: 235888002
debug:            counter/2[0]: 235888002
debug:               scaled[0]: 0
debug:            counter/0[1]: 2
debug:            counter/1[1]: 235887793
debug:            counter/2[1]: 235887793
debug:               scaled[1]: 0
debug:            counter/0[2]: 1
debug:            counter/1[2]: 235887645
debug:            counter/2[2]: 235887645
debug:               scaled[2]: 0
debug:            counter/0[3]: 135
debug:            counter/1[3]: 235887499
debug:            counter/2[3]: 235887499
debug:               scaled[3]: 0
debug:            counter/0[4]: 754881504
debug:            counter/1[4]: 235887368
debug:            counter/2[4]: 235887368
debug:               scaled[4]: 0
debug:            counter/0[5]: 1001401731
debug:            counter/1[5]: 235887024
debug:            counter/2[5]: 235887024
debug:               scaled[5]: 0
debug:            counter/0[6]: 24212
debug:            counter/1[6]: 235886786
debug:            counter/2[6]: 235886786
debug:               scaled[6]: 0
debug:            counter/0[7]: 1824
debug:            counter/1[7]: 235886560
debug:            counter/2[7]: 235886560
debug:               scaled[7]: 0

 Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):

     235.913377  task-clock-msecs     #      0.997 CPUs    ( +-   0.011% )
              2  context-switches     #      0.000 M/sec   ( +-   0.000% )
              1  CPU-migrations       #      0.000 M/sec   ( +-   0.000% )
            136  page-faults          #      0.001 M/sec   ( +-   0.730% )
      755048744  cycles               #   3200.534 M/sec   ( +-   0.009% )
     1001417586  instructions         #      1.326 IPC     ( +-   0.001% )
          25277  cache-references     #      0.107 M/sec   ( +-   3.988% )
           2315  cache-misses         #      0.010 M/sec   ( +-   9.845% )

    0.236706075  seconds time elapsed.

This allows the summary stats to be validated.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 15:40:35 +02:00
Ingo Molnar
42202dd56c perf stat: Add feature to run and measure a command multiple times
Add the --repeat <n> feature to perf stat, which repeats a given
command up to a 100 times, collects the stats and calculates an
average and a stddev.

For example, the following oneliner 'perf stat' command runs hackbench
5 times and prints a tabulated result of all metrics, with averages
and noise levels (in percentage) printed:

 aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
 Time: 0.117
 Time: 0.108
 Time: 0.089
 Time: 0.088
 Time: 0.100

 Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

    1243.989586  task-clock-msecs     #     10.460 CPUs    ( +-   4.720% )
          47706  context-switches     #      0.038 M/sec   ( +-  19.706% )
            387  CPU-migrations       #      0.000 M/sec   ( +-   3.608% )
          17793  page-faults          #      0.014 M/sec   ( +-   0.354% )
     3770941606  cycles               #   3031.329 M/sec   ( +-   4.621% )
     1566372416  instructions         #      0.415 IPC     ( +-   2.703% )
       16783421  cache-references     #     13.492 M/sec   ( +-   5.202% )
        7128590  cache-misses         #      5.730 M/sec   ( +-   7.420% )

    0.118924455  seconds time elapsed.

The goal of this feature is to allow the reliance on these accurate
statistics and to know how many times a command has to be repeated
for the noise to go down to an acceptable level.

(The -v option can be used to see a line printed out as each run progresses.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 15:18:57 +02:00
Ingo Molnar
44175b6f39 perf stat: Reorganize output
- use IPC for the instruction normalization output
 - CPUs for the CPU utilization factor value.
 - print out time elapsed like the other rows
 - tidy up the task-clocks/cpu-clocks printout

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 13:40:03 +02:00
Peter Zijlstra
f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Peter Zijlstra
f7b7c26e01 perf_counter tools: Propagate signals properly
Currently report and stat catch SIGINT (and others) without altering
their exit state. This means that things like:

   while :; do perf stat ./foo ; done

Loops become hard-to-interrupt, because bash never sees perf terminate
due to interruption. Fix this.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-10 16:55:27 +02:00
Ingo Molnar
e779898aa7 perf stat: Print out instructins/cycle metric
Before:

     7549326754  cycles               #    3201.811 M/sec
    10007594937  instructions         #    4244.408 M/sec

After:

     7542051194  cycles               #    3201.996 M/sec
    10007743852  instructions         #    4248.811 M/sec # 1.327 per cycle

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 18:14:46 +02:00
Ingo Molnar
743ee1f804 perf stat: Continue even on counter creation error
Before:

 $ perf stat ~/hackbench 5

 error: syscall returned with -1 (No such device)

After:

 $ perf stat ~/hackbench 5
 Time: 1.640

 Performance counter stats for '/home/mingo/hackbench 5':

    6524.570382  task-clock-ticks     #       3.838 CPU utilization factor
          35704  context-switches     #       0.005 M/sec
            191  CPU-migrations       #       0.000 M/sec
           8958  page-faults          #       0.001 M/sec
  <not counted>  cycles
  <not counted>  instructions
  <not counted>  cache-references
  <not counted>  cache-misses

 Wall-clock time elapsed:  1699.999995 msecs

Also add -v (--verbose) option to allow the printing of failed
counter opens.

Plus dont print 'inf' if wall-time is zero (due to jiffies granularity),
instead skip the printing of the CPU utilization factor.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 17:08:59 +02:00
Ingo Molnar
864709302a perf_counter tools: Move from Documentation/perf_counter/ to tools/perf/
Several people have suggested that 'perf' has become a full-fledged
tool that should be moved out of Documentation/. Move it to the
(new) tools/ directory.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 20:33:43 +02:00