copy_insn() fails with -EIO if ->readpage == NULL, but this error
is not propagated unless uprobe_register() path finds ->mm which
already mmaps this file. In this case (say) "perf record" does not
actually install the probe, but the user can't know about this.
Move this check into uprobe_register() so that this problem can be
detected earlier and reported to user.
Note: this is still not perfect,
- copy_insn() and arch_uprobe_analyze_insn() should be called
by uprobe_register() but this is not simple, we need vm_file
for read_mapping_page() (although perhaps we can pass NULL),
and we need ->mm for is_64bit_mm() (although this logic is
broken anyway).
- uprobe_register() should be called by create_trace_uprobe(),
not by probe_event_enable(), so that an error can be detected
at "perf probe -x" time. This also needs more changes in the
core uprobe code, uprobe register/unregister interface was
poorly designed from the very beginning.
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140519184054.GA6750@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/core improvements and fixes from Jiri Olsa:
* Warn the user when trace command is not available (Arnaldo Carvalho de Melo)
* Add warning when disabling perl scripting support due to missing devel files (Arnaldo Carvalho de Melo)
* Consider header files outside perf directory in tags target (Sebastian Andrzej Siewior)
* Allow overriding sysfs and proc finding with env var (Cody P Schafer)
* Fix "==" into "=" in ui_browser__warning assignment (zhangdianfang)
* Factor elide bool handling in sort code (Jiri Olsa)
* Fix poll return value propagation (Jiri Olsa)
* Fix 'make help' message error (Jianyu Zhan)
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently 'make help' message has such hint:
use "make prefix=<path> <install target>" to install to a particular
path like make prefix=/usr/local install install-doc
But this is misleading, when I specify "prefix=/usr/local", it has got no
respect at all.
This is because that, "DESTDIR" is considered first. In this case, "DESTDIR"
has an empty value, so "prefix" is honored. However, "prefix" is unconditionally
assigned to $HOME, regardless of what it is set to from command line. So our
"prefix" setting got no respect and the actual destination falls back to $HOME.
This patch fixes this issue and corrects the help message.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1401727474-19370-1-git-send-email-nasa4836@gmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
If the perf record command is interrupted in record__mmap_read_all
function, the 'done' is set and err has the latest poll return
value, which is most likely positive number (= number of pollfds
ready to read).
This 'positive err' is then propagated to the exit code, resulting
in not finishing the perf.data header properly, causing following
error in report:
# perf record -F 50000 -a
---
make the system real busy, so there's more chance
to interrupt perf in event writing code
---
^C[ perf record: Woken up 16 times to write data ]
[ perf record: Captured and wrote 30.292 MB perf.data (~1323468 samples) ]
# perf report --stdio > /dev/null
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
Fixing this by checking for positive poll return value
and setting err to 0.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1401732126-19465-1-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
After output/sort fields refactoring, it's expensive
to check the elide bool in its current location inside
the 'struct sort_entry'.
The perf_hpp__should_skip function gets highly noticable in
workloads with high number of output/sort fields, like for:
$ perf report -i perf-test.data -F overhead,sample,period,comm,pid,dso,symbol,cpu --stdio
Performance report:
9.70% perf [.] perf_hpp__should_skip
Moving the elide bool into the 'struct perf_hpp_fmt', which
makes the perf_hpp__should_skip just single struct read.
Got speedup of around 22% for my test perf.data workload.
The change should not harm any other workload types.
Performance counter stats for (10 runs):
before:
358,319,732,626 cycles ( +- 0.55% )
467,129,581,515 instructions # 1.30 insns per cycle ( +- 0.00% )
150.943975206 seconds time elapsed ( +- 0.62% )
now:
278,785,972,990 cycles ( +- 0.12% )
370,146,797,640 instructions # 1.33 insns per cycle ( +- 0.00% )
116.416670507 seconds time elapsed ( +- 0.31% )
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20140601142622.GA9131@krava.brq.redhat.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Pull perf/core improvements and fixes from Jiri Olsa:
* Add support to accumulate hist periods (Namhyung Kim)
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On stdio, there's a problem that it shows invalid values for
callchains in cumulated hist entries. It's because it only cares
about the self period. But with --children behavior, we always add
callchain info to the cumulated entries so it should use the value in
that case.
Before:
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ................
#
61.22% 0.32% swapper [kernel.kallsyms] [k] cpu_idle
|
--- cpu_idle
|
|--16530.76%-- start_secondary
|
|--2758.70%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--6837850969203030.00%-- [...]
After:
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ................
#
61.22% 0.32% swapper [kernel.kallsyms] [k] cpu_idle
|
--- cpu_idle
|
|--85.70%-- start_secondary
|
--14.30%-- rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1401335910-16832-24-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Add top.children config option for setting default value of
callchain accumulation. It affects the output only if one of
-g or --call-graph option is given as well.
A user can write .perfconfig file like below to enable accumulation
by default:
$ cat ~/.perfconfig
[top]
children = true
And it can be disabled through command line:
$ perf top --no-children
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1401335910-16832-22-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The new ->add_entry_cb() will be called after an entry was added to
the histogram. It's used for code sharing between perf report and
perf top. Note that ops->add_*_entry() should set iter->he properly
in order to call the ->add_entry_cb.
Also pass @arg to the callback function. It'll be used by perf top
later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/87k393g999.fsf@sejong.aot.lge.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Add report.children config option for setting default value of
callchain accumulation. It affects the report output only if
perf.data contains callchain info.
A user can write .perfconfig file like below to enable accumulation
by default:
$ cat ~/.perfconfig
[report]
children = true
And it can be disabled through command line:
$ perf report --no-children
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1401335910-16832-17-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Print accumulated stat of a hist entry if requested.
To do that, add new HPP_PERCENT_ACC_FNS macro and generate a
perf_hpp_fmt using it. The __hpp__sort_acc() function sorts entries
by accumulated period value. When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.
Also add "overhead_children" output field to be selected by user.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1401335910-16832-11-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
There're some duplicate code when adding hist entries. They are
different in that some have branch info or mem info but generally do
same thing. So introduce new struct hist_entry_iter and add callbacks
to customize each case in general way.
The new perf_evsel__add_entry() function will look like:
iter->prepare_entry();
iter->add_single_entry();
while (iter->next_entry())
iter->add_next_entry();
iter->finish_entry();
This will help further work like the cumulative callchain patchset.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1401335910-16832-3-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Pull uprobes fixes and changes from Oleg Nesterov:
" Denys found another nasty old bug in uprobes/x86: div, mul, shifts with
count in CL, and cmpxchg are not handled correctly.
Plus a couple of other minor fixes. Nobody acked the changes in x86/traps,
hopefully they are simple enough, and I believe that they make sense anyway
and allow to do more cleanups."
Signed-off-by: Ingo Molnar <mingo@kernel.org>