Linux performance counters with perf and perf-tools
Linux performance counters, also called perf_events, are queried and analyzed via perf command. The perf command is part of the linux-tools-common under Ubuntu. At first, perf_events were known as Performance Counters for Linux (PCL). They are used for performance analysis and reporting of performance counters and values, for example, special hardware directories of the CPUs (PMU). Therefore, events such as executed instructions or cache misses can be recorded and analyzed. Furthermore, the perf programs offer Tracing functionalities to evaluate kernel-functions. [1]
Unlike strace, for example, perf_events are known for not significantly impacting application performance during profiling. Further characteristics of perf are as follows:
- lots of sub-commands, that are specialized on one task
- workload-recording per thread, per process, per CPU or per thread, per process, per CPU or system-wide
- no running daemons, only tools directly on the command line
Documentation
Kernel documentation in the directory:
- tools/perf/Documentation/examples.txt
for example for 3.19.1:
- Kernel 3.19.1 perf (kernel.org)
Further useful information can be found here:
- perf Wiki tutorial (perf.wiki.kernel.org)
- The Unofficial Linux Perf Events Web-Page (web.eece.maine.edu)
- Modern CPU Performance Analysis on Linux (halobates.de)
- perf User Guide (google.com)
Overview of perf commands
In the following, the most important perf commands are listed:
| command | purpose |
|---|---|
| perf list | show list of events |
| perf stat | collect event statistics |
| perf record | create event profile (Profiling) |
| perf record -e 'syscalls:*' | Static Tracing |
| perf probe + perf record | Dynamic Tracing (Probes) |
| perf report | reporting of recorded records |
| perf top | collect performance counter in real time |
A whole list of the commands can be found in perf Wiki or by entering perf in the command line.
Useful perf one-liners
Further one-liners can be found under perf Examples - One Liners (brendangregg.com).
| perf command | purpose |
|---|---|
| perf top --pid 4766 | record Live perf_events Analyse on PID 4766 |
| perf record --pid 4766 | record samples of PID 4766 |
|
set probe and trace malloc in libc |
| perf stat -e cycles:u,instructions:u -a -C 1 | limited on CPU 1, measure CPU instructions |
| perf stat -e 'ext4:*' -a sleep 10 | collect ext4 events from the whole system for 10 seconds |
Live events with perf top
perf top displays current running events from the system or for a process ID in the traditional top style:
# perf top 20,90% libxul.so [.] 0x00000000008ec220 9,43% libv8.so [.] 0x000000000009afe3 5,19% perf [.] 0x0000000000056584 4,61% libblink_web.so [.] 0x000000000043c7d3 2,85% firefox [.] 0x000000000000eb64 [...] # perf top --pid 24410 86,05% python2.7 [.] PyEval_EvalFrameEx 3,33% python2.7 [.] list_ass_subscript.16933 2,49% python2.7 [.] rangeiter_next.21172 [..]
Filtering by events is also supported:
# perf top -p 25396 -e instructions Samples: 40K of event 'instructions', Event count (approx.): 37825818495 73,92% python2.7 [.] PyEval_EvalFrameEx 8,31% python2.7 [.] list_ass_subscript.16933 4,16% python2.7 [.] rangeiter_next.21172
List of events
The perf list command shows the supported and pre-defined events by perf that can be used with -e.
The commands can be divided into the following divisions:
- hardware events (PMUs, s.a. paragraph RAW CPU Counters)
- software events (for example Context Switches)
- tracepoints (lines in kernel code, that are equipped with event)
# perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] [...] xen:xen_cpu_write_gdt_entry [Tracepoint event] xen:xen_cpu_set_ldt [Tracepoint event] # perf list | wc -l 1403
Collecting statistics of programs
perf stat collects performance counter statistics. It provides a summary of the collected counters without stating a special event.
$ man perf-stat
- example for python programs:
# perf stat python numpy-matrix.py -i matrix.in
Performance counter stats for 'python numpy-matrix.py -i matrix.in':
532,132546 task-clock (msec) # 0,998 CPUs utilized
62 context-switches # 0,117 K/sec
2 cpu-migrations # 0,004 K/sec
9.736 page-faults # 0,018 M/sec
1.652.381.223 cycles # 3,105 GHz [83,37%]
746.873.941 stalled-cycles-frontend # 45,20% frontend cycles idle [83,47%]
379.979.967 stalled-cycles-backend # 23,00% backend cycles idle [66,97%]
2.776.008.940 instructions # 1,68 insns per cycle
# 0,27 stalled cycles per insn [83,50%]
574.792.729 branches # 1080,168 M/sec [83,47%]
3.423.864 branch-misses # 0,60% of all branches [82,72%]
0,533060251 seconds time elapsed
- Collect syscall events system-wide for 10 seconds and sum them up using tracepoints.
# perf stat -e 'syscalls:*' -a sleep 5
Record samples with record
Withperf record ,the events are summed up via tracepoint and program names:
# perf record -e 'syscalls:*' -a sleep 5 # perf report
You can also record a process that is already running:
# ps aux | grep firefox gschoenb 4766 12.5 13.2 2573504 1068276 ? Sl 07:11 25:35 /usr/lib/firefox/firefox # perf record --pid 4766 ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.407 MB perf.data (~61466 samples) ] # perf report
The -g option adds call-graphics of used functions to the recording.
perf-tools collection
perf-tools are a collection of userspace tools that use perf_events and ftrace. The tools are available at github
- perf-tools (github.com)
A presentation about the tools of the LISA2014 can be found here:
- Linux Performance Analysis New Tools and Old Secrets (slideshare.net)

# git clone https://github.com/brendangregg/perf-tools.git
The following tools are integrated:
perf-tools# find . -type f -executable | grep -v git ./net/tcpretrans ./killsnoop ./execsnoop ./opensnoop ./iosnoop ./misc/perf-stat-hist ./tools/reset-ftrace ./kernel/funcslower ./kernel/functrace ./kernel/funccount ./kernel/kprobe ./kernel/funcgraph ./disk/bitesize ./system/tpoint ./iolatency ./fs/cachestat ./syscount
RAW CPU counters
Advantages of performance monitoring units (PMU) CPU counters:[2]
- low overhead (implemented in hardware)
- low influence on other components (for example ALU)
- high resolution (HW-events that cannot be measured by SW)
As the PMUs are different for every CPU-microarchitecure, the counters/events are documented by the manufacturers:
- Performance monitoring event lists for Intel processors (download01.org)
- The README describes the meaning of the Event List Fields:
- perfmon README (download01.org)
There is another direct way for the local CPU for: libpfm4:[3]
# git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4 # cd libpfm4 # make # cd examples/ # ./showevtinfo # ./showevtinfo | grep LLC | grep MISSES # ./check_events LLC_MISSES | grep Codes Codes : 0x53412e
This RAW code can be included as event in perf:
# perf stat -e r53412e sleep 5
References
- ↑ Linux Performance Analysis: New Tools and Old Secrets (brendangregg.com)
- ↑ Hardware Performance Monitoring (cse.shirazu.ac.ir)
- ↑ How to monitor the full range of CPU performance events (bnikolic.co.uk)
Author: Georg Schönberger
|
Translator: Alina Ranzinger Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.
|

