The Art of
JVM Profiling
Andrei Pangin
Vadim Tsesko
2017
2
http://recetasfamilia.com/escala-scoville/
● 48 M DAU
● 8500 machines in 4 DC
● 1.2 Tb/s
● Up to 70 К QPS/server
● 99% < 100 ms
Одноклассники
3
Java
Profilers
0
• IO (disk, network)
• syscalls
• Synchronization
• SQL queries
• …
What to profile?
5
How to profile?
Instrumenting
• Trace method transitions
• Measure/count
• Slooow
Sampling
• Snapshot state
• Periodic
• Suitable for PROD
6
Thread Dump
1
Java
Thread.getAllStackTraces()
How does it work?
Native (JVM TI)
GetAllStackTraces()
class StackTraceElement {
String declaringClass;
String methodName;
String fileName;
int lineNumber;
}
struct {
jmethodID method;
jlocation location;
}
Overhead
9
• 1K threads ~ 10 MB
• up to 50 ms
• Simple
• All Java platforms and versions
• No JVM options needed
• VisualVM, Java Mission Control, YourKit, JProfiler ...
Advantages
10
DEMO
11
memory
access
safepoint
start
Safepoint
return
math
loop
Thread 1 Thread 2
safepoint
request
12
memory
access
safepoint
start
Safepoint
return
math
loop
Thread 1 Thread 2
safepoint
request
Are we there yet?
for-loop
public Theme getThemeById(Long id) {
for (int i = 0; i < themes.length; i++) {
if (id.equals(themes[i].getId())) {
return themes[i];
}
}
return null;
}
-XX:+UseCountedLoopSafepoints
• Useless
• Unreliable
• It happens
https://jug.ru/2016/05/андрей-паньгин-всё-что-вы-хотели-зна/
http://psy-lob-saw.blogspot.ru/2016/02/why-most-sampling-java-profilers-are.html
Safepoints make profiling
15
DEMO
16
• All threads
• Native → RUNNABLE
• How to interpret?
Off CPU
17
Problems
● Safepoints
● Off CPU
● Native
Can we do better?
18
AsyncGetCallTrace
2
• Oracle Developer Studio
• github.com/jvm-profiling-tools/honest-profiler
• github.com/apangin/async-profiler
How does it work?
20
AsyncGetCallTrace(ASGCT_CallTrace *trace,
jint depth,
void* ucontext)
from signal handler
itimer() + SIGPROF
DEMO
21
• Not limited to safepoints
-XX:+DebugNonSafepoints
• Active threads
• All Java: interpreted, compiled, inlined
Advantages
22
Windows
Native
JVM (GC, compiler…)
Disadvantages
23
DEMO
24
Problems
enum {
ticks_no_Java_frame = 0,
ticks_no_class_load = -1,
ticks_GC_active = -2,
ticks_unknown_not_Java = -3,
ticks_not_walkable_not_Java = -4,
ticks_unknown_Java = -5,
ticks_not_walkable_Java = -6,
ticks_unknown_state = -7,
ticks_thread_exit = -8,
ticks_deopt = -9,
ticks_safepoint = -10
}; src/share/vm/prims/forte.cpp
25
Inconsistent frame
mov %eax,-0x6000(%rsp)
push %rbp
sub $0x30,%rsp
mov 0xc(%rdx),%eax
add $0x30,%rsp
pop %rbp
test %eax,-0x12345a(%rip)
retq
public int getX() {
return x;
}
Epilogue
Prologue
26
Workaround
1. Fix SP, IP
2. Retry AsyncGetCallTrace()
unknown_Java < 0.05%
bugs.openjdk.java.net/browse/JDK-8178287
27
DEMO
28
Visualization
3
Flat
30
Tree
31
DEMO
32
● brendangregg.com
Problems
● Safepoints
● Off CPU
● Native
Can we do better?
33
Perf Events
4
PMU
• HW Events
• Cycles, instructions
• Cache misses, branch misses
HW
interrupts
• SW events
• CPU clock
• Page faults
• Context switches
35
• Linux syscall
• fd → counter
• mmap page → samples
perf_event_open()
S
S
• Samples
• pid, tid
• CPU registers
• Call chain (user + kernel)
36
perf
$ perf record –F 999 java ...
$ perf report
4.70% java [kernel.kallsyms] [k] clear_page_c
2.10% java libpthread-2.17.so [.] pthread_cond_wait
1.97% java libjvm.so [.] Unsafe_Park
1.40% java libjvm.so [.] Parker::park
1.31% java [kernel.kallsyms] [k] try_to_wake_up
1.31% java perf-18762.map [.] 0x00007f8510e9e757
1.21% java perf-18762.map [.] 0x00007f8510e9e89e
1.17% java perf-18762.map [.] 0x00007f8510e9cc17
perf.wiki.kernel.org/index.php/Tutorial
37
Java symbols
38
● No symbols for JITted code
● /tmp/perf-<pid>.map
7fe0e9117220 80 java.lang.Object::<init>
7fe0e91175e0 140 java.lang.String::hashCode
7fe0e9117900 20 java.lang.Math::min
7fe0e9117ae0 60 java.lang.String::length
7fe0e9117d20 180 java.lang.String::indexOf
JVM TI
39
CompiledMethodLoad() // Compiled Java
DynamicCodeGenerated() // VM Runtime
$ java -agentpath:/usr/lib/libperfmap.so …
github.com/jrudolph/perf-map-agent
0
ret1
prev BP
ret2
prev BP BP
SP
BP
SP
method 1
method 2
current method
Native stack IP
40
ret1
ret2
SP
Java stack
BP
-XX:+PreserveFramePointer
41
DEMO
42
$ perf record -F $HZ -o $RAW -g -p $PID -- sleep $SEC
$ FlameGraph/stackcollapse-perf.pl $PERF > $STACKS
$ FlameGraph/flamegraph.pl $STACKS > $SVG
$ perf script -i $RAW > $PERF
github.com/brendangregg/FlameGraph
App startup
WTF?!
Hot interpreter
com.maxmind.geoip.RegionName::regionNameByCode
44
Poor GeoIP library
if (country_code.equals("RU")) {
switch (region_code) {
case 1:
name = "Adygeya";
break;
case 2:
name = "Aginsky Buryatsky AO";
break;
case 3:
name = "Gorno-Altaysk";
break;
... -XX:-DontCompileHugeMethods
45
Disadvantages
46
● No interpreted Java
● -XX:+PreserveFramePointer
● Java ≥ 8u60
● JIT recompile
● /proc/sys/kernel/perf_event_paranoid
● Limited stack depth
● Unstable (many threads)
● Big data :)
Full-stack Profiler
5
Ideal profiler
• Kernel + native stacks
• HW counters
• Full Java stack
• Fast and simple
perf_event_open
AsyncGetCallTrace
48
Put together
java.io.FileInputStream::readBytes
java.io.FileInputStream::read
JavaApp::main
readBytes
system_call_fastpath
sys_read
xfs_file_aio_read
perf
S
S
AsyncGetCallTrace
SIGIO
fcntl(): signal owner = this thread
Issues
• Stack merge point
• Online aggregation
• Native symbols
• Event-per-thread
• ulimit –n
• /proc/sys/kernel/perf_event_mlock_kb
• Concurrency
50
Case: file reading
byte[] buf = new byte[bufSize];
try (FileInputStream in = new FileInputStream(fileName)) {
int bytesRead;
while ((bytesRead = in.read(buf)) > 0) {
...
}
}
Buffer size?
❑ 8 K
❑ 64 K
❑ 250 K
❑ 1 M
❑ 4 M
51
Full-stack profile
52
Full-stack profile
53
Read buffer: 260K => 250K
AsyncGCT Perf Full-stack Profiler
Java stack Yes No interpreted Yes
Native stack No Yes Yes
Kernel stack No Yes Yes
JDK version 6+ 8u60+ 6+
Idle overhead 0 2-5% 0
Online aggregation Yes No Yes
Stable Yes No Yes
54
Performance problem?
CPU utilization?
Thread dump
AsyncGetCallTrace
Perf
Full-stack profiler
Future improvements
55
Try it
• github.com/apangin/async-profiler
• Contributions are welcome!
56
Contacts
57
Andrei Pangin
@AndreiPangin
Vadim Tsesko
@incubos
https://v.ok.ru/vacancies.html
The Art of JVM Profiling

The Art of JVM Profiling