Java in Flames
Flame graphs: Visualization of profiled software
M. Isuru Tharanga Chrishantha Perera
Technical Lead at WSO2, Co-organizer of Java Colombo Meetup
Profiling Software
● Profiling can help you to analyze the performance of your applications and
improve poorly performing sections in your code
Java Profiling Tools Available in JDK
● Java VisualVM
● Java Mission Control
Other Java Profiling Tools
● JProfiler - A commercially licensed Java profiling tool developed by
ej-technologies
● Honest Profiler - A sampling JVM profiler without the safepoint sample bias
● Async Profiler - Sampling CPU and HEAP profiler for Java featuring
AsyncGetCallTrace + perf_events
Java Profiling Tools
Survey by RebelLabs in 2016: http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html
Attitude toward performance work
Survey by RebelLabs in 2017:
https://zeroturnaround.com/rebellabs/developer-productivity-survey-2017/
Measuring Methods for CPU Profiling
Sampling: Monitor running code externally and check which code is executed
Instrumentation: Include measurement code into the real code
Sampling
main()
foo()
bar()
Instrumentation
main()
foo()
bar()
Sampling vs. Instrumentation
Sampling:
● Overhead depends on the sampling interval
● Can see execution hotspots
● Can miss methods, which returns faster than the sampling interval.
Instrumentation:
● Precise measurement for execution times
● More data to process
How Profilers Work?
● Generic profilers rely on the JVMTI spec
● JVMTI offers only safepoint sampling stack trace collection options
● Some profilers use AsyncGetCallTrace method, which is an OpenJDK internal
API call to facilitate non-safepoint collection of stack traces
Safepoints
● A safepoint is a moment in time when a thread’s data, its internal state and
representation in the JVM are, well, safe for observation by other threads in
the JVM.
○ Between every 2 bytecodes (interpreter mode)
○ Backedge of non-’counted’ loops
○ Method exit
○ JNI call exit
Flame Graphs
● “Flame graphs are a visualization of profiled software, allowing the most
frequent code-paths to be identified quickly and accurately.”
● Developed by Brendan Gregg, an industry expert in computing performance
and cloud computing.
● Flame Graphs can be generated using
https://github.com/brendangregg/FlameGraph
○ This creates an interactive SVG
http://www.brendangregg.com/flamegraphs.html
Flame Graph Example
Flame Graph: Definition
● The x-axis shows the stack profile population, sorted alphabetically
● The y-axis shows stack depth
○ The top edge shows what is on-CPU, and beneath it is its ancestry
● Each rectangle represents a stack frame.
● Box width is proportional to the total time a function was profiled directly or
its children were profiled
Types of Flame Graphs
● CPU - see which code-paths are hot (busy on-CPU)
● Memory - Memory Leak (and Growth)
● Off-CPU - Time spent by processes and threads when they are not running
on-CPU
● Hot/Cold - both CPU and Off-CPU
● Differential - compare before and after flame graphs
Why do we need Flame Graphs?
● Finding out why CPUs are busy is an important task when troubleshooting
performance issues
● Can use a sampling profiler to see which code-paths are hot.
● Usually a profiler will dump a lot of data with thousands of lines
● Flame Graph can simply visualize the stack traces output of a sampling
profiler.
Naive Profiling: Taking Thread Dumps
● “A thread dump is a snapshot of the state of all threads that are part of the
process.”
● The state of the thread is represented with a stack trace.
● A thread can be in only one state at a given point in time.
● You can take thread dumps at regular intervals to do “Naive Java Profiling”
Sample program to profile
● Get Sample “highcpu” program from
https://github.com/chrishantha/sample-java-programs
● mvn clean install
● cd highcpu
● java -jar target/highcpu.jar --help
Flame Graph with Thread Dumps
i=0; while (( i++ < 30 )); do jstack $(pgrep -f highcpu) >>
out.jstacks; sleep 2; done
cat out.jstacks | $FLAMEGRAPH_DIR/stackcollapse-jstack.pl >
out.stacks-folded
cat out.stacks-folded | $FLAMEGRAPH_DIR/flamegraph.pl >
jstack_flamegraph.svg
firefox jstack_flamegraph.svg
Flame Graph with Thread Dumps
Flame Graph with Thread Dumps (Without
Thread Names) Top edge shows the methods
on-CPU directly
Visually compare lengths
AncestryCode path
Branches
Flame Graphs with Java Flight Recordings
● We can generate CPU Flame Graphs from a Java Flight Recording
● Program is available at GitHub:
https://github.com/chrishantha/jfr-flame-graph
● The program uses the (unsupported) JMC Parser
Java Flight Recorder (JFR)
● A profiling and event collection framework built into the Oracle JDK
● Gather low level information about the JVM and application behaviour without
performance impact (less than 2%)
● Always on Profiling in Production Environments
● Engine was released with Java 7 update 4
● Commercial feature in Oracle JDK
● A main tool in Java Mission Control (since Java 7 update 40)
Generating a Flame Graph using JFR dump
● JFR has Method Profiling Samples
○ You can view those in “Hot Methods” and “Call Tree” tabs
● A Flame Graph can be generated using these Method Profilings Samples
● Use following to improve the accuracy of JFR Method Profiler.
● -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
Profiling the Sample Program
● Get a Profiling Recording
○ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
-XX:StartFlightRecording=delay=10s,duration=1m,name=Profiling,filena
me=highcpu_profiling.jfr,settings=profile -jar target/highcpu.jar
--hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
Tree View (in JFR)
Using jfr-flame-graph
create_flamegraph.sh -f highcpu_profiling.jfr -i > jfr_flamegraph.svg
Java Mixed-Mode Flame Graphs
● With Java Profilers, we can get information about Java process only.
● However with Java Mixed-Mode Flame Graphs, we can see how much CPU
time is spent in Java methods, system libraries and the kernel.
● Mixed-mode means that the Flame Graph shows profile information from
both system code paths and Java code paths.
Linux Perf (perf_events)
● System profiler
● Userspace + Kernel
Installing “perf_events” on Ubuntu
● On terminal, type perf
● sudo apt install linux-tools-common
● sudo apt install linux-tools-generic
The Problem with Java and Perf
● perf needs the Java symbol table. JVM doesn’t preserve frame pointers by
default.
● Run sample program
○ java -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
--exit-timeout 300
● Run perf record
○ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60
● Display trace output
○ sudo perf script
No Java Frames!
Preserving Frame Pointers in JVM
● Run java program with the JVM flag "-XX:+PreserveFramePointer"
○ java -XX:+PreserveFramePointer -jar target/highcpu.jar --hashing-algo SHA-512
--hashing-workers 20 --math-workers 10 --exit-timeout 300
● This flag is working only on JDK 8 update 60 and above.
● Some frames may be still missing when compared to Flame Graphs
generated from JFR or jstack due to “inlining”.
● Can reduced the amount of inlining if you need to see more frames in the
profile.
○ For example, -XX:InlineSmallCode=500
How to generate Java symbol table
● Use a java agent to generate method mappings to use with the linux `perf`
tool
○ Clone & Build https://github.com/jvm-profiling-tools/perf-map-agent
● Create symbol map
○ ./create-java-perf-map.sh `pgrep -f highcpu`
● You can also use “jmaps” tool in FlameGraph repository to create symbol files
for all Java processes.
○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent
○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps
● Let Java to “warm-up” before getting symbol maps.
Generate Java Mixed-Mode Flame Graph
● Run perf and create symbol map
○ export
AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent
○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E
$FLAMEGRAPH_DIR/jmaps
● Generate Flame Graph
○ sudo perf script -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace | 
○ stackcollapse-perf.pl --pid | grep java-`pgrep -f highcpu` | 
○ flamegraph.pl --color=java --hash --width 1080 >
java-mixed-mode.svg
○ firefox java-mixed-mode.svg
Java Mixed-Mode Flame Graph
Java Mixed-Mode Flame Graph for Netty
Java Mixed-Mode Flame Graph
● Helps to understand Java CPU Usage
● With Flame Graphs, we can see both java and system profiles
● Can profile GC as well
Thank you!
Any questions?

Java in flames

  • 1.
    Java in Flames Flamegraphs: Visualization of profiled software M. Isuru Tharanga Chrishantha Perera Technical Lead at WSO2, Co-organizer of Java Colombo Meetup
  • 2.
    Profiling Software ● Profilingcan help you to analyze the performance of your applications and improve poorly performing sections in your code
  • 3.
    Java Profiling ToolsAvailable in JDK ● Java VisualVM ● Java Mission Control
  • 4.
    Other Java ProfilingTools ● JProfiler - A commercially licensed Java profiling tool developed by ej-technologies ● Honest Profiler - A sampling JVM profiler without the safepoint sample bias ● Async Profiler - Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
  • 5.
    Java Profiling Tools Surveyby RebelLabs in 2016: http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html
  • 6.
    Attitude toward performancework Survey by RebelLabs in 2017: https://zeroturnaround.com/rebellabs/developer-productivity-survey-2017/
  • 7.
    Measuring Methods forCPU Profiling Sampling: Monitor running code externally and check which code is executed Instrumentation: Include measurement code into the real code
  • 8.
  • 9.
  • 10.
    Sampling vs. Instrumentation Sampling: ●Overhead depends on the sampling interval ● Can see execution hotspots ● Can miss methods, which returns faster than the sampling interval. Instrumentation: ● Precise measurement for execution times ● More data to process
  • 11.
    How Profilers Work? ●Generic profilers rely on the JVMTI spec ● JVMTI offers only safepoint sampling stack trace collection options ● Some profilers use AsyncGetCallTrace method, which is an OpenJDK internal API call to facilitate non-safepoint collection of stack traces
  • 12.
    Safepoints ● A safepointis a moment in time when a thread’s data, its internal state and representation in the JVM are, well, safe for observation by other threads in the JVM. ○ Between every 2 bytecodes (interpreter mode) ○ Backedge of non-’counted’ loops ○ Method exit ○ JNI call exit
  • 13.
    Flame Graphs ● “Flamegraphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.” ● Developed by Brendan Gregg, an industry expert in computing performance and cloud computing. ● Flame Graphs can be generated using https://github.com/brendangregg/FlameGraph ○ This creates an interactive SVG http://www.brendangregg.com/flamegraphs.html
  • 14.
  • 15.
    Flame Graph: Definition ●The x-axis shows the stack profile population, sorted alphabetically ● The y-axis shows stack depth ○ The top edge shows what is on-CPU, and beneath it is its ancestry ● Each rectangle represents a stack frame. ● Box width is proportional to the total time a function was profiled directly or its children were profiled
  • 16.
    Types of FlameGraphs ● CPU - see which code-paths are hot (busy on-CPU) ● Memory - Memory Leak (and Growth) ● Off-CPU - Time spent by processes and threads when they are not running on-CPU ● Hot/Cold - both CPU and Off-CPU ● Differential - compare before and after flame graphs
  • 17.
    Why do weneed Flame Graphs? ● Finding out why CPUs are busy is an important task when troubleshooting performance issues ● Can use a sampling profiler to see which code-paths are hot. ● Usually a profiler will dump a lot of data with thousands of lines ● Flame Graph can simply visualize the stack traces output of a sampling profiler.
  • 18.
    Naive Profiling: TakingThread Dumps ● “A thread dump is a snapshot of the state of all threads that are part of the process.” ● The state of the thread is represented with a stack trace. ● A thread can be in only one state at a given point in time. ● You can take thread dumps at regular intervals to do “Naive Java Profiling”
  • 19.
    Sample program toprofile ● Get Sample “highcpu” program from https://github.com/chrishantha/sample-java-programs ● mvn clean install ● cd highcpu ● java -jar target/highcpu.jar --help
  • 20.
    Flame Graph withThread Dumps i=0; while (( i++ < 30 )); do jstack $(pgrep -f highcpu) >> out.jstacks; sleep 2; done cat out.jstacks | $FLAMEGRAPH_DIR/stackcollapse-jstack.pl > out.stacks-folded cat out.stacks-folded | $FLAMEGRAPH_DIR/flamegraph.pl > jstack_flamegraph.svg firefox jstack_flamegraph.svg
  • 21.
    Flame Graph withThread Dumps
  • 22.
    Flame Graph withThread Dumps (Without Thread Names) Top edge shows the methods on-CPU directly Visually compare lengths AncestryCode path Branches
  • 23.
    Flame Graphs withJava Flight Recordings ● We can generate CPU Flame Graphs from a Java Flight Recording ● Program is available at GitHub: https://github.com/chrishantha/jfr-flame-graph ● The program uses the (unsupported) JMC Parser
  • 24.
    Java Flight Recorder(JFR) ● A profiling and event collection framework built into the Oracle JDK ● Gather low level information about the JVM and application behaviour without performance impact (less than 2%) ● Always on Profiling in Production Environments ● Engine was released with Java 7 update 4 ● Commercial feature in Oracle JDK ● A main tool in Java Mission Control (since Java 7 update 40)
  • 25.
    Generating a FlameGraph using JFR dump ● JFR has Method Profiling Samples ○ You can view those in “Hot Methods” and “Call Tree” tabs ● A Flame Graph can be generated using these Method Profilings Samples ● Use following to improve the accuracy of JFR Method Profiler. ● -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
  • 26.
    Profiling the SampleProgram ● Get a Profiling Recording ○ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=10s,duration=1m,name=Profiling,filena me=highcpu_profiling.jfr,settings=profile -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
  • 27.
  • 28.
    Using jfr-flame-graph create_flamegraph.sh -fhighcpu_profiling.jfr -i > jfr_flamegraph.svg
  • 29.
    Java Mixed-Mode FlameGraphs ● With Java Profilers, we can get information about Java process only. ● However with Java Mixed-Mode Flame Graphs, we can see how much CPU time is spent in Java methods, system libraries and the kernel. ● Mixed-mode means that the Flame Graph shows profile information from both system code paths and Java code paths.
  • 30.
    Linux Perf (perf_events) ●System profiler ● Userspace + Kernel
  • 31.
    Installing “perf_events” onUbuntu ● On terminal, type perf ● sudo apt install linux-tools-common ● sudo apt install linux-tools-generic
  • 32.
    The Problem withJava and Perf ● perf needs the Java symbol table. JVM doesn’t preserve frame pointers by default. ● Run sample program ○ java -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● Run perf record ○ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60 ● Display trace output ○ sudo perf script
  • 33.
  • 34.
    Preserving Frame Pointersin JVM ● Run java program with the JVM flag "-XX:+PreserveFramePointer" ○ java -XX:+PreserveFramePointer -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● This flag is working only on JDK 8 update 60 and above. ● Some frames may be still missing when compared to Flame Graphs generated from JFR or jstack due to “inlining”. ● Can reduced the amount of inlining if you need to see more frames in the profile. ○ For example, -XX:InlineSmallCode=500
  • 35.
    How to generateJava symbol table ● Use a java agent to generate method mappings to use with the linux `perf` tool ○ Clone & Build https://github.com/jvm-profiling-tools/perf-map-agent ● Create symbol map ○ ./create-java-perf-map.sh `pgrep -f highcpu` ● You can also use “jmaps” tool in FlameGraph repository to create symbol files for all Java processes. ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Let Java to “warm-up” before getting symbol maps.
  • 36.
    Generate Java Mixed-ModeFlame Graph ● Run perf and create symbol map ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Generate Flame Graph ○ sudo perf script -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace | ○ stackcollapse-perf.pl --pid | grep java-`pgrep -f highcpu` | ○ flamegraph.pl --color=java --hash --width 1080 > java-mixed-mode.svg ○ firefox java-mixed-mode.svg
  • 37.
  • 38.
    Java Mixed-Mode FlameGraph for Netty
  • 39.
    Java Mixed-Mode FlameGraph ● Helps to understand Java CPU Usage ● With Flame Graphs, we can see both java and system profiles ● Can profile GC as well
  • 40.