JEEConf 2019 | Let’s build a Java backend designed for a high load

Let’s build a Java backend designed for
a high load
Alex Moskvin
CTO@Plexteq

About myself
• CTO@Plexteq OÜ
• Ph.D in information technology area
• Interests
• Software architecture
• High loaded systems
• Everything under the hood
• AI/ML + BigData
• Knowledge sharing ;)
• Follow me
• https://twitter.com/amoskvin
• https://www.facebook.com/moskvin.aleksey
2

Agenda
1. Why does your Java based software fail to handle a high load
2. What could be done about it
3

High load
What load is to be considered as high?
4

High load
What clients do see:
- Low responsiveness of the service
- Sporadic errors
- Interrupted connections
5

High load
What sysops/devopsdoes see:
- Large iowait
- High CPU contention
- High RAM usage
- High number of open files (processes, sockets, threads)
- High contention on target data storage
- High GC activity
6

High load
What application is actually doing:
7

Problem
Software aggregator that gathers sensor data from multiple remote
locations
8

Problem :: key aspects
1. Interaction with
remote/external
services
2. Processing data from
remote/external
services
3. Storing data
4. Handling interaction
with clients
10

Distributed services
The 8 fallacies of distributed
computing
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn’t change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
13

Distributed services
Engineers cannot just ignore these issues,
they have to explicitly deal with them!
14

Distributed services :: example 1
15

Distributed services :: watch timeouts
Issue #1:
Code relies on default timeouts
Connect Read
16

Issue #1:
Code relies on default timeouts
Connect Read
By default timeouts are
INFINITE
17

Issue #1:
Solution
18

Profiling results
Case: 1 request per thread, remote’s HTTP server
thread pool is exhausted - 100 requests
Default (infinite connect timeout) 2500 ms connect timeout
19

Distributed services :: handle errors!
Issue #2:
No error handling
Solution
1. Handle them all
2. Check response code
• i.e. 429 is a requirement to stop sending requests
20

Distributed services :: fail fast
Issue #3:
Code relies on high availability of external service
Solutions:
1. Fail-fast
21

Distributed services :: circuit breaker
Fail-fast with circuit breaker
22

Distributed services :: circuit breaker
Stability pattern used when calling remote functions
23

Distributed services :: Circuit breaker
CLOSED OPEN HALF OPEN
If state = OPEN && grace period passed -> retry
If request succeeded -> CLOSED
Otherwise -> OPEN
Service is OK Service is FAILING
24

Distributed services :: Circuit breaker
Fail-fast with circuit breaker
25

Interaction with external services
100 threads concurrently
accessing remote failing resource
Default (infinite connect timeout) Circuit breaker + 2s timeout
26

Processing, offloading and storing data
27

Processing data
What is wrong here once again?
29

Processing data :: deserialization
Issue #1:
Deserialization is expensive with large payloads
30

Issue #1:
Consequences:
1. Heap saturation -> potential OOM
2. GC pauses -> high latencies
3. Deserialization is sequential -> thread is not
doing anything useful until deserialization
completes
31

Any ideas?
32

Processing data :: streaming deserialization
Issue #1:
Solutions:
1. Stream based processing
33

Your favorite deserializer more likely supports
streaming processing already!
34

XML
• SAX (Streaming API for XML)
JSON
• Gson
• Jackson
35

36

Deserialization :: profiling
Test: 500Mb JSON payload
All-at-once deserialization
37

Deserialization :: profiling
Test: 500Mb JSON payload
Stream-based
38

Storing data
Okay, so now we need to store data somewhere.
39

Storing data
Problem:
1. Your data set will grow when you won’t expect
40

Storing data
Our recommendation:
1. Domain model and entity relationships
2. Scaling strategy for your data
3. Approach for achieving optimal read/write
performance
41

Storing data
In-heap ACID compliant BASE compliant
42

Storing data
In-heap ACID compliant BASE compliant
Time series, monitoring dataCustomer billing dataMonitored device online status
(deviceID:status)
Our problem:
43

Storing data
In-heap storage. Issues:
1. GC overhead
2. Heap usage
44

Storing data
In-heap storage. Solution:
1. Store payloads off-heap (i.e. Chronicle-Map)
45

Storing data
ConcurrentHashMap (JRE)
1,5M keys, 5M concurrent total write ops
46

Storing data
ChronicleMap
1,5M keys, 5M concurrent total write ops
47

Storing data
• No GC involved
• Low heap usage (10x time smaller)
• Map is shared between multiple JVM processes
• Map could be replicated across multiple nodes
(commercial feature)
48

Storing data
It’s easy-peasy:
49

Storing data
Relational storage. Issues:
1. ACID is expensive
2. Operations are blocking
3. Connection pool is limited
50

Storing data
Relational storage. Solution:
1. Avoid pessimistic locking
2. Transaction isolation > Read Committed is a no-go
3. Scale up!
4. Connection pool implementation matters (HikariCP)
5. Connection pool sizes on app side and on DB size must
correlate
6. Use indices wisely
7. Know your execution plans!
51

Handling client requests
We deal with a thread
pool which is constrained
55

Let’s do some stuff asynchronously!
56

Let’s do some stuff asynchronously!
57

https://docs.spring.io/spring-framework/docs/current/javadoc-
api/org/springframework/scheduling/annotation/EnableAsync.html
By default, Spring will be searching for an associated thread pool
definition: either a unique TaskExecutorbean in the context, or an
Executor bean named "taskExecutor" otherwise. If neither of the two
is resolvable, a SimpleAsyncTaskExecutor will be used to process
async method invocations.
https://docs.spring.io/spring-framework/docs/current/javadoc-
api/org/springframework/core/task/SimpleAsyncTaskExecutor.html
NOTE: This implementation does not reuse threads! Consider a
thread-pooling TaskExecutorimplementation instead, in particular for
executing a large number of short-lived tasks.
60

Pros:
1. System is more predictable
• Target resource (i.e. MongoDB) load is managed and constrained
with a thread pool (a kind of DOS protection)
• Thread allocation is managed and constrained
63

Okay, but what if we need to reply with some data?
64

Servlet 3.0 came up with the startAsync
method that returns a context
https://docs.oracle.com/javaee/7/tutorial/servlets012.htm
65

Servlet 3.0 66

Spring 67

Then probably this will work as well?
68

No, it does not
71

Hmm, maybe streaming then?
72

JEEConf 2019 | Let’s build a Java backend designed for a high load

More Related Content

What's hot

Similar to JEEConf 2019 | Let’s build a Java backend designed for a high load

Recently uploaded

JEEConf 2019 | Let’s build a Java backend designed for a high load