1
Rethinking Stream Processing
with Apache Kafka:
Applications vs. Clusters,
Streams vs. Databases
Michael G. Noll, Confluent
@miguno
Google GDG DevFest Switzerland, October 28-29, 2017
2
0.11 Exactly-once
semantics
0.10 Data processing (Streams API)
0.9 Data integration (Connect API)
Intra-cluster
replication
0.8
2012 2014 2015 2016 2017
Cluster mirroring0.7
2013
Apache Kafka: birthed as a messaging system, now a streaming platform
3
4
5
6
7
8
9
,
10
11
12
13
(Does NOT run inside
the Kafka brokers!)
14
(Does NOT run inside
the Kafka brokers!)
15
16
17
18
http://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html
19
20
Before
21
Before
With Kafka’s
Streams API
22
KStream<Integer, Integer> input =
builder.stream("numbers-topic");
// Stateless computation
KStream<Integer, Integer> doubled =
input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.groupByKey()
.reduce((v1, v2) -> v1 + v2, "sum-of-odds");
class PrintToConsoleProcessor
implements Processor<K, V> {
@Override
public void init(ProcessorContext context) {}
@Override
void process(K key, V value) {
System.out.println("Got value " + value);
}
@Override
void punctuate(long timestamp) {}
@Override
void close() {}
}
23
24
Linux Windows
25
26
27
28
29
30
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
https://kafka.apache.org/documentation/streams#streams_duality
31
32
33
34
35
36
37
38
39
40
41
42
43
…and many more…
44
…and many more…
45
46
47
Kafka 1.0*
2016 2017
First release of Kafka’s
Streams API (0.10.0)
today
Kafka Streams API in the wild In production at LINE Corp., Japan
220+ million active users, processing millions of msg/s
“Applying Kafka Streams for internal message delivery pipeline”
https://engineering.linecorp.com/en/blog/detail/80
48
49Supported since Apache Kafka 0.11 (June 2017)
50
51
52
53
54
55
56
57
58
…and more…
59
60
$ curl -sXGET http://localhost:7070/kafka-music/charts/top-five
[
{
"artist": "Subhumans",
"album": "Live In A Dive",
"name": "All Gone Dead",
"plays": 126
},
{
"artist": "Wheres The Pope?",
"album": "PSI",
"name": "Fear Of God",
"plays": 115
},
...
]
61
62
63
64
https://kafka.apache.org/documentation/streams
http://docs.confluent.io/current/streams/
https://www.confluent.io/downloads/
65
KSQL: a Streaming SQL Engine for Apache Kafka™ from Confluent
ü No coding required, all you need is SQL
ü No separate processing cluster required
ü Powered by Kafka: elastic, scalable,
distributed, battle-tested
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.userid
WHERE u.level = ‘Platinum’;
KSQL is the simplest way to process streams of data in real-time
ü Perfect for streaming ETL, anomaly detection,
event monitoring, and more
ü Part of Confluent Open Source
https://github.com/confluentinc/ksql

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases (Google DevFest Switzerland 2017)