Build Fast, Scalable
App Monitoring with
Open Source
Robert Hodges - Altinity
Roman Khavronenko - VictoriaMetrics
1
Let’s make some introductions
2
Robert Hodges
Database geek with 30+ years
on DBMS systems. Day job:
CEO at Altinity
Roman Khavronenko
Distributed systems and
monitoring engineer. Day job:
SE at VictoriaMetrics
What is
application
monitoring?
3
Monitoring is for answering questions
● Why users are getting errors?
● When it started?
● How many users are affected?
● Which service is failing?
4
To get an answer to the question you need 3 things
1. The question
2. The information to process
3. The respondent
5
6
7
8
9
10
Using
VictoriaMetrics
11
VictoriaMetrics - Open Source Time Series Database & Monitoring Solution
● Vertically and horizontally scalable
● Operational simplicity
● Cost-efficient
● Prometheus compatible
● Free forever
12
VictoriaMetrics - Open Source Time Series Database & Monitoring Solution
● Kubernetes monitoring
● Hardware and infrastructure monitoring
● Application Performance Monitoring (APM)
● IoT
● Edge computing
● Alerting
13
14
Metric is a numeric measure or observation of something:
● Number of served requests
● Requests latency
● CPU or memory usage
● Occupied or free disk space
What is a metric?
15
Metrics structure
16
Storage for metrics
17
● VictoriaMetrics data model is schemaless
● No need to define metric names or their labels in advance
● User is free to add or change ingested metrics anytime.
Storage for metrics
18
OSA Con 2021: How ClickHouse Inspired Us
to Build a High Performance TSDB
● VictoriaMetrics is specialized solution for time series data
● Compression reaches 0.4 Bytes per sample
● Ingestions speed 300k samples/s per CPU core
● Scanning speed 50Mil samples/s per CPU core
19
> curl https://my.application/metrics
requests_total{path="/",code="200"}10
requests_total{path="/",code="240300"}1
20
> curl -d "requests_total{path="/",code="200"} 10" -X POST
http://victoriametrics/api/v1/import/prometheus
21
More than one protocol for metrics
● Prometheus remote write API.
● Prometheus text exposition format.
● DataDog protocol.
● InfluxDB line protocol over HTTP, TCP and UDP.
● Graphite plaintext protocol with tags.
● OpenTSDB put message.
● HTTP OpenTSDB /api/put requests.
● JSON line format.
● Arbitrary CSV data.
22
Querying via MetricsQL
23
Querying via MetricsQL
24
Demo time!
● Run VictoriaMetrics
● Write some metrics
● Execute read queries
25
Frequently asked questions
● Can I monitor MySQL Server, Postgres, MongoDB, ClickHouse?
○ Yes, there are plenty of exporters, dashboards and alerting rules there.
● Can I monitor my applications?
○ Yes, there are libraries for multiple programming languages to instrument the application with
metrics.
● How expensive monitoring is?
○ With VictoriaMetrics, cost of storing metrics from 100 instances, each instance emits 1000
metrics every 30s for the total cost will be:
■ 100GB of disk space $0.045 per GB-month: 100*0.045*12 = $54
■ One t3.medium instance, $0.0418 per hour: 0.0418*730*12 = $366
■ Total: $420 per year for monitoring 100 instances.
● Can I run it in Kubernetes?
○ Sure! We have k8s operator and helm charts for VictoriaMetrics!
26
Using
ClickHouse
27
ClickHouse: a real-time analytic database
It understands SQL
It’s Apache 2.0
It handles many use cases beyond monitoring
It also handles time series data very well
28
ClickHouse optimizes for fast response on large datasets
29
Highly compressed column
storage with indexing
Automatic replication
between nodes
SELECT host, avg(idle)
FROM vmstat GROUP BY host
Parallelized/vectorized
query
Table replica
ClickHouse can load millions of events per second
30
Unaggregated
event data
Source data table(s)
Parallel load
Event
Queue
(Kafka)
Custom
Application
Data Lake
(S3, HDFS)
Precomputed
aggregates
Precomputed
aggregates
Precomputed
aggregates
Materialized views
Instantly queryable
…And supports [many] dozens of input formats
31
INSERT INTO some_table Format <format>
TabSeparated
TabSeparatedWithNames
CSV
CSVWithNames
CustomSeparated
Values
JSON
JSONEachRow
Protobuf
Parquet
...
There are many ways to store and manipulate time data
Date -- Precision to day
DateTime -- Precision to second
DateTime64 -- Precision to
nanosecond
toYear(), toMonth(), toWeek(),
toDayOfWeek, toDay(), toHour(), ...
toStartOfYear(), toStartOfQuarter(),
toStartOfMonth(), toStartOfHour(),
toStartOfMinute(), …, toStartOfInterval()
toYYYYMM()
toYYYYMMDD()
toYYYYMMDDhhmmsss()
And many more!
32
BI tools like Grafana like
DateTime values
Let’s build a simple host monitoring system
33
$ vmstat 1 -n
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 166912 2645740 36792 3360652 0 0 3 101 1 1 2 1 98 0 0
1 0 166912 2645360 36792 3360652 0 0 0 0 1182 3986 7 1 93 0 0
ClickHouse
Grafana
Dashboard
Step 1: Generate vmstat data
34
#!/usr/bin/env python3
import datetime, json, socket, subprocess
host = socket.gethostname()
with subprocess.Popen(['vmstat', '-n', '1'], stdout=subprocess.PIPE) as proc:
proc.stdout.readline() # discard first line
header_names = proc.stdout.readline().decode().split()
values = proc.stdout.readline().decode()
while values != '' and proc.poll() is None:
dict = {}
dict['timestamp'] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
dict['host'] = host
for (header, value) in zip(header_names, values.split()):
dict[header] = int(value)
print(json.dumps(dict), flush=True)
values = proc.stdout.readline().decode()
Here’s the output
35
{"timestamp": "2023-01-22 18:13:16", "host": "logos3", "r": 0, "b":
0, "swpd": 166912, "free": 2523688, "buff": 41412, "cache": 3408292,
"si": 0, "so": 0, "bi": 3, "bo": 101, "in": 1, "cs": 0, "us": 2,
"sy": 1, "id": 98, "wa": 0, "st": 0}
{"timestamp": "2023-01-22 18:13:17", "host": "logos3", "r": 0, "b":
0, "swpd": 166912, "free": 2523696, "buff": 41412, "cache": 3408316,
"si": 0, "so": 0, "bi": 0, "bo": 216, "in": 1214, "cs": 4320, "us":
1, "sy": 1, "id": 98, "wa": 0, "st": 0}
{"timestamp": "2023-01-22 18:13:18", "host": "logos3", "r": 0, "b":
0, "swpd": 166912, "free": 2527120, "buff": 41412, "cache": 3408572,
"si": 0, "so": 0, "bi": 0, "bo": 0, "in": 1172, "cs": 4162, "us": 2,
"sy": 1, "id": 98, "wa": 0, "st": 0}
Step 2: Design a ClickHouse table to hold data
36
CREATE TABLE monitoring.vmstat (
timestamp DateTime,
day UInt32 default toYYYYMMDD(timestamp),
host String,
r UInt64, b UInt64, -- procs
swpd UInt64, free UInt64, buff UInt64, cache UInt64, -- memory
si UInt64, so UInt64, -- swap
bi UInt64, bo UInt64, -- io
in UInt64, cs UInt64, -- system
us UInt64, sy UInt64, id UInt64, wa UInt64, st UInt64 -- cpu
) ENGINE=MergeTree
PARTITION BY day
ORDER BY (host, timestamp)
Dimensions
Measurements
Step 3: Load data into ClickHouse
37
INSERT INTO vmstat Format JSONEachRow
E.g.
INSERT='INSERT%20INTO%20vmstat%20Format%20JSONEachRow'
cat vmstat.dat | curl -X POST --data-binary @- 
"http://logos3:8123/?database=monitoring&query=${INSERT}"
(Or a Python script)
Step 4: Build a Grafana dashboard to show results
38
ClickHouse data source for Grafana Altinity plugin for ClickHouse
After loading you can go crazy with analytical queries
39
SELECT host, count() AS loaded_minutes
FROM (
SELECT
toStartOfMinute(timestamp) AS minute, host, avg(100 - id) AS load
FROM monitoring.vmstat
WHERE timestamp > (now() - toIntervalDay(1))
GROUP BY minute, host HAVING load > 25
)
GROUP BY host ORDER BY loaded_minutes DESC
┌─host───┬─loaded_minutes─┐
│ logos3 │ 6 │
│ logos2 │ 5 │
└────────┴────────────────┘
2 hosts had > 25% load for at least
a minute in the last 24 hours
40
DEMO TIME!
Can ClickHouse store data in a “schemaless” way?
{{"timestamp":
"2023-01-23
19:53:14",
"host": "logos3",
...}
SQL Table
JSON
String
JSON String (“blob”) with
derived header values
One table can handle
many entity types!
41
More schemaless ways to store data
SQL Table
Array
of
Keys
Arrays: Header values
with key-value pairs
Array
of
Values
SQL Table
Map
with
Key/Values
Map: Header values &
key value pairs
SQL Table
JSON
Data
Type
JSON data type mapped to
column storage
42
Where is the software to build monitoring?
43
Event streaming
● Apache Kafka
● Apache Pulsar
● Vectorized Redpanda
ELT
● Apache Airflow
● Rudderstack
Rendering/Display
● Apache Superset
● Cube.js
● Grafana
Client Libraries
● C++ - ClickHouse CPP
● Golang - ClickHouse Go
● Java - ClickHouse JDBC
● Javascript/Node.js - Apla
● ODBC - ODBC Driver for ClickHouse
● Python - ClickHouse Driver, ClickHouse
SQLAlchemy
More client library links HERE
Kubernetes
● Altinity Operator for ClickHouse
Where can I find out more about ClickHouse?
ClickHouse official docs – https://clickhouse.com/docs/
Altinity Blog – https://altinity.com/blog/
Altinity Youtube Channel –
https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA
Altinity Knowledge Base – https://kb.altinity.com/
Meetups, other blogs, and external resources. Use your powers of Search!
44
Wrap-up
45
Comparing VictoriaMetrics and ClickHouse databases
VictoriaMetrics
Talks MetricsQL, PromQL, Graphite QL
Stores time series data
No explicit schema
Easy to load data using simple clients
Can pull data from Prometheus exporters and Kafka
Time-series specific functions and transformations
Integrates with any BI tool that speaks PromQL
Extremely fast and scalable
ClickHouse
Talks SQL
Stores any kind of data
Uses tables; many ways to represent data
Easy to load data using simple clients
Can pull data from Kafka and object storage
Versatile queries including JOIN and aggregation
Most BI tools have ClickHouse adapters
Extremely fast and scalable
46
Help for building monitoring systems that work
VictoriaMetrics Inc.
VictoriaMetrics Community
VictoriaMetrics Enterprise
VictoriaMetrics Managed platform
Altinity Inc.
Altinity.Cloud managed ClickHouse platform
Enterprise support for ClickHouse
Altinity Developer Academy classes
Altinity Stable Builds for ClickHouse
Altinity Kubernetes Operator for ClickHouse
47
Thank you!
Questions?
https://altinity.com
Contact Altinity
48
https://victoriametrics.com
Contact VictoriaMetrics

Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHouse Webinar.pdf

  • 1.
    Build Fast, Scalable AppMonitoring with Open Source Robert Hodges - Altinity Roman Khavronenko - VictoriaMetrics 1
  • 2.
    Let’s make someintroductions 2 Robert Hodges Database geek with 30+ years on DBMS systems. Day job: CEO at Altinity Roman Khavronenko Distributed systems and monitoring engineer. Day job: SE at VictoriaMetrics
  • 3.
  • 4.
    Monitoring is foranswering questions ● Why users are getting errors? ● When it started? ● How many users are affected? ● Which service is failing? 4
  • 5.
    To get ananswer to the question you need 3 things 1. The question 2. The information to process 3. The respondent 5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    VictoriaMetrics - OpenSource Time Series Database & Monitoring Solution ● Vertically and horizontally scalable ● Operational simplicity ● Cost-efficient ● Prometheus compatible ● Free forever 12
  • 13.
    VictoriaMetrics - OpenSource Time Series Database & Monitoring Solution ● Kubernetes monitoring ● Hardware and infrastructure monitoring ● Application Performance Monitoring (APM) ● IoT ● Edge computing ● Alerting 13
  • 14.
  • 15.
    Metric is anumeric measure or observation of something: ● Number of served requests ● Requests latency ● CPU or memory usage ● Occupied or free disk space What is a metric? 15
  • 16.
  • 17.
    Storage for metrics 17 ●VictoriaMetrics data model is schemaless ● No need to define metric names or their labels in advance ● User is free to add or change ingested metrics anytime.
  • 18.
  • 19.
    OSA Con 2021:How ClickHouse Inspired Us to Build a High Performance TSDB ● VictoriaMetrics is specialized solution for time series data ● Compression reaches 0.4 Bytes per sample ● Ingestions speed 300k samples/s per CPU core ● Scanning speed 50Mil samples/s per CPU core 19
  • 20.
  • 21.
    > curl -d"requests_total{path="/",code="200"} 10" -X POST http://victoriametrics/api/v1/import/prometheus 21
  • 22.
    More than oneprotocol for metrics ● Prometheus remote write API. ● Prometheus text exposition format. ● DataDog protocol. ● InfluxDB line protocol over HTTP, TCP and UDP. ● Graphite plaintext protocol with tags. ● OpenTSDB put message. ● HTTP OpenTSDB /api/put requests. ● JSON line format. ● Arbitrary CSV data. 22
  • 23.
  • 24.
  • 25.
    Demo time! ● RunVictoriaMetrics ● Write some metrics ● Execute read queries 25
  • 26.
    Frequently asked questions ●Can I monitor MySQL Server, Postgres, MongoDB, ClickHouse? ○ Yes, there are plenty of exporters, dashboards and alerting rules there. ● Can I monitor my applications? ○ Yes, there are libraries for multiple programming languages to instrument the application with metrics. ● How expensive monitoring is? ○ With VictoriaMetrics, cost of storing metrics from 100 instances, each instance emits 1000 metrics every 30s for the total cost will be: ■ 100GB of disk space $0.045 per GB-month: 100*0.045*12 = $54 ■ One t3.medium instance, $0.0418 per hour: 0.0418*730*12 = $366 ■ Total: $420 per year for monitoring 100 instances. ● Can I run it in Kubernetes? ○ Sure! We have k8s operator and helm charts for VictoriaMetrics! 26
  • 27.
  • 28.
    ClickHouse: a real-timeanalytic database It understands SQL It’s Apache 2.0 It handles many use cases beyond monitoring It also handles time series data very well 28
  • 29.
    ClickHouse optimizes forfast response on large datasets 29 Highly compressed column storage with indexing Automatic replication between nodes SELECT host, avg(idle) FROM vmstat GROUP BY host Parallelized/vectorized query Table replica
  • 30.
    ClickHouse can loadmillions of events per second 30 Unaggregated event data Source data table(s) Parallel load Event Queue (Kafka) Custom Application Data Lake (S3, HDFS) Precomputed aggregates Precomputed aggregates Precomputed aggregates Materialized views Instantly queryable
  • 31.
    …And supports [many]dozens of input formats 31 INSERT INTO some_table Format <format> TabSeparated TabSeparatedWithNames CSV CSVWithNames CustomSeparated Values JSON JSONEachRow Protobuf Parquet ...
  • 32.
    There are manyways to store and manipulate time data Date -- Precision to day DateTime -- Precision to second DateTime64 -- Precision to nanosecond toYear(), toMonth(), toWeek(), toDayOfWeek, toDay(), toHour(), ... toStartOfYear(), toStartOfQuarter(), toStartOfMonth(), toStartOfHour(), toStartOfMinute(), …, toStartOfInterval() toYYYYMM() toYYYYMMDD() toYYYYMMDDhhmmsss() And many more! 32 BI tools like Grafana like DateTime values
  • 33.
    Let’s build asimple host monitoring system 33 $ vmstat 1 -n procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 166912 2645740 36792 3360652 0 0 3 101 1 1 2 1 98 0 0 1 0 166912 2645360 36792 3360652 0 0 0 0 1182 3986 7 1 93 0 0 ClickHouse Grafana Dashboard
  • 34.
    Step 1: Generatevmstat data 34 #!/usr/bin/env python3 import datetime, json, socket, subprocess host = socket.gethostname() with subprocess.Popen(['vmstat', '-n', '1'], stdout=subprocess.PIPE) as proc: proc.stdout.readline() # discard first line header_names = proc.stdout.readline().decode().split() values = proc.stdout.readline().decode() while values != '' and proc.poll() is None: dict = {} dict['timestamp'] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") dict['host'] = host for (header, value) in zip(header_names, values.split()): dict[header] = int(value) print(json.dumps(dict), flush=True) values = proc.stdout.readline().decode()
  • 35.
    Here’s the output 35 {"timestamp":"2023-01-22 18:13:16", "host": "logos3", "r": 0, "b": 0, "swpd": 166912, "free": 2523688, "buff": 41412, "cache": 3408292, "si": 0, "so": 0, "bi": 3, "bo": 101, "in": 1, "cs": 0, "us": 2, "sy": 1, "id": 98, "wa": 0, "st": 0} {"timestamp": "2023-01-22 18:13:17", "host": "logos3", "r": 0, "b": 0, "swpd": 166912, "free": 2523696, "buff": 41412, "cache": 3408316, "si": 0, "so": 0, "bi": 0, "bo": 216, "in": 1214, "cs": 4320, "us": 1, "sy": 1, "id": 98, "wa": 0, "st": 0} {"timestamp": "2023-01-22 18:13:18", "host": "logos3", "r": 0, "b": 0, "swpd": 166912, "free": 2527120, "buff": 41412, "cache": 3408572, "si": 0, "so": 0, "bi": 0, "bo": 0, "in": 1172, "cs": 4162, "us": 2, "sy": 1, "id": 98, "wa": 0, "st": 0}
  • 36.
    Step 2: Designa ClickHouse table to hold data 36 CREATE TABLE monitoring.vmstat ( timestamp DateTime, day UInt32 default toYYYYMMDD(timestamp), host String, r UInt64, b UInt64, -- procs swpd UInt64, free UInt64, buff UInt64, cache UInt64, -- memory si UInt64, so UInt64, -- swap bi UInt64, bo UInt64, -- io in UInt64, cs UInt64, -- system us UInt64, sy UInt64, id UInt64, wa UInt64, st UInt64 -- cpu ) ENGINE=MergeTree PARTITION BY day ORDER BY (host, timestamp) Dimensions Measurements
  • 37.
    Step 3: Loaddata into ClickHouse 37 INSERT INTO vmstat Format JSONEachRow E.g. INSERT='INSERT%20INTO%20vmstat%20Format%20JSONEachRow' cat vmstat.dat | curl -X POST --data-binary @- "http://logos3:8123/?database=monitoring&query=${INSERT}" (Or a Python script)
  • 38.
    Step 4: Builda Grafana dashboard to show results 38 ClickHouse data source for Grafana Altinity plugin for ClickHouse
  • 39.
    After loading youcan go crazy with analytical queries 39 SELECT host, count() AS loaded_minutes FROM ( SELECT toStartOfMinute(timestamp) AS minute, host, avg(100 - id) AS load FROM monitoring.vmstat WHERE timestamp > (now() - toIntervalDay(1)) GROUP BY minute, host HAVING load > 25 ) GROUP BY host ORDER BY loaded_minutes DESC ┌─host───┬─loaded_minutes─┐ │ logos3 │ 6 │ │ logos2 │ 5 │ └────────┴────────────────┘ 2 hosts had > 25% load for at least a minute in the last 24 hours
  • 40.
  • 41.
    Can ClickHouse storedata in a “schemaless” way? {{"timestamp": "2023-01-23 19:53:14", "host": "logos3", ...} SQL Table JSON String JSON String (“blob”) with derived header values One table can handle many entity types! 41
  • 42.
    More schemaless waysto store data SQL Table Array of Keys Arrays: Header values with key-value pairs Array of Values SQL Table Map with Key/Values Map: Header values & key value pairs SQL Table JSON Data Type JSON data type mapped to column storage 42
  • 43.
    Where is thesoftware to build monitoring? 43 Event streaming ● Apache Kafka ● Apache Pulsar ● Vectorized Redpanda ELT ● Apache Airflow ● Rudderstack Rendering/Display ● Apache Superset ● Cube.js ● Grafana Client Libraries ● C++ - ClickHouse CPP ● Golang - ClickHouse Go ● Java - ClickHouse JDBC ● Javascript/Node.js - Apla ● ODBC - ODBC Driver for ClickHouse ● Python - ClickHouse Driver, ClickHouse SQLAlchemy More client library links HERE Kubernetes ● Altinity Operator for ClickHouse
  • 44.
    Where can Ifind out more about ClickHouse? ClickHouse official docs – https://clickhouse.com/docs/ Altinity Blog – https://altinity.com/blog/ Altinity Youtube Channel – https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA Altinity Knowledge Base – https://kb.altinity.com/ Meetups, other blogs, and external resources. Use your powers of Search! 44
  • 45.
  • 46.
    Comparing VictoriaMetrics andClickHouse databases VictoriaMetrics Talks MetricsQL, PromQL, Graphite QL Stores time series data No explicit schema Easy to load data using simple clients Can pull data from Prometheus exporters and Kafka Time-series specific functions and transformations Integrates with any BI tool that speaks PromQL Extremely fast and scalable ClickHouse Talks SQL Stores any kind of data Uses tables; many ways to represent data Easy to load data using simple clients Can pull data from Kafka and object storage Versatile queries including JOIN and aggregation Most BI tools have ClickHouse adapters Extremely fast and scalable 46
  • 47.
    Help for buildingmonitoring systems that work VictoriaMetrics Inc. VictoriaMetrics Community VictoriaMetrics Enterprise VictoriaMetrics Managed platform Altinity Inc. Altinity.Cloud managed ClickHouse platform Enterprise support for ClickHouse Altinity Developer Academy classes Altinity Stable Builds for ClickHouse Altinity Kubernetes Operator for ClickHouse 47
  • 48.