Building an Analytic extension to MySQL with
ClickHouse
1
Vadim Tkachenko(Percona) and Kanthi Subramanian(Altinity)
2 March 2023
Who we are
Vadim Tkachenko
CTO Percona
Kanthi Subramanian
Open source contributor/Data
Engineer/Developer Advocate
2
©2023 Percona
MySQL
Strengths
- OLTP Database (Operational)
Handles up to 1mln transactions per second
- Thousands of concurrent transactions
3
©2023 Percona
MySQL is
good for
- 1. ACID transactions.
- 2. Excellent concurrency.
- 3. Very fast point lookups and short
transactions.
- 4. Excellent tooling for building OLTP
applications.
- It's very good for running interactive online
properties:
- - e-commerce
- - online gaming
- - social networks
4
©2023 Percona
Analytics
with MySQL
- Only for small data sets.
- Aggregation queries (GROUP BY) can be problematic
(slow) on 10mln+ rows
In summary: analyzing data over millions of small
transactions is not good use case for MySQL
Some examples (next slides):
5
©2023 Percona 6
Query comparison (MySQL/ClickHouse)
The number of flights delayed by more than 10 minutes,
grouped by the day of the week, for 2000-2008
SELECT DayOfWeek, count(*) AS c
FROM ontime_snapshot
WHERE DepDel15>10 AND Year>=2000 AND
Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;
176mln rows to process
MySQL ClickHouse
573 Seconds (9
minutes 7 seconds)
0.5 seconds
©2023 Percona 7
7
Query comparison(MySQL/ClickHouse)
7
The number of flights delayed by more than 10 minutes,
grouped by the day of the week, for 2000-2008
SELECT Year, avg(DepDelay>10)*100
FROM ontime
GROUP BY Year
ORDER BY Year;
176mln rows to process
MySQL ClickHouse
240 Seconds (4
minutes)
0.674 seconds
©2023 Percona
What gives
such
difference ?
8
MySQL features:
storing data in rows
single-threaded queries,
optimization for high concurrency
are exactly the opposite of those needed to run analytic queries that compute
aggregates on large datasets.
ClickHouse is designed for analytic processing:
- stores data in columns
- has optimizations to minimize I/O
- computes aggregates very efficiently
- parallelized query processing
©2023 Percona 9
Why choose ClickHouse as a complement to
MySQL?
The number of flights delayed by more than 10 minutes,
grouped by the day of the week, for 2000-2008
Read all columns in row (MySQL) Read only selected columns
(ClickHouse)
©2023 Percona
Signs that MySQL needs
Analytic Help
10
Read all
columns
59 GB
(100%)
MySQL, hypothetical query
©2023 Percona
Signs that MySQL needs
Analytic Help
11
21 MB (.035%)
2.6 MB
(.0044%)
1.7 GB
(3%)
Read 3
columns Read 3
compressed
columns
Read 3
compressed
columns
over 8
threads
21 MB (.035%)
ClickHouse, the same query
©2023 Percona
Why is MySQL
a natural
complement
to
ClickHouse?
12
MySQL
Transactional processing
Fast single row updates
High Concurrency. MySQL
support large amount of
concurrent queries
ClickHouse
Does not support ACID
transactions
Updating single row is
problematic. ClickHouse will
need to read and updated a
lot of data
ClickHouse can use a lot of
resources for a single query.
Not good use case for
concurrent access
13
Leveraging Analytical
Benefits of
ClickHouse
● Identify Databases/Tables in
MySQL to be replicated
● Create schema/Databases in
ClickHouse
● Transfer Data from MySQL to
ClickHouse
https://github.com/Altinity/clickhouse-sink-connector
Fully wired, continuous replication
14
Table Engine(s)
Initial Dump/Load
MySQL ClickHouse
OLTP App Analytic App
MySQL
Binlog
Debezium
Altinity Sink
Connector
Kafka*
Event
Stream
*Including Pulsar and RedPanda
ReplacingMergeTree
Replication Setup
Validate Data
Setup CDC Replication
Initial Dump/Load
1
2
3
1. Initial Dump/Load
Why do we need custom load/dump tools?
● Data Types limits and Data Types are not the same for
MySQL and ClickHouse
Date Max MySQL(9999-12-31), Date CH(2299-12-31)
● Translate/Read MySQL schema and create ClickHouse
schema. (Identify PK, partition and translate to ORDER BY
in CH(RMT))
● Faster transfer, leverage existing MySQL and ClickHouse
tools.
1. Initial Dump/Load (MySQL Shell)
https://dev.mysql.com/blog-archive/mysql-shell-8-0-21-
speeding-up-the-dump-process/
https://blogs.oracle.com/mysql/post/mysql-shell-dump-load-
and-compression
1. Initial Dump/Load
MySQL Shell: Multi-Threaded, Split large tables to smaller chunks, Compression,
Speeds(upto 3GB/s).
Clickhouse Client: Multi-Threaded, read compressed data.
1. Initial Dump/Load
Install mysql-shell (JS)
mysqlsh -uroot -proot -hlocalhost -e "util.dump_tables('test', ['employees'],
'/tmp/employees_12');" --verbose
python db_load/clickhouse_loader.py --clickhouse_host localhost --
clickhouse_database $DATABASE --dump_dir $HOME/dbdumps/$DATABASE --
clickhouse_user root --clickhouse_password root --threads 4 --
mysql_source_database $DATABASE --mysqlshell
1. Initial Dump/Load
CREATE TABLE IF NOT EXISTS `employees_predated` (
`emp_no` int NOT NULL,
`birth_date` Date32 NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` Date32 NOT NULL,
`salary` bigint unsigned DEFAULT NULL,
`num_years` tinyint unsigned DEFAULT NULL,
`bonus` mediumint unsigned DEFAULT NULL,
`small_value` smallint unsigned DEFAULT NULL,
`int_value` int unsigned DEFAULT NULL,
`discount` bigint DEFAULT NULL,
`num_years_signed` tinyint DEFAULT NULL,
`bonus_signed` mediumint DEFAULT NULL,
`small_value_signed` smallint DEFAULT NULL,
`int_value_signed` int DEFAULT NULL,
`last_modified_date_time` DateTime64(0) DEFAULT NULL,
`last_access_time` String DEFAULT NULL,
`married_status` char(1) DEFAULT NULL,
`perDiemRate` decimal(30,12) DEFAULT NULL,
`hourlyRate` double DEFAULT NULL,
`jobDescription` text DEFAULT NULL,
`updated_time` String NULL ,
`bytes_date` longblob DEFAULT NULL,
`binary_test_column` varbinary(255) DEFAULT NULL,
`blob_med` mediumblob DEFAULT NULL,
`blob_new` blob DEFAULT NULL,
`_sign` Int8 DEFAULT 1,
`_version` UInt64 DEFAULT 0,
) ENGINE = ReplacingMergeTree(_version) ORDER BY (`emp_no`)
SETTINGS index_granularity = 8192;
CREATE TABLE `employees_predated` (
`emp_no` int NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`salary` bigint unsigned DEFAULT NULL,
`num_years` tinyint unsigned DEFAULT NULL,
`bonus` mediumint unsigned DEFAULT NULL,
`small_value` smallint unsigned DEFAULT NULL,
`int_value` int unsigned DEFAULT NULL,
`discount` bigint DEFAULT NULL,
`num_years_signed` tinyint DEFAULT NULL,
`bonus_signed` mediumint DEFAULT NULL,
`small_value_signed` smallint DEFAULT NULL,
`int_value_signed` int DEFAULT NULL,
`last_modified_date_time` datetime DEFAULT NULL,
`last_access_time` time DEFAULT NULL,
`married_status` char(1) DEFAULT NULL,
`perDiemRate` decimal(30,12) DEFAULT NULL,
`hourlyRate` double DEFAULT NULL,
`jobDescription` text,
`updated_time` timestamp NULL DEFAULT NULL,
`bytes_date` longblob,
`binary_test_column` varbinary(255) DEFAULT NULL,
`blob_med` mediumblob,
`blob_new` blob,
PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`emp_no`)
(PARTITION p1 VALUES LESS THAN (1000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
*/ |
MySQL
ClickHouse
2. Validate Data
Why is a basic count check not enough?
● Essential to validate the values, example decimal/floating
precision and datatype limits.
● Data Types are different between MySQL and ClickHouse.
Solution: md5 checksum of column data (Courtesy:
Sisense)
1. Take the MD5 of each column. Use a space for
NULL values.
2. Concatenate those results, and MD5 this result.
3. Split into 4 8-character hex strings.
4. Convert into 32-bit integers and sum.
python
db_compare/mysql_table_check
sum.py --mysql_host localhost --
mysql_user root --mysql_password
root --mysql_database menagerie
--tables_regex "^pet" --
debug_output
python
db_compare/clickhouse_table_c
hecksum.py --clickhouse_host
localhost --clickhouse_user root --
clickhouse_password root --
clickhouse_database menagerie --
tables_regex "^pet" --debug_output
diff out.pet.ch.txt out.pet.mysql.txt
| grep "<|>"
Credits: Arnaud
3. Setup CDC Replication
MySQL
binlog file: mysql.bin.00001
binlog position: 100002
Or
Gtid: 1233:223232323
Debezium
Altinity Sink
Connector
Kafka*
Event
Stream
ClickHouse
Setup Debezium to start from binlog file/position or Gtid
https://github.com/Altinity/clickhouse-sink-connector/blob/develop/doc/debezium_setup.md
Final step - Deploy
● Docker Compose (Debezium Strimzi, Sink Strimzi)
https://hub.docker.com/repository/docker/altinity/clickhouse-sink-connector
● Kubernetes (Docker images)
● JAR file
Simplified Architecture
MySQL
binlog file: mysql.bin.00001
binlog position: 100002
Or
Gtid: 1233:223232323
ClickHouse
Debezium
Altinity Sink
Connector
One executable
One service
Final step - Monitor
● Monitor Lag
● Connector Status
● Kafka monitoring
● CPU/Memory Stats
Challenges
- MySQL Master failover
- Schema Changes(DDL)
MySQL Master Replication
MySQL Master Failover
MySQL Master Failover - Snowflake ID
binlog timestamp
Alter Table support
30
ADD Column <col_name> varchar(1000)
NULL
ADD Column <col_name> Nullable(String)
ADD index type btree ADD index type minmax
MySQL ClickHouse
31
Replicating Schema Changes
32
Replicating Schema Changes
● Debezium does not provide events for all DDL Changes
● Complete DDL is only available in a separate topic(Not a
SinkRecord)
● Parallel Kafka workers might process messages out of order.
33
Replicating Schema Changes
Where can I get more information?
34
Altinity Sink Connector for ClickHouse
https://github.com/Altinity/clickhouse-sink-connector
https://github.com/ClickHouse/ClickHouse
https://github.com/mydumper/mydumper
35
Project roadmap and next Steps
- PostgreSQL, Mongo, SQL server support
- CH shards/replicas support
- Support Transactions.
36
Thank you!
Questions?
https://altinity.com https://percona.com

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx

  • 1.
    Building an Analyticextension to MySQL with ClickHouse 1 Vadim Tkachenko(Percona) and Kanthi Subramanian(Altinity) 2 March 2023
  • 2.
    Who we are VadimTkachenko CTO Percona Kanthi Subramanian Open source contributor/Data Engineer/Developer Advocate 2
  • 3.
    ©2023 Percona MySQL Strengths - OLTPDatabase (Operational) Handles up to 1mln transactions per second - Thousands of concurrent transactions 3
  • 4.
    ©2023 Percona MySQL is goodfor - 1. ACID transactions. - 2. Excellent concurrency. - 3. Very fast point lookups and short transactions. - 4. Excellent tooling for building OLTP applications. - It's very good for running interactive online properties: - - e-commerce - - online gaming - - social networks 4
  • 5.
    ©2023 Percona Analytics with MySQL -Only for small data sets. - Aggregation queries (GROUP BY) can be problematic (slow) on 10mln+ rows In summary: analyzing data over millions of small transactions is not good use case for MySQL Some examples (next slides): 5
  • 6.
    ©2023 Percona 6 Querycomparison (MySQL/ClickHouse) The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008 SELECT DayOfWeek, count(*) AS c FROM ontime_snapshot WHERE DepDel15>10 AND Year>=2000 AND Year<=2008 GROUP BY DayOfWeek ORDER BY c DESC; 176mln rows to process MySQL ClickHouse 573 Seconds (9 minutes 7 seconds) 0.5 seconds
  • 7.
    ©2023 Percona 7 7 Querycomparison(MySQL/ClickHouse) 7 The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008 SELECT Year, avg(DepDelay>10)*100 FROM ontime GROUP BY Year ORDER BY Year; 176mln rows to process MySQL ClickHouse 240 Seconds (4 minutes) 0.674 seconds
  • 8.
    ©2023 Percona What gives such difference? 8 MySQL features: storing data in rows single-threaded queries, optimization for high concurrency are exactly the opposite of those needed to run analytic queries that compute aggregates on large datasets. ClickHouse is designed for analytic processing: - stores data in columns - has optimizations to minimize I/O - computes aggregates very efficiently - parallelized query processing
  • 9.
    ©2023 Percona 9 Whychoose ClickHouse as a complement to MySQL? The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008 Read all columns in row (MySQL) Read only selected columns (ClickHouse)
  • 10.
    ©2023 Percona Signs thatMySQL needs Analytic Help 10 Read all columns 59 GB (100%) MySQL, hypothetical query
  • 11.
    ©2023 Percona Signs thatMySQL needs Analytic Help 11 21 MB (.035%) 2.6 MB (.0044%) 1.7 GB (3%) Read 3 columns Read 3 compressed columns Read 3 compressed columns over 8 threads 21 MB (.035%) ClickHouse, the same query
  • 12.
    ©2023 Percona Why isMySQL a natural complement to ClickHouse? 12 MySQL Transactional processing Fast single row updates High Concurrency. MySQL support large amount of concurrent queries ClickHouse Does not support ACID transactions Updating single row is problematic. ClickHouse will need to read and updated a lot of data ClickHouse can use a lot of resources for a single query. Not good use case for concurrent access
  • 13.
    13 Leveraging Analytical Benefits of ClickHouse ●Identify Databases/Tables in MySQL to be replicated ● Create schema/Databases in ClickHouse ● Transfer Data from MySQL to ClickHouse https://github.com/Altinity/clickhouse-sink-connector
  • 14.
    Fully wired, continuousreplication 14 Table Engine(s) Initial Dump/Load MySQL ClickHouse OLTP App Analytic App MySQL Binlog Debezium Altinity Sink Connector Kafka* Event Stream *Including Pulsar and RedPanda ReplacingMergeTree
  • 15.
    Replication Setup Validate Data SetupCDC Replication Initial Dump/Load 1 2 3
  • 16.
    1. Initial Dump/Load Whydo we need custom load/dump tools? ● Data Types limits and Data Types are not the same for MySQL and ClickHouse Date Max MySQL(9999-12-31), Date CH(2299-12-31) ● Translate/Read MySQL schema and create ClickHouse schema. (Identify PK, partition and translate to ORDER BY in CH(RMT)) ● Faster transfer, leverage existing MySQL and ClickHouse tools.
  • 17.
    1. Initial Dump/Load(MySQL Shell) https://dev.mysql.com/blog-archive/mysql-shell-8-0-21- speeding-up-the-dump-process/ https://blogs.oracle.com/mysql/post/mysql-shell-dump-load- and-compression
  • 18.
    1. Initial Dump/Load MySQLShell: Multi-Threaded, Split large tables to smaller chunks, Compression, Speeds(upto 3GB/s). Clickhouse Client: Multi-Threaded, read compressed data.
  • 19.
    1. Initial Dump/Load Installmysql-shell (JS) mysqlsh -uroot -proot -hlocalhost -e "util.dump_tables('test', ['employees'], '/tmp/employees_12');" --verbose python db_load/clickhouse_loader.py --clickhouse_host localhost -- clickhouse_database $DATABASE --dump_dir $HOME/dbdumps/$DATABASE -- clickhouse_user root --clickhouse_password root --threads 4 -- mysql_source_database $DATABASE --mysqlshell
  • 20.
    1. Initial Dump/Load CREATETABLE IF NOT EXISTS `employees_predated` ( `emp_no` int NOT NULL, `birth_date` Date32 NOT NULL, `first_name` varchar(14) NOT NULL, `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` Date32 NOT NULL, `salary` bigint unsigned DEFAULT NULL, `num_years` tinyint unsigned DEFAULT NULL, `bonus` mediumint unsigned DEFAULT NULL, `small_value` smallint unsigned DEFAULT NULL, `int_value` int unsigned DEFAULT NULL, `discount` bigint DEFAULT NULL, `num_years_signed` tinyint DEFAULT NULL, `bonus_signed` mediumint DEFAULT NULL, `small_value_signed` smallint DEFAULT NULL, `int_value_signed` int DEFAULT NULL, `last_modified_date_time` DateTime64(0) DEFAULT NULL, `last_access_time` String DEFAULT NULL, `married_status` char(1) DEFAULT NULL, `perDiemRate` decimal(30,12) DEFAULT NULL, `hourlyRate` double DEFAULT NULL, `jobDescription` text DEFAULT NULL, `updated_time` String NULL , `bytes_date` longblob DEFAULT NULL, `binary_test_column` varbinary(255) DEFAULT NULL, `blob_med` mediumblob DEFAULT NULL, `blob_new` blob DEFAULT NULL, `_sign` Int8 DEFAULT 1, `_version` UInt64 DEFAULT 0, ) ENGINE = ReplacingMergeTree(_version) ORDER BY (`emp_no`) SETTINGS index_granularity = 8192; CREATE TABLE `employees_predated` ( `emp_no` int NOT NULL, `birth_date` date NOT NULL, `first_name` varchar(14) NOT NULL, `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` date NOT NULL, `salary` bigint unsigned DEFAULT NULL, `num_years` tinyint unsigned DEFAULT NULL, `bonus` mediumint unsigned DEFAULT NULL, `small_value` smallint unsigned DEFAULT NULL, `int_value` int unsigned DEFAULT NULL, `discount` bigint DEFAULT NULL, `num_years_signed` tinyint DEFAULT NULL, `bonus_signed` mediumint DEFAULT NULL, `small_value_signed` smallint DEFAULT NULL, `int_value_signed` int DEFAULT NULL, `last_modified_date_time` datetime DEFAULT NULL, `last_access_time` time DEFAULT NULL, `married_status` char(1) DEFAULT NULL, `perDiemRate` decimal(30,12) DEFAULT NULL, `hourlyRate` double DEFAULT NULL, `jobDescription` text, `updated_time` timestamp NULL DEFAULT NULL, `bytes_date` longblob, `binary_test_column` varbinary(255) DEFAULT NULL, `blob_med` mediumblob, `blob_new` blob, PRIMARY KEY (`emp_no`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!50100 PARTITION BY RANGE (`emp_no`) (PARTITION p1 VALUES LESS THAN (1000) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ | MySQL ClickHouse
  • 21.
    2. Validate Data Whyis a basic count check not enough? ● Essential to validate the values, example decimal/floating precision and datatype limits. ● Data Types are different between MySQL and ClickHouse. Solution: md5 checksum of column data (Courtesy: Sisense) 1. Take the MD5 of each column. Use a space for NULL values. 2. Concatenate those results, and MD5 this result. 3. Split into 4 8-character hex strings. 4. Convert into 32-bit integers and sum. python db_compare/mysql_table_check sum.py --mysql_host localhost -- mysql_user root --mysql_password root --mysql_database menagerie --tables_regex "^pet" -- debug_output python db_compare/clickhouse_table_c hecksum.py --clickhouse_host localhost --clickhouse_user root -- clickhouse_password root -- clickhouse_database menagerie -- tables_regex "^pet" --debug_output diff out.pet.ch.txt out.pet.mysql.txt | grep "<|>" Credits: Arnaud
  • 22.
    3. Setup CDCReplication MySQL binlog file: mysql.bin.00001 binlog position: 100002 Or Gtid: 1233:223232323 Debezium Altinity Sink Connector Kafka* Event Stream ClickHouse Setup Debezium to start from binlog file/position or Gtid https://github.com/Altinity/clickhouse-sink-connector/blob/develop/doc/debezium_setup.md
  • 23.
    Final step -Deploy ● Docker Compose (Debezium Strimzi, Sink Strimzi) https://hub.docker.com/repository/docker/altinity/clickhouse-sink-connector ● Kubernetes (Docker images) ● JAR file
  • 24.
    Simplified Architecture MySQL binlog file:mysql.bin.00001 binlog position: 100002 Or Gtid: 1233:223232323 ClickHouse Debezium Altinity Sink Connector One executable One service
  • 25.
    Final step -Monitor ● Monitor Lag ● Connector Status ● Kafka monitoring ● CPU/Memory Stats
  • 26.
    Challenges - MySQL Masterfailover - Schema Changes(DDL)
  • 27.
  • 28.
  • 29.
    MySQL Master Failover- Snowflake ID binlog timestamp
  • 30.
    Alter Table support 30 ADDColumn <col_name> varchar(1000) NULL ADD Column <col_name> Nullable(String) ADD index type btree ADD index type minmax MySQL ClickHouse
  • 31.
  • 32.
    32 Replicating Schema Changes ●Debezium does not provide events for all DDL Changes ● Complete DDL is only available in a separate topic(Not a SinkRecord) ● Parallel Kafka workers might process messages out of order.
  • 33.
  • 34.
    Where can Iget more information? 34 Altinity Sink Connector for ClickHouse https://github.com/Altinity/clickhouse-sink-connector https://github.com/ClickHouse/ClickHouse https://github.com/mydumper/mydumper
  • 35.
    35 Project roadmap andnext Steps - PostgreSQL, Mongo, SQL server support - CH shards/replicas support - Support Transactions.
  • 36.

Editor's Notes

  • #14 Experience deploying to customers and the tools we have developed in the process. It's a complicated set of steps, it will be easier to automate the entire process. Create schema/databases -> we have scripts for the initial load that simplifies this process, and sink connector can also auto create tables. Complete suite of tools to simplify the process end to end.
  • #15 Existing data in MySQL might be big, need a solution that will be fast to do the Initial transfer. (CH needs to be in-sync) End to End solution for transferring data from MySQL to ClickHouse for Production Deployments. Debezium timeout(STATEMENT execution timeout). Source DB might have limited permissions. You might not have permission to perform OUTFILE.
  • #16 Step 1: Perform a dump of data from MySQL and load it into ClickHouse. Debezium initial snapshot might not be faster. Step 2: After the dump is loaded, validate the data. Step 3: Setup CDC replication using Debezium and Altinity sink connector.
  • #17 Debezium provides initial snapshotting, but it’s slow. Debezium load times very slow. MAX_EXECUTION_TIMEOUT
  • #18 Debezium provides initial snapshotting, but it’s slow. Mysqlsh requires a PK, if PK is not present, it does not parallelize and do not provide chunking capabilities.
  • #19 Debezium provides initial snapshotting, but it’s slow. Mysql shell uses zstd compression standard by default. –threads option provides parallelism.
  • #20 Debezium provides initial snapshotting, but it’s slow. Mysql shell uses zstd compression standard by default. –threads option provides parallelism. Clickhouse_loader creates CH schema and adds version and sign columns for UPDATES/DELETES.
  • #21 Debezium provides initial snapshotting, but it’s slow. Mysql shell uses zstd compression standard by default. –threads option provides parallelism. Clickhouse_loader creates CH schema and adds version and sign columns for UPDATES/DELETES.
  • #22 Debezium provides initial snapshotting, but it’s slow. Compare results of the aggregation table that drives your dashboard. Sales numbers have to be accurate.
  • #24 Debezium provides initial snapshotting, but it’s slow. Different environments We also maintain images for Debezium/Strimzi and Sink/Strimzi
  • #25 Debezium provides initial snapshotting, but it’s slow. Different environments We also maintain images for Debezium/Strimzi and Sink/Strimzi
  • #26 Setup Alerts if connectors are down. Setup Alerts when there is a lag. Setup Alerts when there are errors. We also bundle the debezium dashboard and the kafka dashboard.
  • #32 Co-ordination is Key! Tradeoff between Parallelism and Consistency.
  • #33 Events: Truncate table.
  • #34 Events: Truncate table.
  • #37 Events: Truncate table.