Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022

© 10x Banking Technology Limited 2022. All rights reserved
Stuart Coleman – 10x Banking
Enabling product
personalization using Kafka,
Pinot and Trino
05 April 2022

1. Why your bank has not offered you new products for a long time.
2. Breaking the monolith of core banking systems and liberating data from the
core.
3. Taking advantage of the hard work to offer new and dynamic product
personalization.
4. Our experiences are based on transactions and accounts (but you should be
able to substitutes purchases and customers or movies and users)
What are we going to talk about?
What are we not going to talk about?
1. Lots of details of Kafka, Pinot or Trino architecture

What is core banking and how does it work?

Core banking is conceptually easy:
• Customers onboard to the bank and subscribe to
one or more products.
• They make and receive payments.
• Payments are booked into the Ledger
synchronously in real time (to avoid double
spend).
• Products define a series of lifecycle steps which
happen after payments are posted
• Interest calculation
• Fees and rewards
• Reporting and Accounting
How does my bank work?

How does my bank work?
• Correctness is absolute – it’s a bank!
• Status quo is to use a mainframe
• Highly performant, available and
reliable.
• Consistency is much easier in a
monolith
• Applications directly communicate with
mainframe
• Single monolithic shared databases
Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mainframe/ibm-zos-online-transaction-processing-azure

Domain driven design, microservices and data encapsulation

Retrieving balances and adding cleansed merchant name to a transaction

Retrieving balances and adding cleansed merchant name to a transaction – mainframe version
Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mainframe/ibm-zos-online-transaction-processing-azure
• No new components
• Front end and business logic
components need to be modified
• Required new data fields added to
monolithic data layer
• Complex and risky change

Data Dichotomy
Taken from https://www.confluent.io/en-gb/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/

Data Dichotomy

Event Driven Design

Correctness and integrity baked in – the Outbox pattern

Putting core banking data to use for customers

Your bank account today

Let’s build some new products

What dimensions are interesting?

What we need
Ability to compute analytical aggregates on data with filters from data in other domains

Possible solutions – data warehouse + real time component

Pre-aggregation for reliable realtime query latency
Reliable query latency but
• Dimensions need to be known beforehand
• One record generates multiple aggregates
• Dimension and storage explosion
• Difficult to scale

Flexibility vs Latency

Pinot and Trino in 1 minute
Pinot is a purpose built data store for ultra-low latency analytics at high throughput
• Column oriented
• Powerful indexing techniques for low latency aggregation and filtering
• Horizontally scalable
• Supports high concurrency queries
Trino is a distributed ANSI SQL compliant engine
• Pluggable connector architecture which allows querying across many data stores,
(including Pinot)
• Powerful indexing techniques for low latency aggregation and filtering
• Built for low latency and efficiency even on large batch queries

Bridging the gap with Pinot and Trino

Handling both types of queries with Pinot and Trino
• Single copy of data
• No need to handle two ingest pipelines
• Scaleable horizontally through more Pinot servers and Trino workers

How much to denormalize?
• Pinot does have limited support for lookup
joins but not fully featured SQL joins
• Aggregation, filtering and grouping are strong
with a wide range of indexes to speed up
queries
• For our use case, most practical to pre-join
outside of Pinot and to ingest the pre-joined
topic. Aggregations and group-by’s performed
in Trino and Pinot depending on query size

Ensuring correctness

Deduplication
• Duplication is (obviously) not acceptable in core banking
• Outbox only guarantees at least once delivery
• Message only marked as sent after publication to Kafka, outside of a transaction
• Pinot ingestion is exactly once but has no inbuilt deduplication in the ingestion component

Pinot Upsert and how Pinot consumes in real time
• Low level consumer ingestion has one consumer per topic
partition
• This is duplicated by the replication factor for Pinot
• Pinot upsert requires primary key for upsert is placed on
the same partition
• This can then be checked for duplicate records on that
server
• Gives an efficient way of deduplication without a global
coordinator
But there are some cons:
• Duplicates are only checked in a given time window
• Increasing partitions in Kafka is problematic
• Read consistency is not guaranteed
• Certain Pinot indexes cannot be used (startree)

Deduplication - Subquery
• Safest to deduplicate on the read side

Deduplication – Subquery in Pinot
• Pinot wraps double query in a single query using IdSets

Takeaways
Domain driven design and microservices have lots of great benefits. Processes become less
coupled and product innovation can happen in a safer and more flexible way.
But data can become trapped inside domains and building features which require data across
multiple domains, like product personalization, become difficult and entangled.
Event based architectures are a great way to share data across domains, allowing datasets to
be joined.
Apache Pinot provides the ability to perform real time customer facing analytics and
personalisation without the dimension explosion typical in pre-aggregation solutions in stream
processing.

Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022

In this document

More Related Content

What's hot

Similar to Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022

More from HostedbyConfluent

Recently uploaded

Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022