© 10x Banking Technology Limited 2022. All rights reserved
Stuart Coleman – 10x Banking
Enabling product
personalization using Kafka,
Pinot and Trino
05 April 2022
© 10x Banking Technology Limited 2022. All rights reserved
1. Why your bank has not offered you new products for a long time.
2. Breaking the monolith of core banking systems and liberating data from the
core.
3. Taking advantage of the hard work to offer new and dynamic product
personalization.
4. Our experiences are based on transactions and accounts (but you should be
able to substitutes purchases and customers or movies and users)
What are we going to talk about?
What are we not going to talk about?
1. Lots of details of Kafka, Pinot or Trino architecture
© 10x Banking Technology Limited 2022. All rights reserved
What is core banking and how does it work?
© 10x Banking Technology Limited 2022. All rights reserved
Core banking is conceptually easy:
• Customers onboard to the bank and subscribe to
one or more products.
• They make and receive payments.
• Payments are booked into the Ledger
synchronously in real time (to avoid double
spend).
• Products define a series of lifecycle steps which
happen after payments are posted
• Interest calculation
• Fees and rewards
• Reporting and Accounting
How does my bank work?
© 10x Banking Technology Limited 2022. All rights reserved
How does my bank work?
• Correctness is absolute – it’s a bank!
• Status quo is to use a mainframe
• Highly performant, available and
reliable.
• Consistency is much easier in a
monolith
• Applications directly communicate with
mainframe
• Single monolithic shared databases
Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mainframe/ibm-zos-online-transaction-processing-azure
© 10x Banking Technology Limited 2022. All rights reserved
Domain driven design, microservices and data encapsulation
© 10x Banking Technology Limited 2022. All rights reserved
Retrieving balances and adding cleansed merchant name to a transaction
© 10x Banking Technology Limited 2022. All rights reserved
Retrieving balances and adding cleansed merchant name to a transaction – mainframe version
Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mainframe/ibm-zos-online-transaction-processing-azure
• No new components
• Front end and business logic
components need to be modified
• Required new data fields added to
monolithic data layer
• Complex and risky change
© 10x Banking Technology Limited 2022. All rights reserved
Data Dichotomy
Taken from https://www.confluent.io/en-gb/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
© 10x Banking Technology Limited 2022. All rights reserved
Data Dichotomy
© 10x Banking Technology Limited 2022. All rights reserved
Event Driven Design
© 10x Banking Technology Limited 2022. All rights reserved
Event Driven Design
© 10x Banking Technology Limited 2022. All rights reserved
Correctness and integrity baked in – the Outbox pattern
© 10x Banking Technology Limited 2022. All rights reserved
Putting core banking data to use for customers
© 10x Banking Technology Limited 2022. All rights reserved
Your bank account today
© 10x Banking Technology Limited 2022. All rights reserved
Let’s build some new products
© 10x Banking Technology Limited 2022. All rights reserved
What dimensions are interesting?
© 10x Banking Technology Limited 2022. All rights reserved
What we need
Ability to compute analytical aggregates on data with filters from data in other domains
© 10x Banking Technology Limited 2022. All rights reserved
Possible solutions – data warehouse + real time component
© 10x Banking Technology Limited 2022. All rights reserved
Possible solutions – data warehouse + real time component
© 10x Banking Technology Limited 2022. All rights reserved
Pre-aggregation for reliable realtime query latency
Reliable query latency but
• Dimensions need to be known beforehand
• One record generates multiple aggregates
• Dimension and storage explosion
• Difficult to scale
© 10x Banking Technology Limited 2022. All rights reserved
Flexibility vs Latency
© 10x Banking Technology Limited 2022. All rights reserved
Pinot and Trino in 1 minute
Pinot is a purpose built data store for ultra-low latency analytics at high throughput
• Column oriented
• Powerful indexing techniques for low latency aggregation and filtering
• Horizontally scalable
• Supports high concurrency queries
Trino is a distributed ANSI SQL compliant engine
• Pluggable connector architecture which allows querying across many data stores,
(including Pinot)
• Powerful indexing techniques for low latency aggregation and filtering
• Built for low latency and efficiency even on large batch queries
© 10x Banking Technology Limited 2022. All rights reserved
Bridging the gap with Pinot and Trino
© 10x Banking Technology Limited 2022. All rights reserved
Handling both types of queries with Pinot and Trino
• Single copy of data
• No need to handle two ingest pipelines
• Scaleable horizontally through more Pinot servers and Trino workers
© 10x Banking Technology Limited 2022. All rights reserved
How much to denormalize?
• Pinot does have limited support for lookup
joins but not fully featured SQL joins
• Aggregation, filtering and grouping are strong
with a wide range of indexes to speed up
queries
• For our use case, most practical to pre-join
outside of Pinot and to ingest the pre-joined
topic. Aggregations and group-by’s performed
in Trino and Pinot depending on query size
© 10x Banking Technology Limited 2022. All rights reserved
Ensuring correctness
© 10x Banking Technology Limited 2022. All rights reserved
Deduplication
• Duplication is (obviously) not acceptable in core banking
• Outbox only guarantees at least once delivery
• Message only marked as sent after publication to Kafka, outside of a transaction
• Pinot ingestion is exactly once but has no inbuilt deduplication in the ingestion component
© 10x Banking Technology Limited 2022. All rights reserved
Pinot Upsert and how Pinot consumes in real time
• Low level consumer ingestion has one consumer per topic
partition
• This is duplicated by the replication factor for Pinot
• Pinot upsert requires primary key for upsert is placed on
the same partition
• This can then be checked for duplicate records on that
server
• Gives an efficient way of deduplication without a global
coordinator
But there are some cons:
• Duplicates are only checked in a given time window
• Increasing partitions in Kafka is problematic
• Read consistency is not guaranteed
• Certain Pinot indexes cannot be used (startree)
© 10x Banking Technology Limited 2022. All rights reserved
Deduplication - Subquery
• Safest to deduplicate on the read side
© 10x Banking Technology Limited 2022. All rights reserved
Deduplication – Subquery in Pinot
• Pinot wraps double query in a single query using IdSets
© 10x Banking Technology Limited 2022. All rights reserved
Takeaways
Domain driven design and microservices have lots of great benefits. Processes become less
coupled and product innovation can happen in a safer and more flexible way.
But data can become trapped inside domains and building features which require data across
multiple domains, like product personalization, become difficult and entangled.
Event based architectures are a great way to share data across domains, allowing datasets to
be joined.
Apache Pinot provides the ability to perform real time customer facing analytics and
personalisation without the dimension explosion typical in pre-aggregation solutions in stream
processing.
Thank you

Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022

  • 1.
    © 10x BankingTechnology Limited 2022. All rights reserved Stuart Coleman – 10x Banking Enabling product personalization using Kafka, Pinot and Trino 05 April 2022
  • 2.
    © 10x BankingTechnology Limited 2022. All rights reserved 1. Why your bank has not offered you new products for a long time. 2. Breaking the monolith of core banking systems and liberating data from the core. 3. Taking advantage of the hard work to offer new and dynamic product personalization. 4. Our experiences are based on transactions and accounts (but you should be able to substitutes purchases and customers or movies and users) What are we going to talk about? What are we not going to talk about? 1. Lots of details of Kafka, Pinot or Trino architecture
  • 3.
    © 10x BankingTechnology Limited 2022. All rights reserved What is core banking and how does it work?
  • 4.
    © 10x BankingTechnology Limited 2022. All rights reserved Core banking is conceptually easy: • Customers onboard to the bank and subscribe to one or more products. • They make and receive payments. • Payments are booked into the Ledger synchronously in real time (to avoid double spend). • Products define a series of lifecycle steps which happen after payments are posted • Interest calculation • Fees and rewards • Reporting and Accounting How does my bank work?
  • 5.
    © 10x BankingTechnology Limited 2022. All rights reserved How does my bank work? • Correctness is absolute – it’s a bank! • Status quo is to use a mainframe • Highly performant, available and reliable. • Consistency is much easier in a monolith • Applications directly communicate with mainframe • Single monolithic shared databases Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example- scenario/mainframe/ibm-zos-online-transaction-processing-azure
  • 6.
    © 10x BankingTechnology Limited 2022. All rights reserved Domain driven design, microservices and data encapsulation
  • 7.
    © 10x BankingTechnology Limited 2022. All rights reserved Retrieving balances and adding cleansed merchant name to a transaction
  • 8.
    © 10x BankingTechnology Limited 2022. All rights reserved Retrieving balances and adding cleansed merchant name to a transaction – mainframe version Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example- scenario/mainframe/ibm-zos-online-transaction-processing-azure • No new components • Front end and business logic components need to be modified • Required new data fields added to monolithic data layer • Complex and risky change
  • 9.
    © 10x BankingTechnology Limited 2022. All rights reserved Data Dichotomy Taken from https://www.confluent.io/en-gb/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
  • 10.
    © 10x BankingTechnology Limited 2022. All rights reserved Data Dichotomy
  • 11.
    © 10x BankingTechnology Limited 2022. All rights reserved Event Driven Design
  • 12.
    © 10x BankingTechnology Limited 2022. All rights reserved Event Driven Design
  • 13.
    © 10x BankingTechnology Limited 2022. All rights reserved Correctness and integrity baked in – the Outbox pattern
  • 14.
    © 10x BankingTechnology Limited 2022. All rights reserved Putting core banking data to use for customers
  • 15.
    © 10x BankingTechnology Limited 2022. All rights reserved Your bank account today
  • 16.
    © 10x BankingTechnology Limited 2022. All rights reserved Let’s build some new products
  • 17.
    © 10x BankingTechnology Limited 2022. All rights reserved What dimensions are interesting?
  • 18.
    © 10x BankingTechnology Limited 2022. All rights reserved What we need Ability to compute analytical aggregates on data with filters from data in other domains
  • 19.
    © 10x BankingTechnology Limited 2022. All rights reserved Possible solutions – data warehouse + real time component
  • 20.
    © 10x BankingTechnology Limited 2022. All rights reserved Possible solutions – data warehouse + real time component
  • 21.
    © 10x BankingTechnology Limited 2022. All rights reserved Pre-aggregation for reliable realtime query latency Reliable query latency but • Dimensions need to be known beforehand • One record generates multiple aggregates • Dimension and storage explosion • Difficult to scale
  • 22.
    © 10x BankingTechnology Limited 2022. All rights reserved Flexibility vs Latency
  • 23.
    © 10x BankingTechnology Limited 2022. All rights reserved Pinot and Trino in 1 minute Pinot is a purpose built data store for ultra-low latency analytics at high throughput • Column oriented • Powerful indexing techniques for low latency aggregation and filtering • Horizontally scalable • Supports high concurrency queries Trino is a distributed ANSI SQL compliant engine • Pluggable connector architecture which allows querying across many data stores, (including Pinot) • Powerful indexing techniques for low latency aggregation and filtering • Built for low latency and efficiency even on large batch queries
  • 24.
    © 10x BankingTechnology Limited 2022. All rights reserved Bridging the gap with Pinot and Trino
  • 25.
    © 10x BankingTechnology Limited 2022. All rights reserved Handling both types of queries with Pinot and Trino • Single copy of data • No need to handle two ingest pipelines • Scaleable horizontally through more Pinot servers and Trino workers
  • 26.
    © 10x BankingTechnology Limited 2022. All rights reserved How much to denormalize? • Pinot does have limited support for lookup joins but not fully featured SQL joins • Aggregation, filtering and grouping are strong with a wide range of indexes to speed up queries • For our use case, most practical to pre-join outside of Pinot and to ingest the pre-joined topic. Aggregations and group-by’s performed in Trino and Pinot depending on query size
  • 27.
    © 10x BankingTechnology Limited 2022. All rights reserved Ensuring correctness
  • 28.
    © 10x BankingTechnology Limited 2022. All rights reserved Deduplication • Duplication is (obviously) not acceptable in core banking • Outbox only guarantees at least once delivery • Message only marked as sent after publication to Kafka, outside of a transaction • Pinot ingestion is exactly once but has no inbuilt deduplication in the ingestion component
  • 29.
    © 10x BankingTechnology Limited 2022. All rights reserved Pinot Upsert and how Pinot consumes in real time • Low level consumer ingestion has one consumer per topic partition • This is duplicated by the replication factor for Pinot • Pinot upsert requires primary key for upsert is placed on the same partition • This can then be checked for duplicate records on that server • Gives an efficient way of deduplication without a global coordinator But there are some cons: • Duplicates are only checked in a given time window • Increasing partitions in Kafka is problematic • Read consistency is not guaranteed • Certain Pinot indexes cannot be used (startree)
  • 30.
    © 10x BankingTechnology Limited 2022. All rights reserved Deduplication - Subquery • Safest to deduplicate on the read side
  • 31.
    © 10x BankingTechnology Limited 2022. All rights reserved Deduplication – Subquery in Pinot • Pinot wraps double query in a single query using IdSets
  • 32.
    © 10x BankingTechnology Limited 2022. All rights reserved Takeaways Domain driven design and microservices have lots of great benefits. Processes become less coupled and product innovation can happen in a safer and more flexible way. But data can become trapped inside domains and building features which require data across multiple domains, like product personalization, become difficult and entangled. Event based architectures are a great way to share data across domains, allowing datasets to be joined. Apache Pinot provides the ability to perform real time customer facing analytics and personalisation without the dimension explosion typical in pre-aggregation solutions in stream processing.
  • 33.