BANKING CIRCLE
Advanced Analytics
and graphs in AML
Banking Circle
Financial Crime and Graph Technology
2
Confidential
Our vision is to build a global account-to-account payments network and related financial infrastructure that is
simple, fast and low-cost, with world-class levels of security and compliance, for all major businesses
2
Executive summary
• Banking Circle is a payments bank, delivering payments and banking services by
connecting to the world’s clearing systems.
• This requires a more performant approach to Anti money laundering (AML) than
customer facing banks.
• A data driven approach to AML has significantly improved BC’s ability to detect
potentially suspicious flow
• A network representation of our payment data is a core part of this data driven
approach
• Embeddings improve machine learning models
• Visualisations aid manual investigations
3
Confidential
Banking Circle
Financial Crime and Graph Technology
4
Confidential
Scale
Financial Crime and Graph Technology
5
Confidential
What does this mean for AML at Banking Circle
• Lots of payments (data)
• Different data to other banks / firms in the system
• Traditional approach of throwing bodies at the problem won’t work
• We need to use the data to make better decisions.
6
A data driven approach in AML can help mitigate some of the issues in “traditional”
systems
• Traditional AML approach
‒ Based on rules: Manually tuned decision tree - > forest of very shallow trees
• E.g: Catch certain words, amounts or locations. Attempts at catching patterns/behaviours
‒ (Very) high false positive rates
• Data driven approach:
• Kill/adjust non-performing models based on continuous testing and evaluation
• Complex ensemble ML models
‒ Gradient boosted trees, neural networks, natural language processing
 Enrich with new data representations
 Augment with statistics based features:
‒ Anomaly detection, pattern descriptions
 Incorporate external data sets
7
SCAM – System for Catching Atempted Moneylaundering – an ensemble of different machine
learning models
New
Payments
Static Rules
+ NLP
Supervised
Models
Entered into Hitlist
DB w. Explanation
Diversification
Models
Manual
Evaluation
Feedback from Monitoring Team
 Hit 1
 Hit 2
 Hit 3
 Hit 4
 Hit 5
 Hit 6
 Entity A
 Hit 1
 Hit 2
 Hit 3
 Hit 4
 Hit 5
 Hit 6
 Entity A
 Hit 1
 Payment 2
 Hit 3
 Hit 4
 Payment 5
 Hit 6
 Entity A
 Hit 1
 Payment 2
 Hit 3
 Hit 4
 Payment 5
 Payment 6
 Entity A
 Hit 1
 Payment 2
 Payment 3
 Payment 4
 Payment 5
 Payment 6
 Payment 1
 Payment 2
 Payment 3
 Payment 4
 Payment 5
 Payment 6
Unsupervised models – do we have payments coming through that we can
learn from
‒ Decision tree based models
‒ Statistical measures of similarity
‒ NLP
 Will generate new hits and reduce bias
 Will help label payments not found by rules or supervised methods
Supervised models – what can we learn from already
evaluated payments
‒ Decision tree based models
‒ Deep neural network based models
 Will score each payment (and account/client) and
constantly improve performance as more payments are
manually evaluated
8
Core risk scoring model
Payment
Not
suspicious
Potentially
suspicious
0.8 0.2
The “best” model
minimizes a
combination of error
rates and complexity on
a historical data set
Probability
Payment details
Payment and monitoring history
Data related to payment
Payment patterns and connections
9
SCAM relies on a network representation to capture connections
Query
• Entities (owners, directors etc)
• Accounts
• Countries
related to
• Payment
• Account
• Entity
within 1,2,…, n links
Graph interaction used for feature extraction
10
Network based feature generation
 Basic properties that are difficult to get from “traditional” DBs
‒ Number of connections for an account or entity
‒ Names of all beneficiaries, remitters and their locations
‒ Distance to tax havens or known crooks.
‒ Number of addresses used or associated directors
‒ etc
 Advanced feature generation (WIP)
‒ Graph properties generate features
• Connectedness and centrality vary between business types of customers
• Color propagation – diffuse risk through a network
‒ Embedded subgraph to generate features
‒ Community detection
‒ ML Anomaly detection as feature input into ML models
 Results in an AUC improvement of 0.02 points
‒ 10% -25% reduction in false negatives, no change in false positives
‒ Halving of overall alerts generated​
11
Confidential
AI on connected data has led to better AML decisions while reducing operational costs
• AML system based on:
• Rules (prevent egg in face, compliance with regulation)
• machine learning running on top of network data to mitigate the deficiencies of a purely rule
based system and catch new behaviour rather than known typologies fitting predefined static
rules.
• Daily AML screening of all processed payments through same system.
• AI is the only way to future proof an AML model in real time whilst constantly increasing the value
of all data in the ecosystem.
Operational scaling
12
Confidential
AI on connected data has led to better AML decisions while reducing operational costs
Expose known unknowns and increase holistic detection based on connections and enriched
data
• 50 % accounts closed or escalated to compliance now due to pure AI related AML
findings.
• 10+ % AI of alerts escalated vs few % for traditional rules.
• External requests does not lead to additional tp’s
In-house development by cross-functional team has allowed BC to
• Learn from already evaluated transaction history.
• Be adaptable and continuously identify payment patterns not covered by a static rule
set.
A data driven approach to AML has allowed BC to highlight fewer non-risky payments,
generate less false positives, thus more time to investigate suspicious payments:
• Number of payments has increased ~200% ; the number of generated AML alerts has
decreased ~30%.
• Number of AML alerts leading to compliance handling or account closure have
increased ~1300%.
Currently screening over 1.000.000 payments / day
13
Confidential
A closer look at the implementation
• A streaming setup based on Azure service bus and Kubernetes
• Some data is cached on a daily basis, but most is computed live
• Graph nodes and connections are added for each payment
• Features are computed on the graph
• All computation done in cypher
• Limited to methods available Neo4j
14
Graphs in anti money laundering investigations
Empowering investigations
Investigation of known associates and connections
‒ New view onto the same data
‒ Linking diverse data sources in one view, complements “tabular”
reports
‒ Natural representation for criminal networks / organized crime
‒ Very effective – some analysis is cut from days to minutes
15
Investigation tool allows easy access to full data model
Empowering investigations
 Investigate individual entities, accounts or transfers
 Resulted in open source neographviz python package
16
Investigation tool allows easy access to full data model
Empowering investigations
Investigation of connections
17
Take away
 A data driven approach to AML has significantly improved BC’s ability
to detect potentially suspicious flow
 A network representation of our payment data is a core part of this
data driven approach
‒ Machine learning models
‒ Manual investigations
18
Thank you for your attention
Contact:
Ruben Menke
rum@bankingcircle.com

Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machine Learning and Graphs

  • 1.
    BANKING CIRCLE Advanced Analytics andgraphs in AML Banking Circle Financial Crime and Graph Technology
  • 2.
    2 Confidential Our vision isto build a global account-to-account payments network and related financial infrastructure that is simple, fast and low-cost, with world-class levels of security and compliance, for all major businesses 2 Executive summary • Banking Circle is a payments bank, delivering payments and banking services by connecting to the world’s clearing systems. • This requires a more performant approach to Anti money laundering (AML) than customer facing banks. • A data driven approach to AML has significantly improved BC’s ability to detect potentially suspicious flow • A network representation of our payment data is a core part of this data driven approach • Embeddings improve machine learning models • Visualisations aid manual investigations
  • 3.
  • 4.
  • 5.
    5 Confidential What does thismean for AML at Banking Circle • Lots of payments (data) • Different data to other banks / firms in the system • Traditional approach of throwing bodies at the problem won’t work • We need to use the data to make better decisions.
  • 6.
    6 A data drivenapproach in AML can help mitigate some of the issues in “traditional” systems • Traditional AML approach ‒ Based on rules: Manually tuned decision tree - > forest of very shallow trees • E.g: Catch certain words, amounts or locations. Attempts at catching patterns/behaviours ‒ (Very) high false positive rates • Data driven approach: • Kill/adjust non-performing models based on continuous testing and evaluation • Complex ensemble ML models ‒ Gradient boosted trees, neural networks, natural language processing  Enrich with new data representations  Augment with statistics based features: ‒ Anomaly detection, pattern descriptions  Incorporate external data sets
  • 7.
    7 SCAM – Systemfor Catching Atempted Moneylaundering – an ensemble of different machine learning models New Payments Static Rules + NLP Supervised Models Entered into Hitlist DB w. Explanation Diversification Models Manual Evaluation Feedback from Monitoring Team  Hit 1  Hit 2  Hit 3  Hit 4  Hit 5  Hit 6  Entity A  Hit 1  Hit 2  Hit 3  Hit 4  Hit 5  Hit 6  Entity A  Hit 1  Payment 2  Hit 3  Hit 4  Payment 5  Hit 6  Entity A  Hit 1  Payment 2  Hit 3  Hit 4  Payment 5  Payment 6  Entity A  Hit 1  Payment 2  Payment 3  Payment 4  Payment 5  Payment 6  Payment 1  Payment 2  Payment 3  Payment 4  Payment 5  Payment 6 Unsupervised models – do we have payments coming through that we can learn from ‒ Decision tree based models ‒ Statistical measures of similarity ‒ NLP  Will generate new hits and reduce bias  Will help label payments not found by rules or supervised methods Supervised models – what can we learn from already evaluated payments ‒ Decision tree based models ‒ Deep neural network based models  Will score each payment (and account/client) and constantly improve performance as more payments are manually evaluated
  • 8.
    8 Core risk scoringmodel Payment Not suspicious Potentially suspicious 0.8 0.2 The “best” model minimizes a combination of error rates and complexity on a historical data set Probability Payment details Payment and monitoring history Data related to payment Payment patterns and connections
  • 9.
    9 SCAM relies ona network representation to capture connections Query • Entities (owners, directors etc) • Accounts • Countries related to • Payment • Account • Entity within 1,2,…, n links Graph interaction used for feature extraction
  • 10.
    10 Network based featuregeneration  Basic properties that are difficult to get from “traditional” DBs ‒ Number of connections for an account or entity ‒ Names of all beneficiaries, remitters and their locations ‒ Distance to tax havens or known crooks. ‒ Number of addresses used or associated directors ‒ etc  Advanced feature generation (WIP) ‒ Graph properties generate features • Connectedness and centrality vary between business types of customers • Color propagation – diffuse risk through a network ‒ Embedded subgraph to generate features ‒ Community detection ‒ ML Anomaly detection as feature input into ML models  Results in an AUC improvement of 0.02 points ‒ 10% -25% reduction in false negatives, no change in false positives ‒ Halving of overall alerts generated​
  • 11.
    11 Confidential AI on connecteddata has led to better AML decisions while reducing operational costs • AML system based on: • Rules (prevent egg in face, compliance with regulation) • machine learning running on top of network data to mitigate the deficiencies of a purely rule based system and catch new behaviour rather than known typologies fitting predefined static rules. • Daily AML screening of all processed payments through same system. • AI is the only way to future proof an AML model in real time whilst constantly increasing the value of all data in the ecosystem. Operational scaling
  • 12.
    12 Confidential AI on connecteddata has led to better AML decisions while reducing operational costs Expose known unknowns and increase holistic detection based on connections and enriched data • 50 % accounts closed or escalated to compliance now due to pure AI related AML findings. • 10+ % AI of alerts escalated vs few % for traditional rules. • External requests does not lead to additional tp’s In-house development by cross-functional team has allowed BC to • Learn from already evaluated transaction history. • Be adaptable and continuously identify payment patterns not covered by a static rule set. A data driven approach to AML has allowed BC to highlight fewer non-risky payments, generate less false positives, thus more time to investigate suspicious payments: • Number of payments has increased ~200% ; the number of generated AML alerts has decreased ~30%. • Number of AML alerts leading to compliance handling or account closure have increased ~1300%. Currently screening over 1.000.000 payments / day
  • 13.
    13 Confidential A closer lookat the implementation • A streaming setup based on Azure service bus and Kubernetes • Some data is cached on a daily basis, but most is computed live • Graph nodes and connections are added for each payment • Features are computed on the graph • All computation done in cypher • Limited to methods available Neo4j
  • 14.
    14 Graphs in antimoney laundering investigations Empowering investigations Investigation of known associates and connections ‒ New view onto the same data ‒ Linking diverse data sources in one view, complements “tabular” reports ‒ Natural representation for criminal networks / organized crime ‒ Very effective – some analysis is cut from days to minutes
  • 15.
    15 Investigation tool allowseasy access to full data model Empowering investigations  Investigate individual entities, accounts or transfers  Resulted in open source neographviz python package
  • 16.
    16 Investigation tool allowseasy access to full data model Empowering investigations Investigation of connections
  • 17.
    17 Take away  Adata driven approach to AML has significantly improved BC’s ability to detect potentially suspicious flow  A network representation of our payment data is a core part of this data driven approach ‒ Machine learning models ‒ Manual investigations
  • 18.
    18 Thank you foryour attention Contact: Ruben Menke rum@bankingcircle.com

Editor's Notes

  • #12 Moving from a batch based setup to live scoring of payments and clients
  • #13 Moving from a batch based setup to live scoring of payments and clients
  • #14 Moving from a batch based setup to live scoring of payments and clients