ML & Graph algorithms to prevent financial crime in digital payments
This document discusses using machine learning and graph algorithms to prevent financial crime in digital payments. It presents a three level approach: Level 0 uses rule-based SQL queries to detect anomalies, Level 1 applies supervised machine learning to classify transactions, and Level 2 uses a graph database and rules to model network anomalies. Level 3 combines machine learning, graph algorithms, and personalized page rank to spread anomaly scores throughout a transaction network to identify suspicious groups. The strategies are being piloted through the Infinitech Project to develop technologies for applications in financial crime prevention, cybersecurity, and personalized products using AI, big data, IoT, and blockchain.
ML & Graph algorithms to prevent financial crime in digital payments
1.
This content isclassified as Internal
ML & Graph
algorithms to prevent
financial crime in
digital payments
Alberto Danese, Head of Data
Science
Paolo Testa, Lead Data Scientist
2.
This content isclassified as Internal
2
Next
generation
payments
Next to
Customers
3.
This content isclassified as Internal
3
End-to-end payment solutions for Financial Institutions, Merchants and Consumers
4.
This content isclassified as Internal
4
CAPABILITIES
MANAGING
SERVING
SCALE
>10.500
people
>3.000
Product & Tech
Development
Specialists
~€300mln
Annual Total IT &
Innovation
Spending
~170mln
Cards
~2.2mln
Merchants
#1
Merchant
acquirer
by number of merchants
and transaction value
#1
Card
processor
by number of cards
and transaction
volume
>1k
Top financial
Institutions
~15 bn
ACH trx
>25
Countries
HQ
The Leading PayTech, European by scale, Local by nature
5.
This content isclassified as Internal
5
Select * from Nexi where dept like ‘DATA%’
~40 internal people (in
the hub - Data Area)
including 10 in Data
Science Team
5+ spokes within the
business lines
Support Business
Development
Fight Financial
Crime
Evolve Tech and
Develop Special
Projects
90% projects on public
cloud
Product-oriented
mindset with end-to-end
ownership
6.
This content isclassified as Internal
6
Work with us!
Check on Linkedin jobs for Nexi Italy, Nexi Group and Nexi Digital
Currently, you may be interested in:
• Junior Data Scientist – Fraud Intelligence Specialist (Nexi Italy)
• Data Engineer (Nexi Digital)
7.
This content isclassified as Internal
7
ML & Graph algorithms to prevent
financial crime in digital payments
8.
This content isclassified as Internal
8
INFINITECH
PROJECT
ML & GRAPH
APPLICATION
FINANCIAL CRIME IN
DIGITAL PAYMENTS
Agenda - ML & Graph algorithms to prevent financial crime in digital payments
01
What are the main
challenges related
fo financial crime?
What is Anti-
money laundering
(AML)?
02
How can ML and
Graph help AML
ot optimize and
improove their
daily activity?
03
What is the
Infinitech Project
and you can
access to it?
9.
This content isclassified as Internal
9
Financial crime challenges
• Financial crime is one of the main challenges that
banks, and financial institutions must face,
leading to :
• Financial penalties
• Reputational Damage
• Lose customer trust
• While security measure are more and more advanced,
fraudsters and organized crime keep on evolving as well
• Two types of criminal activity:
• Payments Fraud → the cardholder is victim of
fraudulent organization.
• Money Laundering → the cardholder is involved in
criminal activity, using payments tools.
In 2020 global banks were hit with $10.4bn in fines for
money-laundering violations, an increase of more than
80% on 2019
10.
This content isclassified as Internal
10
Anti-Money Laundering (AML)
• Money Laundering happens when individuals
attempt to hide profit gained from criminal
means
• Money can be originated by gambling, drug
trafficking, and other illegal activities
• Nexi AML aim is to put risk procedures to
recognize and stop criminal activities using
Nexi products (cards or POS).
Money laundering in digital payments AML Transaction Monitoring workflow
1. Suspicious payments
transaction Alert is
generated
2. The AML investigators
analyze the data related
to the alert
3. The investigator evaluates
whether to notify to
central authority (Bank of
Italy informative financial
unit)
the suspicious case
11.
This content isclassified as Internal
11
Anti-Money Laundering Challenges
“Human in the Loop” is a must
We can’t fully automize and
optimize the transaction
monitoring, since evaluation
of AML investigator is a
requirement.
But we can optimize the
volume of alerts, that is, the
suspicious transactions to
analyze
# of analysis of
payments
transactions
time & cost
effectiveness
FP – FN trade off
FP
Risk of sanctions
due to missed notification
You can’t analyze
each single transactions
FN FP FN
12.
This content isclassified as Internal
12
INFINITECH
PROJECT
ML & GRAPH
APPLICATION
FINANCIAL CRIME IN
DIGITAL PAYMENTS
Agenda - ML & Graph algorithms to prevent financial crime in digital payments
01
What are the main
challenges related
fo financial crime?
What is Anti-
money laundering
(AML)?
02
How can ML and
Graph help AML
ot optimize and
improove their
daily activity?
03
What is the
Infinitech Project
and you can
access to it?
13.
This content isclassified as Internal
13
Level 0 – rule base approach
Data
Lake
Amazon
S3
SQL-based
Anomaly
Engine
Amazon
Athena
Reporting
Layer
Microsoft
PowerBI
SELECT user_id
FROM transactions
WHERE
metric_1 > THR1
and metric_2 > THR2
PROS
Simplicity:
each alert is triggered by an SQL
Explainability :
a SQL is rule shared with AML
CONS
Scale:
high transact. volume, +300 rules,
generating large amount of alerts,
leading to high False Positive rate
Group:
relational dbs are not the best
solutions for network modelling
Highlights
Architecture
14.
This content isclassified as Internal
14
Level 01 - Supervised ML
The first approach is to train a ML
model to classify anomalous cases
based on notification pattern (target)
Thanks to this, we avoid to set manually
+500 rules; we just metrics of those
rule as features in the ML model.
The output of the model will be
probability score of a customer to be
notified due to suspicious transactions
AML investigators can focus only on
tuning only 1 threshold (the prob.
Score) and can be used as rank metric
User id Feature 01 Feature 02 … Feature P Target
373429 12.000 € 23% 1 1
598492 1.000 € 10% 0 0
… … … .. .. ..
598492 160.000 € 37% 1 0
Level 01 Description
15.
This content isclassified as Internal
15
Level 01 – Deep dive
Data
Lake
Amazon
S3
process
AML
Feature
Store
prepare
train
test
score
datasets
Amazon
SageMaker
train
Score
&
SHAP ML score
Back-end
Front-end
report
Pipeline
Orchestration
AWS
Step
Functions
SERVERLESS & FULLY MANAGED STACK
Amazon
Glue
Model
Registry
Experiment
Registry
1
2
3
Serverless & Fully Managed ML
architecture
AML FS is part of Nexi FS.
Great_expectations and TDD
guarantees us data quality
4
Data Centric AI
SageMaker transaparently store
train jobs artifacts and
experiemnts in dedicated registry
16.
This content isclassified as Internal
16
Level 01 – Highlights
ML score
Shap values ordered features
Top 10000 risk
Notified
17.
This content isclassified as Internal
17
Level 01 – Highlights
• The output is a probability score;
1 threshold to fine tune the results
• SHAP values guarantees explainability
PROS
CONS
• The score is associated to a single
independent user, NOT in a
Network
• The ML model learn only from
notification of AML investigators.
18.
This content isclassified as Internal
18
send
money
Level 02 – Graph rule based approach
0.8 0.3
Org.
0.4
user 01 user 02
user 05
user 03
send
money
. . .
user N
send
money
0.93
0.2
user 06
0.3
• Network anomalies are hard to model and to check
with relational database
• Graph databases, like Neo4j, are the best choice when
your analytical focus is link between peers, that is a
network.
• Thanks to graph data structure several anomaly
detection rules can be built to fight groups of
suspicious transactions or users.
MATCH
(u:User)-[:send_money*]->(v:User)
WHERE
u.score > 0.8 and v.score > 0.8
Level 02 Description
19.
This content isclassified as Internal
19
Level 02 – Graph rule based approach
Data
Lake
ingest
AML
Feature
Store
Amazon
S3
Amazon
Glue
EC2
load
Graph
anomalies
Back-end
has
account
Front-end
query
process
DATA MODEL
1
Neo4j CE on EC2 (not managed)
2
Neo4J Spark connector
3
MATCH
(u:User)-[:send_money*]->(v:User)
WHERE
u.score > 0.8 and v.score > 0.8
20.
This content isclassified as Internal
20
Level 02 – Graph rule based approach
• Now network anomaly scenarios are
now addressed
• Graph model are easy to explain and
straightforward in rule settings.
PROS
CONS
• We lose the scale factor given by
automation of ML model.
It’s a manual rule based apporach
made with Cypher queries,
instead of SQL
21.
This content isclassified as Internal
21
Level 03 – ML and Graph
USER
ML
SCORE
RANK
user 03 0.93 1
user 01 0.8 2
user 05 0.4 3
user 02 0.3 4
user 04 0.2 5
ML anomaly only consider single
users.
If a user within a network with
high risk, then we want to spread
this information to this user.
USER
ML
SCORE
NEW
RANK
ML
RANK
user 03 0.93 1 1
user 01 0.8 2 2
user 02 0.3 3 (+1) 4
user 04 0.2 4 (+1) 5
user 05 0.4 5 (-2) 3
What we need is an algorithm
that spread ML score information
to within a given network
topology to adjust score and find
group of anomalous case.
Level 03 Description
22.
This content isclassified as Internal
22
Page Rank (PR) is an iterative
algorithm originally used by
Google to rank web pages (nodes)
according to the quality and
quantity of links (edges) pointing to
the pages.
Personalized Page Rank
𝑥′ = (1 − α) 𝑃′𝑥 + α ν
𝑥 = importance rank
ν = intrinsic centrality
P = adjacency matrix
We use ML score as intrinsic
centrality (v) to spread anomaly
information within the network,
with the aim to find groups of
suspicious individuals.
personalized PR
page-rank
23.
This content isclassified as Internal
23
Level 03 – Architecture & Contribution
Data
Lake
AML
Feature
Store
ML
anoamly
Score
ingest
PPR
Back-end
load
Front-end
Apply
PPR
1
Level01 ML anomaly score is
loaded into
Graph db as node attribute
2
Customized PPR is applied
as a Cypher function over
the graph database
3
PowerBI report
show PPR score
24.
This content isclassified as Internal
24
Level 03 - Highlights
• Scale factor is addressed, since PPR
run is operationalized.
• The PPR rank combines the best of
Level 01 (ML) and Level 02 (graph
algorithms) approaches
• As for Level 02, graph and PPR is
highly interpretable and
explainable
PPR RANK ML RANK ML SCORE
3 4859 0.93
3 neighbors
25.
This content isclassified as Internal
25
Summary
Level 0 – SQL rule based approach
Level 1 – Supervised ML
Level 2 – Graph rule based approach
Level 3 – ML and Graph based approach
INFINITECH
INFINITECH
26.
This content isclassified as Internal
26
INFINITECH PROJECT
Flagship initiative in Digital Finance
within ICT Horizon 2020 Programme
Project deliverables:
• Marketplace of
AI/IoT/Blockchain/Cybersecurity
technologies (algorithms, real-time
analytics, blockchain based data
sharing)
• Testbeds and sandbox for fintech
players to test and deploy their own
solutions
• Regulatory tools to helps company
to facilitate compliance to
regulations
27.
This content isclassified as Internal
27
INFINITECH PILOTS
Nexi Payments is one among 15
organizations, commercial banks,
central banks, startup, insurance
firms, who joined Infinitech
project as pilot.
Fintech/Insuretech application
fields:
• Financial crime
• Cybersecurity
• Personalized products
Technologies:
• #AI
• #BigData
• #IoT
• #Blockchain
28.
This content isclassified as Internal
28
INFINITECH OUTPUT
https://pilot16.infinitech-h2020.eu/ https://gitlab.infinitech-h2020.eu/pilot16/
aml-graph-payments-anomaly-detection
Two are outputs in the Infinitech
Testbed/Sandbox:
1. PILOT DEMO WEB-APP
Web app to show algorithms ( both
anomaly detection and PPR) with
third part or synthetic generated
data
2. ANOMALY ENGINE w/ JUPYTER
Git repository with docker
containers to use pilot graph
algorithms on third-part data in a
Jupyter notebook environment
29.
This content isclassified as Internal
29
1. PILOT DEMO WEB-APP
https://pilot16.infinitech-h2020.eu/
Web app to show algorithms ( both anomaly detection and PPR) with third part or synthetic generated data
30.
This content isclassified as Internal
30
2. ANOMALY ENGINE w/ JUPYYER
Git repository with docker containers to use pilot graph algorithms on third-part data in a Jupyter notebook
environment
https://gitlab.infinitech-h2020.eu/pilot16/aml-graph-payments-anomaly-detection
31.
This content isclassified as Internal
31
Pubblications and Contributions
32.
This content isclassified as Internal
32
Wrap up and special thanks
What did we learn along the way?
• Rule based approaches are still
present and widely adopted in 2023:
that’s fine! But ML is a sound
approach when dealing with hundreds
of rules
• Human in the loop is key: optimizing
the work of an analyst may be even
better than a fully automated
approach (in some scenarios)
• Network is a natural representation
for some phenomena, like digital
payments: a good data scientist should
be able to find the best combination
of ML & graph data science
• EU projects are a great opportunity!
33.
This content isclassified as Internal
33
Wrap up and special thanks
It’s been a team-work with multiple people
giving support in a way of another!
Thank you to:
• Fabio Dezi, Luca Latella and Luca Rinaldi –
and the always supporting DST
• Alfredo Fomitchenko and Vittorio Giatti
• Laura Arditti, Alberto De Lazzari and the
LARUS team
• Infinitech organization and GFT
SPECIAL THANK YOU TO…
What did we learn along the way?
• Rule based approaches are still
present and widely adopted in 2023:
that’s fine! But ML is a sound
approach when dealing with hundreds
of rules
• Human in the loop is key: optimizing
the work of an analyst may be even
better than a fully automated
approach (in some scenarios)
• Network is a natural representation
for some phenomena, like digital
payments: a good data scientist should
be able to find the best combination
of ML & graph data science
• EU projects are a great opportunity!