Generalized	B2B	Machine	Learning	
at	Reten4on	Science	
	
	
Andrew	Waage	(andrew@reten*onscience.com)	
Co-founder	/	CTO
2012
Santa Monica
~50 (and hiring!)
AI Powered Marketing Automation Platform
Step 1
Collect Data
Step 2
Generate Predictions
Step 3
Automation Powers Intelligent
Campaigns Across Channels
Ecom / Retail Behavioral Custom Demographic
Email
Campaigns
On-Site
Display
Mobile Call
Center
What kind of Scale?
100’s Clients
210M+ Customers Tracked
1000+ Client-specific Models
2 Billion+
Predictions Daily10K+ Actions / second
“Generalized ML Platform”
What’s the
challenge?
The Challenge
•  Many Clients
•  Dirty Data
•  Sparse Datasets
•  Custom Attributes
•  Various Industries
Clean
PredictionsModel Layer!
C1	 C2	 C3	 C4
What Kind of Predictions?
Purchase Probability: High-likelihood
Lifecycle Stage: Ready to Buy
Churn Time: 300 days
Customer Future Value: $925
Contact Frequency: Every 3 days
Optimal Time to Engage: Thursday 7-9PM
Optimal Incentive/Discount: Dollars Off
Product Recommendations: Based on interest
Optimal Subject Line: Individual preference
Optimal Template: Individual preference
Our Approach & Learnings
1.  Robust Ingestion Pipeline
2.  Common Feature Engineering Layer
3.  “Plug-in” Architecture for Models
4.  Evaluation / AB Testing
5.  Robust Monitoring & Visualization
1. Robust Ingestion Pipeline
10K+ Actions Per Second
auto-scaling!
auto-scaling
lambdas!
•  Abstraction Layer: Data
Ingestion
•  Do not compromise for
clean data
•  Auto-scaling everywhere
•  High confidence in
upstream data
Flume
Kinesis
2. Common Feature Engineering Layer
•  Abstraction: Feature Layer
•  Allow custom features
•  Handle feature selection
•  Modelers know what to
expect
Raw	Data	
User	Behavior	
Features	
Product	
Features	
User	Sta4c	
Features	
Timing	Model	 CLV	Model	 Recommender
3. Model Plug-in Architecture
C6
C3
C4
C5
C7
• Plug-in Architecture
• Tune model hyper-parameters
• A/B test models per client
C1
C8 C2
Client’s Model Execution Plan
Recommender System
Multi-Layer Personalization:
•  Layer 1: ML / Algorithmic
•  ALS CF, Content-based, Item-Item
•  Layer 2: User-level Domain Logic
•  User-level predictions
(gender, location, shoe sizes)
•  Layer 3: Client-tuned Domain Logic / Controls
•  Rank by profit-margin
•  Increase discovery rate influxer
1. Algorithmic
2. User-Level
Domain Logic
3. Client-Level
Controls
4. Model Evaluation / Fast Feedback
A/B	Framework	
M1	 M2	 M3	
•  Start Simple
•  Collect feedback data
•  Skip long production cycle
•  Unbiased policy
generation is important
M1	
Campaign	Predic4ons
5. Robust Model Monitoring and Visualization
“Sauron”
(LOTR)!
Monitor,
monitor,
monitor!!
Monitor Recs: Distributions, Coverage, Diversity
REF:	hQp://sauron.rsci.co/advanced/sites/181/data_types/31
Monitoring Subject Line Bandit Models
Churn Rates, ROC Curve, Reliability Curve
Our Data Science Stack
Persistence! Pipeline / Process! Viz / Monitor! Code!
Takeaways
1.  Use abstraction layers
-  Clean / common interfaces
2.  Monitor, monitor, monitor
-  Fast feedback
3.  Start simple and keep iterating
Thank You!
andrew@retentionscience.com

Generalized B2B Machine Learning by Andrew Waage

  • 1.
  • 2.
  • 3.
    AI Powered MarketingAutomation Platform Step 1 Collect Data Step 2 Generate Predictions Step 3 Automation Powers Intelligent Campaigns Across Channels Ecom / Retail Behavioral Custom Demographic Email Campaigns On-Site Display Mobile Call Center
  • 4.
    What kind ofScale? 100’s Clients 210M+ Customers Tracked 1000+ Client-specific Models 2 Billion+ Predictions Daily10K+ Actions / second
  • 5.
  • 6.
    The Challenge •  ManyClients •  Dirty Data •  Sparse Datasets •  Custom Attributes •  Various Industries Clean PredictionsModel Layer! C1 C2 C3 C4
  • 7.
    What Kind ofPredictions? Purchase Probability: High-likelihood Lifecycle Stage: Ready to Buy Churn Time: 300 days Customer Future Value: $925 Contact Frequency: Every 3 days Optimal Time to Engage: Thursday 7-9PM Optimal Incentive/Discount: Dollars Off Product Recommendations: Based on interest Optimal Subject Line: Individual preference Optimal Template: Individual preference
  • 8.
    Our Approach &Learnings 1.  Robust Ingestion Pipeline 2.  Common Feature Engineering Layer 3.  “Plug-in” Architecture for Models 4.  Evaluation / AB Testing 5.  Robust Monitoring & Visualization
  • 9.
    1. Robust IngestionPipeline 10K+ Actions Per Second auto-scaling! auto-scaling lambdas! •  Abstraction Layer: Data Ingestion •  Do not compromise for clean data •  Auto-scaling everywhere •  High confidence in upstream data Flume Kinesis
  • 10.
    2. Common FeatureEngineering Layer •  Abstraction: Feature Layer •  Allow custom features •  Handle feature selection •  Modelers know what to expect Raw Data User Behavior Features Product Features User Sta4c Features Timing Model CLV Model Recommender
  • 11.
    3. Model Plug-inArchitecture C6 C3 C4 C5 C7 • Plug-in Architecture • Tune model hyper-parameters • A/B test models per client C1 C8 C2 Client’s Model Execution Plan
  • 12.
    Recommender System Multi-Layer Personalization: • Layer 1: ML / Algorithmic •  ALS CF, Content-based, Item-Item •  Layer 2: User-level Domain Logic •  User-level predictions (gender, location, shoe sizes) •  Layer 3: Client-tuned Domain Logic / Controls •  Rank by profit-margin •  Increase discovery rate influxer 1. Algorithmic 2. User-Level Domain Logic 3. Client-Level Controls
  • 13.
    4. Model Evaluation/ Fast Feedback A/B Framework M1 M2 M3 •  Start Simple •  Collect feedback data •  Skip long production cycle •  Unbiased policy generation is important M1 Campaign Predic4ons
  • 14.
    5. Robust ModelMonitoring and Visualization “Sauron” (LOTR)! Monitor, monitor, monitor!!
  • 15.
    Monitor Recs: Distributions,Coverage, Diversity REF: hQp://sauron.rsci.co/advanced/sites/181/data_types/31
  • 16.
  • 17.
    Churn Rates, ROCCurve, Reliability Curve
  • 18.
    Our Data ScienceStack Persistence! Pipeline / Process! Viz / Monitor! Code!
  • 19.
    Takeaways 1.  Use abstractionlayers -  Clean / common interfaces 2.  Monitor, monitor, monitor -  Fast feedback 3.  Start simple and keep iterating
  • 20.