Machine learning in production

Machine Learning in
Production
Krishna Sridhar (@krishna_srd)
Data Scientist, Dato Inc.
1

About Me
• Background
- Machine Learning (ML) Research.
- Ph.D Numerical Optimization @Wisconsin
• Now
- Build ML tools for data-scientists & developers @Dato.
- Help deploy ML algorithms.
@krishna_srd, @DatoInc
2

Overview
• Lots of fundamental problems to tackle.
• Blend of statistics, applied-ML, and software engineering.
• The space is new, so lots of room for innovation!
• Understanding production helps make better modeling
decisions.
3
ML
+

Why production?
6
Make your predictions available to everyone.
Share
Measure quality of the predictions over time.
Review
Improve prediction quality with feedback.
React

ML in Production - 101
Creation Production
7
Historical
Data
Trained
Model
Deployed
Model
Live
Data
Predictions

What is Production?
Evaluation
Management Monitoring
Deployment
Making model predictions easily available.
Measuring quality of deployed models.
Tracking model quality over time.
Improving deployed models with feedback.
8

What is Production?
Evaluation
Monitoring
Deployment
Management
9

What is Deployment?
Evaluation
Monitoring
Deployment
Management
11

ML in Production - 101
12
Trained
Model
Deployed
Model
ProductionCreation
Historical
Data
Live
Data
Predictions

What are we deploying?
13
def predict(data):
data[‘is_good’] = data[‘rating’] > 3
return model.predict(data)
Advantages
• Flexibility: No need for complicated abstractions.
• Software deployment is a very mature field.
• Rapid model updating with continuous deployments.
Treat model deployment the same was as code deployment!

def predict(data):
def predict(data) : double = {
}
predict <- function(data):
data$is_good = data$rating > 3
return predict(model, data)
14

What’s the challenge?
Wallofconfusion
Beat baseline by 15%.
Time to deploy!
What the **** is alpha,
and beta.
Data Scientists Deployment Engineers
15

What’s the solution?
Beat baseline by 15%.
Time to deploy!
Beat baseline by 15%!
16
Data Scientists Deployment Engineers

Deploying ML: Requirements
1. Ease of integration.
- Any code, any language.
2. Low latency predictions.
- Cache frequent predictions.
3. Fault Tolerant.
- Replicate models, run on many machines.
4. Scalable.
- Elastically scale nodes up or down.
5. Maintainable.
- Easily update with newer models.
18

Deploying ML
Model Prediction
Cache
Web Service
Node 1
Model Prediction
Cache
Web Service
Node 3
Load Balancer
Model Prediction
Cache
Web Service
Node 2
Client
19

What is Evaluation?
Evaluation
Monitoring
Deployment
Management
21

What is Evaluation?
22
Predictions Metric
+
Evaluation

Which metric?
Model evaluation metric != business metric
Precision-Recall, DCG,
NDCG
User engagement,
click through rate
Track both ML and business metrics to see if they correlate!
23

Evaluating Models
24
Historical
Data
Live
Data
PredictionsTrained
Model
Deployed
Model
Ofﬂine Evaluation
Online Evaluation

Monitoring & Management?
Evaluation
Monitoring
Deployment
Management
26

Monitoring & Management?
Tracking metrics over time and reacting to
feedback from deployed models.
MonitoringManagement

Monitoring & Management
28
Historical
Data
Live
Data
PredictionsTrained
Model
Deployed
Model
Feedback

Monitoring & Management
Important for software engineering
- Versioning.
- Logging.
- Provenance.
- Dashboards.
- Reports.
Interesting for applied-ML researchers
- Updating models.
29

Updating models
When to update?
• Trends and user taste changes over time.
- I liked R in the past, but now I like Python!
- Tip: Track statistics about the data over time
• Model performance drops.
- CTR was down 20% last month.
- Tip: Monitor both offline and online metric, track correlation!
How to update?
• A/B Testing
• Multi-armed bandits
30

A/B testing
Is model V2 significantly better than model V1?
2000 visits
10% CTR
2000 visits
30% CTR
Model V2
Model V1
31
Be really careful with A/B testing.
B
A
World gets V2

Multi-armed Bandits
32
2000 visits
10% CTR
2000 visits
30% CTR
Model V2
Model V1
B
A
World gets V2
10% of the time
Exploration
90% of the time
Exploitation
36k visits
30% CTR

MAB vs A/B Testing
Why MAB?
• “Set and forget approach” for continuous optimization.
• Minimize your losses.
• Good MAB algorithms converge very quickly!
Why A/B testing?
• Easy and quick to set up!
• Answer relevant business questions.
• Sometimes, it could take a while before you observe results.

Conclusion
@krishna_srd, @DatoInc
• ML in production can be fun! Lots of new challenges in
deployment, evaluation, monitoring, and management.
• Summary of tips:
- Try to run the same code in modeling & deployment mode.
- Business metric != Model metric
- Monitor offline and online behavior, track their correlation.
- Be really careful with A/B testing.
- Minimize your losses with multi-armed bandits!
35

Thanks!
Download
pip install graphlab-create
Docs
https://dato.com/learn/
Source
https://github.com/dato-code/tutorials

When/how to evaluate ML
• Offline evaluation
- Evaluate on historical labeled data.
- Make sure you evaluate on a test set!
• Online evaluation
- A/B testing – split off a portion of incoming requests (B)
to evaluate new deployment, use the rest as control
group (A).
39

ML Deployment - 2
Prototype
model
Historical
data
Deployed
model
Predictions
New
request
Online
adaptive
model
40

Online Learning
• Benefits
- Computationally faster and more efficient.
- Deployment and training are the same!
• Key Challenges
- How do we maintain distributed state?
- Do standard algorithms need to change in order to be more deployment
friendly?
- How much should the model “forget”.
- Tricky to evaluate.
• Simple Ideas that work.
- Splitting the model space so the state of each model can lie in a single
machine.
41

A/B testing
I’m a happy
Gaussian
I’m another
happy Gaussian
Click-through rate
Variance A
Variance B
42

Running an A/B test
As easy as alpha, beta, gamma, delta.
• Procedure
- Pick significance level α.
- Compute the test statistic.
- Compute p-value (probability of test statistic under the null
hypothesis).
- Reject the null hypothesis if p-value is less than α.
43

How long to run the test?
• Run the test until you see a significant difference?
- Wrong! Don’t do this.
• Statistical tests directly control for false positive rate (significance)
- With probability 1-α, Population 1 is different from Population 0
• The statistical power of a test controls for the false negative rate
- How many observations do I need to discern a difference of δ between
the means with power 0.8 and significance 0.05?
• Determine how many observations you need before you start the test
- Pick the power β, significance α, and magnitude of difference δ
- Calculate n, the number of observations needed
- Don’t stop the test until you’ve made this many observations.
44

Separation of experiences
How well did you split off group B?
Homepage New
homepage
Second page Second page
BA
Button Button Button Button
45

Separation of experiences
How well did you split off group B?
Homepage New
homepage
Second page Second page
BA
Button Button Button Button
Unclean separation of experiences!
46

Shock of newness
• People hate change
• Why is my button now blue??
• Wait until the “shock of newness” wears off, then measure
• Some population of users are forever wedded to old ways
• Consider obtaining a fresh population
Click-through rate
The shock of
newness
t0
47

Deploying ML: Requirements
1. Ease of integration.
- Any code, any language.
2. Low latency predictions.
- Cache frequent predictions.
3. Fault Tolerant.
- Replicate models, run on many machines.
4. Scalable.
- Elastically scale nodes up or down.
5. Maintainable.
- Easily update with newer models.
48

def predict(data):
def predict(data) : double = {
}
predict <- function(data):
data$is_good = data$rating > 3
return predict(model, data)
49

Deploying ML
Model Prediction
Cache
Web Service
Node 1
Model Prediction
Cache
Web Service
Node 3
Load Balancer
Model Prediction
Cache
Web Service
Node 2
Client
50

Machine learning in production

More Related Content

What's hot

Viewers also liked

Similar to Machine learning in production

More from Turi, Inc.

Recently uploaded

Machine learning in production