Lessons Learned from Building Machine Learning Software at Netflix

1
Lessons Learned from
Building Machine Learning Software at Netflix
Justin Basilico
Page Algorithms Engineering December 13, 2014
@JustinBasilico
Workshop 2014

4
Netflix Scale
 > 50M members
 > 40 countries
 > 1000 device types
 Hours: > 2B/month
 Plays: > 70M/day
 Log 100B events/day
 34.2% of peak US
downstream traffic

5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention

6
Everything is a Recommendation
Rows
Ranking
Over 75% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning

7
Machine Learning Approach
Problem
Data
Metrics
Model Algorithm

8
Models & Algorithms
 Regression (Linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …

9
Design Considerations
Recommendations
• Personal
• Accurate
• Diverse
• Novel
• Fresh
Software
• Scalable
• Responsive
• Resilient
• Efficient
• Flexible

10
Software Stack
http://techblog.netflix.com

12
Lesson 1:
Be flexible about where and when
computation happens.

13
System Architecture
 Offline: Process data
 Nearline: Process events
 Online: Process requests
 Learning, Features, or Model
evaluation can be done at any
level
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog

14
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm like
SGD to produce factorization
 Publish video factors
 Nearline:
 Solve user factors
 Compute user-video dot products
 Store scores in cache
 Online:
 Presentation-context filtering
 Serve recommendations
Netflix.Hermes
Netflix.Manhattan
X≈UVt
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
V
sij=uivj Aui=b
sij
X
sij>t

15
Lesson 2:
Think about distribution starting from the
outermost levels.

16
Three levels of Learning Distribution/Parallelization
1. For each subset of the population (e.g.
region)
 Want independently trained and tuned models
2. For each combination of (hyper)parameters
 Simple: Grid search
 Better: Bayesian optimization using Gaussian
Processes
3. For each subset of the training data
 Distribute over machines (e.g. ADMM)
 Multi-core parallelism (e.g. HogWild)
 Or… use GPUs

17
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Spearmint or MOE for parameter
optimization
 Condor, StarCluster, Mesos, etc. for
coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs

18
Lesson 3:
Design application software for
experimentation.

19
Example development process
Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Data
discrepancies
Missing post-processing
logic
Performance
issues
Actual
output
Experimentation environment
Production environment
(A/B test) Code
discrepancies
Final
model

20
Avoid dual implementations
Shared Engine
Experiment
code
Production
code
Experiment Production • Models
• Features
• Algorithms
• …

21
Solution: Share and lean towards production
 Developing machine learning is an iterative process
 Want a short pipeline to rapidly try ideas
 Want to see output of complete system, not just learned component
 Make application components easy to experiment with
 Share them between online, nearline, and offline
 Make it possible to run individual parts of the software
 Use the real code whenever possible
 Have well-defined interfaces and formats to allow you to go
off-the-beaten path

22
Lesson 4:
Make algorithms extensible and modular.

23
Make algorithms and models extensible and modular
 Algorithms often need to be tailored for a
specific application
 Treating an algorithm as a black box is
limiting
 Better to make algorithms extensible and
modular to allow for customization
 Separate models and algorithms
 Many algorithms can learn the same model
(i.e. linear binary classifier)
 Many algorithms can be trained on the same
types of data
 Support composing algorithms
Data
Parameters
Data
Model
Parameters
Model
Algorithm
Vs.

24
Provide building blocks
 Don’t start from scratch
 Linear algebra: Vectors, Matrices, …
 Statistics: Distributions, tests, …
 Models, features, metrics, ensembles, …
 Cost, distance, kernel, … functions
 Optimization, inference, …
 Layers, activation functions, …
 Initializers, stopping criteria, …
 …
 Domain-specific components
Build abstractions on
familiar concepts
Make the software put
them together

25
Example: Tailoring Random Forests
Use a custom
tree split
Customize to
run it for an
hour
Report a
custom metric
each iteration
Inspect the
ensemble
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry

26
Lesson 5:
Describe your input and output
transformations with your model.

27
Putting learning in an application
Application
Application or model code?
Feature
Encoding
Output
Decoding
? Machine
Learned Model
Rd ⟶ Rk

28
Example: Simple ranking system
 High-level API: List<Video> rank(User u, List<Video> videos)
 Example model description file:
{
“type”: “ScoringRanker”,
“scorer”: {
“type”: “FeatureScorer”,
“features”: [
{“type”: “Popularity”, “days”: 10},
{“type”: “PredictedRating”}
],
“function”: {
“type”: “Linear”,
“bias”: -0.5,
“weights”: {
“popularity”: 0.2,
“predictedRating”: 1.2,
“predictedRating*popularity”:
3.5
}
}
}
}
Ranker
Scorer
Features
Linear function
Feature transformations

29
Lesson 6:
Don’t just rely on metrics for testing.

30
Importance of Testing
 Temptation: Use validation metrics to test software
 When things work this seems great
 When metrics don’t improve: was it the code, data, metric, idea, …?
 Machine learning code involves intricate math and logic
 Rounding issues, corner cases, …
 Is that a + or -? (The math or paper could be wrong.)
 Solution: Unit test
 Testing of metric code is especially important
 Test the whole system
 Compare output for unexpected changes across versions

32
Two ways to solve computational problems
Know
solution
Write code
Compile
code
Test code Deploy code
Know
relevant
data
Develop
algorithmic
approach
Train model
on data using
algorithm
Validate
model with
metrics
Deploy
model
Software Development
Machine Learning
(steps may involve Software Development)

33
Take-aways for building machine learning software
 Building machine learning is an iterative process
 Make experimentation easy
 Take a holistic view of both the application and experimental
environments
 Optimize only what matters
 Testing can be hard but is worthwhile

Thank You Justin Basilico
jbasilico@netflix.com
34 @JustinBasilico
We’re hiring

Lessons Learned from Building Machine Learning Software at Netflix

In this document