Paris ML meetup

Machine Learning @ Netflix
(and some lessons learned)
Yves Raimond (@moustaki)
Research/Engineering Manager
Search & Recommendations
Algorithm Engineering

Netflix scale
● > 69M members
● > 50 countries
● > 1000 device types
● > 3B hours/month
● 36% of peak US downstream traffic

Recommendations @ Netflix
● Goal: Help members find content
to watch and enjoy to maximize
satisfaction and retention
● Over 80% of what people watch
comes from our recommendations
● Top Picks, Because you Watched,
Trending Now, Row Ordering,
Evidence, Search, Search
Recommendations, Personalized
Genre Rows, ...

▪ Regression (Linear, logistic, elastic net)
▪ SVD and other Matrix Factorizations
▪ Factorization Machines
▪ Restricted Boltzmann Machines
▪ Deep Neural Networks
▪ Markov Models and Graph Algorithms
▪ Clustering
▪ Latent Dirichlet Allocation
▪ Gradient Boosted Decision Trees/Random Forests
▪ Gaussian Processes
▪ …
Models & Algorithms

Build the offline experimentation
framework first

When tackling a new problem
● What offline metrics can we compute that capture what online improvements we’
re actually trying to achieve?
● How should the input data to that evaluation be constructed (train, validation,
test)?
● How fast and easy is it to run a full cycle of offline experimentations?
○ Minimize time to first metric
● How replicable is the evaluation? How shareable are the results?
○ Provenance (see Dagobah)
○ Notebooks (see Jupyter, Zeppelin, Spark Notebook)

When tackling an old problem
● Same…
○ Were the metrics designed when first running experimentation in that space still appropriate now?

Think about distribution from the
outermost layers

1. For each combination of hyper-parameter
(e.g. grid search, random search, gaussian processes…)
2. For each subset of the training data
a. Multi-core learning (e.g. HogWild)
b. Distributed learning (e.g. ADMM, distributed L-BFGS, …)

When to use distributed learning?
● The impact of communication overhead when building distributed ML
algorithms is non-trivial
● Is your data big enough that the distribution offsets the communication overhead?

Example: Uncollapsed Gibbs sampler for LDA
(more details here)

Design production code to be
experimentation-friendly

Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Missing post-
processing logic
Performance
issues
Actual
outputProduction environment
(A/B test) Code
discrepancies
Final
model
Data
discrepancies
Example development process

Avoid dual implementations
Shared Engine
Experiment
code
Production
code
ProductionExperiment

We’re hiring!
Yves Raimond (@moustaki)

Paris ML meetup

More Related Content

What's hot

Viewers also liked

Similar to Paris ML meetup

More from Yves Raimond

Recently uploaded

In this document

Paris ML meetup