Machine Learning In Production

Running
Machine Learning Applications
In Production
Sam BESSALAH
@samklr

Might Works well for KAGGLE!
But Kaggle isn’t real world Machine
learning!

In Real Life
- Trade off : Accuracy vs Interpretability cs Speed vs Infrastructure contraints
- Interpretability and Speed often beats accuracy
- Most of the time Kaggle is a feature engineering contest
- Contest oriented vs Real Product Impact

But in real life … Things are less obvious
Data Engineers
Data Pipeline
Data Scientists
/ ML Engineers
APP
Applications
Developers

But in real life … Things are less obvious
Data Engineers
Data Pipeline
Data Scientists
/ ML Engineers
APP
Applications
Developers
Innovation is often (wrongly ?) thought to be here ...

http://www.slideshare.net/jssm1th/an-architecture-for-agile-machine-learning-in-realtime-applications

Production Requirements :
- Flexibility and agility
- Scalability and Performance
- Enable Real time decision making, sometimes at huge QPS at subseconds
pace.
- Security

Machine Learning as a Software Problem
- Most ML developement patterns lead to software design anti patterns
- Dependencies in code, creeps through Models dependencies in Data
- Wasteful use of data, since most ml model selection require multiple version
of data. Hence the instability of data, and of prediction services
- Breaks system isolation, leading to un-maintainable stacks

In Production, Machine Learning is a
Software and System Problem.
Treat it accordingly !!!!

Deployment / Model Serving
The Missing Part in ML

- Model Serving is often ignored or left out to Back End Engineers to implement
at their own liking.
- More often it involves serving an API or a Service to do the Predict function.
But that not often enough.
- Software scaling can become problematic to the accuracy of the model.
- How many models are you serving?
- Are you running something else ?
- Are you updating your model in real time?

- Trained Models are stored in PMML files
- They serve their models via Openscoring

PMML?
- Might be the solution for some (most ?) cases
- Support many models, but lacks support for many others
- Fails to capture the evolution of your modeling process … Transformations, re
encoding, etc .
- Better suited for exporting models to other systems, rather than being served
to machine learning products with real user facing.
- And … XML ?? Really????

Model Versioning - Packaging
- You usually don’t serve only one model. But a lot more. Especially when
running experiments.
- You should vie to package your model in versionned way.
- Git is awesome, but not appropriate for live model serving
- Build a model repository or a model index
- I usually use fast KV store or advanced data stores to save my models
- Build a service to manage your models (Model Manager) responsible for
evaluating and updating your model.

Serialization
- Remember PMML ?
- In Big Data, data has schema and proper evolution?
- Why not models ?
- Lots to choose from : Protobuf, Avro
- Use binary schema to represent and version your models

Evaluation
- Business metrics often differ from core model metrics : Trade off between
long term metrics and short term metrics.
- Hyperparameters
- A/B Testing - Multi Armed Bandits Problem

A/B Testing - Multi-armed Bandit

A/B Testing - Multi-armed Bandit
Dataiku

Reproducibility
- How to keep track of data used for training ?
- Are notebooks enough?
- Junpyter Notebooks, Spark Notebooks, Zeppelin, etc ….
- Need for an end to end solution. Not perfect, but a workable one.

I forgot many things
- Monitoring
- Pipeline tuning (one model is often fed to another one)
- RPC over REST for fast model serving ?
- How to deal with heterogeneous systems ?
- Do you really have to distribute your processing?
- Is more data better than smartly tuned algorithms?

Machine Learning In Production

Machine Learning In Production

More Related Content

What's hot

Viewers also liked

Similar to Machine Learning In Production

More from Samir Bessalah

Recently uploaded

Machine Learning In Production