Working with Machine Learning in Production

Machine
Learning in
Production

Hello!
I am Frederick Apina
I am here because I love to
give presentations.
You can email me at:
fred.apina@gmail.com
2

“
Deploying deep learning models in
production can be challenging, as
it is far beyond training models
with good performance.
3

Fun Fact
85% of AI
Projects fail.
5
��

Potential reasons include:
◎ Technically infeasible or poorly scoped
◎ Never make the leap to production
◎ Unclear success criteria (metrics)
◎ Poor team management
6

“
This talk aims to be an engineering
guideline for building
production-level machine learning
systems which will be deployed in
real world applications.
7

Important Note:
It is important to understand state of the art in your
domain:
Why?
◎ Helps to understand what is possible
◎ Helps to know what to try next
10

Important factors to consider when deﬁning and
prioritizing ML projects:
High Impact
◎ Complex parts of your pipeline
◎ Where "cheap prediction" is
valuable
◎ Where automating complicated
manual process is valuable
Low Cost
◎ Cost is driven by:
○ Data availability
○ Performance requirements:
costs tend to scale super-linearly
in the accuracy requirement
○ Problem diﬀiculty
11

2.1 Data Sources
◎ Supervised deep learning requires a lot of labeled data
◎ Labeling own data is costly!
◎ Here are some resources for data:
○ Open source data (good to start with, but not an
advantage)
○ Data augmentation (a MUST for computer vision, an
option for NLP)
○ Synthetic data (almost always worth starting with, esp.
in NLP)
○
○ 15

2.2 Data Labeling
◎ Requires: separate software stack (labeling platforms),
temporary labor, and QC
◎ Sources of labor for labeling:
○ Crowdsourcing (Mechanical Turk): cheap and scalable,
less reliable, needs QC
○ Hiring own annotators: less QC needed, expensive,
slow to scale
○ Data labeling service companies
◎ Labeling platforms
16

2.3 Data Storage
◎ Data storage options
○ Object store: Store binary data (images, sound files,
compressed texts)
○ Database: Store metadata (file paths, labels, user
activity, etc).
○ Data Lake: to aggregate features which are not
obtainable from database (e.g. logs)
○ Feature Store: store, access, and share machine
learning features
◎ Suggestion: At training time, copy data into a local or
networked filesystem (NFS)
17

2.4 Data Versioning
◎ It's a "MUST" for deployed ML models:
Deployed ML models are part code, part data. No data
versioning means no model versioning.
◎ Data versioning platforms
18

2.5 Data Processing
◎ Training data for production models may come from
diﬀerent sources.
◎ There are dependencies between tasks, each needs to be
kicked oﬀ after its dependencies are finished.
◎ Makefiles are not scalable. ʻWorkflow managerʼs become
pretty essential in this regard.
◎ Workflow orchestration
19

3.
Development, Training
and Evaluation
20

3.1 Software Engineering
◎ Winner language: Python
◎ Editors:
○ VS Code, Pycharm
○ Notebooks -> Jupyter notebook, JupyterLab, nteract
○ Streamlit: Interactive data science tool with applets
◎ Compute recommendations
○ For individuals or startups: Use GPU PC or buy shared
servers or use cloud instances
○ For large companies: Use cloud instances with proper
provisioning and handling of failures
21

3.2 Resource Management
◎ Allocating free resources to programs
◎ Resources management options:
○ Old school cluster job scheduler
○ Docker + Kubernetes
○ Kubeflow
○ Polyaxon (paid features)
22

3.4 Experiment management
◎ Development, training, and evaluation strategy:
○ Always start simple
○ Experiment management tools:
◉ Tensorboard
◉ Comet
◉ Weights & Biases
◉ MLFlow Tracking
24

3.5 Hyperparameter Tuning
◎ Approaches:
○ Grid search
○ Random search
○ Bayesian Optimization
◎ Platforms
○ RayTune
○ Katib
○ Hyperas
25

3.6 Distributed Training
◎ Data parallelism: Use it when iteration time is too long
(both tensorflow and PyTorch support)
◎ Model parallelism: when model does not fit on a single GPU
◎ Solutions
○ Horovod
26

4.2 Web Deployment
◎ Consists of a Prediction System and a Serving System
◎ Serving options:
○ Deploy to VMs, scale by adding instances
○ Deploy as containers, scale via orchestration
◎ Model serving:
○ Specialized web deployment for ML models
○ Frameworks:
◉ Tensorflow serving, Clipper (Berkeley), Seldon
◎ Decision making: CPU or GPU?
◎ (Bonus) Deploying Jupyter Notebooks: Use Kubeflow
Fairing
29

4.3 Service Mesh and Trafﬁc Routing
◎ Transition from monolithic applications towards a
distributed microservice architecture could be challenging.
◎ A Service mesh (consisting of a network of microservices)
reduces the complexity of such deployments, and eases the
strain on development teams.
○ Istio: a service mesh to ease creation of a network of
deployed services with load balancing,
service-to-service authentication, monitoring, with few
or no code changes in service code.
30

4.4 Monitoring
◎ Purpose of monitoring:
○ Alerts for downtime, errors, and distribution shifts
○ Catching service and data regressions
◎ Kiali: an observability console for Istio with service mesh
configuration capabilities. It answers these questions: How
are the microservices connected? How are they
performing?
31

4.5 Deploying on Embedded and Mobile Devices
◎ Main challenge: memory footprint and compute constraints
◎ Solutions:
○ Quantization
○ Reduced model size (MobileNets)
○ Knowledge Distillation
◎ Embedded and Mobile Frameworks:
○ Tensorflow Lite, PyTorch Mobile, Core ML, FRITZ, ML Kit
◎ Model Conversion:
○ Open Neural Network Exchange (ONNX): open-source
format for deep learning models
○
○ 32

4.6 All-in-one solutions
◎ Tensorflow Extended (TFX)
◎ Michelangelo (Uber)
◎ Google Cloud AI Platform
◎ Amazon SageMaker
◎ Neptune
◎ FLOYD
◎ Paperspace
◎ Determined AI
◎ Domino data lab
33

Thanks!
Any questions?
You can find me at:
fred.apina@gmail.com
36

Working with Machine Learning in Production

More Related Content

Similar to Working with Machine Learning in Production

Recently uploaded

Working with Machine Learning in Production