Machine
Learning in
Production
Hello!
I am Frederick Apina
I am here because I love to
give presentations.
You can email me at:
fred.apina@gmail.com
2
“
Deploying deep learning models in
production can be challenging, as
it is far beyond training models
with good performance.
3
4
Fun Fact
85% of AI
Projects fail.
5
��
Potential reasons include:
◎ Technically infeasible or poorly scoped
◎ Never make the leap to production
◎ Unclear success criteria (metrics)
◎ Poor team management
6
“
This talk aims to be an engineering
guideline for building
production-level machine learning
systems which will be deployed in
real world applications.
7
1.
ML Projects lifecycle
8
9
Important Note:
It is important to understand state of the art in your
domain:
Why?
◎ Helps to understand what is possible
◎ Helps to know what to try next
10
Important factors to consider when defining and
prioritizing ML projects:
High Impact
◎ Complex parts of your pipeline
◎ Where "cheap prediction" is
valuable
◎ Where automating complicated
manual process is valuable
Low Cost
◎ Cost is driven by:
○ Data availability
○ Performance requirements:
costs tend to scale super-linearly
in the accuracy requirement
○ Problem difficulty
11
12
13
2.
Data Management
14
2.1 Data Sources
◎ Supervised deep learning requires a lot of labeled data
◎ Labeling own data is costly!
◎ Here are some resources for data:
○ Open source data (good to start with, but not an
advantage)
○ Data augmentation (a MUST for computer vision, an
option for NLP)
○ Synthetic data (almost always worth starting with, esp.
in NLP)
○
○ 15
2.2 Data Labeling
◎ Requires: separate software stack (labeling platforms),
temporary labor, and QC
◎ Sources of labor for labeling:
○ Crowdsourcing (Mechanical Turk): cheap and scalable,
less reliable, needs QC
○ Hiring own annotators: less QC needed, expensive,
slow to scale
○ Data labeling service companies
◎ Labeling platforms
16
2.3 Data Storage
◎ Data storage options
○ Object store: Store binary data (images, sound files,
compressed texts)
○ Database: Store metadata (file paths, labels, user
activity, etc).
○ Data Lake: to aggregate features which are not
obtainable from database (e.g. logs)
○ Feature Store: store, access, and share machine
learning features
◎ Suggestion: At training time, copy data into a local or
networked filesystem (NFS)
17
2.4 Data Versioning
◎ It's a "MUST" for deployed ML models:
Deployed ML models are part code, part data. No data
versioning means no model versioning.
◎ Data versioning platforms
18
2.5 Data Processing
◎ Training data for production models may come from
different sources.
◎ There are dependencies between tasks, each needs to be
kicked off after its dependencies are finished.
◎ Makefiles are not scalable. ʻWorkflow managerʼs become
pretty essential in this regard.
◎ Workflow orchestration
19
3.
Development, Training
and Evaluation
20
3.1 Software Engineering
◎ Winner language: Python
◎ Editors:
○ VS Code, Pycharm
○ Notebooks -> Jupyter notebook, JupyterLab, nteract
○ Streamlit: Interactive data science tool with applets
◎ Compute recommendations
○ For individuals or startups: Use GPU PC or buy shared
servers or use cloud instances
○ For large companies: Use cloud instances with proper
provisioning and handling of failures
21
3.2 Resource Management
◎ Allocating free resources to programs
◎ Resources management options:
○ Old school cluster job scheduler
○ Docker + Kubernetes
○ Kubeflow
○ Polyaxon (paid features)
22
3.3 DL Frameworks
23
3.4 Experiment management
◎ Development, training, and evaluation strategy:
○ Always start simple
○ Experiment management tools:
◉ Tensorboard
◉ Comet
◉ Weights & Biases
◉ MLFlow Tracking
24
3.5 Hyperparameter Tuning
◎ Approaches:
○ Grid search
○ Random search
○ Bayesian Optimization
◎ Platforms
○ RayTune
○ Katib
○ Hyperas
25
3.6 Distributed Training
◎ Data parallelism: Use it when iteration time is too long
(both tensorflow and PyTorch support)
◎ Model parallelism: when model does not fit on a single GPU
◎ Solutions
○ Horovod
26
4.
Testing and Deployment
27
28
4.2 Web Deployment
◎ Consists of a Prediction System and a Serving System
◎ Serving options:
○ Deploy to VMs, scale by adding instances
○ Deploy as containers, scale via orchestration
◎ Model serving:
○ Specialized web deployment for ML models
○ Frameworks:
◉ Tensorflow serving, Clipper (Berkeley), Seldon
◎ Decision making: CPU or GPU?
◎ (Bonus) Deploying Jupyter Notebooks: Use Kubeflow
Fairing
29
4.3 Service Mesh and Traffic Routing
◎ Transition from monolithic applications towards a
distributed microservice architecture could be challenging.
◎ A Service mesh (consisting of a network of microservices)
reduces the complexity of such deployments, and eases the
strain on development teams.
○ Istio: a service mesh to ease creation of a network of
deployed services with load balancing,
service-to-service authentication, monitoring, with few
or no code changes in service code.
30
4.4 Monitoring
◎ Purpose of monitoring:
○ Alerts for downtime, errors, and distribution shifts
○ Catching service and data regressions
◎ Kiali: an observability console for Istio with service mesh
configuration capabilities. It answers these questions: How
are the microservices connected? How are they
performing?
31
4.5 Deploying on Embedded and Mobile Devices
◎ Main challenge: memory footprint and compute constraints
◎ Solutions:
○ Quantization
○ Reduced model size (MobileNets)
○ Knowledge Distillation
◎ Embedded and Mobile Frameworks:
○ Tensorflow Lite, PyTorch Mobile, Core ML, FRITZ, ML Kit
◎ Model Conversion:
○ Open Neural Network Exchange (ONNX): open-source
format for deep learning models
○
○ 32
4.6 All-in-one solutions
◎ Tensorflow Extended (TFX)
◎ Michelangelo (Uber)
◎ Google Cloud AI Platform
◎ Amazon SageMaker
◎ Neptune
◎ FLOYD
◎ Paperspace
◎ Determined AI
◎ Domino data lab
33
Are We Done?
34
35
Thanks!
Any questions?
You can find me at:
fred.apina@gmail.com
36

Working with Machine Learning in Production

  • 1.
  • 2.
    Hello! I am FrederickApina I am here because I love to give presentations. You can email me at: fred.apina@gmail.com 2
  • 3.
    “ Deploying deep learningmodels in production can be challenging, as it is far beyond training models with good performance. 3
  • 4.
  • 5.
    Fun Fact 85% ofAI Projects fail. 5 ��
  • 6.
    Potential reasons include: ◎Technically infeasible or poorly scoped ◎ Never make the leap to production ◎ Unclear success criteria (metrics) ◎ Poor team management 6
  • 7.
    “ This talk aimsto be an engineering guideline for building production-level machine learning systems which will be deployed in real world applications. 7
  • 8.
  • 9.
  • 10.
    Important Note: It isimportant to understand state of the art in your domain: Why? ◎ Helps to understand what is possible ◎ Helps to know what to try next 10
  • 11.
    Important factors toconsider when defining and prioritizing ML projects: High Impact ◎ Complex parts of your pipeline ◎ Where "cheap prediction" is valuable ◎ Where automating complicated manual process is valuable Low Cost ◎ Cost is driven by: ○ Data availability ○ Performance requirements: costs tend to scale super-linearly in the accuracy requirement ○ Problem difficulty 11
  • 12.
  • 13.
  • 14.
  • 15.
    2.1 Data Sources ◎Supervised deep learning requires a lot of labeled data ◎ Labeling own data is costly! ◎ Here are some resources for data: ○ Open source data (good to start with, but not an advantage) ○ Data augmentation (a MUST for computer vision, an option for NLP) ○ Synthetic data (almost always worth starting with, esp. in NLP) ○ ○ 15
  • 16.
    2.2 Data Labeling ◎Requires: separate software stack (labeling platforms), temporary labor, and QC ◎ Sources of labor for labeling: ○ Crowdsourcing (Mechanical Turk): cheap and scalable, less reliable, needs QC ○ Hiring own annotators: less QC needed, expensive, slow to scale ○ Data labeling service companies ◎ Labeling platforms 16
  • 17.
    2.3 Data Storage ◎Data storage options ○ Object store: Store binary data (images, sound files, compressed texts) ○ Database: Store metadata (file paths, labels, user activity, etc). ○ Data Lake: to aggregate features which are not obtainable from database (e.g. logs) ○ Feature Store: store, access, and share machine learning features ◎ Suggestion: At training time, copy data into a local or networked filesystem (NFS) 17
  • 18.
    2.4 Data Versioning ◎It's a "MUST" for deployed ML models: Deployed ML models are part code, part data. No data versioning means no model versioning. ◎ Data versioning platforms 18
  • 19.
    2.5 Data Processing ◎Training data for production models may come from different sources. ◎ There are dependencies between tasks, each needs to be kicked off after its dependencies are finished. ◎ Makefiles are not scalable. ʻWorkflow managerʼs become pretty essential in this regard. ◎ Workflow orchestration 19
  • 20.
  • 21.
    3.1 Software Engineering ◎Winner language: Python ◎ Editors: ○ VS Code, Pycharm ○ Notebooks -> Jupyter notebook, JupyterLab, nteract ○ Streamlit: Interactive data science tool with applets ◎ Compute recommendations ○ For individuals or startups: Use GPU PC or buy shared servers or use cloud instances ○ For large companies: Use cloud instances with proper provisioning and handling of failures 21
  • 22.
    3.2 Resource Management ◎Allocating free resources to programs ◎ Resources management options: ○ Old school cluster job scheduler ○ Docker + Kubernetes ○ Kubeflow ○ Polyaxon (paid features) 22
  • 23.
  • 24.
    3.4 Experiment management ◎Development, training, and evaluation strategy: ○ Always start simple ○ Experiment management tools: ◉ Tensorboard ◉ Comet ◉ Weights & Biases ◉ MLFlow Tracking 24
  • 25.
    3.5 Hyperparameter Tuning ◎Approaches: ○ Grid search ○ Random search ○ Bayesian Optimization ◎ Platforms ○ RayTune ○ Katib ○ Hyperas 25
  • 26.
    3.6 Distributed Training ◎Data parallelism: Use it when iteration time is too long (both tensorflow and PyTorch support) ◎ Model parallelism: when model does not fit on a single GPU ◎ Solutions ○ Horovod 26
  • 27.
  • 28.
  • 29.
    4.2 Web Deployment ◎Consists of a Prediction System and a Serving System ◎ Serving options: ○ Deploy to VMs, scale by adding instances ○ Deploy as containers, scale via orchestration ◎ Model serving: ○ Specialized web deployment for ML models ○ Frameworks: ◉ Tensorflow serving, Clipper (Berkeley), Seldon ◎ Decision making: CPU or GPU? ◎ (Bonus) Deploying Jupyter Notebooks: Use Kubeflow Fairing 29
  • 30.
    4.3 Service Meshand Traffic Routing ◎ Transition from monolithic applications towards a distributed microservice architecture could be challenging. ◎ A Service mesh (consisting of a network of microservices) reduces the complexity of such deployments, and eases the strain on development teams. ○ Istio: a service mesh to ease creation of a network of deployed services with load balancing, service-to-service authentication, monitoring, with few or no code changes in service code. 30
  • 31.
    4.4 Monitoring ◎ Purposeof monitoring: ○ Alerts for downtime, errors, and distribution shifts ○ Catching service and data regressions ◎ Kiali: an observability console for Istio with service mesh configuration capabilities. It answers these questions: How are the microservices connected? How are they performing? 31
  • 32.
    4.5 Deploying onEmbedded and Mobile Devices ◎ Main challenge: memory footprint and compute constraints ◎ Solutions: ○ Quantization ○ Reduced model size (MobileNets) ○ Knowledge Distillation ◎ Embedded and Mobile Frameworks: ○ Tensorflow Lite, PyTorch Mobile, Core ML, FRITZ, ML Kit ◎ Model Conversion: ○ Open Neural Network Exchange (ONNX): open-source format for deep learning models ○ ○ 32
  • 33.
    4.6 All-in-one solutions ◎Tensorflow Extended (TFX) ◎ Michelangelo (Uber) ◎ Google Cloud AI Platform ◎ Amazon SageMaker ◎ Neptune ◎ FLOYD ◎ Paperspace ◎ Determined AI ◎ Domino data lab 33
  • 34.
  • 35.
  • 36.
    Thanks! Any questions? You canfind me at: fred.apina@gmail.com 36