1 7 J U N E 2 0 2 1
S C A L I N G A I I N P R O D U C T I O N
U S I N G P Y T O R C H
G E E T A C H A U H A N
PyTorch Partner Engineering, Facebook AI
@ C H A U H A N G
MLOPS World 2021
A G E N D A 0 1


C H A L L E N G E S W I T H M L I N
P R O D U C T I O N


0 2


T O R C H S E R V E O V E R V I E W


0 3


B E S T P R A C T I C E S F O R P R O D U C T I O N
D E P L O Y M E N T
MLOps World 2021
P Y T O R C H C O M M U N I T Y G R O W T H
Source: https://paperswithcode.com/trends
MLOps World 2021
●
●
●
Cloud / On-Prem
Preprocessing
Application
Application logic
Application logic
Postprocessing
. . .
. . .
. . .
Performance Ease of use
Cost efficiency Deployment at scale
C H A L L E N G E S W I T H M L I N D E P L O Y M E N T
MLOps World 2021
INFERENCE AT SCALE
Deploying and managing models in production is
di
ffi
cult.


Some of the pain points include:
Loading and managing multiple models, on multiple
servers or end devices


Running pre-processing and post-processing code on
prediction requests.


How to log, monitor and secure predictions


What happens when you hit scale?
MLOps World 2021
TORCHSERVE
Easily deploy PyTorch models in production at scale


D E F A U LT H A N D L E R S
F O R C O M M O N T A S K S
L O W L AT E N C Y M O D E L
S E R V I N G
W O R K S W I T H A N Y M L
E N V I R O N M E N T
MLOps World 2021
• Default handlers for common use
cases (e.g., image segmentation,
text classification) along with
custom handlers support for other
use cases and a Model Zoo


• Multi-model serving, Model
versioning and ability to roll back
to an earlier version


• Automatic batching of individual
inferences across HTTP requests
• Logging including common
metrics, and the ability to
incorporate custom metrics


• Robust HTTP APIS -
Management and Inference
model1.pth
model1.pth
model1.pth
torch-model-archiver
HTTP
HTTP
http://localhost:8080/ …


http://localhost:8081/ …
Logging Metrics
model1.mar model2.mar model3.mar
model4.mar model5.mar
<path>/model_store
Inference API
Management API
TorchServe
Metrics API
Inference
API
Serving Model 3
Serving Model 2
Serving Model 1
torchserve --start
TORCHSERVE
T O R C H S E R V E D E T A I L :


M O D E L H A N D L E R S
TorchServe has default model handlers that
perform boilerplate data transforms for
common cases:


• Image Classification


• Image Segmentation


• Object Detection


• Text Classification


You can also create custom model handlers
for any model and inference task.
import torch


class MyModelHandler(object):


    def initialize(self, context):


# get GPU status & device handle


# load model & supporting files (vocabularies etc.)


    def preprocess(self, data):


# put incoming data into tensor


# transform as needed for your model


    def inference(self, context):


# do predictions


    def postprocess(self, output):


# process inference output, e.g. extracting top K


# package output for web delivery


    def handle(self, context):


if not _service.initialized:


_service.initialize(context)


if data is None:


return None


data = _service.preprocess(data)


data = _service.inference(data)


data = _service.postprocess(data)


return data
M O D E L A R C H I V E
torch-model-archiver cli tool for packaging all
model artifacts into a single deployment unit


• model checkpoints or model definition file
with state_dict


• torchscript and eager mode support


• Extra files like vocab, config, index_to_name
mapping


torch-model-archiver


—model-name BERTSeqClassification_Torchscript


--version 1.0


--serialized-file Transformer_model/traced_model.pt


--handler ./Transformer_handler_generalized.py


--extra-files "./setup_config.json,./
Seq_classification_artifacts/index_to_name.json"





setup.config


{


“model_name": "bert-base-uncased",


“mode": "sequence_classification",


“do_lower_case": "True",


“num_labels": "2",


“save_mode": "torchscript",


“max_length": "150"


}




torchserve --start


--model-store model_store


—-models <path-to model-file/s3-url/azure-blob-url>
https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive
D Y N A M I C B A T C H I N G
Via Custom Handlers


• Model Configuration based


• batch_size Max batch size


• max_batch_delay The max batch delay time
TorchServe waits to
receive batch_size number of requests


• (Coming soon) Batching support in default
handlers


curl localhost:8081/models/resnet-152


{


"modelName": "resnet-152",


"modelUrl": "https://s3.amazonaws.com/model-server/
model_archive_1.0/examples/resnet-152-batching/resnet-152.ma


"runtime": "python",


"minWorkers": 1,


"maxWorkers": 1,


"batchSize": 8,


"maxBatchDelay": 10,


"workers": [


{


"id": "9008",


"startTime": "2019-02-19T23:56:33.907Z",


"status": "READY",


"gpu": false,


"memoryUsage": 607715328


}


]


}


https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md
M E T R I C S
Out of box metrics with ability to extend


• CPU, Disk, Memory utilization


• Requests type count


• ts.metrics class for extension


• Types supported - Size, percentage, counter,
general metric


• Prometheus metrics support available


# Access context metrics as follows


metrics = context.metrics


# Create Dimension Object


from ts.metrics.dimension import Dimension


# Dimensions are name value pairs


dim1 = Dimension(name, value)


.


dimN= Dimension(name_n, value_n)


# Add Distance as a metric


# dimensions = [dim1, dim2, dim3, ..., dimN]


metrics.add_metric('DistanceInKM', distance, 'km',
dimensions=dimensions)


# Add Image size as a size metric


metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions)


# Add MemoryUtilization as a percentage metric


metrics.add_percent('MemoryUtilization', utilization_percent, None,
dimensions)


# Create a counter with name 'LoopCount' and dimensions


metrics.add_counter('LoopCount', 1, None, dimensions)


# Log custom metrics


for metric in metrics.store:


logger.info("[METRICS]%s", str(metric))


https://github.com/pytorch/serve/blob/master/docs/metrics.md
MLOps World 2021
RECENT FEATURES
+ Ensemble Model support, Captum Model Interpretability


+ Kubeflow Pipelines /KFServing Integration with Auto-scaling and Canary rollout on any cloud/on-prem


+ GCP Vertex AI Serverless pipelines


+ MLflow Integration




+ Prometheus Integration with Grafana


+ Multiple nodes on EC2, Autoscaling on SageMaker/EKS, AWS Inferentia support


+ MMF, NMT, DeepLapV3 new examples




Deployment
models
Optimizations Resilience Measurement
Responsible AI
Standalon
e

Primary backu
p

Orchestratio
n

Cloud vs. 

on-premises
Performance vs.
latency
 

TorchScript profilin
g

Offline vs. real-tim
e

Cost
Robust endpoin
t

Auto-scalin
g

Canary
deployment
s

A / B testing
Metric
s

Model
performanc
e

Interpretabilit
y

Feedback loop
Fairnes
s

Human-centered
design
B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T S
MLOps World 2021
Fairness by design


• Measure skewness of data, model bias, data bias; identify relevant metrics


• Transparency, Explainable AI, inclusive design


Human-centered design


• Consider AI-driven decisions and their impact on people at the time of model design


• Provide ability to have human recourse vs. full automation – for example, need to avoid a mortgage
applications AI rejecting people of certain category or race


• Computer vision models measure results based on demographics; for example, include support for different
skin tones, age groups
R E S P O N S I B L E A I
MLOps World 2021
• Build with performance vs. latency goals in mind


• Reduce size of the model: Quantization, pruning, mixed precision training


• Reduce latency: TorchScript model; use SnakeViz profiler


• Evaluate GPU vs. CPU for low latency


• Evaluate REST vs. gRPC for your prediction service
O P T I M I Z A T I O N S
MLOps World 2021
fp32 accuracy int8 accuracy change Technique CPU inference speed up
ResNet50 76.1


Top-1, Imagenet
-0.2


75.9
Post Training
2x


214ms ➙102ms,


Intel Skylake-DE
MobileNetV2 71.9


Top-1, Imagenet
-0.3


71.6
Quantization-Aware
Training
4x


75ms ➙18ms


OnePlus 5, Snapdragon 835
Translate / FairSeq 32.78


BLEU, IWSLT 2014 de-en
0.0


32.78
Dynamic


(weights only)
4x


for encoder


Intel Skylake-SE
These models and more available on TorchHub - https://pytorch.org/hub/
QUANTIZATION
MLOps World 2021
B E R T


M O D E L


P R O F I L I N G


Eager Mode
MLOps World 2021
B E R T


M O D E L


P R O F I L I N G


Torchscript Mode


4x speedup
MLOps World 2021
Offline vs. real-time predictions


• Offline: Dynamic batching


• Online: Async processing – push/poll


• Pre-computed predictions for certain elements


Cost optimizations


• Spot Instances for offline


• Autoscaling based on metrics, on-demand cluster


• Evaluate AI Accelerators supported like AWS Inferentia for lower cost point


O P T I M I Z A T I O N S ( C O N T D . )
MLOps World 2021
Develop
,

Test
Production
Staging
,

Experiments
Hybrid Cloud
On-prem Cloud Managed
Install from Source
Standalone
Docker
Large Scale

Production
MLflow, Kubeflow
Kubernetes, Kubeflow/KFserving
Primary/Backup, ML Microservices
Autoscaling, Canary Rollouts
Minikub
e

Self managed Docker AWS CloudFormation
CLOUD VMs/ Containers
Microservices behind
 

API Gateway
CLOUD VMs/ Containers
AWS SageMaker
Endpoints, BYOC
AWS SageMaker
EKS/AKS/GKE
AWS SageMaker/ GCP
AI Platform
Serverless Functions
GCP Vertex AI,
 

AWS SageMaker
 

Canary Rollouts
Databricks
Managed MLflow
D E P L O Y I N G M O D E L S I N P R O D U C T I O N
MLOps World 2021
Create robust endpoint for serving, for example, SageMaker endpoint


Auto-scaling with orchestration deployments, multi-node for EC2, and other scenarios


Canary deployments, test new version of a model on small subset before making
default


Shadow inference, deploy new version of model in parallel


A / B testing of different versions of model
R E S I L L I E N C E
MLOps World 2021
Define model performance metrics, such as accuracy, while designing the AI service;
use-case specific


Add custom metrics as appropriate


Use CloudWatch or Prometheus dashboards for monitoring model performance


Model interpretability analysis via Captum


Deploy with a feedback loop, if model accuracy drops over time or new version,
analyze issues like concept drift, stale data, etc.
M E A S U R E M E N T
MLOps World 2021
Understand
Align
Mitigate
Monitor
Measure
Stakeholder conversations to find


consensus and outline measurement and
mitigation plans


Analyze model performance,


label bias, outcomes, and other
relevant signals
Address observed


issues in dataset,


models, policies, etc
How might the product’s goals, its policy,
and its implementation affect users from
different subgroups? Identify contextual
definitions of fairness


Monitor effect of mitigations on


subgroups, and ensure fairness
analysis holds as product adapts


FAIRNESS BY DESIGN
CAPTUM
Text Contributions: 7.54


Image Contributions: 11.19


Total Contributions: 18.73
0 200 400 600 800
400
300
200
100
0
S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S


T O I N T E R P R E T:


• Output predictions with respect to inputs


• Output predictions with respect to layers


• Neurons with respect to inputs


• Currently provides gradient & perturbation based
approaches (e.g. Integrated Gradients)
Model interpretability library for PyTorch
https://captum.ai/
MLOps World 2021
DYNABOARD & FLORES 101 WMT COMPETITION
http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html
https://github.com/facebookresearch/dynalab
https://dynabench.org/tasks/3#overall
MLOps World 2021
COMMUNIT Y PROJECTS https://github.com/cceyda/torchserve-dashboard
https://github.com/Unity-Technologies/SynthDet
https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch-
to-empower-cotton-farmers-14397f4c9f2b
MLOps World 2021
FUTURE RELEASES
+ Improved memory and resource usage for better scalability


+ C++ Backend for lower latency


+ Enhanced profiling tools
• TorchServe: https://github.com/pytorch/serve


• Management API: https://github.com/pytorch/serve/blob/master/docs/management_api.md


• Inference API: https://github.com/pytorch/serve/blob/master/docs/inference_api.md


• Language Translation Ensemble example: https://github.com/pytorch/serve/tree/master/examples/Work
fl
ows/nmt_tranformers_pipeline


• BERT Model example: https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers


• Model Zoo: https://github.com/pytorch/serve/blob/master/docs/model_zoo.md


• SnakeViz visualizations: https://github.com/pytorch/serve/tree/master/benchmarks#visualize-snakeviz-results


• Logging: https://github.com/pytorch/serve/blob/master/docs/logging.md


• Metrics: https://github.com/pytorch/serve/blob/master/docs/metrics.md


• Prometheus Metrics: https://gith ub.com/pytorch/serve/blob/master/docs/metrics_api.md


• Batch Inference: https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md


• Kube
fl
ow Pipelines: https://github.com/kube
fl
ow/pipelines/tree/master/components/PyTorch/pytorch-kfp-components


• Kubernetes support: https://github.com/pytorch/serve/blob/master/kubernetes/README.md


• TorchServe Dashboard (Community): https://cceyda.github.io/blog/torchserve/streamlit/dashboard/2020/10/15/torchserve.html


• Custom Handler community blog: https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve-
fc2d048fbe91


• Captum Interpretability for BERT models: https://github.com/pytorch/serve/blob/master/captum/Captum_visualization_for_bert.ipynb


• Operationalize, Scale and Infuse Trust in AI using KFServing: https://blog.kube
fl
ow.org/release/o
ffi
cial/2021/03/08/kfserving-0.5.html


REFERENCES
QUESTIONS?


Contact:


Email: gchauhan@fb.com


Linkedin: https://www.linkedin.com/in/geetachauhan/

Scaling AI in production using PyTorch

  • 1.
    1 7 JU N E 2 0 2 1 S C A L I N G A I I N P R O D U C T I O N U S I N G P Y T O R C H G E E T A C H A U H A N PyTorch Partner Engineering, Facebook AI @ C H A U H A N G
  • 2.
    MLOPS World 2021 AG E N D A 0 1 C H A L L E N G E S W I T H M L I N P R O D U C T I O N 0 2 T O R C H S E R V E O V E R V I E W 0 3 B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T
  • 3.
    MLOps World 2021 PY T O R C H C O M M U N I T Y G R O W T H Source: https://paperswithcode.com/trends
  • 4.
    MLOps World 2021 ● ● ● Cloud/ On-Prem Preprocessing Application Application logic Application logic Postprocessing . . . . . . . . . Performance Ease of use Cost efficiency Deployment at scale C H A L L E N G E S W I T H M L I N D E P L O Y M E N T
  • 5.
    MLOps World 2021 INFERENCEAT SCALE Deploying and managing models in production is di ffi cult. Some of the pain points include: Loading and managing multiple models, on multiple servers or end devices Running pre-processing and post-processing code on prediction requests. How to log, monitor and secure predictions What happens when you hit scale?
  • 6.
    MLOps World 2021 TORCHSERVE Easilydeploy PyTorch models in production at scale D E F A U LT H A N D L E R S F O R C O M M O N T A S K S L O W L AT E N C Y M O D E L S E R V I N G W O R K S W I T H A N Y M L E N V I R O N M E N T
  • 7.
    MLOps World 2021 •Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference model1.pth model1.pth model1.pth torch-model-archiver HTTP HTTP http://localhost:8080/ … http://localhost:8081/ … Logging Metrics model1.mar model2.mar model3.mar model4.mar model5.mar <path>/model_store Inference API Management API TorchServe Metrics API Inference API Serving Model 3 Serving Model 2 Serving Model 1 torchserve --start TORCHSERVE
  • 8.
    T O RC H S E R V E D E T A I L : M O D E L H A N D L E R S TorchServe has default model handlers that perform boilerplate data transforms for common cases: • Image Classification • Image Segmentation • Object Detection • Text Classification You can also create custom model handlers for any model and inference task. import torch class MyModelHandler(object):     def initialize(self, context): # get GPU status & device handle # load model & supporting files (vocabularies etc.)     def preprocess(self, data): # put incoming data into tensor # transform as needed for your model     def inference(self, context): # do predictions     def postprocess(self, output): # process inference output, e.g. extracting top K # package output for web delivery     def handle(self, context): if not _service.initialized: _service.initialize(context) if data is None: return None data = _service.preprocess(data) data = _service.inference(data) data = _service.postprocess(data) return data
  • 9.
    M O DE L A R C H I V E torch-model-archiver cli tool for packaging all model artifacts into a single deployment unit • model checkpoints or model definition file with state_dict • torchscript and eager mode support • Extra files like vocab, config, index_to_name mapping torch-model-archiver 
 —model-name BERTSeqClassification_Torchscript 
 --version 1.0 
 --serialized-file Transformer_model/traced_model.pt 
 --handler ./Transformer_handler_generalized.py 
 --extra-files "./setup_config.json,./ Seq_classification_artifacts/index_to_name.json" 
 

 setup.config 
 { “model_name": "bert-base-uncased", “mode": "sequence_classification", “do_lower_case": "True", “num_labels": "2", “save_mode": "torchscript", “max_length": "150" } 
 
 torchserve --start 
 --model-store model_store 
 —-models <path-to model-file/s3-url/azure-blob-url> https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive
  • 10.
    D Y NA M I C B A T C H I N G Via Custom Handlers • Model Configuration based • batch_size Max batch size • max_batch_delay The max batch delay time TorchServe waits to receive batch_size number of requests 
 • (Coming soon) Batching support in default handlers curl localhost:8081/models/resnet-152 { "modelName": "resnet-152", "modelUrl": "https://s3.amazonaws.com/model-server/ model_archive_1.0/examples/resnet-152-batching/resnet-152.ma "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 8, "maxBatchDelay": 10, "workers": [ { "id": "9008", "startTime": "2019-02-19T23:56:33.907Z", "status": "READY", "gpu": false, "memoryUsage": 607715328 } ] } https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md
  • 11.
    M E TR I C S Out of box metrics with ability to extend • CPU, Disk, Memory utilization • Requests type count • ts.metrics class for extension • Types supported - Size, percentage, counter, general metric • Prometheus metrics support available # Access context metrics as follows metrics = context.metrics # Create Dimension Object from ts.metrics.dimension import Dimension # Dimensions are name value pairs dim1 = Dimension(name, value) . dimN= Dimension(name_n, value_n) # Add Distance as a metric # dimensions = [dim1, dim2, dim3, ..., dimN] metrics.add_metric('DistanceInKM', distance, 'km', dimensions=dimensions) # Add Image size as a size metric metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions) # Add MemoryUtilization as a percentage metric metrics.add_percent('MemoryUtilization', utilization_percent, None, dimensions) # Create a counter with name 'LoopCount' and dimensions metrics.add_counter('LoopCount', 1, None, dimensions) # Log custom metrics for metric in metrics.store: logger.info("[METRICS]%s", str(metric)) https://github.com/pytorch/serve/blob/master/docs/metrics.md
  • 12.
    MLOps World 2021 RECENTFEATURES + Ensemble Model support, Captum Model Interpretability + Kubeflow Pipelines /KFServing Integration with Auto-scaling and Canary rollout on any cloud/on-prem 
 + GCP Vertex AI Serverless pipelines + MLflow Integration + Prometheus Integration with Grafana + Multiple nodes on EC2, Autoscaling on SageMaker/EKS, AWS Inferentia support + MMF, NMT, DeepLapV3 new examples 
 

  • 13.
    Deployment models Optimizations Resilience Measurement ResponsibleAI Standalon e Primary backu p Orchestratio n Cloud vs. 
 on-premises Performance vs. latency TorchScript profilin g Offline vs. real-tim e Cost Robust endpoin t Auto-scalin g Canary deployment s A / B testing Metric s Model performanc e Interpretabilit y Feedback loop Fairnes s Human-centered design B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T S
  • 14.
    MLOps World 2021 Fairnessby design • Measure skewness of data, model bias, data bias; identify relevant metrics • Transparency, Explainable AI, inclusive design Human-centered design • Consider AI-driven decisions and their impact on people at the time of model design • Provide ability to have human recourse vs. full automation – for example, need to avoid a mortgage applications AI rejecting people of certain category or race • Computer vision models measure results based on demographics; for example, include support for different skin tones, age groups R E S P O N S I B L E A I
  • 15.
    MLOps World 2021 •Build with performance vs. latency goals in mind • Reduce size of the model: Quantization, pruning, mixed precision training • Reduce latency: TorchScript model; use SnakeViz profiler • Evaluate GPU vs. CPU for low latency • Evaluate REST vs. gRPC for your prediction service O P T I M I Z A T I O N S
  • 16.
    MLOps World 2021 fp32accuracy int8 accuracy change Technique CPU inference speed up ResNet50 76.1 
 Top-1, Imagenet -0.2 
 75.9 Post Training 2x 
 214ms ➙102ms, 
 Intel Skylake-DE MobileNetV2 71.9 Top-1, Imagenet -0.3 71.6 Quantization-Aware Training 4x 
 75ms ➙18ms 
 OnePlus 5, Snapdragon 835 Translate / FairSeq 32.78 
 BLEU, IWSLT 2014 de-en 0.0 
 32.78 Dynamic 
 (weights only) 4x 
 for encoder 
 Intel Skylake-SE These models and more available on TorchHub - https://pytorch.org/hub/ QUANTIZATION
  • 17.
    MLOps World 2021 BE R T M O D E L P R O F I L I N G Eager Mode
  • 18.
    MLOps World 2021 BE R T M O D E L P R O F I L I N G Torchscript Mode 4x speedup
  • 19.
    MLOps World 2021 Offlinevs. real-time predictions • Offline: Dynamic batching • Online: Async processing – push/poll • Pre-computed predictions for certain elements Cost optimizations • Spot Instances for offline • Autoscaling based on metrics, on-demand cluster • Evaluate AI Accelerators supported like AWS Inferentia for lower cost point O P T I M I Z A T I O N S ( C O N T D . )
  • 20.
    MLOps World 2021 Develop , Test Production Staging , Experiments HybridCloud On-prem Cloud Managed Install from Source Standalone Docker Large Scale
 Production MLflow, Kubeflow Kubernetes, Kubeflow/KFserving Primary/Backup, ML Microservices Autoscaling, Canary Rollouts Minikub e Self managed Docker AWS CloudFormation CLOUD VMs/ Containers Microservices behind API Gateway CLOUD VMs/ Containers AWS SageMaker Endpoints, BYOC AWS SageMaker EKS/AKS/GKE AWS SageMaker/ GCP AI Platform Serverless Functions GCP Vertex AI, AWS SageMaker Canary Rollouts Databricks Managed MLflow D E P L O Y I N G M O D E L S I N P R O D U C T I O N
  • 21.
    MLOps World 2021 Createrobust endpoint for serving, for example, SageMaker endpoint Auto-scaling with orchestration deployments, multi-node for EC2, and other scenarios Canary deployments, test new version of a model on small subset before making default Shadow inference, deploy new version of model in parallel A / B testing of different versions of model R E S I L L I E N C E
  • 22.
    MLOps World 2021 Definemodel performance metrics, such as accuracy, while designing the AI service; use-case specific Add custom metrics as appropriate Use CloudWatch or Prometheus dashboards for monitoring model performance Model interpretability analysis via Captum Deploy with a feedback loop, if model accuracy drops over time or new version, analyze issues like concept drift, stale data, etc. M E A S U R E M E N T
  • 23.
    MLOps World 2021 Understand Align Mitigate Monitor Measure Stakeholderconversations to find 
 consensus and outline measurement and mitigation plans Analyze model performance, 
 label bias, outcomes, and other relevant signals Address observed 
 issues in dataset, 
 models, policies, etc How might the product’s goals, its policy, and its implementation affect users from different subgroups? Identify contextual definitions of fairness Monitor effect of mitigations on 
 subgroups, and ensure fairness analysis holds as product adapts FAIRNESS BY DESIGN
  • 24.
    CAPTUM Text Contributions: 7.54 ImageContributions: 11.19 Total Contributions: 18.73 0 200 400 600 800 400 300 200 100 0 S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S 
 T O I N T E R P R E T: • Output predictions with respect to inputs • Output predictions with respect to layers • Neurons with respect to inputs • Currently provides gradient & perturbation based approaches (e.g. Integrated Gradients) Model interpretability library for PyTorch https://captum.ai/
  • 25.
    MLOps World 2021 DYNABOARD& FLORES 101 WMT COMPETITION http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html https://github.com/facebookresearch/dynalab https://dynabench.org/tasks/3#overall
  • 26.
    MLOps World 2021 COMMUNITY PROJECTS https://github.com/cceyda/torchserve-dashboard https://github.com/Unity-Technologies/SynthDet https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch- to-empower-cotton-farmers-14397f4c9f2b
  • 27.
    MLOps World 2021 FUTURERELEASES + Improved memory and resource usage for better scalability + C++ Backend for lower latency + Enhanced profiling tools
  • 28.
    • TorchServe: https://github.com/pytorch/serve •Management API: https://github.com/pytorch/serve/blob/master/docs/management_api.md • Inference API: https://github.com/pytorch/serve/blob/master/docs/inference_api.md • Language Translation Ensemble example: https://github.com/pytorch/serve/tree/master/examples/Work fl ows/nmt_tranformers_pipeline • BERT Model example: https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers • Model Zoo: https://github.com/pytorch/serve/blob/master/docs/model_zoo.md • SnakeViz visualizations: https://github.com/pytorch/serve/tree/master/benchmarks#visualize-snakeviz-results • Logging: https://github.com/pytorch/serve/blob/master/docs/logging.md • Metrics: https://github.com/pytorch/serve/blob/master/docs/metrics.md • Prometheus Metrics: https://gith ub.com/pytorch/serve/blob/master/docs/metrics_api.md • Batch Inference: https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md • Kube fl ow Pipelines: https://github.com/kube fl ow/pipelines/tree/master/components/PyTorch/pytorch-kfp-components • Kubernetes support: https://github.com/pytorch/serve/blob/master/kubernetes/README.md • TorchServe Dashboard (Community): https://cceyda.github.io/blog/torchserve/streamlit/dashboard/2020/10/15/torchserve.html • Custom Handler community blog: https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve- fc2d048fbe91 • Captum Interpretability for BERT models: https://github.com/pytorch/serve/blob/master/captum/Captum_visualization_for_bert.ipynb • Operationalize, Scale and Infuse Trust in AI using KFServing: https://blog.kube fl ow.org/release/o ffi cial/2021/03/08/kfserving-0.5.html REFERENCES
  • 29.