R E P R O D U C I B L E A I
U S I N G P Y T O R C H
A N D M L F L O W
Geeta Chauhan
PyTorch Partner
Engineering, Facebook AI
@ C H A U H A N G
MLFLOW + PyTorch
A G E N D A 0 1
R E P R O D U C I B L E A I C H A L L E N G E
0 2
M L F L O W + P Y T O R C H
0 3
D E M O
PyTorch + MLflow
Reproducible AI Challenge
PyTorch + MLflow
• Continuous Iterative process, Optimize for a metric
• Quality depends on data and running parameters
• Experiment tracking is difficult
• Over time data changes, model drift
• Model artifacts getting lost
• Compare & combine many libraries and models
• Diverse deployment environments
TRADITIONAL SOFTWARE VS MACHINE LEARNING
PyTorch + MLflow
mlflow + PyTorch
PyTorch + MLflow
AN OPEN SOURCE PLATFORM FOR MACHINE LEARNING LIFECYCLE MANAGEMENT
I N T R O D U C I N G
Record and query
experiments: code,
data, config, and
results.
TRACKING
Package data science
code in a format that
enables reproducible
runs on many
platforms
PROJECTS
Deploy machine
learning models in
diverse serving
environments
MODELS
Store, annotate, and
manage models in a
central repository
MODEL REGISTRY
PyTorch + MLflow
MLFLow + Pytorch for reproducibility
Record and query
experiments: code,
data, config, and
results.
TRACKING
Package data science
code in a format that
enables reproducible
runs on many
platforms
PROJECTS
Deploy machine
learning models in
diverse serving
environments
MODELS
Store, annotate, and
manage models in a
central repository
MODEL REGISTRY
PYTORCH AUTO
LOGGING
PYTORCH EXAMPLES
W/ MLPROJECTS
TORCHSCRIPTED MODELS,
SAVE/LOAD ARTIFACTS
MLFLOW TORCHSERVE
DEPLOYMENT PLUGIN
PyTorch + MLflow
M L F L O W A U T O L O G G I N G
• PyTorch auto logging with Lightning training loop
• Model hyper-params like LR, model summary,
optimizer name, min delta, best score
• Early stopping and other callbacks
• Log every N iterations
• User defined metrics like F1 score, test accuracy
import mlflow.pytorch
parser =
LightningMNISTClassifier.add_model_specific_args(parent_parser=parser)
#just add this and your autologging should work!
mlflow.pytorch.autolog()
model = LightningMNISTClassifier(**dict_args)
dm = MNISTDataModule(**dict_args)
dm.prepare_data()
dm.setup(stage="fit")
early_stopping = EarlyStopping(monitor="val_loss", mode="min",
verbose=True)
checkpoint_callback = ModelCheckpoint(
filepath=os.getcwd(), save_top_k=1, verbose=True,
monitor="val_loss", mode="min", prefix="",
)
lr_logger = LearningRateLogger()
trainer = pl.Trainer.from_argparse_args(
args, callbacks=[lr_logger, early_stopping],
checkpoint_callback=checkpoint_callback
)
trainer.fit(model)
trainer.test()
PyTorch + MLflow
C O M P A R E E X P E R I M E N T R U N S
PyTorch + MLflow
mlflow.pytorch.save_model(
model,
path=args.model_save_path,
requirements_file="requirements.txt",
extra_files=["class_mapping.json", "bert_base_uncased_vocab.txt"],
)
:param requirements_file: An (optional) string containing the path to requirements file.
If ``None``, no requirements file is added to the model.
:param extra_files: An (optional) list containing the paths to corresponding extra files.
For example, consider the following ``extra_files`` list::
extra_files = ["s3://my-bucket/path/to/my_file1",
"s3://my-bucket/path/to/my_file2"]
In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3.
If ``None``, no extra files are added to the model.
S A V E
A R T I F A C T S
• Additional artifacts for
model reproducibility
• For Example: vocabulary files
for NLP models,
requirements.txt and other
extra files for torchserve
deployment
PyTorch + MLflow
model = LightningMNISTClassifier(**dict_args)
# Convert to TorchScripted model
scripted_model = torch.jit.script(model)
mlflow.start_run()
# Log the scripted model using log_model
mlflow.pytorch.log_model(scripted_model, "scripted_model")
# If you need to reload the model just call load_model
uri_path = mlflow.get_artifact_uri()
scripted_loaded_model =
mlflow.pytorch.load_model(os.path.join(uri_path, "scripted_model"))
mlflow.end_run()
T O R C H S C R I P T E D M O D E L
• Log TorchScripted model
• Static subset of the python language
specialized for ML applications
• Serialize and Optimize models for python-
free process
• Recommended for production inference
PY TORCH DEVELOPER DAY 2020 #PTD2
TORCHSERVE
• Default handlers for common use
cases (e.g., image segmentation,
text classification) along with
custom handlers support for other
use cases and a Model Zoo
•
• Multi-model serving, Model
versioning and ability to roll back
to an earlier version
• Automatic batching of individual
inferences across HTTP requests
• Logging including common
metrics, and the ability to
incorporate custom metrics
• Robust HTTP APIS -
Management and Inference
model1.pth
model1.pth
model1.pth
torch-model-archiver
HTTP
HTTP
http://localhost:8080/ …
http://localhost:8081/ …
Logging Metrics
model1.mar model2.mar model3.mar
model4.mar model5.mar
<path>/model_store
Inference API
Management API
TorchServe
Metrics API
Inference
API
Serving Model 3
Serving Model 2
Serving Model 1
torchserve --start
PyTorch + MLflow
# deploy model
mlflow deployments create --name mnist_test --target torchserve ——
model-uri mnist.pt -C "MODEL_FILE=mnist_model.py" -C
"HANDLER=mnist_handler.py"
# do prediction
mlflow deployments predict --name mnist_test --target torchserve --
input_path sample.json --output_path output.json
D E P L O Y M E N T P L U G I N
New TorchServe Deployment Plugin
Test models during development cycle, pull
models from MLflow Model repository and run
• CLI
• Run with Local vs remote TorchServe
• Python API
import os
import matplotlib.pyplot as plt
from torchvision import transforms
from mlflow.deployments import get_deploy_client
img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png"))
mnist_transforms = transforms.Compose([
transforms.ToTensor()
])
image = mnist_transforms(img)
plugin = get_deploy_client("torchserve")
config = {
'MODEL_FILE': "mnist_model.py",
'HANDLER_FILE': 'mnist_handler.py'
}
plugin.create_deployment(name="mnist_test", model_uri="mnist_cnn.pt",
config=config)
prediction = plugin.predict("mnist_test", image)
CAPTUM
Text Contributions: 7.54
Image Contributions: 11.19
Total Contributions: 18.73
0 200 400 600 800
400
300
200
100
0
S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S
T O I N T E R P R E T:
• Output predictions with respect to inputs
• Output predictions with respect to layers
• Neurons with respect to inputs
• Currently provides gradient & perturbation based
approaches (e.g. Integrated Gradients)
Model interpretability library for PyTorch
https://captum.ai/
GradientSHAP
DeepLiftSHAP
SHAP Methods Integrated Gradients
Saliency
GuidedGradCam
Attribute model output (or internal neurons) to input
features
LayerGradientSHAP
LayerDeepLiftSHAP
SHAP Methods
LayerConductance
InternalInfluence
GradCam
Attribute model output to the layers of the model
DeepLift
NoiseTunnel (Smoothgrad, Vargrad, Smoothgrad Square)
LayerActivation
LayerGradientXActivation
LayerDeepLift
FeatureAblation /
FeaturePermutation
GuidedBackprop /
Deconvolution
AT TRIBUTION ALGORITHMS
Input * Gradient LayerFeatureAblation
LayerIntegratedGradients
Occlusion
Shapely Value Sampling
Gradient
Perturbation
Other
NEW FEATURES
Integrations and new samples for:
- Model Interpretability using Captum
- Model Signature
- Hyper Parameter Optimization using Ax/Botorch
- Iterative Pruning Example using Ax/Botorch
#Captum
ig = IntegratedGradients(net)
test_input_tensor.requires_grad_()
attr, _ = ig.attribute(test_input_tensor, target=1,
return_convergence_delta=True)
attr = attr.detach().numpy()
# To understand attributions, average across all inputs, print
and visualize average attribution for each feature.
feature_imp, feature_imp_dict =
visualize_importances(feature_names, np.mean(attr, axis=0))
mlflow.log_metrics(feature_imp_dict)
mlflow.log_text(str(feature_imp), "feature_imp_summary.txt")
fig, (ax1, ax2) = plt.subplots(2, 1)
fig.tight_layout(pad=3)
ax1.hist(attr[:, 1], 100)
ax1.set(title="Distribution of Sibsp Attribution Values")
#Model Signature
from mlflow.models.signature import infer_signature
train = df.drop_column("target_label")
predictions = ... # compute model predictions
signature = infer_signature(train, predictions)
EXPERIMENT
CANDIDATES
OFFLINE
SIMULATION
ONLINE
A / B TEST
AX / BOTORCH
MULTITASK MODEL
ADAPTIVE EXPERIMENTATION FOR NEWS
FEED RANKING
NEW FEATURES - HPO WITH AX
with mlflow.start_run(run_name="Parent Run"):
train_evaluate(params=params, max_epochs=max_epochs)
ax_client = AxClient()
ax_client.create_experiment(
parameters=[
{"name": "weight_decay", "type": "range", "bounds": [1e-4, 1e-3]},
{"name": "momentum", "type": "range", "bounds": [0.7, 1.0]},
],
objective_name="test_accuracy",
)
for i in range(total_trials):
with mlflow.start_run(nested=True, run_name="Trial " + str(i)) as child_run:
parameters, trial_index = ax_client.get_next_trial()
test_accuracy = train_evaluate(params=parameters, max_epochs=max_epochs)
# completion of trial
ax_client.complete_trial(trial_index=trial_index,
raw_data=test_accuracy.item())
best_parameters, metrics = ax_client.get_best_parameters()
for param_name, value in best_parameters.items():
mlflow.log_param("optimum_" + param_name, value)
Demo
PY TORCH DEVELOPER DAY 2020 #PTD2
MLOPS WORKFLOW: MLFLOW + PY TORCH + TORCHSERVE
Deployment
TorchServe
Management
Inference
Build PyTorch Model
Data Scientist
Training/
Distributed Training
PyTorch Model
Optimized Model:
TorchScript
Autolog
Experiment Runs
Model Registry
MLflow TorchServe
Plugin
+
PyTorch + MLflow
FUTURE
• Captum Interpretability for Inference in mlflow-torchserve
• Captum Insights visualizations in MLflow UI
• PyTorch Profiler integration with Tensorboard in MLflow UI
• More examples
PyTorch + MLflow
RESOURCES
• Reproducibility Checklist: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf
• NeurIPS Reproducibility updates: https://ai.facebook.com/blog/new-code-completeness-checklist-and-
reproducibility-updates/
• arXiv + Papers with code: https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv-
ecc362883167
• MLflow + PyTorch Autolog blog: https://medium.com/pytorch/mlflow-and-pytorch-where-cutting-edge-ai-
meets-mlops-1985cf8aa789
• MLflow TorchServe deployment plugin: https://github.com/mlflow/mlflow-torchserve
• MLflow + PyTorch Examples: https://github.com/mlflow/mlflow/tree/master/examples/pytorch
T H A N K Y O U
Contact:
Email: gchauhan@fb.com
Linkedin: https://www.linkedin.com/in/geetachauhan/

Reproducible AI using MLflow and PyTorch

  • 1.
    R E PR O D U C I B L E A I U S I N G P Y T O R C H A N D M L F L O W Geeta Chauhan PyTorch Partner Engineering, Facebook AI @ C H A U H A N G
  • 2.
    MLFLOW + PyTorch AG E N D A 0 1 R E P R O D U C I B L E A I C H A L L E N G E 0 2 M L F L O W + P Y T O R C H 0 3 D E M O
  • 3.
  • 4.
    PyTorch + MLflow •Continuous Iterative process, Optimize for a metric • Quality depends on data and running parameters • Experiment tracking is difficult • Over time data changes, model drift • Model artifacts getting lost • Compare & combine many libraries and models • Diverse deployment environments TRADITIONAL SOFTWARE VS MACHINE LEARNING
  • 5.
  • 6.
    PyTorch + MLflow ANOPEN SOURCE PLATFORM FOR MACHINE LEARNING LIFECYCLE MANAGEMENT I N T R O D U C I N G Record and query experiments: code, data, config, and results. TRACKING Package data science code in a format that enables reproducible runs on many platforms PROJECTS Deploy machine learning models in diverse serving environments MODELS Store, annotate, and manage models in a central repository MODEL REGISTRY
  • 7.
    PyTorch + MLflow MLFLow+ Pytorch for reproducibility Record and query experiments: code, data, config, and results. TRACKING Package data science code in a format that enables reproducible runs on many platforms PROJECTS Deploy machine learning models in diverse serving environments MODELS Store, annotate, and manage models in a central repository MODEL REGISTRY PYTORCH AUTO LOGGING PYTORCH EXAMPLES W/ MLPROJECTS TORCHSCRIPTED MODELS, SAVE/LOAD ARTIFACTS MLFLOW TORCHSERVE DEPLOYMENT PLUGIN
  • 8.
    PyTorch + MLflow ML F L O W A U T O L O G G I N G • PyTorch auto logging with Lightning training loop • Model hyper-params like LR, model summary, optimizer name, min delta, best score • Early stopping and other callbacks • Log every N iterations • User defined metrics like F1 score, test accuracy import mlflow.pytorch parser = LightningMNISTClassifier.add_model_specific_args(parent_parser=parser) #just add this and your autologging should work! mlflow.pytorch.autolog() model = LightningMNISTClassifier(**dict_args) dm = MNISTDataModule(**dict_args) dm.prepare_data() dm.setup(stage="fit") early_stopping = EarlyStopping(monitor="val_loss", mode="min", verbose=True) checkpoint_callback = ModelCheckpoint( filepath=os.getcwd(), save_top_k=1, verbose=True, monitor="val_loss", mode="min", prefix="", ) lr_logger = LearningRateLogger() trainer = pl.Trainer.from_argparse_args( args, callbacks=[lr_logger, early_stopping], checkpoint_callback=checkpoint_callback ) trainer.fit(model) trainer.test()
  • 9.
    PyTorch + MLflow CO M P A R E E X P E R I M E N T R U N S
  • 10.
    PyTorch + MLflow mlflow.pytorch.save_model( model, path=args.model_save_path, requirements_file="requirements.txt", extra_files=["class_mapping.json","bert_base_uncased_vocab.txt"], ) :param requirements_file: An (optional) string containing the path to requirements file. If ``None``, no requirements file is added to the model. :param extra_files: An (optional) list containing the paths to corresponding extra files. For example, consider the following ``extra_files`` list:: extra_files = ["s3://my-bucket/path/to/my_file1", "s3://my-bucket/path/to/my_file2"] In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3. If ``None``, no extra files are added to the model. S A V E A R T I F A C T S • Additional artifacts for model reproducibility • For Example: vocabulary files for NLP models, requirements.txt and other extra files for torchserve deployment
  • 11.
    PyTorch + MLflow model= LightningMNISTClassifier(**dict_args) # Convert to TorchScripted model scripted_model = torch.jit.script(model) mlflow.start_run() # Log the scripted model using log_model mlflow.pytorch.log_model(scripted_model, "scripted_model") # If you need to reload the model just call load_model uri_path = mlflow.get_artifact_uri() scripted_loaded_model = mlflow.pytorch.load_model(os.path.join(uri_path, "scripted_model")) mlflow.end_run() T O R C H S C R I P T E D M O D E L • Log TorchScripted model • Static subset of the python language specialized for ML applications • Serialize and Optimize models for python- free process • Recommended for production inference
  • 12.
    PY TORCH DEVELOPERDAY 2020 #PTD2 TORCHSERVE • Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference model1.pth model1.pth model1.pth torch-model-archiver HTTP HTTP http://localhost:8080/ … http://localhost:8081/ … Logging Metrics model1.mar model2.mar model3.mar model4.mar model5.mar <path>/model_store Inference API Management API TorchServe Metrics API Inference API Serving Model 3 Serving Model 2 Serving Model 1 torchserve --start
  • 13.
    PyTorch + MLflow #deploy model mlflow deployments create --name mnist_test --target torchserve —— model-uri mnist.pt -C "MODEL_FILE=mnist_model.py" -C "HANDLER=mnist_handler.py" # do prediction mlflow deployments predict --name mnist_test --target torchserve -- input_path sample.json --output_path output.json D E P L O Y M E N T P L U G I N New TorchServe Deployment Plugin Test models during development cycle, pull models from MLflow Model repository and run • CLI • Run with Local vs remote TorchServe • Python API import os import matplotlib.pyplot as plt from torchvision import transforms from mlflow.deployments import get_deploy_client img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png")) mnist_transforms = transforms.Compose([ transforms.ToTensor() ]) image = mnist_transforms(img) plugin = get_deploy_client("torchserve") config = { 'MODEL_FILE': "mnist_model.py", 'HANDLER_FILE': 'mnist_handler.py' } plugin.create_deployment(name="mnist_test", model_uri="mnist_cnn.pt", config=config) prediction = plugin.predict("mnist_test", image)
  • 14.
    CAPTUM Text Contributions: 7.54 ImageContributions: 11.19 Total Contributions: 18.73 0 200 400 600 800 400 300 200 100 0 S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S T O I N T E R P R E T: • Output predictions with respect to inputs • Output predictions with respect to layers • Neurons with respect to inputs • Currently provides gradient & perturbation based approaches (e.g. Integrated Gradients) Model interpretability library for PyTorch https://captum.ai/
  • 15.
    GradientSHAP DeepLiftSHAP SHAP Methods IntegratedGradients Saliency GuidedGradCam Attribute model output (or internal neurons) to input features LayerGradientSHAP LayerDeepLiftSHAP SHAP Methods LayerConductance InternalInfluence GradCam Attribute model output to the layers of the model DeepLift NoiseTunnel (Smoothgrad, Vargrad, Smoothgrad Square) LayerActivation LayerGradientXActivation LayerDeepLift FeatureAblation / FeaturePermutation GuidedBackprop / Deconvolution AT TRIBUTION ALGORITHMS Input * Gradient LayerFeatureAblation LayerIntegratedGradients Occlusion Shapely Value Sampling Gradient Perturbation Other
  • 16.
    NEW FEATURES Integrations andnew samples for: - Model Interpretability using Captum - Model Signature - Hyper Parameter Optimization using Ax/Botorch - Iterative Pruning Example using Ax/Botorch #Captum ig = IntegratedGradients(net) test_input_tensor.requires_grad_() attr, _ = ig.attribute(test_input_tensor, target=1, return_convergence_delta=True) attr = attr.detach().numpy() # To understand attributions, average across all inputs, print and visualize average attribution for each feature. feature_imp, feature_imp_dict = visualize_importances(feature_names, np.mean(attr, axis=0)) mlflow.log_metrics(feature_imp_dict) mlflow.log_text(str(feature_imp), "feature_imp_summary.txt") fig, (ax1, ax2) = plt.subplots(2, 1) fig.tight_layout(pad=3) ax1.hist(attr[:, 1], 100) ax1.set(title="Distribution of Sibsp Attribution Values") #Model Signature from mlflow.models.signature import infer_signature train = df.drop_column("target_label") predictions = ... # compute model predictions signature = infer_signature(train, predictions)
  • 17.
    EXPERIMENT CANDIDATES OFFLINE SIMULATION ONLINE A / BTEST AX / BOTORCH MULTITASK MODEL ADAPTIVE EXPERIMENTATION FOR NEWS FEED RANKING
  • 18.
    NEW FEATURES -HPO WITH AX with mlflow.start_run(run_name="Parent Run"): train_evaluate(params=params, max_epochs=max_epochs) ax_client = AxClient() ax_client.create_experiment( parameters=[ {"name": "weight_decay", "type": "range", "bounds": [1e-4, 1e-3]}, {"name": "momentum", "type": "range", "bounds": [0.7, 1.0]}, ], objective_name="test_accuracy", ) for i in range(total_trials): with mlflow.start_run(nested=True, run_name="Trial " + str(i)) as child_run: parameters, trial_index = ax_client.get_next_trial() test_accuracy = train_evaluate(params=parameters, max_epochs=max_epochs) # completion of trial ax_client.complete_trial(trial_index=trial_index, raw_data=test_accuracy.item()) best_parameters, metrics = ax_client.get_best_parameters() for param_name, value in best_parameters.items(): mlflow.log_param("optimum_" + param_name, value)
  • 19.
  • 20.
    PY TORCH DEVELOPERDAY 2020 #PTD2 MLOPS WORKFLOW: MLFLOW + PY TORCH + TORCHSERVE Deployment TorchServe Management Inference Build PyTorch Model Data Scientist Training/ Distributed Training PyTorch Model Optimized Model: TorchScript Autolog Experiment Runs Model Registry MLflow TorchServe Plugin +
  • 21.
    PyTorch + MLflow FUTURE •Captum Interpretability for Inference in mlflow-torchserve • Captum Insights visualizations in MLflow UI • PyTorch Profiler integration with Tensorboard in MLflow UI • More examples
  • 22.
    PyTorch + MLflow RESOURCES •Reproducibility Checklist: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf • NeurIPS Reproducibility updates: https://ai.facebook.com/blog/new-code-completeness-checklist-and- reproducibility-updates/ • arXiv + Papers with code: https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv- ecc362883167 • MLflow + PyTorch Autolog blog: https://medium.com/pytorch/mlflow-and-pytorch-where-cutting-edge-ai- meets-mlops-1985cf8aa789 • MLflow TorchServe deployment plugin: https://github.com/mlflow/mlflow-torchserve • MLflow + PyTorch Examples: https://github.com/mlflow/mlflow/tree/master/examples/pytorch
  • 23.
    T H AN K Y O U Contact: Email: gchauhan@fb.com Linkedin: https://www.linkedin.com/in/geetachauhan/