The document discusses the integration of PyTorch and MLflow for reproducible AI workflows, emphasizing their roles in experiment tracking, model deployment, and management of machine learning models. It presents the features of MLflow, such as model registration, autologging, and deployment with TorchServe, along with examples and best practices for setting up experiments. The content also highlights the importance of model interpretability and provides resources for further learning and implementation.
PyTorch + MLflow
•Continuous Iterative process, Optimize for a metric
• Quality depends on data and running parameters
• Experiment tracking is difficult
• Over time data changes, model drift
• Model artifacts getting lost
• Compare & combine many libraries and models
• Diverse deployment environments
TRADITIONAL SOFTWARE VS MACHINE LEARNING
PyTorch + MLflow
ANOPEN SOURCE PLATFORM FOR MACHINE LEARNING LIFECYCLE MANAGEMENT
I N T R O D U C I N G
Record and query
experiments: code,
data, config, and
results.
TRACKING
Package data science
code in a format that
enables reproducible
runs on many
platforms
PROJECTS
Deploy machine
learning models in
diverse serving
environments
MODELS
Store, annotate, and
manage models in a
central repository
MODEL REGISTRY
7.
PyTorch + MLflow
MLFLow+ Pytorch for reproducibility
Record and query
experiments: code,
data, config, and
results.
TRACKING
Package data science
code in a format that
enables reproducible
runs on many
platforms
PROJECTS
Deploy machine
learning models in
diverse serving
environments
MODELS
Store, annotate, and
manage models in a
central repository
MODEL REGISTRY
PYTORCH AUTO
LOGGING
PYTORCH EXAMPLES
W/ MLPROJECTS
TORCHSCRIPTED MODELS,
SAVE/LOAD ARTIFACTS
MLFLOW TORCHSERVE
DEPLOYMENT PLUGIN
8.
PyTorch + MLflow
ML F L O W A U T O L O G G I N G
• PyTorch auto logging with Lightning training loop
• Model hyper-params like LR, model summary,
optimizer name, min delta, best score
• Early stopping and other callbacks
• Log every N iterations
• User defined metrics like F1 score, test accuracy
import mlflow.pytorch
parser =
LightningMNISTClassifier.add_model_specific_args(parent_parser=parser)
#just add this and your autologging should work!
mlflow.pytorch.autolog()
model = LightningMNISTClassifier(**dict_args)
dm = MNISTDataModule(**dict_args)
dm.prepare_data()
dm.setup(stage="fit")
early_stopping = EarlyStopping(monitor="val_loss", mode="min",
verbose=True)
checkpoint_callback = ModelCheckpoint(
filepath=os.getcwd(), save_top_k=1, verbose=True,
monitor="val_loss", mode="min", prefix="",
)
lr_logger = LearningRateLogger()
trainer = pl.Trainer.from_argparse_args(
args, callbacks=[lr_logger, early_stopping],
checkpoint_callback=checkpoint_callback
)
trainer.fit(model)
trainer.test()
PyTorch + MLflow
mlflow.pytorch.save_model(
model,
path=args.model_save_path,
requirements_file="requirements.txt",
extra_files=["class_mapping.json","bert_base_uncased_vocab.txt"],
)
:param requirements_file: An (optional) string containing the path to requirements file.
If ``None``, no requirements file is added to the model.
:param extra_files: An (optional) list containing the paths to corresponding extra files.
For example, consider the following ``extra_files`` list::
extra_files = ["s3://my-bucket/path/to/my_file1",
"s3://my-bucket/path/to/my_file2"]
In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3.
If ``None``, no extra files are added to the model.
S A V E
A R T I F A C T S
• Additional artifacts for
model reproducibility
• For Example: vocabulary files
for NLP models,
requirements.txt and other
extra files for torchserve
deployment
11.
PyTorch + MLflow
model= LightningMNISTClassifier(**dict_args)
# Convert to TorchScripted model
scripted_model = torch.jit.script(model)
mlflow.start_run()
# Log the scripted model using log_model
mlflow.pytorch.log_model(scripted_model, "scripted_model")
# If you need to reload the model just call load_model
uri_path = mlflow.get_artifact_uri()
scripted_loaded_model =
mlflow.pytorch.load_model(os.path.join(uri_path, "scripted_model"))
mlflow.end_run()
T O R C H S C R I P T E D M O D E L
• Log TorchScripted model
• Static subset of the python language
specialized for ML applications
• Serialize and Optimize models for python-
free process
• Recommended for production inference
12.
PY TORCH DEVELOPERDAY 2020 #PTD2
TORCHSERVE
• Default handlers for common use
cases (e.g., image segmentation,
text classification) along with
custom handlers support for other
use cases and a Model Zoo
•
• Multi-model serving, Model
versioning and ability to roll back
to an earlier version
• Automatic batching of individual
inferences across HTTP requests
• Logging including common
metrics, and the ability to
incorporate custom metrics
• Robust HTTP APIS -
Management and Inference
model1.pth
model1.pth
model1.pth
torch-model-archiver
HTTP
HTTP
http://localhost:8080/ …
http://localhost:8081/ …
Logging Metrics
model1.mar model2.mar model3.mar
model4.mar model5.mar
<path>/model_store
Inference API
Management API
TorchServe
Metrics API
Inference
API
Serving Model 3
Serving Model 2
Serving Model 1
torchserve --start
13.
PyTorch + MLflow
#deploy model
mlflow deployments create --name mnist_test --target torchserve ——
model-uri mnist.pt -C "MODEL_FILE=mnist_model.py" -C
"HANDLER=mnist_handler.py"
# do prediction
mlflow deployments predict --name mnist_test --target torchserve --
input_path sample.json --output_path output.json
D E P L O Y M E N T P L U G I N
New TorchServe Deployment Plugin
Test models during development cycle, pull
models from MLflow Model repository and run
• CLI
• Run with Local vs remote TorchServe
• Python API
import os
import matplotlib.pyplot as plt
from torchvision import transforms
from mlflow.deployments import get_deploy_client
img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png"))
mnist_transforms = transforms.Compose([
transforms.ToTensor()
])
image = mnist_transforms(img)
plugin = get_deploy_client("torchserve")
config = {
'MODEL_FILE': "mnist_model.py",
'HANDLER_FILE': 'mnist_handler.py'
}
plugin.create_deployment(name="mnist_test", model_uri="mnist_cnn.pt",
config=config)
prediction = plugin.predict("mnist_test", image)
14.
CAPTUM
Text Contributions: 7.54
ImageContributions: 11.19
Total Contributions: 18.73
0 200 400 600 800
400
300
200
100
0
S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S
T O I N T E R P R E T:
• Output predictions with respect to inputs
• Output predictions with respect to layers
• Neurons with respect to inputs
• Currently provides gradient & perturbation based
approaches (e.g. Integrated Gradients)
Model interpretability library for PyTorch
https://captum.ai/
15.
GradientSHAP
DeepLiftSHAP
SHAP Methods IntegratedGradients
Saliency
GuidedGradCam
Attribute model output (or internal neurons) to input
features
LayerGradientSHAP
LayerDeepLiftSHAP
SHAP Methods
LayerConductance
InternalInfluence
GradCam
Attribute model output to the layers of the model
DeepLift
NoiseTunnel (Smoothgrad, Vargrad, Smoothgrad Square)
LayerActivation
LayerGradientXActivation
LayerDeepLift
FeatureAblation /
FeaturePermutation
GuidedBackprop /
Deconvolution
AT TRIBUTION ALGORITHMS
Input * Gradient LayerFeatureAblation
LayerIntegratedGradients
Occlusion
Shapely Value Sampling
Gradient
Perturbation
Other
16.
NEW FEATURES
Integrations andnew samples for:
- Model Interpretability using Captum
- Model Signature
- Hyper Parameter Optimization using Ax/Botorch
- Iterative Pruning Example using Ax/Botorch
#Captum
ig = IntegratedGradients(net)
test_input_tensor.requires_grad_()
attr, _ = ig.attribute(test_input_tensor, target=1,
return_convergence_delta=True)
attr = attr.detach().numpy()
# To understand attributions, average across all inputs, print
and visualize average attribution for each feature.
feature_imp, feature_imp_dict =
visualize_importances(feature_names, np.mean(attr, axis=0))
mlflow.log_metrics(feature_imp_dict)
mlflow.log_text(str(feature_imp), "feature_imp_summary.txt")
fig, (ax1, ax2) = plt.subplots(2, 1)
fig.tight_layout(pad=3)
ax1.hist(attr[:, 1], 100)
ax1.set(title="Distribution of Sibsp Attribution Values")
#Model Signature
from mlflow.models.signature import infer_signature
train = df.drop_column("target_label")
predictions = ... # compute model predictions
signature = infer_signature(train, predictions)
PY TORCH DEVELOPERDAY 2020 #PTD2
MLOPS WORKFLOW: MLFLOW + PY TORCH + TORCHSERVE
Deployment
TorchServe
Management
Inference
Build PyTorch Model
Data Scientist
Training/
Distributed Training
PyTorch Model
Optimized Model:
TorchScript
Autolog
Experiment Runs
Model Registry
MLflow TorchServe
Plugin
+
21.
PyTorch + MLflow
FUTURE
•Captum Interpretability for Inference in mlflow-torchserve
• Captum Insights visualizations in MLflow UI
• PyTorch Profiler integration with Tensorboard in MLflow UI
• More examples