What’s Next for in 2019
Matei Zaharia, Corey Zumar, Sid Murching
February 12, 2019
Outline
MLflow overview
Feedback so far
Databricks’ development themes for 2019
Demos of upcoming features
Outline
MLflow overview
Feedback so far
Databricks’ development themes for 2019
Demos of upcoming features
ML Development is Harder than
Traditional Software Development
5
Traditional Software Machine Learning
Goal: optimize a metric (e.g. accuracy)Goal: meet a functional specification
6
Traditional Software Machine Learning
Goal: optimize a metric (e.g. accuracy)
Quality depends on data, code & tuning
→ Must regularly update with fresh data
Goal: meet a functional specification
Quality depends only on code
7
Traditional Software Machine Learning
Goal: optimize a metric (e.g. accuracy)
Quality depends on data, code & tuning
→ Must regularly update with fresh data
Constantly experiment w/ new libraries +
models (and must productionize them!)
Goal: meet a functional specification
Quality depends only on code
Typically one software stack
What is ?
Open source platform to manage ML development
• Lightweight APIs & abstractions that work with any ML library
• Designed to be useful for 1 user or 1000+ person orgs
• Runs the same way anywhere (e.g. any cloud)
Key principle: “open interface” APIs that work with any
existing ML library, app, deployment tool, etc
MLflow Components
9
Tracking
Record and query
experiments: code,
params, results, etc
Projects
Code packaging for
reproducible runs
on any platform
Models
Model packaging and
deployment to diverse
environments
Learning
pip install mlflow to get started in Python
(APIs also available in Java and R)
Docs and tutorials at mlflow.org
• Hyperparameter tuning, REST serving, batch scoring, etc
Outline
MLflow overview
Feedback so far
Databricks’ development themes for 2019
Demos of upcoming features
Running a user survey at mlflow.org (fill it in if you haven’t!)
Users are using all components (but Tracking most popular)
What users want to see next
Outline
MLflow overview
Feedback so far
Databricks’ development themes for 2019
Demos of upcoming features
High-Level Themes
1) Update existing components based on feedback
2) Stabilize the APIs and dev process (MLflow 1.0)
3) Add new features for more of the ML lifecycle
Rough Development Timeline
MLflow 0.9, 0.10, etc: in the next few months
MLflow 1.0 and API stabilization: end of April
(stabilize core APIs and mark others as experimental)
After 1.0: continue releasing regularly to get features out
Updating Existing Components
MLflow Tracking
• SQL database backend for scaling the tracking server (0.9)
•UI scalability improvements (0.8, 0.9, etc)
•X-coordinate logging for metrics & batched logging (1.0)
•Fluent API for Java and Scala (1.0)
Updating Existing Components
MLflow Projects
•Docker-based project environment specification (0.9)
•X-coordinate logging for metrics & batched logging (1.0)
•Packaging projects with build steps (1.0+)
Updating Existing Components
MLflow Models
•Custom model logging in Python, R and Java (0.8, 0.9, 1.0)
•Better environment isolation when loading models (1.0)
•Logging schema of models (1.0+)
New Components in Discussion
Model registry
•A way to name and manage models, track deployments, etc
•Could be new abstraction or tags on existing runs (need feedback!)
Multi-step workflow GUI
•UI to view or even edit multi-step workflows (do you want this?)
MLflow telemetry component
•Standard API for deployed models to log metrics wherever they run
•Data collection and analytics tools downstream (need feedback!)
Outline
MLflow overview
Feedback so far
Databricks’ development themes for 2019
Demos of upcoming features
Demo: Model Customization
Motivating example: MLflow flower classification
f(petal_attribs) -> classification
f(petal_attribs) -> probabilities
23
Demo: Model Customization
Motivation: ML teams want to capture
mathematical models and business logic in a single
MLflow model.
24
mlflow.sklearn.save_model,
mlflow.pytorch.log_model,
….
Demo: Model Customization
MLflow 0.9: Users can easily customize models,
introducing inference logic and data dependencies
25
class PythonModel:
def load_context(self, context):
# The context object contains paths to
# files (artifacts) that can be loaded here
def predict(self, context, input_df):
# Inference logic goes here
26
class ToyModel(mlflow.pyfunc.PythonModel):
def __init__(self, return_value):
self.return_value = return_value
def predict(self, context, input_df):
return self.return_value
mlflow.pyfunc.save_model(
python_model=ToyModel(pd.DataFrame([42])),
dst_path="toy_model")
27
class ProbaModel(mlflow.pyfunc.PythonModel):
def predict(self, context, input_df):
sk_model = mlflow.sklearn.load_model(
context.artifacts["sk_model"])
return sk_model.predict_proba(input_df)
mlflow.pyfunc.save_model(
dst_path="proba_model",
python_model=ProbaModel(),
artifacts={"sk_model": "s3://model/path"})
28
Demo: Model Customization
We will fit a model that identifies iris flowers based
on their petals, emitting a probability distribution
f(pwidth, plength) -> probabilities
29
across 3 flower types
Demo
3030
Project Spec
Code DataConfig
Local Execution
Remote Execution
MLflow Projects
mlflow run git://...
Demo: Docker-based Projects
MLflow 0.9: run projects in docker containers
(@marcusrehm)
Package code with arbitrary dependencies (Java etc)
Run, share, track code with same MLflow APIs
32
Demo: Docker-based Projects
Docker handles the dependencies
docker_env:
image: continuumio/anaconda
MLflow provides unified interface for running code
$ mlflow run git://<my_project>
33
Project Structure
34
my_project/
├── MLproject
│
│
│
│
│
├── train.py
└── utils.py
...
docker_env:
image: continuumio/anaconda
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python train.py {training_data} {lambda}
$ mlflow run git://<my_project>
Demo: Docker-based Projects
See example project at
github.com/mlflow/mlflow/tree/master/examples/docker
35
Demo
3636
What’s next: Docker-based Projects
Remote execution (Kubernetes, Databricks) for
horizontal, vertical scaleout
Ease-of-use improvements add custom Docker
build steps, log to remote artifact stores
37
Thank You!
Get started with MLflow at mlflow.org
• Fill out our survey and join our Slack!
Spark AI Summit 15% discount: MLflowMeetup

What's Next for MLflow in 2019

  • 1.
    What’s Next forin 2019 Matei Zaharia, Corey Zumar, Sid Murching February 12, 2019
  • 2.
    Outline MLflow overview Feedback sofar Databricks’ development themes for 2019 Demos of upcoming features
  • 3.
    Outline MLflow overview Feedback sofar Databricks’ development themes for 2019 Demos of upcoming features
  • 4.
    ML Development isHarder than Traditional Software Development
  • 5.
    5 Traditional Software MachineLearning Goal: optimize a metric (e.g. accuracy)Goal: meet a functional specification
  • 6.
    6 Traditional Software MachineLearning Goal: optimize a metric (e.g. accuracy) Quality depends on data, code & tuning → Must regularly update with fresh data Goal: meet a functional specification Quality depends only on code
  • 7.
    7 Traditional Software MachineLearning Goal: optimize a metric (e.g. accuracy) Quality depends on data, code & tuning → Must regularly update with fresh data Constantly experiment w/ new libraries + models (and must productionize them!) Goal: meet a functional specification Quality depends only on code Typically one software stack
  • 8.
    What is ? Opensource platform to manage ML development • Lightweight APIs & abstractions that work with any ML library • Designed to be useful for 1 user or 1000+ person orgs • Runs the same way anywhere (e.g. any cloud) Key principle: “open interface” APIs that work with any existing ML library, app, deployment tool, etc
  • 9.
    MLflow Components 9 Tracking Record andquery experiments: code, params, results, etc Projects Code packaging for reproducible runs on any platform Models Model packaging and deployment to diverse environments
  • 10.
    Learning pip install mlflowto get started in Python (APIs also available in Java and R) Docs and tutorials at mlflow.org • Hyperparameter tuning, REST serving, batch scoring, etc
  • 11.
    Outline MLflow overview Feedback sofar Databricks’ development themes for 2019 Demos of upcoming features
  • 12.
    Running a usersurvey at mlflow.org (fill it in if you haven’t!)
  • 13.
    Users are usingall components (but Tracking most popular)
  • 14.
    What users wantto see next
  • 15.
    Outline MLflow overview Feedback sofar Databricks’ development themes for 2019 Demos of upcoming features
  • 16.
    High-Level Themes 1) Updateexisting components based on feedback 2) Stabilize the APIs and dev process (MLflow 1.0) 3) Add new features for more of the ML lifecycle
  • 17.
    Rough Development Timeline MLflow0.9, 0.10, etc: in the next few months MLflow 1.0 and API stabilization: end of April (stabilize core APIs and mark others as experimental) After 1.0: continue releasing regularly to get features out
  • 18.
    Updating Existing Components MLflowTracking • SQL database backend for scaling the tracking server (0.9) •UI scalability improvements (0.8, 0.9, etc) •X-coordinate logging for metrics & batched logging (1.0) •Fluent API for Java and Scala (1.0)
  • 19.
    Updating Existing Components MLflowProjects •Docker-based project environment specification (0.9) •X-coordinate logging for metrics & batched logging (1.0) •Packaging projects with build steps (1.0+)
  • 20.
    Updating Existing Components MLflowModels •Custom model logging in Python, R and Java (0.8, 0.9, 1.0) •Better environment isolation when loading models (1.0) •Logging schema of models (1.0+)
  • 21.
    New Components inDiscussion Model registry •A way to name and manage models, track deployments, etc •Could be new abstraction or tags on existing runs (need feedback!) Multi-step workflow GUI •UI to view or even edit multi-step workflows (do you want this?) MLflow telemetry component •Standard API for deployed models to log metrics wherever they run •Data collection and analytics tools downstream (need feedback!)
  • 22.
    Outline MLflow overview Feedback sofar Databricks’ development themes for 2019 Demos of upcoming features
  • 23.
    Demo: Model Customization Motivatingexample: MLflow flower classification f(petal_attribs) -> classification f(petal_attribs) -> probabilities 23
  • 24.
    Demo: Model Customization Motivation:ML teams want to capture mathematical models and business logic in a single MLflow model. 24 mlflow.sklearn.save_model, mlflow.pytorch.log_model, ….
  • 25.
    Demo: Model Customization MLflow0.9: Users can easily customize models, introducing inference logic and data dependencies 25
  • 26.
    class PythonModel: def load_context(self,context): # The context object contains paths to # files (artifacts) that can be loaded here def predict(self, context, input_df): # Inference logic goes here 26
  • 27.
    class ToyModel(mlflow.pyfunc.PythonModel): def __init__(self,return_value): self.return_value = return_value def predict(self, context, input_df): return self.return_value mlflow.pyfunc.save_model( python_model=ToyModel(pd.DataFrame([42])), dst_path="toy_model") 27
  • 28.
    class ProbaModel(mlflow.pyfunc.PythonModel): def predict(self,context, input_df): sk_model = mlflow.sklearn.load_model( context.artifacts["sk_model"]) return sk_model.predict_proba(input_df) mlflow.pyfunc.save_model( dst_path="proba_model", python_model=ProbaModel(), artifacts={"sk_model": "s3://model/path"}) 28
  • 29.
    Demo: Model Customization Wewill fit a model that identifies iris flowers based on their petals, emitting a probability distribution f(pwidth, plength) -> probabilities 29 across 3 flower types
  • 30.
  • 31.
    Project Spec Code DataConfig LocalExecution Remote Execution MLflow Projects mlflow run git://...
  • 32.
    Demo: Docker-based Projects MLflow0.9: run projects in docker containers (@marcusrehm) Package code with arbitrary dependencies (Java etc) Run, share, track code with same MLflow APIs 32
  • 33.
    Demo: Docker-based Projects Dockerhandles the dependencies docker_env: image: continuumio/anaconda MLflow provides unified interface for running code $ mlflow run git://<my_project> 33
  • 34.
    Project Structure 34 my_project/ ├── MLproject │ │ │ │ │ ├──train.py └── utils.py ... docker_env: image: continuumio/anaconda entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python train.py {training_data} {lambda} $ mlflow run git://<my_project>
  • 35.
    Demo: Docker-based Projects Seeexample project at github.com/mlflow/mlflow/tree/master/examples/docker 35
  • 36.
  • 37.
    What’s next: Docker-basedProjects Remote execution (Kubernetes, Databricks) for horizontal, vertical scaleout Ease-of-use improvements add custom Docker build steps, log to remote artifact stores 37
  • 38.
    Thank You! Get startedwith MLflow at mlflow.org • Fill out our survey and join our Slack! Spark AI Summit 15% discount: MLflowMeetup