How to Build a ML
Platform Efficiently
Using Open-Source
Jean Carlo Machado
Theodore Meynard
GetYourGuide 1
Agenda
▪ Introduction
▪ ML at GetYourGuide
▪ Before the Platform
▪ ML Platform
▪ Demo
▪ Final words
2
Who Are We
Theo: Senior Data Scientist
Jean: Senior Software Engineer
3
theodore.meynard@getyourguide.com
jean.machado@getyourguide.com
Introduction
4
We’ve built the world’s largest
marketplace for travel activities…
Millions of travelers use
GetYourGuide every year
We facilitate the transaction We offer more than 40,000
activities worldwide
Connecting
customers...
...to suppliers
around the world
5
ML at GetYourGuide
6
Data Product: Ranking Service
7
Data Product: Recommendation Panels
8
Data Product: Paid Search
9
Amsterdam
:
Canal
Cruise
Other Data Products
We also use ML for:
• Demand forecasting
• Inventory labeling
⇒ 20+ ML projects distributed in 2
teams + delivered models to other
teams to maintain
10
Data Product Principles
We follow clean code principles
PoCs are temporary
We build solid, resilient deployment processes
Know our models health at every point in time
Quality,
Testing &
Monitoring
We integrate into the engineering
ecosystem and leverage
open-source
Data workflows are efficient and
cost-effective
We take reproducibility and
modularity seriously
Engineering
We Promote the Data Product
mindset
We deeply integrate with data
stakeholders
Stakeholder
Engagement
We actively manage the unknowns in our
planning
Data analytics dynamically informs our
project plans
Workflow
Customer and business value over
fancy solutions
Exploration is one of our goals
Strategy
We value small iterations on
existing models
We value explainability over pure
accuracy
Performance is proven online
Model
Our principles explained 11
Before the platform
12
How We Started
Pros
● Widely used by ML practitioners
● Good to start new projects &
prototype
● Great visualization
Cons
● No proper version control
● No code reuse
● No automatic testing
13
A Major Improvement
Pros
● Tests included in library
● Version control with code review
● Maintainable projects
Cons
● No CI/CD
● No model tracking
14
ML Platform
15
From Amsterdam:
Volendam, Marken and
Windmills
ML Platform Key Features
● CI/CD
● Model Tracking
● Batch & Online inference
16
ML Platform Principles
• Maximize data scientist’s model
ownership
• Reproducible Machine Learning
• Reuse most of our existing
infrastructure
• Build incrementally
• Use open-source and open
standards
17
Our Current Workflow
18
ML CI/CD
19
Training Path
20
Batch Inference Path
21
Online Inference Path
22
Online Inference With BentoML
23
From MLflow to BentoML Example
import mlflow
from iris_classifier import IrisClassifier
# Load mlflow model
mlflow_model = mlflow.sklearn.load_model(model_uri)
# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()
# Pack the newly trained model artifact
iris_classifier_service.pack("model", mlflow_model)
# Save the prediction service for model serving
saved_path = iris_classifier_service.save()
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModelArtifact
@artifacts([SklearnModelArtifact("model")])
class IrisClassifier(BentoService):
"""A minimum prediction service"""
@api(input=DataframeInput())
def predict(self, df: pd.DataFrame):
"""An inference API named `predict`"""
return self.artifacts.model.predict(df)
iris_classifier.py
mlflow2bentoml.py
24
Demo
25
Final remarks
26
● The integration with the existing
architecture fosters proactive
collaboration
● The tool space is very new making
the exploration vital but time
consuming
● A design review is necessary to align
everyone
Learnings
27
Amsterdam:
Moco Museum
Conclusion
● As we grew, we needed to refine our
Data science process
● Software engineering good practices
+ special twist for ML
● Our platform helps Data Scientists to
○ Build faster
○ Deploy safer
○ Document automatically
28
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
29

How to Build a ML Platform Efficiently Using Open-Source

  • 1.
    How to Builda ML Platform Efficiently Using Open-Source Jean Carlo Machado Theodore Meynard GetYourGuide 1
  • 2.
    Agenda ▪ Introduction ▪ MLat GetYourGuide ▪ Before the Platform ▪ ML Platform ▪ Demo ▪ Final words 2
  • 3.
    Who Are We Theo:Senior Data Scientist Jean: Senior Software Engineer 3 theodore.meynard@getyourguide.com jean.machado@getyourguide.com
  • 4.
  • 5.
    We’ve built theworld’s largest marketplace for travel activities… Millions of travelers use GetYourGuide every year We facilitate the transaction We offer more than 40,000 activities worldwide Connecting customers... ...to suppliers around the world 5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    Amsterdam : Canal Cruise Other Data Products Wealso use ML for: • Demand forecasting • Inventory labeling ⇒ 20+ ML projects distributed in 2 teams + delivered models to other teams to maintain 10
  • 11.
    Data Product Principles Wefollow clean code principles PoCs are temporary We build solid, resilient deployment processes Know our models health at every point in time Quality, Testing & Monitoring We integrate into the engineering ecosystem and leverage open-source Data workflows are efficient and cost-effective We take reproducibility and modularity seriously Engineering We Promote the Data Product mindset We deeply integrate with data stakeholders Stakeholder Engagement We actively manage the unknowns in our planning Data analytics dynamically informs our project plans Workflow Customer and business value over fancy solutions Exploration is one of our goals Strategy We value small iterations on existing models We value explainability over pure accuracy Performance is proven online Model Our principles explained 11
  • 12.
  • 13.
    How We Started Pros ●Widely used by ML practitioners ● Good to start new projects & prototype ● Great visualization Cons ● No proper version control ● No code reuse ● No automatic testing 13
  • 14.
    A Major Improvement Pros ●Tests included in library ● Version control with code review ● Maintainable projects Cons ● No CI/CD ● No model tracking 14
  • 15.
  • 16.
    From Amsterdam: Volendam, Markenand Windmills ML Platform Key Features ● CI/CD ● Model Tracking ● Batch & Online inference 16
  • 17.
    ML Platform Principles •Maximize data scientist’s model ownership • Reproducible Machine Learning • Reuse most of our existing infrastructure • Build incrementally • Use open-source and open standards 17
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
    From MLflow toBentoML Example import mlflow from iris_classifier import IrisClassifier # Load mlflow model mlflow_model = mlflow.sklearn.load_model(model_uri) # Create a iris classifier service instance iris_classifier_service = IrisClassifier() # Pack the newly trained model artifact iris_classifier_service.pack("model", mlflow_model) # Save the prediction service for model serving saved_path = iris_classifier_service.save() from bentoml import env, artifacts, api, BentoService from bentoml.adapters import DataframeInput from bentoml.frameworks.sklearn import SklearnModelArtifact @artifacts([SklearnModelArtifact("model")]) class IrisClassifier(BentoService): """A minimum prediction service""" @api(input=DataframeInput()) def predict(self, df: pd.DataFrame): """An inference API named `predict`""" return self.artifacts.model.predict(df) iris_classifier.py mlflow2bentoml.py 24
  • 25.
  • 26.
  • 27.
    ● The integrationwith the existing architecture fosters proactive collaboration ● The tool space is very new making the exploration vital but time consuming ● A design review is necessary to align everyone Learnings 27
  • 28.
    Amsterdam: Moco Museum Conclusion ● Aswe grew, we needed to refine our Data science process ● Software engineering good practices + special twist for ML ● Our platform helps Data Scientists to ○ Build faster ○ Deploy safer ○ Document automatically 28
  • 29.
    Feedback Your feedback isimportant to us. Don’t forget to rate and review the sessions. 29