Agile Deep Learning
David Murgatroyd (@dmurga)
@dmurga
@dmurga
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
If a typical person can do a mental task with
less than one second of thought, we can
probably automate it using AI either now or in
the near future.
- Andrew Ng, HBR Nov 2016
Identifying a Problem: Perception
@dmurga
For any concrete, repeated event that we
observe, we can reasonably try to predict the
outcome of the next such event.
- Andrew Ng, NIPS 2016
Identifying a Problem: Prediction
@dmurga
If a desire for content is shared by many
individuals but should be met in ways specific
to each of them, we can probably automate
satisfying those desires with AI.
- (yours truly, today :-)
Identifying a Problem: Personalization
@dmurga
@dmurga
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Need both:
Offline: for quick experimentation -- model
training and quality analysis
Online: for monitoring alignment with
business goals
Identifying Metrics
@dmurga
Art of identifying some part of signal to try to predict it.
Example ways to identify signal:
‣ Create it directly: humans annotate the right output.
‣ Usage of feature: predict historical usage from data prior to it.
‣ Usage of other features: identify other ways users satisfied need.
‣ Outside product: predict related data in the same domain.
Identifying Metrics: Offline
@dmurga
Usage metrics:
‣ How many users?
‣ How much do they use it?
‣ How often do they use it?
‣ How do they use it?
Explicit feedback metrics:
‣ Thumbs up or down, etc.
Identifying Metrics: Online
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Concrete data is the way deep learning products are
specified. Uses
‣ Truth: start with manual generation, especially by product
manager.
‣ Fodder: not the ultimate output, but valuable for training
subcomponents
‣ Baseline: output of simplest solution you can think of,
worst case random.
Watch out for bias against under-represented subpopulations!
Identifying Data
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Identifying Model(s)
Quantity of data?
Structure of data?
1D Discrete
Categorical
2D+ Continuous
1D Continuous
R
U
L
E
S
SVM
CRF
CNN
RNN (BiLSTM)
GAN ...ever deeper
with richer
attention
FF
Deep RL
@dmurga
Seed with pre-trained models from similar tasks.
Consider other properties of model’s output:
‣ interpretable
‣ confidence scores
‣ time / space performance
Identifying Model(s)
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Explore
@dmurga
Experiment
@dmurga
Experiment
Gradually increase:
‣ Amount of data used to train / dev
‣ Amount of data used to test
‣ Complexity of model
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Analyze: Bugs v. Errors
‣ Error: incorrect output from a
model despite the model
being correctly implemented.
Egregious examples are
“howlers” or “WTFs”.
‣ Bug: implementation does
something other than what
was intended
This distinction is useful for
managing expectations about cost
of addressing.
Bug Error
@dmurga
Analyze: Isolate functional tests
Options:
Black-box style: ensure “can’t be
wrong” (“earmark”) input/output
pairs. Might lead to spurious test
failures.
@dmurga
Analyze: Isolate functional tests
Options:
Black-box style: ensure “can’t be
wrong” (“earmark”) input/output
pairs. Might lead to spurious test
failures.
Clear-box style: use a mock
implementation of the model that
produces expected answers.
@dmurga
Analyze: Automate all tests
Deep Learning’s dependence on
data means changing anything
changes everything.
Look at aggregate results across
data sets to gauge importance.
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Prioritize
@dmurga
High training error?
More training epochs Bigger / new model
@dmurga
High development error? (train error OK)
@dmurga
High development error?
More data Regularize / new model
Dev error
Dev error
@dmurga
High evaluation error? (train/dev OK)
Get more development data similar to test data
so you can return to the
“High development error?” step.
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
@dmurga
Kinds of overrides:
‣ Always give this answer.
‣ Never give this answer.
Beware of ‘whack a mole’.
Be sad when overrides are used.
Productize: Overrides
@dmurga
Productize: How and when to scale
Move from data parallelism to
model parallelism as there’s first
more data then more complex
models.
Only scale rest of product when
you’re sure what problem you’re
solving.
@dmurga
Productize: Milestones
1. By hand examples
2. Glued-together with some rules
(Prototype)
3. Functions on some data (“Labs” /
Alpha)
4. Measurable & inspectable (0.1% /
early Beta)
5. Accurate, not slow, nice demo,
documented & configurable (1% /
late Beta)
6. Simple & fast (100% / GA)
7. Handle new domains (post-100%
/ post-GA)
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning: Early Iterations (Spikes)
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning: Middle Sprints
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning: Late Sprints
@dmurga
Identify
Problem
Identify
Metrics
Identify
Data
Identify
Model(s)
Explore &
Experiment
Analyze Prioritize Productize
Agile Deep Learning
Thanks! Questions?
David Murgatroyd (@dmurga)
Suggestions:
Pros and cons of different team organizations strategies?
What are the different roles in a Deep Learning oriented group?
Does Scrum or Kanban work better for Deep Learning?
What about “presentation bias” for measuring on historic data?
We’re hiring in
Boston, NYC,
and Stockholm!
Appendix
David Murgatroyd (@dmurga)
How does deep learning
affect team
organization?
45
Machine Learning Expert
Encourages alignment with
business goals.
Challenges machine learning
collaboration, depth and reuse.
Best for products with many
small, simpler models.
Option 1: integrated
teams with cross-team
groups (chapters!)
46
Encourages machine learning
collaboration, depth and reuse.
Challenges alignment with
business goals.
Best for products with fewer large,
complex model(s).
Option 2: independent
machine learning team
delivering models
47
Just one kind of ML Expert?
48
Machine
Learning
Expert
Machine
Learning
Expert
49
50
Carpenters
Blacksmiths
Miners
51
52
An Applied Machine Learning Engineer:
● crafts specific (parts of) products
● by applying tools (e.g., libraries)
● to materials (e.g., data)
with an understanding of what sort of product is desired.
Carpenters
53
A Machine Learning Toolist (Engineer/Scientist):
● implements practical machine learning ideas into
industrial-strength tools (like a blacksmith firing metal into
carpentry tools)
● understands the latest in ML theory and prototypes to see what’s
practical (like a blacksmith smelting ore into metal)
Blacksmiths
54
An Machine Learning Theoretician:
● distills new material from nature to be made into tools
● understands the fundamental characteristics of that material to inform its use
Miners
Carpenters
Blacksmiths
Miners
55
Carpenters
Applied ML Eng
Blacksmiths
ML Toolist
Miners
ML Theoretician
56

Agile Deep Learning