From multiplication to 
convolutional networks 
How do ML with Theano
Today’s Talk 
● A motivating problem 
● Understanding a model based framework 
● Theano 
○ Linear Regression 
○ Logistic Regression 
○ Net 
○ Modern Net 
○ Convolutional Net
Follow along 
Tutorial code at: 
https://github.com/Newmu/Theano-Tutorials 
Data at: 
http://yann.lecun.com/exdb/mnist/ 
Slides at: 
http://goo.gl/vuBQfe
A motivating problem 
How do we program a computer to recognize a picture of a 
handwritten digit as a 0-9? 
What could we do?
A dataset - MNIST 
What if we have 60,000 of these images and their label? 
X = images 
Y = labels 
X = (60000 x 784) #matrix (list of lists) 
Y = (60000) #vector (list) 
Given X as input, predict Y
An idea 
For each image, find the “most similar” image and guess 
that as the label.
An idea 
For each image, find the “most similar” image and guess 
that as the label. 
KNearestNeighbors ~95% accuracy
Trying things 
Make some functions computing relevant information for 
solving the problem
What we can code 
Make some functions computing relevant information for 
solving the problem 
feature engineering
What we can code 
Hard coded rules are brittle and often aren’t obvious or 
apparent for many problems.
Model 
A Machine Learning Framework 
8 
Inputs Computation Outputs
from arXiv:1409.4842v1 [cs.CV] 17 Sep 2014 
A … model? - GoogLeNet
3 mult 
Input Computation Output 
A very simple model 
by x 12
Theano intro
Theano intro 
imports
Theano intro 
imports 
theano symbolic variable initialization
Theano intro 
imports 
theano symbolic variable initialization 
our model
Theano intro 
imports 
theano symbolic variable initialization 
our model 
compiling to a python function
Theano intro 
imports 
theano symbolic variable initialization 
our model 
compiling to a python function 
usage
Theano
Theano 
imports
Theano 
imports 
training data generation
Theano 
imports 
training data generation 
symbolic variable initialization
Theano 
imports 
training data generation 
symbolic variable initialization 
our model
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s)
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s) 
how to change parameter based on learning signal
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s) 
how to change parameter based on learning signal 
compiling to a python function
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s) 
how to change parameter based on learning signal 
compiling to a python function 
iterate through data 100 times and train model 
on each example of input, output pairs
Theano doing its thing
Zero One Two Three Four Five Six Seven Eight Nine 
0.1 
0. 0. 0.1 0. 0. 0. 0. 0.7 0.1 
Logistic Regression 
softmax(X) 
T.dot(X, w)
Back to Theano
Back to Theano 
convert to correct dtype
Back to Theano 
convert to correct dtype 
initialize model parameters
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions 
classification metric to optimize
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions 
classification metric to optimize 
compile prediction function
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions 
classification metric to optimize 
compile prediction function 
train on mini-batches of 128 
examples
0 1 2 3 4 5 6 7 8 9 
What it learns
0 1 2 3 4 5 6 7 8 9 
What it learns 
Test Accuracy: 92.5%
Zero One Two Three Four Five Six Seven Eight Nine 
0.0 
0. 0. 0.1 0. 0. 0. 0. 0.9 0. 
y = softmax(T.dot(h, wo)) 
h = T.nnet.sigmoid(T.dot(X, wh)) 
An “old” net (circa 2000)
A “old” net in Theano
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters
2D moons dataset 
courtesy of scikit-learn 
Understanding SGD
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters 
2 layers of computation 
input -> hidden (sigmoid) 
hidden -> output (softmax)
Understanding Sigmoid Units
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters 
2 layers of computation 
input -> hidden (sigmoid) 
hidden -> output (softmax) 
initialize both weight matrices
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters 
2 layers of computation 
input -> hidden (sigmoid) 
hidden -> output (softmax) 
initialize both weight matrices 
updated version of updates
What an “old” net learns 
Test Accuracy: 98.4%
Zero One Two Three Four Five Six Seven Eight Nine 
0.0 
0. 0. 0.1 0. 0. 0. 0. 0.9 0. 
y = softmax(T.dot(h2, wo)) 
h2 = rectify(T.dot(h, wh)) 
h = rectify(T.dot(X, wh)) 
Noise 
Noise 
Noise 
(or augmentation) 
A “modern” net - 2012+
A “modern” net in Theano
rectifier 
A “modern” net in Theano
Understanding rectifier units
rectifier 
numerically stable softmax 
A “modern” net in Theano
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
A “modern” net in Theano
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
scale the gradient based on running average 
A “modern” net in Theano
2D moons dataset 
courtesy of scikit-learn 
Understanding RMSprop
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
scale the gradient based on running average 
A “modern” net in Theano 
randomly drop values and scale rest
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
scale the gradient based on running average 
A “modern” net in Theano 
randomly drop values and scale rest 
Noise injected into model 
rectifiers now used 
2 hidden layers
What a “modern” net learns 
Test Accuracy: 99.0%
Quantifying the difference
What a “modern” net is doing
from deeplearning.net 
Convolutional Networks
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
highest conv layer has 128 filters and a 3x3 grid of responses 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
highest conv layer has 128 filters and a 3x3 grid of responses 
A convolutional network in Theano 
noise during training
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
highest conv layer has 128 filters and a 3x3 grid of responses 
A convolutional network in Theano 
noise during training 
no noise for prediction
Test Accuracy: 99.5% 
What a convolutional network learns
Takeaways 
● A few tricks are needed to get good results 
○ Noise important for regularization 
○ Rectifiers for faster, better, learning 
○ Don’t use SGD - lots of cheap simple improvements 
● Models need room to compute. 
● If your data has structure, your model should 
respect it.
Resources 
● More in-depth theano tutorials 
○ http://www.deeplearning.net/tutorial/ 
● Theano docs 
○ http://www.deeplearning.net/software/theano/library/ 
● Community 
○ http://www.reddit.com/r/machinelearning
A plug 
Keep up to date with indico: 
https://indico1.typeform.com/to/DgN5SP
Questions?

Introduction to Deep Learning with Python

  • 1.
    From multiplication to convolutional networks How do ML with Theano
  • 2.
    Today’s Talk ●A motivating problem ● Understanding a model based framework ● Theano ○ Linear Regression ○ Logistic Regression ○ Net ○ Modern Net ○ Convolutional Net
  • 3.
    Follow along Tutorialcode at: https://github.com/Newmu/Theano-Tutorials Data at: http://yann.lecun.com/exdb/mnist/ Slides at: http://goo.gl/vuBQfe
  • 4.
    A motivating problem How do we program a computer to recognize a picture of a handwritten digit as a 0-9? What could we do?
  • 5.
    A dataset -MNIST What if we have 60,000 of these images and their label? X = images Y = labels X = (60000 x 784) #matrix (list of lists) Y = (60000) #vector (list) Given X as input, predict Y
  • 6.
    An idea Foreach image, find the “most similar” image and guess that as the label.
  • 7.
    An idea Foreach image, find the “most similar” image and guess that as the label. KNearestNeighbors ~95% accuracy
  • 8.
    Trying things Makesome functions computing relevant information for solving the problem
  • 9.
    What we cancode Make some functions computing relevant information for solving the problem feature engineering
  • 10.
    What we cancode Hard coded rules are brittle and often aren’t obvious or apparent for many problems.
  • 11.
    Model A MachineLearning Framework 8 Inputs Computation Outputs
  • 12.
    from arXiv:1409.4842v1 [cs.CV]17 Sep 2014 A … model? - GoogLeNet
  • 13.
    3 mult InputComputation Output A very simple model by x 12
  • 14.
  • 15.
  • 16.
    Theano intro imports theano symbolic variable initialization
  • 17.
    Theano intro imports theano symbolic variable initialization our model
  • 18.
    Theano intro imports theano symbolic variable initialization our model compiling to a python function
  • 19.
    Theano intro imports theano symbolic variable initialization our model compiling to a python function usage
  • 20.
  • 21.
  • 22.
    Theano imports trainingdata generation
  • 23.
    Theano imports trainingdata generation symbolic variable initialization
  • 24.
    Theano imports trainingdata generation symbolic variable initialization our model
  • 25.
    Theano imports trainingdata generation symbolic variable initialization our model model parameter initialization
  • 26.
    Theano imports trainingdata generation symbolic variable initialization our model model parameter initialization metric to be optimized by model
  • 27.
    Theano imports trainingdata generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s)
  • 28.
    Theano imports trainingdata generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s) how to change parameter based on learning signal
  • 29.
    Theano imports trainingdata generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s) how to change parameter based on learning signal compiling to a python function
  • 30.
    Theano imports trainingdata generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s) how to change parameter based on learning signal compiling to a python function iterate through data 100 times and train model on each example of input, output pairs
  • 31.
  • 32.
    Zero One TwoThree Four Five Six Seven Eight Nine 0.1 0. 0. 0.1 0. 0. 0. 0. 0.7 0.1 Logistic Regression softmax(X) T.dot(X, w)
  • 33.
  • 34.
    Back to Theano convert to correct dtype
  • 35.
    Back to Theano convert to correct dtype initialize model parameters
  • 36.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format
  • 37.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices
  • 38.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types
  • 39.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions
  • 40.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions classification metric to optimize
  • 41.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions classification metric to optimize compile prediction function
  • 42.
    Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions classification metric to optimize compile prediction function train on mini-batches of 128 examples
  • 43.
    0 1 23 4 5 6 7 8 9 What it learns
  • 44.
    0 1 23 4 5 6 7 8 9 What it learns Test Accuracy: 92.5%
  • 45.
    Zero One TwoThree Four Five Six Seven Eight Nine 0.0 0. 0. 0.1 0. 0. 0. 0. 0.9 0. y = softmax(T.dot(h, wo)) h = T.nnet.sigmoid(T.dot(X, wh)) An “old” net (circa 2000)
  • 46.
    A “old” netin Theano
  • 47.
    A “old” netin Theano generalize to compute gradient descent on all model parameters
  • 48.
    2D moons dataset courtesy of scikit-learn Understanding SGD
  • 49.
    A “old” netin Theano generalize to compute gradient descent on all model parameters 2 layers of computation input -> hidden (sigmoid) hidden -> output (softmax)
  • 50.
  • 51.
    A “old” netin Theano generalize to compute gradient descent on all model parameters 2 layers of computation input -> hidden (sigmoid) hidden -> output (softmax) initialize both weight matrices
  • 52.
    A “old” netin Theano generalize to compute gradient descent on all model parameters 2 layers of computation input -> hidden (sigmoid) hidden -> output (softmax) initialize both weight matrices updated version of updates
  • 53.
    What an “old”net learns Test Accuracy: 98.4%
  • 54.
    Zero One TwoThree Four Five Six Seven Eight Nine 0.0 0. 0. 0.1 0. 0. 0. 0. 0.9 0. y = softmax(T.dot(h2, wo)) h2 = rectify(T.dot(h, wh)) h = rectify(T.dot(X, wh)) Noise Noise Noise (or augmentation) A “modern” net - 2012+
  • 55.
  • 56.
  • 57.
  • 58.
    rectifier numerically stablesoftmax A “modern” net in Theano
  • 59.
    rectifier numerically stablesoftmax a running average of the magnitude of the gradient A “modern” net in Theano
  • 60.
    rectifier numerically stablesoftmax a running average of the magnitude of the gradient scale the gradient based on running average A “modern” net in Theano
  • 61.
    2D moons dataset courtesy of scikit-learn Understanding RMSprop
  • 62.
    rectifier numerically stablesoftmax a running average of the magnitude of the gradient scale the gradient based on running average A “modern” net in Theano randomly drop values and scale rest
  • 63.
    rectifier numerically stablesoftmax a running average of the magnitude of the gradient scale the gradient based on running average A “modern” net in Theano randomly drop values and scale rest Noise injected into model rectifiers now used 2 hidden layers
  • 64.
    What a “modern”net learns Test Accuracy: 99.0%
  • 65.
  • 66.
    What a “modern”net is doing
  • 67.
  • 68.
  • 69.
    a “block” ofcomputation conv -> activate -> pool -> noise A convolutional network in Theano
  • 70.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix A convolutional network in Theano
  • 71.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format A convolutional network in Theano
  • 72.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix A convolutional network in Theano
  • 73.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) A convolutional network in Theano
  • 74.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) highest conv layer has 128 filters and a 3x3 grid of responses A convolutional network in Theano
  • 75.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) highest conv layer has 128 filters and a 3x3 grid of responses A convolutional network in Theano noise during training
  • 76.
    a “block” ofcomputation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) highest conv layer has 128 filters and a 3x3 grid of responses A convolutional network in Theano noise during training no noise for prediction
  • 77.
    Test Accuracy: 99.5% What a convolutional network learns
  • 78.
    Takeaways ● Afew tricks are needed to get good results ○ Noise important for regularization ○ Rectifiers for faster, better, learning ○ Don’t use SGD - lots of cheap simple improvements ● Models need room to compute. ● If your data has structure, your model should respect it.
  • 79.
    Resources ● Morein-depth theano tutorials ○ http://www.deeplearning.net/tutorial/ ● Theano docs ○ http://www.deeplearning.net/software/theano/library/ ● Community ○ http://www.reddit.com/r/machinelearning
  • 80.
    A plug Keepup to date with indico: https://indico1.typeform.com/to/DgN5SP
  • 81.