Introduction to Deep Learning with Python

From multiplication to
convolutional networks
How do ML with Theano

Today’s Talk
● A motivating problem
● Understanding a model based framework
● Theano
○ Linear Regression
○ Logistic Regression
○ Net
○ Modern Net
○ Convolutional Net

Follow along
Tutorial code at:
https://github.com/Newmu/Theano-Tutorials
Data at:
http://yann.lecun.com/exdb/mnist/
Slides at:
http://goo.gl/vuBQfe

A motivating problem
How do we program a computer to recognize a picture of a
handwritten digit as a 0-9?
What could we do?

A dataset - MNIST
What if we have 60,000 of these images and their label?
X = images
Y = labels
X = (60000 x 784) #matrix (list of lists)
Y = (60000) #vector (list)
Given X as input, predict Y

An idea
For each image, find the “most similar” image and guess
that as the label.

An idea
For each image, find the “most similar” image and guess
that as the label.
KNearestNeighbors ~95% accuracy

Trying things
Make some functions computing relevant information for
solving the problem

What we can code
Make some functions computing relevant information for
solving the problem
feature engineering

What we can code
Hard coded rules are brittle and often aren’t obvious or
apparent for many problems.

Model
A Machine Learning Framework
8
Inputs Computation Outputs

from arXiv:1409.4842v1 [cs.CV] 17 Sep 2014
A … model? - GoogLeNet

3 mult
Input Computation Output
A very simple model
by x 12

Theano intro
imports
theano symbolic variable initialization

Theano intro
imports
our model

Theano intro
imports
our model
compiling to a python function

Theano intro
imports
our model
usage

Theano
imports
training data generation

Theano
imports
symbolic variable initialization

Theano
imports
our model

Theano
imports
our model
model parameter initialization

Theano
imports
our model
metric to be optimized by model

Theano
imports
our model
learning signal for parameter(s)

Theano
imports
our model
how to change parameter based on learning signal

Theano
imports
our model

Theano
imports
our model
iterate through data 100 times and train model
on each example of input, output pairs

Zero One Two Three Four Five Six Seven Eight Nine
0.1
0. 0. 0.1 0. 0. 0. 0. 0.7 0.1
Logistic Regression
softmax(X)
T.dot(X, w)

Back to Theano
convert to correct dtype

Back to Theano
initialize model parameters

Back to Theano
our model in matrix format

Back to Theano
loading data matrices

Back to Theano
now matrix types

Back to Theano
now matrix types
probability outputs and maxima predictions

Back to Theano
now matrix types
classification metric to optimize

Back to Theano
now matrix types
compile prediction function

Back to Theano
now matrix types
compile prediction function
train on mini-batches of 128
examples

0 1 2 3 4 5 6 7 8 9
What it learns

0 1 2 3 4 5 6 7 8 9
What it learns
Test Accuracy: 92.5%

0.0
0. 0. 0.1 0. 0. 0. 0. 0.9 0.
y = softmax(T.dot(h, wo))
h = T.nnet.sigmoid(T.dot(X, wh))
An “old” net (circa 2000)

A “old” net in Theano
generalize to compute gradient descent on
all model parameters

2D moons dataset
courtesy of scikit-learn
Understanding SGD

2 layers of computation
input -> hidden (sigmoid)
hidden -> output (softmax)

initialize both weight matrices

initialize both weight matrices
updated version of updates

What an “old” net learns

0.0
0. 0. 0.1 0. 0. 0. 0. 0.9 0.
y = softmax(T.dot(h2, wo))
h2 = rectify(T.dot(h, wh))
h = rectify(T.dot(X, wh))
Noise
Noise
Noise
(or augmentation)
A “modern” net - 2012+

rectifier
A “modern” net in Theano

rectifier
numerically stable softmax

rectifier
a running average of the magnitude of the gradient

rectifier
scale the gradient based on running average

2D moons dataset
courtesy of scikit-learn
Understanding RMSprop

rectifier
randomly drop values and scale rest

rectifier
randomly drop values and scale rest
Noise injected into model
rectifiers now used
2 hidden layers

What a “modern” net learns

What a “modern” net is doing

from deeplearning.net
Convolutional Networks

A convolutional network in Theano

a “block” of computation conv -> activate -> pool -> noise

convert from 4tensor to normal matrix

reshape into conv 4tensor (b, c, 0, 1) format

now 4tensor for conv instead of matrix

conv weights (n_kernels, n_channels, kernel_w, kerbel_h)

highest conv layer has 128 filters and a 3x3 grid of responses

noise during training

noise during training
no noise for prediction

What a convolutional network learns

Takeaways
● A few tricks are needed to get good results
○ Noise important for regularization
○ Rectifiers for faster, better, learning
○ Don’t use SGD - lots of cheap simple improvements
● Models need room to compute.
● If your data has structure, your model should
respect it.

Resources
● More in-depth theano tutorials
○ http://www.deeplearning.net/tutorial/
● Theano docs
○ http://www.deeplearning.net/software/theano/library/
● Community
○ http://www.reddit.com/r/machinelearning

A plug
Keep up to date with indico:
https://indico1.typeform.com/to/DgN5SP

Introduction to Deep Learning with Python

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Deep Learning with Python

More from indico data

Recently uploaded

Introduction to Deep Learning with Python