Introduction to Hands-on
Deep Learning
Imry Kissos
Algorithm Researcher
Outline
● Problem Definition
● Motivation
● Training a Regression DNN
● Training a Classification DNN
● Open Source Packages
● Summary + Questions
2
Problem Definition
3
Deep
Convolutional
Network
Tutorial
● Goal: Detect facial
landmarks on (normal)
face images
● Data set provided by
Dr. Yoshua Bengio
● Tutorial code available:
https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4
Flow
5
Predict Points
on Test Set
Train Model
General
Train Model
“Nose Tip”
Train Model
“Mouth Corners”
Flow
6
Train Images
Train Points
Fit Trained
Net
Flow
7
Test
Images
Predict Predicted
Points
Python DL Framework
Wrapper to Lasagne
Theano extension for Deep Learning
Define, optimize, and evaluate mathematical expressions
Efficient Cuda GPU for DNN
8
Low Level
High Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
9
Training a Deep Neural Network
1. Data Analysis
a. Exploration + Validation
b. Pre-Processing
c. Batch and Split
2. Architecture Engineering
3. Optimization
4. Training the DNN
10
Data Exploration + Validation
Data:
● 7K gray-scale images of detected faces
● 96x96 pixels per image
● 15 landmarks per image (?)
Data validation:
● Some Landmarks are missing
11
1
Pre-Processing
12
Data
Normalization
Shuffle train data
Batch
-
- t - train batch
- validation batch
- - test batch
⇐One Epoch’s data
13train/valid/test splits are constant
Train / Validation Split
14
Classification - Train/Validation preserve classes proportion
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
a. Layers Definition
b. Layers Implementation
3. Optimization
4. Training
15
Architecture
16
X Y
Conv Pool Dense Output
Layers Definition
17
Activation Function
18
1
ReLU
Dense Layer
19
Dropout
20
Dropout
21
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
a. Back Propagation
b. Objective
c. SGD
d. Updates
e. Convergence Tuning
4. Training the DNN 22
Back Propagation
Forward Path
23
Conv Dense
X Y
Output
Points
Back Propagation
Forward Path
24
X Y
Conv
Output
PointsDense
X Y
Training
Points
Back Propagation
Backward Path
25
X Y
Conv Dense
Back Propagation
Update
26
Conv Dense
For All Layers:
Objective
27
S.G.D
28Updates the network after each batch
Optimization - Updates
29
Alec Radford
Adjusting Learning Rate & Momentum
30
Linear in epoch
Convergence Tuning
31
stops according to validation loss
returns best weights
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
a. Fit
b. Fine Tune Pre-Trained
c. Learning Curves
32
Fit
33
Loop over test batchs
Forward
Loop over train batchs
Forward+BackProp
Fine Tune Pre-Trained
fgd
34
change output layer
load pre-trained weight
fine tune specialist
Learning Curves
Loop over 6 Nets:
35
Epochs
Learning Curves Analysis
36
Net 1
Net 2
OverfittingConvergence
Jittering
EpochsEpochs
RMSE
RMSE
Part 1 Summary
Training a DNN:
37
Python
● Rich eco-system
● State-of-the-art
● Easy to port from prototype to production
38
https://github.com/yoavram/Py4Eng
Python DL Framework
39
Theano based Packages
Part 1 End
Break
Part 2
Outline
● Problem Definition
● Motivation
● Training a regression DNN
● Training a classification DNN
● Improving the DNN
● Open Source Packages
● Summary
42
Matlab DL Framework
Open Source CNN Toolbox by
Numerical computing using Parallel Computing Toolbox
Efficient Cuda GPU for DNN
43
Low Level
High Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows
Problem Statement
Classify a, b, …, z images into 26 classes:
44http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Bonus - OCR:
Training a Deep Neural Network
1. Data Analysis
2. Training the DNN
3. Architecture Engineering
4. Optimization
45
Data Analysis
46
Defines training vs validation
Class uint per class [1,26]
Data Pre-Processing
Image
47
scalar
Training Flow
48
Customized Batch Loading
49
How would you add Data Augmentation ?
trainOpts
50
Start from last iter if interrupted
initializeCharCnn()
Net Architecture
Layers:
● Conv
● Pool
● Conv
● Pool
● Conv
● Relu
● Conv
● SoftMaxLoss
51
%f is the W initial std
Optimization
SoftMax
Score (-∞,∞) → probabilities [0,1]
52https://classroom.udacity.com/courses/ud730
One Hot Encoding
Encode class labels
53
Cross Entropy
Distance measure between S(Y) and Labels
54
Cross Entropy
Distance measure between S(Y) and Labels
55
D(S,L) is a positive scalar
t - index of ground-truth class
Cross Entropy
Distance measure between S(Y) and Labels
56
In vl_nnloss.m:
Training Goal
57
CNN
Minimize Loss
Loss=average cross entropy
58
Minimize loss
- learning rate
59
Error Rate
TopK - Target label is one of the top K predictions
The Error Rate is:
60
Loss & Error Convergence
61
Loss Error Rate
Learned Filters
62
OCR Evaluation
63
OCR Evaluation
64
Beyond Training
1. Training a classification DNN
2. Improving the DNN
a. Analysis Capabilities
b. Augmentation
3. Open Source Packages
4. Summary
65
Basic VS Advanced Mode
66
Basic
Advance
Improving the DNN
Very tempting:
● >1M images
● >1M parameters
● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
67
Analysis Capabilities
1. Theoretical explanation
a. Eg. dropout/augmentation decrease overfit
2. Empirical claims about a phenomena
a. Eg. normalization helps convergence
3. Numerical understanding
a. Eg. exploding / vanishing updates
68
Reduce Overfitting
Solution:
Data Augmentation
69
Net 1
Net 2
Overfitting
Epochs
Data Augmentation
Horizontal Flip Perturbation
70
1
Convergence Challenges
71
Need to monitor forward + backward path
EpochsEpochs
RMSE
Data ErrorNormalization
Deal with NaN
1. If in first 100 iterations
a. Learning rate is too high
2. Beyond 100 iterations
a. Gradient explosion
i. Consider gradient clipping
b. Illegal math operation
i. SoftMax: inf/inf
ii. Division by zero by one of your customized layers
72http://russellsstewart.com//notes/0.html
The Net Doesn’t Learn Anything
1. Training loss does not reduce after first 100 iterations
a. Reduce the training size to 10 instances (images) to overfit it
i. Achieve 100% training accuracy on a small portion of data
b. Change batch size to 1 to and monitor the error per batch
c. Solve the simplest version of your problem
73
http://russellsstewart.com//notes/0.html
Beyond Training
1. Training a classification DNN
2. Improving the DNN
3. Open Source Packages
a. DL Open Source Packages
b. Effort Estimation
4. Summary
74
Tips from Other Packages
Torch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format
defining experiment’s configuration
75
DL Open Source Packages
76
Caffe & MatConvNet for applications
Torch, TensorFlow and Theano for research on DL
http://fastml.com/torch-vs-theano/
Simple dnnComplex dnn
Disruptive Effort Estimation
Feature Eng Deep Learning
77Modest SW Infra Huge SW Infra
Summary
● Dove into Training a DNN
● Presented Analysis Capabilities
● Reviewed Open Source Packages
78
References
Hinton Coursera Neuronal Network
https://www.coursera.org/course/neuralnets
Udacity Tensor Flow course
https://classroom.udacity.com/courses/ud730
Technion Deep Learning course
http://moodle.technion.ac.il/course/view.php?id=4128
Oxford Deep Learning course
https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
CS231n CNN for Visual Recognition
http://cs231n.github.io/
Deep Learning Book
http://www.deeplearningbook.org/
79
Questions?
80
DNN

Introduction to deep learning in python and Matlab