Introduction to deep learning in python and Matlab

Introduction to Hands-on
Deep Learning
Imry Kissos
Algorithm Researcher

Outline
● Problem Definition
● Motivation
● Training a Regression DNN
● Training a Classification DNN
● Open Source Packages
● Summary + Questions
2

Problem Definition
3
Deep
Convolutional
Network

Tutorial
● Goal: Detect facial
landmarks on (normal)
face images
● Data set provided by
Dr. Yoshua Bengio
● Tutorial code available:
https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4

Flow
5
Predict Points
on Test Set
Train Model
General
Train Model
“Nose Tip”
Train Model
“Mouth Corners”

Flow
6
Train Images
Train Points
Fit Trained
Net

Flow
7
Test
Images
Predict Predicted
Points

Python DL Framework
Wrapper to Lasagne
Theano extension for Deep Learning
Define, optimize, and evaluate mathematical expressions
Efficient Cuda GPU for DNN
8
Low Level
High Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows

Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
9

1. Data Analysis
a. Exploration + Validation
b. Pre-Processing
c. Batch and Split
3. Optimization
4. Training the DNN
10

Data Exploration + Validation
Data:
● 7K gray-scale images of detected faces
● 96x96 pixels per image
● 15 landmarks per image (?)
Data validation:
● Some Landmarks are missing
11
1

Pre-Processing
12
Data
Normalization
Shuffle train data

Batch
-
- t - train batch
- validation batch
- - test batch
⇐One Epoch’s data
13train/valid/test splits are constant

Train / Validation Split
14
Classification - Train/Validation preserve classes proportion

1. Data Analysis
a. Layers Definition
b. Layers Implementation
3. Optimization
4. Training
15

Architecture
16
X Y
Conv Pool Dense Output

1. Data Analysis
3. Optimization
a. Back Propagation
b. Objective
c. SGD
d. Updates
e. Convergence Tuning
4. Training the DNN 22

Back Propagation
Forward Path
23
Conv Dense
X Y
Output
Points

Back Propagation
Forward Path
24
X Y
Conv
Output
PointsDense
X Y
Training
Points

Back Propagation
Backward Path
25
X Y
Conv Dense

Back Propagation
Update
26
Conv Dense
For All Layers:

S.G.D
28Updates the network after each batch

Optimization - Updates
29
Alec Radford

Adjusting Learning Rate & Momentum
30
Linear in epoch

Convergence Tuning
31
stops according to validation loss
returns best weights

1. Data Analysis
3. Optimization
4. Training the DNN
a. Fit
b. Fine Tune Pre-Trained
c. Learning Curves
32

Fit
33
Loop over test batchs
Forward
Loop over train batchs
Forward+BackProp

Fine Tune Pre-Trained
fgd
34
change output layer
load pre-trained weight
fine tune specialist

Learning Curves
Loop over 6 Nets:
35
Epochs

Learning Curves Analysis
36
Net 1
Net 2
OverfittingConvergence
Jittering
EpochsEpochs
RMSE
RMSE

Part 1 Summary
Training a DNN:
37

Python
● Rich eco-system
● State-of-the-art
● Easy to port from prototype to production
38
https://github.com/yoavram/Py4Eng

Python DL Framework
39
Theano based Packages

Outline
● Problem Definition
● Motivation
● Training a regression DNN
● Training a classification DNN
● Improving the DNN
● Open Source Packages
● Summary
42

Matlab DL Framework
Open Source CNN Toolbox by
Numerical computing using Parallel Computing Toolbox
Efficient Cuda GPU for DNN
43
Low Level
High Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows

Problem Statement
Classify a, b, …, z images into 26 classes:
44http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Bonus - OCR:

1. Data Analysis
2. Training the DNN
4. Optimization
45

Data Analysis
46
Defines training vs validation
Class uint per class [1,26]

Data Pre-Processing
Image
47
scalar

Customized Batch Loading
49
How would you add Data Augmentation ?

trainOpts
50
Start from last iter if interrupted

initializeCharCnn()
Net Architecture
Layers:
● Conv
● Pool
● Conv
● Pool
● Conv
● Relu
● Conv
● SoftMaxLoss
51
%f is the W initial std

Optimization
SoftMax
Score (-∞,∞) → probabilities [0,1]
52https://classroom.udacity.com/courses/ud730

One Hot Encoding
Encode class labels
53

Cross Entropy
Distance measure between S(Y) and Labels
54

Cross Entropy
55
D(S,L) is a positive scalar
t - index of ground-truth class

Cross Entropy
56
In vl_nnloss.m:

Minimize Loss
Loss=average cross entropy
58

Minimize loss
- learning rate
59

Error Rate
TopK - Target label is one of the top K predictions
The Error Rate is:
60

Loss & Error Convergence
61
Loss Error Rate

Beyond Training
1. Training a classification DNN
2. Improving the DNN
a. Analysis Capabilities
b. Augmentation
3. Open Source Packages
4. Summary
65

Basic VS Advanced Mode
66
Basic
Advance

Improving the DNN
Very tempting:
● >1M images
● >1M parameters
● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
67

Analysis Capabilities
1. Theoretical explanation
a. Eg. dropout/augmentation decrease overfit
2. Empirical claims about a phenomena
a. Eg. normalization helps convergence
3. Numerical understanding
a. Eg. exploding / vanishing updates
68

Reduce Overfitting
Solution:
Data Augmentation
69
Net 1
Net 2
Overfitting
Epochs

Data Augmentation
Horizontal Flip Perturbation
70
1

Convergence Challenges
71
Need to monitor forward + backward path
EpochsEpochs
RMSE
Data ErrorNormalization

Deal with NaN
1. If in first 100 iterations
a. Learning rate is too high
2. Beyond 100 iterations
a. Gradient explosion
i. Consider gradient clipping
b. Illegal math operation
i. SoftMax: inf/inf
ii. Division by zero by one of your customized layers
72http://russellsstewart.com//notes/0.html

The Net Doesn’t Learn Anything
1. Training loss does not reduce after first 100 iterations
a. Reduce the training size to 10 instances (images) to overfit it
i. Achieve 100% training accuracy on a small portion of data
b. Change batch size to 1 to and monitor the error per batch
c. Solve the simplest version of your problem
73
http://russellsstewart.com//notes/0.html

Beyond Training
1. Training a classification DNN
2. Improving the DNN
3. Open Source Packages
a. DL Open Source Packages
b. Effort Estimation
4. Summary
74

Tips from Other Packages
Torch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format
defining experiment’s configuration
75

DL Open Source Packages
76
Caffe & MatConvNet for applications
Torch, TensorFlow and Theano for research on DL
http://fastml.com/torch-vs-theano/
Simple dnnComplex dnn

Disruptive Effort Estimation
Feature Eng Deep Learning
77Modest SW Infra Huge SW Infra

Summary
● Dove into Training a DNN
● Presented Analysis Capabilities
● Reviewed Open Source Packages
78

References
Hinton Coursera Neuronal Network
https://www.coursera.org/course/neuralnets
Udacity Tensor Flow course
https://classroom.udacity.com/courses/ud730
Technion Deep Learning course
http://moodle.technion.ac.il/course/view.php?id=4128
Oxford Deep Learning course
https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
CS231n CNN for Visual Recognition
http://cs231n.github.io/
Deep Learning Book
http://www.deeplearningbook.org/
79

Introduction to deep learning in python and Matlab

More Related Content

What's hot

Viewers also liked

Similar to Introduction to deep learning in python and Matlab

Recently uploaded

Introduction to deep learning in python and Matlab