From Artificial Neural
Networks to Deep learning
Viet-Trung Tran

1	
  
2	
  
3	
  
4	
  
5	
  
Perceptron
•  Rosenblatt 1957
•  input signals x1, x2, 
•  bias x0 = 1
•  Net input = weighted sum = Net(w,x)
•  Activation/transfer func = f(Net(w,x))
•  output
weighted	
  sum	
  
step	
  func1on	
  
6	
  
Weighted Sum and Bias
•  Weighted sum
•  Bias 
7	
  
8	
  
Hard-limiter function
•  Hard-limiter
– Threshold function
– Discontinuous function
– Discontinuous derivative

9	
  
Threshold logic function
•  Saturating linear
function
•  Contiguous
function
•  Discontinuous
derivative
10	
  
Sigmoid function
•  Most popular
•  Output (0,1)
•  Continuous derivatives
•  Easy to differentiate
11	
  
Artificial neural network – ANN
structure
•  Number of input/output signals
•  Number of hidden layers
•  Number of neurons per layer
•  Neuron weights
•  Topology
•  Biases
12	
  
Feed-forward neural network
•  connections between the units do not form a
directed cycle
13	
  
Recurrent neural network
•  A class of artificial neural network where
connections between units form a directed
cycle
14	
  
Why hidden layers
15	
  
Neural network learning 
•  2 types of learning
– Parameter learning
•  Learn neuron weight connections
– Structure learning
•  Learn ANN structure from training data
16	
  
Error function
•  Consider an ANN with n neurons
•  For each learning example (x,d)
– Training error caused by current weight w
•  Training error caused by w for entire learning
examples 
17	
  
Learning principle
18	
  
Neuron error gradients
19	
  
Parameter learning: back
propagation of error
•  Calculate total error at the top
•  Calculate contributions to error at each step going
backwards
20	
  
Back propagation discussion
•  Initial weights 
•  Learning rate
•  Number of neurons per hidden layers
•  Number of hidden layers
21	
  
Stochastic gradient descent
(SGD)
22	
  
23	
  
Deep learning
24	
  
Google brain
25	
  
GPU
26	
  
Learning from tagged data
•  @Andrew Ng
27	
  
2006 breakthrough
•  More data
•  Faster hardware: GPU’s, multi-core CPU’s
•  Working ideas on how to train deep
architectures
28	
  
29	
  
30	
  
31	
  
Deep Learning trends
•  @Andrew Ng
32	
  
33	
  
34	
  
AI will transform the internet
•  @Andrew Ng
•  Technology areas with potential for paradigm shift:
–  Computer vision
–  Speech recognition & speech synthesis
–  Language understanding: Machine translation; Web
search; Dialog systems; ….
–  Advertising
–  Personalization/recommendation systems
–  Robotics
•  All this is hard: scalability, algorithms. 
35	
  
36	
  
37	
  
38	
  
Deep learning
39	
  
40	
  
CONVOLUTIONAL NEURAL
NETWORK
http://colah.github.io/
41	
  
Convolution
•  Convolution is a mathematical operation on two
functions f and g, producing a third function that is
typically viewed as a modified version of one of the
original functions, 
42	
  
Convolutional neural networks
•  Conv Nets is a kind of neural network that
uses many identical copies of the same
neuron
– Large number of neurons 
– Large computational models 
– Number of actual weights (parameters) to be
learned fairly small
43	
  
A 2D Convolutional Neural
Network
•  a convolutional neural network can learn a neuron once and
use it in many places, making it easier to learn the model
and reducing error. 
44	
  
Structure of Conv Nets
•  Problem
– predict whether a human is speaking or not
•  Input: audio samples at different points in
time

45	
  
Simple approach
•  just connect them all to a fully-connected
layer
•  Then classify
46	
  
A more sophisticated approach 
•  Local properties of the data
–  frequency of sounds (increasing/decreasing)
•  Look at a small window of the audio sample
–  Create a group of neuron A to compute certain features
–  the output of this convolutional layer is fed into a fully-
connected layer, F
47	
  
48	
  
49	
  
Max pooling layer
50	
  
2D convolutional neural networks
51	
  
52	
  
53	
  
Three-dimensional convolutional
networks 
54	
  
Group of neurons: A
•  Bunch of neurons in parallel
•  all get the same inputs and compute different
features.
55	
  
Network in Network (Lin et al.
(2013)
56	
  
Conv Nets breakthroughs in
computer vision
•  Krizehvsky et al. (2012)
57	
  
Diferent Levels of Abstraction
58	
  
59	
  
60	
  
RECURRENT NEURAL
NETWORKS




http://colah.github.io/
61	
  
Recurrent Neural Networks (RNN)
have loops
•  A loop allows information to
be passed from one step of
the network to the next.
62	
  
Unroll RNN
•  recurrent neural networks are intimately
related to sequences and lists. 
63	
  
Examples
•  predict the last word in “the clouds are in the sky"
•  the gap between the relevant information and the
place that it’s needed is small
•  RNNs can learn to use the past information
64	
  
•  “I grew up in France… I speak fluent French.”
•  As the gap grows, RNNs become unable to
learn to connect the information. 
65	
  
LONG SHORT TERM MEMORY
NETWORKS
LSTM Networks
66	
  
LSTM networks
•  A Special kind of RNN
•  Capable of learning long-term dependencies
•  Structure in the form of a chain of repeating
modules of neural network
67	
  
RNN
•  repeating module has a very simple
structure, such as a single tanh layer
68	
  
•  The tanh(z) function is a rescaled version of
the sigmoid, and its output range is [ − 1,1]
instead of [0,1].
69	
  
LSTM networks
•  Repeating module consists of four neuron,
interacting in a very special way 
70	
  
Core idea behind LSTMs
•  The key to LSTMs is the cell state, the horizontal line
running through the top of the diagram.
•  The cell state runs straight down the entire chain, with only
some minor linear interactions
•  Easy for information to just flow along it unchanged
71	
  
Gates
•  The ability to remove or add information to
the cell state, carefully regulated by
structures called gates
•  Sigmoid
– How much of each component should be let
through. 
– Zero means nothing through
– One means let everything through
•  An LSTM has three of these gates
72	
  
LSTM step 1
•  decide what information we’re going to throw
away from the cell state
•  forget gate layer
73	
  
LSTM step 2
•  decide what new information we’re going to
store in the cell state
•  input gate layer
74	
  
LSTMs step 3
•  update the old cell state, Ct−1, into the new
cell state Ct
75	
  
LSTMs step 4
•  decide what we’re going to output
76	
  
77	
  
78	
  
79	
  
80	
  
RECURRENT NEURAL
NETWORKS WITH WORD
EMBEDDINGS
81	
  
APPENDIX
82	
  
83	
  
Perceptron 1957
84	
  
Perceptron 1957
85	
  
Perceptron 1986
86	
  
Perceptron
87	
  
Activation function
88	
  
Back propagation 1974/1986
89	
  
90	
  
91	
  
•  Inspired by the architectural depth of the brain,
researchers wanted for decades to train deep multi-
layer neural networks.
•  No successful attempts were reported before 2006
…Exception: convolutional neural networks, LeCun
1998
•  SVM: Vapnik and his co-workers developed the
Support Vector Machine (1993) (shallow
•  architecture).
•  Breakthrough in 2006!
92	
  
2006 breakthrough
•  More data
•  Faster hardware: GPU’s, multi-core CPU’s
•  Working ideas on how to train deep
architectures
93	
  
•  Beat state of the art in many areas:
– Language Modeling (2012, Mikolov et al)
– Image Recognition (Krizhevsky won 2012
ImageNet competition)
– Sentiment Classification (2011, Socher et al)
– Speech Recognition (2010, Dahl et al)
– MNIST hand-written digit recognition (Ciresan et
al, 2010)
94	
  
Credits
•  Roelof Pieters, www.graph-technologies.com
•  Andrew Ng
•  http://colah.github.io/
95	
  

From neural networks to deep learning