From neural networks to deep learning

From Artiﬁcial Neural
Networks to Deep learning
Viet-Trung Tran

1

Perceptron
•  Rosenblatt 1957
•  input signals x1, x2,
•  bias x0 = 1
•  Net input = weighted sum = Net(w,x)
•  Activation/transfer func = f(Net(w,x))
•  output
weighted
sum

step
func1on

6

Weighted Sum and Bias
•  Weighted sum
•  Bias
7

Hard-limiter function
•  Hard-limiter
– Threshold function
– Discontinuous function
– Discontinuous derivative

9

Threshold logic function
•  Saturating linear
function
•  Contiguous
function
•  Discontinuous
derivative
10

Sigmoid function
•  Most popular
•  Output (0,1)
•  Continuous derivatives
•  Easy to diﬀerentiate
11

Artiﬁcial neural network – ANN
structure
•  Number of input/output signals
•  Number of hidden layers
•  Number of neurons per layer
•  Neuron weights
•  Topology
•  Biases
12

Feed-forward neural network
•  connections between the units do not form a
directed cycle
13

Recurrent neural network
•  A class of artiﬁcial neural network where
connections between units form a directed
cycle
14

Neural network learning
•  2 types of learning
– Parameter learning
•  Learn neuron weight connections
– Structure learning
•  Learn ANN structure from training data
16

Error function
•  Consider an ANN with n neurons
•  For each learning example (x,d)
– Training error caused by current weight w
•  Training error caused by w for entire learning
examples
17

Parameter learning: back
propagation of error
•  Calculate total error at the top
•  Calculate contributions to error at each step going
backwards
20

Back propagation discussion
•  Initial weights
•  Learning rate
•  Number of neurons per hidden layers
•  Number of hidden layers
21

Stochastic gradient descent
(SGD)
22

Learning from tagged data
•  @Andrew Ng
27

2006 breakthrough
•  More data
•  Faster hardware: GPU’s, multi-core CPU’s
•  Working ideas on how to train deep
architectures
28

Deep Learning trends
•  @Andrew Ng
32

AI will transform the internet
•  @Andrew Ng
•  Technology areas with potential for paradigm shift:
–  Computer vision
–  Speech recognition & speech synthesis
–  Language understanding: Machine translation; Web
search; Dialog systems; ….
–  Advertising
–  Personalization/recommendation systems
–  Robotics
•  All this is hard: scalability, algorithms.
35

CONVOLUTIONAL NEURAL
NETWORK
http://colah.github.io/
41

Convolution
•  Convolution is a mathematical operation on two
functions f and g, producing a third function that is
typically viewed as a modiﬁed version of one of the
original functions,
42

Convolutional neural networks
•  Conv Nets is a kind of neural network that
uses many identical copies of the same
neuron
– Large number of neurons
– Large computational models
– Number of actual weights (parameters) to be
learned fairly small
43

A 2D Convolutional Neural
Network
•  a convolutional neural network can learn a neuron once and
use it in many places, making it easier to learn the model
and reducing error.
44

Structure of Conv Nets
•  Problem
– predict whether a human is speaking or not
•  Input: audio samples at diﬀerent points in
time

45

Simple approach
•  just connect them all to a fully-connected
layer
•  Then classify
46

A more sophisticated approach
•  Local properties of the data
–  frequency of sounds (increasing/decreasing)
•  Look at a small window of the audio sample
–  Create a group of neuron A to compute certain features
–  the output of this convolutional layer is fed into a fully-
connected layer, F
47

2D convolutional neural networks
51

Three-dimensional convolutional
networks
54

Group of neurons: A
•  Bunch of neurons in parallel
•  all get the same inputs and compute diﬀerent
features.
55

Network in Network (Lin et al.
(2013)
56

Conv Nets breakthroughs in
computer vision
•  Krizehvsky et al. (2012)
57

Diferent Levels of Abstraction
58

RECURRENT NEURAL
NETWORKS 
 

http://colah.github.io/
61

Recurrent Neural Networks (RNN)
have loops
•  A loop allows information to
be passed from one step of
the network to the next.
62

Unroll RNN
•  recurrent neural networks are intimately
related to sequences and lists.
63

Examples
•  predict the last word in “the clouds are in the sky"
•  the gap between the relevant information and the
place that it’s needed is small
•  RNNs can learn to use the past information
64

•  “I grew up in France… I speak ﬂuent French.”
•  As the gap grows, RNNs become unable to
learn to connect the information.
65

LONG SHORT TERM MEMORY
NETWORKS
LSTM Networks
66

LSTM networks
•  A Special kind of RNN
•  Capable of learning long-term dependencies
•  Structure in the form of a chain of repeating
modules of neural network
67

RNN
•  repeating module has a very simple
structure, such as a single tanh layer
68

•  The tanh(z) function is a rescaled version of
the sigmoid, and its output range is [ − 1,1]
instead of [0,1].
69

LSTM networks
•  Repeating module consists of four neuron,
interacting in a very special way
70

Core idea behind LSTMs
•  The key to LSTMs is the cell state, the horizontal line
running through the top of the diagram.
•  The cell state runs straight down the entire chain, with only
some minor linear interactions
•  Easy for information to just ﬂow along it unchanged
71

Gates
•  The ability to remove or add information to
the cell state, carefully regulated by
structures called gates
•  Sigmoid
– How much of each component should be let
through.
– Zero means nothing through
– One means let everything through
•  An LSTM has three of these gates
72

LSTM step 1
•  decide what information we’re going to throw
away from the cell state
•  forget gate layer
73

LSTM step 2
•  decide what new information we’re going to
store in the cell state
•  input gate layer
74

LSTMs step 3
•  update the old cell state, Ct−1, into the new
cell state Ct
75

LSTMs step 4
•  decide what we’re going to output
76

RECURRENT NEURAL
NETWORKS WITH WORD
EMBEDDINGS
81

•  Inspired by the architectural depth of the brain,
researchers wanted for decades to train deep multi-
layer neural networks.
•  No successful attempts were reported before 2006
…Exception: convolutional neural networks, LeCun
1998
•  SVM: Vapnik and his co-workers developed the
Support Vector Machine (1993) (shallow
•  architecture).
•  Breakthrough in 2006!
92

2006 breakthrough
•  More data
•  Faster hardware: GPU’s, multi-core CPU’s
•  Working ideas on how to train deep
architectures
93

•  Beat state of the art in many areas:
– Language Modeling (2012, Mikolov et al)
– Image Recognition (Krizhevsky won 2012
ImageNet competition)
– Sentiment Classiﬁcation (2011, Socher et al)
– Speech Recognition (2010, Dahl et al)
– MNIST hand-written digit recognition (Ciresan et
al, 2010)
94

Credits
•  Roelof Pieters, www.graph-technologies.com
•  Andrew Ng
•  http://colah.github.io/
95

From neural networks to deep learning

More Related Content

What's hot

Similar to From neural networks to deep learning

More from Viet-Trung TRAN

Recently uploaded

From neural networks to deep learning