RNN & LSTM
DR. ANINDYA HALDER
1
Reccurent Neural Networks (RNN) :
The RNN has is highly preferred method , especially for sequential data.
Every node at a time step consists of an input from the previous node, and it proceeds using a feedback
loop.
 In RNN, each node generates a current
hidden state and its output is obtained by
using the given input and previous hidden
state as follows:
Fig: Compressed (left) and unfolded (right) basic Recurrent Neural Network.
35
How Recurrent Neural Network works
• RNN processes the sequence of vectors one by one.
• While processing, it passes the previous hidden state to the next step of the sequence. The hidden state
acts as the neural networks memory. It holds information on previous data the network has seen before.
Figure: Processing sequence one by one.
36
Cont…
• First, the input and previous hidden state are combined to form a vector.
• That vector now has information on the current input and previous inputs. The vector goes
through the tanh activation, and the output is the new hidden state, or the memory of the
network.
Figure: Passing hidden state to next time step. Figure: RNN Cell
37
Recurrent Neural Networks suffer from short-term memory. If a sequence is long
enough, they’ll have a hard time carrying information from earlier time steps to later
ones.
During back propagation, recurrent neural networks suffer from the vanishing gradient
problem. Gradients are values used to update a neural networks weights. The vanishing
gradient problem is when the gradient shrinks as it back propagates through time. If a
gradient value becomes extremely small, it doesn’t contribute too much learning.
Drawbacks of RNN:
38
Pros and Cons of RNN:
Advantages Drawbacks
• Possibility of processing input of any length
• Model size not increasing with size of input
• Computation takes into account historical
information
• Weights are shared across time
• Computation being slow
• Difficulty of accessing information from a long
time ago
• Cannot consider any future input for the current
state
The pros and cons of a typical RNN architecture are summed up in the table below:
39
Applications of RNN:
•Prediction problems.
•Machine Translation.
•Speech Recognition.
•Language Modelling and Generating Text.
•Video Tagging.
•Generating Image Descriptions.
•Text Summarization.
•Call Center Analysis.
40
Long Term Short Memory(LSTM):
Long short-term memory is a type of RNN model designed to prevent the output of a neural network from
either exploding or decaying (long-term dependency) as it passes through the feedback loops for a given
input.
63
Basic LSTM Unit and corresponding individual blocks
64
Forget Gate: whether an information
it should remember or to forget.
Cell State: governing the input, as well
as to forget the irrelevant information
Input Gate: decider state that helps to
decide what new information it needs
to store in the cell state.
Output Gate: helps in deciding the
values of next hidden state of the
network.
Activation Functions of LSTM
In LSTM architecture, two types of activation functions are used:
 Tanh activation function
Sigmoid activation function
65
Cont..
Tanh:
 LSTM gates contains Tanh activations.
Tanh is a non-linear activation function.
 It regulates the values flowing through
the network, maintaining the values
between -1 and 1.
The tanh activation is used to help
regulate the values flowing through the
network.
Figure: Tanh squishes values to be between -1 and 1.
66
Sigmoid
LSTM gates contains sigmoid activations.
Sigmoid function squishes values between 0
and 1.
 That is helpful to update or forget data
because any number getting multiplied by 0 is
0, causing values to disappears or be
“forgotten.” Any number multiplied by 1 is the
same value therefore that value stay’s the same
or is “kept.”
Using Sigmoid activation function, the network
can learn which data is not important therefore
can be forgotten or which data is important to
keep.
Cont…
Figure:Sigmoid squishes values to be between 0
and 1
67
Gates of LSTM
68
Cont…
• This gate decides what information should
be thrown away or kept.
• Information from the previous hidden state
and information from the current input is
passed through the sigmoid function.
• Values come out between 0 and 1. The
closer to 0 means to forget, and the closer
to 1 means to keep.
Forget gate
Figure: Forget Gate.
69
Input Gate
• The goal of this gate is to determine what new
information should be added to the networks
long-term memory (cell state), given the
previous hidden state and new input data.
• The input gate is a sigmoid activated network
which acts as a filter, identifying which
components of the ‘new memory vector’ are
worth retaining. This network will output a vector
of values in [0,1].
• It is also passed the hidden state and current
input into the tanh function to squish values
between -1 and 1 to help regulate the network.
Cont…
Figure: Input Gate.
70
Cell State
• The next step is to decide and store the
information from the new state in the cell state.
• The previous cell state C(t-1) gets multiplied with
forget vector f(t). If the outcome is 0, then values
will get dropped in the cell state.
• Next, the network takes the output value of the
input vector i(t) and performs point-by-point
addition, which updates the cell state giving the
network a new cell state C(t).
Cont…
Figure: Cell State.
71
Output Gate
 The output gate decides what the next hidden state
should be. The hidden state contains information on
previous inputs. The hidden state is also used for
predictions.
Cont…
Figure: Output Gate.
72
Applications of LSTM:
73
Autoencoder
Cont…
An Autoencoder consists of three layers:
1.Encoder
2.Compressed representation layer
3.Decoder
• The Encoder layer compresses the input image into a latent space representation. It encodes the input image
as a compressed representation in a reduced dimension.
• The Compressed representation layer represents the compressed input fed to the decoder layer.
• The decoder layer decodes the encoded image back to the original dimension
Case study
A deep LSTM autoencoder for detecting
anomalous ECG
Block diagram
Detailed architecture of ECG-NET
The concept behind ECG-NET
ECG data sets used:
81
Publicly available ECG5000 2 dataset
Original dataset was released by Eamonn Keogh and Yanping Chen that is downloaded from physionet.
This database is a 20 hour long BIDMC congestive Heart Failure Database (chfdb) ECG database
reported as ‘‘chf07’’ which is publicly available in the UCR Time Series Classification archive (Bagnall et al.)
(Bagnall et al., 2018) 2018.
http://www.timeseriesclassification.com/description.php?Dataset=ECG5000.
available ECG5000 dataset is a preprocessed dataset, where the data preprocessing was performed in two
steps
first step extracts each heartbeat and second step makes each heartbeat of equal length using interpolation
The dataset comprises both normal patients’ ECG as well as the ECG of those patients who have severe
congestive heart failure
Details of Data Splitting
Data Source: ECG5000 dataset released by Eamonn Keogh and YanpingChen that is downloaded from physionet
Threshold technique used:
Manual threshold
Automated threshold (Kapur’s thresholding procedure)
Hyper-parameter settings
Result and analysis section
Training and validation
Reconstruction of the training and
validation normal ECG signals
Training ECG samples
Validation ECG samples
Test Result:
Reconstruction of the test ECG signals
Correctly and incorrectly classified ECG
test samples using Manual Thresholding
procedure
Correctly and incorrectly classified ECG
test samples using Automated
Thresholding procedure
Compared Result
Advantages of the Proposed Method
94
Challenges involved in automated ECG arrhythmia detection include but not limited to the following:
Limited availability of the annotated ECG signals to train the model,
involvement of data imbalance problem in ECG dataset, where the normal ECG signals are predominant
over anomaly ECGs.
 The above said challenges are judiciously handled in the proposed (LSTM based autoencoder) ECG-NET
ECG-NET method requires only normal ECG signals for training (where no anomalous ECG signals are
required during the training phase)
Even than can achieve good accuracy during the testing phase considering both normal and anomalous test
ECG signals.
Thus, data imbalance problems and limited availability of annotated samples (particularly anomalous ECG
signals) are handled
Proposed an automated reconstruction loss threshold selection approach on testing phase based on
Kapur’s histogram thresholding approach
Proposed three LSTM based autoencoder architecture which yields better accuracy than existing
architectures.
Reference:
1. https://www.pluralsight.com/guides/introduction-to-lstm-units-in-rnn
2. https://www.geeksforgeeks.org/introduction-to-recurrent-neural-
network/#:~:text=RNN%20converts%20the%20independent%20activations,to%20the%20next%
20hidden%20layer.
3. https://towardsmachinelearning.org/recurrent-neural-network-architecture-explained-in-
detail/
4. https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-
explanation-44e9eb85bf21
95
96
Thank You

RNN-LSTM.pptx

  • 1.
    RNN & LSTM DR.ANINDYA HALDER 1
  • 35.
    Reccurent Neural Networks(RNN) : The RNN has is highly preferred method , especially for sequential data. Every node at a time step consists of an input from the previous node, and it proceeds using a feedback loop.  In RNN, each node generates a current hidden state and its output is obtained by using the given input and previous hidden state as follows: Fig: Compressed (left) and unfolded (right) basic Recurrent Neural Network. 35
  • 36.
    How Recurrent NeuralNetwork works • RNN processes the sequence of vectors one by one. • While processing, it passes the previous hidden state to the next step of the sequence. The hidden state acts as the neural networks memory. It holds information on previous data the network has seen before. Figure: Processing sequence one by one. 36
  • 37.
    Cont… • First, theinput and previous hidden state are combined to form a vector. • That vector now has information on the current input and previous inputs. The vector goes through the tanh activation, and the output is the new hidden state, or the memory of the network. Figure: Passing hidden state to next time step. Figure: RNN Cell 37
  • 38.
    Recurrent Neural Networkssuffer from short-term memory. If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. During back propagation, recurrent neural networks suffer from the vanishing gradient problem. Gradients are values used to update a neural networks weights. The vanishing gradient problem is when the gradient shrinks as it back propagates through time. If a gradient value becomes extremely small, it doesn’t contribute too much learning. Drawbacks of RNN: 38
  • 39.
    Pros and Consof RNN: Advantages Drawbacks • Possibility of processing input of any length • Model size not increasing with size of input • Computation takes into account historical information • Weights are shared across time • Computation being slow • Difficulty of accessing information from a long time ago • Cannot consider any future input for the current state The pros and cons of a typical RNN architecture are summed up in the table below: 39
  • 40.
    Applications of RNN: •Predictionproblems. •Machine Translation. •Speech Recognition. •Language Modelling and Generating Text. •Video Tagging. •Generating Image Descriptions. •Text Summarization. •Call Center Analysis. 40
  • 63.
    Long Term ShortMemory(LSTM): Long short-term memory is a type of RNN model designed to prevent the output of a neural network from either exploding or decaying (long-term dependency) as it passes through the feedback loops for a given input. 63
  • 64.
    Basic LSTM Unitand corresponding individual blocks 64 Forget Gate: whether an information it should remember or to forget. Cell State: governing the input, as well as to forget the irrelevant information Input Gate: decider state that helps to decide what new information it needs to store in the cell state. Output Gate: helps in deciding the values of next hidden state of the network.
  • 65.
    Activation Functions ofLSTM In LSTM architecture, two types of activation functions are used:  Tanh activation function Sigmoid activation function 65
  • 66.
    Cont.. Tanh:  LSTM gatescontains Tanh activations. Tanh is a non-linear activation function.  It regulates the values flowing through the network, maintaining the values between -1 and 1. The tanh activation is used to help regulate the values flowing through the network. Figure: Tanh squishes values to be between -1 and 1. 66
  • 67.
    Sigmoid LSTM gates containssigmoid activations. Sigmoid function squishes values between 0 and 1.  That is helpful to update or forget data because any number getting multiplied by 0 is 0, causing values to disappears or be “forgotten.” Any number multiplied by 1 is the same value therefore that value stay’s the same or is “kept.” Using Sigmoid activation function, the network can learn which data is not important therefore can be forgotten or which data is important to keep. Cont… Figure:Sigmoid squishes values to be between 0 and 1 67
  • 68.
  • 69.
    Cont… • This gatedecides what information should be thrown away or kept. • Information from the previous hidden state and information from the current input is passed through the sigmoid function. • Values come out between 0 and 1. The closer to 0 means to forget, and the closer to 1 means to keep. Forget gate Figure: Forget Gate. 69
  • 70.
    Input Gate • Thegoal of this gate is to determine what new information should be added to the networks long-term memory (cell state), given the previous hidden state and new input data. • The input gate is a sigmoid activated network which acts as a filter, identifying which components of the ‘new memory vector’ are worth retaining. This network will output a vector of values in [0,1]. • It is also passed the hidden state and current input into the tanh function to squish values between -1 and 1 to help regulate the network. Cont… Figure: Input Gate. 70
  • 71.
    Cell State • Thenext step is to decide and store the information from the new state in the cell state. • The previous cell state C(t-1) gets multiplied with forget vector f(t). If the outcome is 0, then values will get dropped in the cell state. • Next, the network takes the output value of the input vector i(t) and performs point-by-point addition, which updates the cell state giving the network a new cell state C(t). Cont… Figure: Cell State. 71
  • 72.
    Output Gate  Theoutput gate decides what the next hidden state should be. The hidden state contains information on previous inputs. The hidden state is also used for predictions. Cont… Figure: Output Gate. 72
  • 73.
  • 74.
  • 75.
    Cont… An Autoencoder consistsof three layers: 1.Encoder 2.Compressed representation layer 3.Decoder • The Encoder layer compresses the input image into a latent space representation. It encodes the input image as a compressed representation in a reduced dimension. • The Compressed representation layer represents the compressed input fed to the decoder layer. • The decoder layer decodes the encoded image back to the original dimension
  • 76.
    Case study A deepLSTM autoencoder for detecting anomalous ECG
  • 78.
  • 79.
  • 80.
  • 81.
    ECG data setsused: 81 Publicly available ECG5000 2 dataset Original dataset was released by Eamonn Keogh and Yanping Chen that is downloaded from physionet. This database is a 20 hour long BIDMC congestive Heart Failure Database (chfdb) ECG database reported as ‘‘chf07’’ which is publicly available in the UCR Time Series Classification archive (Bagnall et al.) (Bagnall et al., 2018) 2018. http://www.timeseriesclassification.com/description.php?Dataset=ECG5000. available ECG5000 dataset is a preprocessed dataset, where the data preprocessing was performed in two steps first step extracts each heartbeat and second step makes each heartbeat of equal length using interpolation The dataset comprises both normal patients’ ECG as well as the ECG of those patients who have severe congestive heart failure
  • 82.
    Details of DataSplitting Data Source: ECG5000 dataset released by Eamonn Keogh and YanpingChen that is downloaded from physionet
  • 83.
    Threshold technique used: Manualthreshold Automated threshold (Kapur’s thresholding procedure)
  • 85.
  • 86.
    Result and analysissection Training and validation
  • 88.
    Reconstruction of thetraining and validation normal ECG signals Training ECG samples Validation ECG samples
  • 89.
  • 90.
    Reconstruction of thetest ECG signals
  • 91.
    Correctly and incorrectlyclassified ECG test samples using Manual Thresholding procedure
  • 92.
    Correctly and incorrectlyclassified ECG test samples using Automated Thresholding procedure
  • 93.
  • 94.
    Advantages of theProposed Method 94 Challenges involved in automated ECG arrhythmia detection include but not limited to the following: Limited availability of the annotated ECG signals to train the model, involvement of data imbalance problem in ECG dataset, where the normal ECG signals are predominant over anomaly ECGs.  The above said challenges are judiciously handled in the proposed (LSTM based autoencoder) ECG-NET ECG-NET method requires only normal ECG signals for training (where no anomalous ECG signals are required during the training phase) Even than can achieve good accuracy during the testing phase considering both normal and anomalous test ECG signals. Thus, data imbalance problems and limited availability of annotated samples (particularly anomalous ECG signals) are handled Proposed an automated reconstruction loss threshold selection approach on testing phase based on Kapur’s histogram thresholding approach Proposed three LSTM based autoencoder architecture which yields better accuracy than existing architectures.
  • 95.
    Reference: 1. https://www.pluralsight.com/guides/introduction-to-lstm-units-in-rnn 2. https://www.geeksforgeeks.org/introduction-to-recurrent-neural- network/#:~:text=RNN%20converts%20the%20independent%20activations,to%20the%20next% 20hidden%20layer. 3.https://towardsmachinelearning.org/recurrent-neural-network-architecture-explained-in- detail/ 4. https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step- explanation-44e9eb85bf21 95
  • 96.