RNN-LSTM.pptx

RNN & LSTM
DR. ANINDYA HALDER
1

Reccurent Neural Networks (RNN) :
The RNN has is highly preferred method , especially for sequential data.
Every node at a time step consists of an input from the previous node, and it proceeds using a feedback
loop.
 In RNN, each node generates a current
hidden state and its output is obtained by
using the given input and previous hidden
state as follows:
Fig: Compressed (left) and unfolded (right) basic Recurrent Neural Network.
35

How Recurrent Neural Network works
• RNN processes the sequence of vectors one by one.
• While processing, it passes the previous hidden state to the next step of the sequence. The hidden state
acts as the neural networks memory. It holds information on previous data the network has seen before.
Figure: Processing sequence one by one.
36

Cont…
• First, the input and previous hidden state are combined to form a vector.
• That vector now has information on the current input and previous inputs. The vector goes
through the tanh activation, and the output is the new hidden state, or the memory of the
network.
Figure: Passing hidden state to next time step. Figure: RNN Cell
37

Recurrent Neural Networks suffer from short-term memory. If a sequence is long
enough, they’ll have a hard time carrying information from earlier time steps to later
ones.
During back propagation, recurrent neural networks suffer from the vanishing gradient
problem. Gradients are values used to update a neural networks weights. The vanishing
gradient problem is when the gradient shrinks as it back propagates through time. If a
gradient value becomes extremely small, it doesn’t contribute too much learning.
Drawbacks of RNN:
38

Pros and Cons of RNN:
Advantages Drawbacks
• Possibility of processing input of any length
• Model size not increasing with size of input
• Computation takes into account historical
information
• Weights are shared across time
• Computation being slow
• Difficulty of accessing information from a long
time ago
• Cannot consider any future input for the current
state
The pros and cons of a typical RNN architecture are summed up in the table below:
39

Applications of RNN:
•Prediction problems.
•Machine Translation.
•Speech Recognition.
•Language Modelling and Generating Text.
•Video Tagging.
•Generating Image Descriptions.
•Text Summarization.
•Call Center Analysis.
40

Long Term Short Memory(LSTM):
Long short-term memory is a type of RNN model designed to prevent the output of a neural network from
either exploding or decaying (long-term dependency) as it passes through the feedback loops for a given
input.
63

Basic LSTM Unit and corresponding individual blocks
64
Forget Gate: whether an information
it should remember or to forget.
Cell State: governing the input, as well
as to forget the irrelevant information
Input Gate: decider state that helps to
decide what new information it needs
to store in the cell state.
Output Gate: helps in deciding the
values of next hidden state of the
network.

Activation Functions of LSTM
In LSTM architecture, two types of activation functions are used:
 Tanh activation function
Sigmoid activation function
65

Cont..
Tanh:
 LSTM gates contains Tanh activations.
Tanh is a non-linear activation function.
 It regulates the values flowing through
the network, maintaining the values
between -1 and 1.
The tanh activation is used to help
regulate the values flowing through the
network.
Figure: Tanh squishes values to be between -1 and 1.
66

Sigmoid
LSTM gates contains sigmoid activations.
Sigmoid function squishes values between 0
and 1.
 That is helpful to update or forget data
because any number getting multiplied by 0 is
0, causing values to disappears or be
“forgotten.” Any number multiplied by 1 is the
same value therefore that value stay’s the same
or is “kept.”
Using Sigmoid activation function, the network
can learn which data is not important therefore
can be forgotten or which data is important to
keep.
Cont…
Figure:Sigmoid squishes values to be between 0
and 1
67

Cont…
• This gate decides what information should
be thrown away or kept.
• Information from the previous hidden state
and information from the current input is
passed through the sigmoid function.
• Values come out between 0 and 1. The
closer to 0 means to forget, and the closer
to 1 means to keep.
Forget gate
Figure: Forget Gate.
69

Input Gate
• The goal of this gate is to determine what new
information should be added to the networks
long-term memory (cell state), given the
previous hidden state and new input data.
• The input gate is a sigmoid activated network
which acts as a filter, identifying which
components of the ‘new memory vector’ are
worth retaining. This network will output a vector
of values in [0,1].
• It is also passed the hidden state and current
input into the tanh function to squish values
between -1 and 1 to help regulate the network.
Cont…
Figure: Input Gate.
70

Cell State
• The next step is to decide and store the
information from the new state in the cell state.
• The previous cell state C(t-1) gets multiplied with
forget vector f(t). If the outcome is 0, then values
will get dropped in the cell state.
• Next, the network takes the output value of the
input vector i(t) and performs point-by-point
addition, which updates the cell state giving the
network a new cell state C(t).
Cont…
Figure: Cell State.
71

Output Gate
 The output gate decides what the next hidden state
should be. The hidden state contains information on
previous inputs. The hidden state is also used for
predictions.
Cont…
Figure: Output Gate.
72

Cont…
An Autoencoder consists of three layers:
1.Encoder
2.Compressed representation layer
3.Decoder
• The Encoder layer compresses the input image into a latent space representation. It encodes the input image
as a compressed representation in a reduced dimension.
• The Compressed representation layer represents the compressed input fed to the decoder layer.
• The decoder layer decodes the encoded image back to the original dimension

Case study
A deep LSTM autoencoder for detecting
anomalous ECG

Detailed architecture of ECG-NET

ECG data sets used:
81
Publicly available ECG5000 2 dataset
Original dataset was released by Eamonn Keogh and Yanping Chen that is downloaded from physionet.
This database is a 20 hour long BIDMC congestive Heart Failure Database (chfdb) ECG database
reported as ‘‘chf07’’ which is publicly available in the UCR Time Series Classification archive (Bagnall et al.)
(Bagnall et al., 2018) 2018.
http://www.timeseriesclassification.com/description.php?Dataset=ECG5000.
available ECG5000 dataset is a preprocessed dataset, where the data preprocessing was performed in two
steps
first step extracts each heartbeat and second step makes each heartbeat of equal length using interpolation
The dataset comprises both normal patients’ ECG as well as the ECG of those patients who have severe
congestive heart failure

Details of Data Splitting
Data Source: ECG5000 dataset released by Eamonn Keogh and YanpingChen that is downloaded from physionet

Threshold technique used:
Manual threshold
Automated threshold (Kapur’s thresholding procedure)

Result and analysis section
Training and validation

Reconstruction of the training and
validation normal ECG signals
Training ECG samples
Validation ECG samples

Reconstruction of the test ECG signals

Correctly and incorrectly classified ECG
test samples using Manual Thresholding
procedure

Correctly and incorrectly classified ECG
test samples using Automated
Thresholding procedure

Advantages of the Proposed Method
94
Challenges involved in automated ECG arrhythmia detection include but not limited to the following:
Limited availability of the annotated ECG signals to train the model,
involvement of data imbalance problem in ECG dataset, where the normal ECG signals are predominant
over anomaly ECGs.
 The above said challenges are judiciously handled in the proposed (LSTM based autoencoder) ECG-NET
ECG-NET method requires only normal ECG signals for training (where no anomalous ECG signals are
required during the training phase)
Even than can achieve good accuracy during the testing phase considering both normal and anomalous test
ECG signals.
Thus, data imbalance problems and limited availability of annotated samples (particularly anomalous ECG
signals) are handled
Proposed an automated reconstruction loss threshold selection approach on testing phase based on
Kapur’s histogram thresholding approach
Proposed three LSTM based autoencoder architecture which yields better accuracy than existing
architectures.

Reference:
1. https://www.pluralsight.com/guides/introduction-to-lstm-units-in-rnn
2. https://www.geeksforgeeks.org/introduction-to-recurrent-neural-
network/#:~:text=RNN%20converts%20the%20independent%20activations,to%20the%20next%
20hidden%20layer.
3. https://towardsmachinelearning.org/recurrent-neural-network-architecture-explained-in-
detail/
4. https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-
explanation-44e9eb85bf21
95

RNN-LSTM.pptx

More Related Content

What's hot

Similar to RNN-LSTM.pptx

Recently uploaded

RNN-LSTM.pptx