Long Short-Term Memory Neural Network
Last updated
Last updated
To overcome the disadvantages of RNNs, Hochreiter and Schmidhuber proposed the architecture of Long Short-Term Memory Neural Network (LSTM-NN) and an appropriate gradient-based algorithm to solve it [79]. The primary objectives of LSTM-NN are to capture long-term dependencies and determine the optimal time lag for time-series problems.
In an LSTM, the cell state (hidden state) is divided into two states: short-term state (ht) (similar to an RNN) and long-term state ct (Figure 2-65). The long-term state (ct) stores the information to capture the long-term dependencies among current hidden state and previous hidden states over time. Traversing from the left to the right, the long-term state passes through a forget gate and drops some memories and then adds some new memories via an addition operation (Figure 2-65 and Figure 2-66).
As shown in Figure 2-65, a fully connected LSTM cell contains four layers (sigma and tanh). The input vector (Xt) and the previous short-term state (ht-1) are fed into these layers. The main layer uses tanh activation functions which outputs (C'(t)). The output from this layer is partially stored in long-the term state (c(t)). The other three layers are gate controllers using logistic activation function and their output ranges from 0 to 1. The forget state f(t) controls which parts of the long-term state should be erased. The input gate i(t) decides which parts of the input should be added. The output gate o(t), finally controls which parts of the long-term state should be read and output at this time step y(t) (=h(t)). The equations for these operations can be written as follows,