Recurrent Neural Networks
Last updated
Last updated
Recurrent Neural Network (RNN) is a network topology designed to accommodate temporal dynamic behavior and to process sequences of inputs or outputs. The RNNs were first introduced in the 1980s ([22-25]) to capture the order and temporal dependencies between the observations. RNNs have gained a lot of attention recently, especially in the field of natural language processing (NLP) due to the improvement in computational power and the use of graphic processing units (GPUs). RNN also has a great application in time series forecasting due to its capability in learning the temporal dependencies between the observations. One advantage of the NNs over the conventional linear models in time series forecasting is their ability to learn and to approximate the nonlinear relations between the input and output features [26]. Moreover, the RNN eliminates the need for a predefined time window for input or output features and is also capable of modeling multivariate sequences [27]. Considering the capabilities of the RNN in the process of multivariate sequence data with temporal dependencies, the RNN has potential applications in the field of transportation such as forecasting state and evolution over time.
RNNs capture long-range time dependencies between the observations in sequence data [28]. The input and output sequence are denoted as (x(1), x(2), …, x(T)) and (y(1), y(2), …, y(T)), and each data point x(t) or y(t) can be a real-valued vector with any shape. The sequence represents a temporal or ordinal aspect of data, and the length of the sequence could be finite or infinite. RNN is a feedforward type of network with additional recurrent edges that connect to the next time steps. The recurrent edge creates a loop by connecting a node to itself that allows the information to be carried across time while receiving the input one step at a time. The recurrent nodes of the RNN maintain information from past events in their internal memories known as hidden state. Figure 2-14 presents a simple RNN node with a recurrent connection. The output of the recurrent node, y(t), at any time depends on the hidden state, h(t), of the node at that time, according to equation 1. In this equation,Vand byare the trainable network parameters (weights and biases), and f can be any activation function.
The hidden state, h(t), is the memory of the recurrent node and is estimated based on the current input, x(t), and the previous hidden state of the node, h(t-1) with respect to equation 2. W, U, and bhare the trainable parameters, and fis the activation function.
Consequently, the output of the recurrent node at each time step is dependent on the current input and the previous hidden state of the node (i.e., the memory of the node). The dynamics of the recurrent node over time can be unfolded, as illustrated in Figure 2-14. By unfolding, the network is unrolled to the full network for the complete sequence. Considering the unfolded network, the RNN creates one layer for each time step with shared weights across all the series. The RNN is trained across all the time steps using the backpropagation through time (BTT) algorithm [29]. Figure 2-14 presents a simple RNN node with an input sequence and an output sequence. Depending on the architecture of the RNN, the network may or may not have an output for every time step of the input sequence, and it may or may not require an input for every time step of the output sequence. The main feature of the RNN is that it maintains a hidden state that is the memory of the node and captures information about the sequence.
At each iteration of the training process, every parameter of the network is updated based on the gradient of the loss function in direction to reduce the total loss (i.e., error) of the network. When estimating the gradient in the backpropagation along with many time steps, the gradient could get exponentially large or exponentially small due to many multiplications through the chain rule. The exploding or vanishing gradient prevents the parameters of the network from converging to the values that result in a minimal error during the training process. Training of the RNN is considered to be challenging due to the problem of vanishing or exploding gradient.
There are techniques such as the truncated backpropagation through time (TBTT) to prevent the gradient from getting too large or too small. TBTT limits the number of time steps the error can be propagated that could restrict the ability of the network to learn the long-range dependencies in sequences. Fortunately, there are modern RNN architectures such as the Long Short-Term Memory (LSTM) network that address the vanishing or exploding gradient problem with high capability of learning long-range dependencies. The LSTM network is a type of RNN with recurrent connection but with advanced LSTM blocks instead of simple recurrent nodes. The core component of the LSTM block is the cell that serves as the memory of it. This component decides what to keep in or remove from memory, considering the previous state and the current input. Most of the recent RNN architectures use LSTM blocks, and this architecture is discussed in more detail in the deep learning component of this document.
Partially Recurrent Network
PRN is considered a simplified version of the Jordan–Elman network without hidden neurons. It is composed of an input layer of source and feedback nodes, and an output layer, which is composed of two types of computation nodes: output neurons and context neurons. The output neurons produce the overall output, while the context neurons provide feedback to the input layer after a time delay. The topological structure of the network is illustrated in Figure 2-16. More details can be found in [30] and [21].