Deep Neural Networks
Deep neural networks (DNN) refer to networks with multiple layers of neurons between the input and output layers. Increasing the number of layers (i.e., depth) and using nonlinear activation functions improves the ability of the network to learn a more complicated relationship between the input and output features. Adding more layers enables the network to transform features in different levels of abstraction, forming higher-level abstractions layer by layer from lower level features. The hierarchical feature transformation allows the network to learn the more complex mapping function from input variables to output variables. Empirically, increasing the depth of the network could improve the generalization of the network [31]. The training of DNNs may become challenging due to the potential for vanishing or exploding gradient as a result of many multiplications through the chain rule through many layers [32]. Still, the training of DNNs is attainable by choice of good activation function, proper initialization of the network parameters, and suitable network architecture. The increased capability of the network in learning more complex relation between the input and output features along with the improved generalization of the network are the main reasons for the popularity and superiority of the deep neural networks in a wide variety of tasks such as computer vision, natural language processing, and machine translation. DNN architectures such as Convolutional Neural Networks (CNN) and deep LSTM networks are discussed in more detail in the following sections of this document.
Last updated