ResNet
Last updated
Knowing the depth of representations is of central importance for many visual recognition tasks and deeper neural networks are difficult to train, a residual learning framework was introduced by He, et al. [51] to ease the training of substantially deeper networks. In particular, the layers were explicitly reformulated as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions, see Figure 2-39.
Figure 2-40 Residual learning: a building block (He, et al., 2016)
As explained by He, et al. [51], it is unlikely that identity mappings are optimal, but the reformulation may help to precondition the problem. If the optimal function is closer to an identity mapping than to a zero mapping, it should be easier for the solver to find the perturbations with reference to an identity mapping, than to learn the function as a new one. Their experiments showed that the learned residual functions in general have small responses, suggesting that identity mappings provide reasonable preconditioning. This was later proved by Li, et al. [54] by exploring the structure of neural loss functions and the effect of loss landscapes on generalization, using visualization methods. As shown in Figure 2-40, the ResNet (i.e., with skip connections) has a much smoother surface, which explained why ResNet is easier to train.
Figure 2-41 The loss surfaces of ResNet-56 with/without skip connections [54]