U-Net: Full Convolutional Network
Last updated
Last updated
The name of U-net comes from its U-shape architecture (see Figure 2-41). U-net is a full convolutional network (FCN) and consists of three sections: contraction (left), bottleneck (bottom), and expansion(right). The contraction section is made of several blocks, each has two 3x3 convolution layers followed by 2x2 max pooling (down-sampling). As seen in Figure 2-41, the number of feature maps doubles after each block, which permits the network to learn complex structures. The expansion section consists of same number of blocks in symmetry with the contraction section. Each expansion block combines the feature maps from corresponding contraction block at the same level with a 2x2 up-sampling layer. This ensures that the features learned at different levels are used to reconstruct an image.
Different from classification applications, where the last couple of layers are typically fully connected layers followed by a softmax layer for obtaining probability distribution across classes, fully convolutional networks (FCN) consist of only convolutional layers and are commonly used for semantic segmentation (i.e., predicting class by pixel). Although U-net was first introduced for biomedical image segmentation [55], its application has been widespread over other domains, including autonomous vehicles, Geo sensing, etc. Some results of U-net for cell segmentation are shown in Figure 2-42.
Figure 2-43 Results on the ISBI cell tracking challenge. (a) part of an input image of the “PhC-U373” data set. (b) Segmentation result (cyan mask) with manual ground truth (yellow border) (c) input image of the “DIC-HeLa” data set. (d) Segmentation result (random colored masks) with manual ground truth (yellow border) [55]