Mask R-CNN
Last updated
Last updated
Comparing with R-CNN, Fast R-CNN, and Faster R-CNN, which generate bounding boxes, not actual shapes of the objects, Mask R-CNN [59] extends the Faster R-CNN for image segmentation at the pixel level. In Mask R-CNN, a convolutional backbone architecture is used for feature extraction over an image, followed by a network head for the bounding-box recognition (classification and regression). A mask branch is added for mask prediction. The Mask R-CNN with the head architecture of the Feature Pyramid Network (FPN) backbone is shown in
Figure 2-46. Some results on the COCO (Common Objects in Context) dataset is shown in Figure 2-47.
Figure 2-47 Head architecture of mask R-CNN plus Faster R-CNN [59]
Figure 2-48 Keypoint detection results and predicted segmentation masks [59]