AIPrimer.AI
  • 🚦AI Primer In Transportation
  • CHAPTER 1 - INTRODUCTION TO MACHINE LEARNING
    • Machine Learning in Transportation
    • What is Machine Learning?
    • Types of Machine Learning
      • Supervised Learning
      • Unsupervised Learning
      • Semi-supervised Learning
      • Reinforced Learning
    • Fundamental concepts of machine learning
      • Model Training and Testing
      • Evaluating the Model’s Prediction Accuracy
      • The Underfitting and Overfitting Problems
      • Bias-Variance Tradeoff in Overfitting
      • Model Validation Techniques
      • Hyperparameter Tuning
      • Model Regularization
      • The Curse of Ddimensionality
    • Machine Learning versus Statistics
  • CHAPTER 2 - SUPERVISED METHODS
    • Supervised Learning_Complete Draft
    • K-Nearest Neighbor (KNN) Algorithm
    • Tree-Based Methods
    • Boosting
    • Support Vector Machines (SVMs)
  • CHAPTER 3 - UNSUPERVISED LEARNING
    • Principal Component Analysis
      • How Does It Work?
      • Interpretation of PCA result
      • Applications in Transportation
    • CLUSTERING
      • K-MEANS
      • SPECTRAL CLUSTERING
      • Hierarchical Clustering
    • REFERENCE
  • CHAPTER 4 - NEURAL NETWORK
    • The Basic Paradigm: Multilayer Perceptron
    • Regression and Classification Problems with Neural Networks
    • Advanced Topologies
      • Modular Network
      • Coactive Neuro–Fuzzy Inference System
      • Recurrent Neural Networks
      • Jordan-Elman Network
      • Time-Lagged Feed-Forward Network
      • Deep Neural Networks
  • CHAPTER 5 - DEEP LEARNING
    • Convolutional Neural Networks
      • Introduction
      • Convolution Operation
      • Typical Layer Structure
      • Parameters and Hyperparameters
      • Summary of Key Features
      • Training of CNN
      • Transfer Learning
    • Recurrent Neural Networks
      • Introduction
      • Long Short-Term Memory Neural Network
      • Application in transportation
    • Recent Development
      • AlexNet, ZFNet, VggNet, and GoogLeNet
      • ResNet
      • U-Net: Full Convolutional Network
      • R-CNN, Fast R-CNN, and Faster R-CNN
      • Mask R-CNN
      • SSD and YOLO
      • RetinaNet
      • MobileNets
      • Deformable Convolution Networks
      • CenterNet
      • Exemplar Applications in Transportation
    • Reference
  • CHAPTER 6 - REINFORCEMENT LEARNING
    • Introduction
    • Reinforcement Learning Algorithms
    • Model-free v.s. Model-based Reinforcement Learning
    • Applications of Reinforcement Learning to Transportation and Traffic Engineering
    • REFERENCE
  • CHAPTER 7 - IMPLEMENTING ML AND COMPUTATIONAL REQUIREMENTS
    • Data Pipeline for Machine Learning
      • Introduction
      • Problem Definition
      • Data Ingestion
      • Data Preparation
      • Data Segregation
      • Model Training
      • Model Deployment
      • Performance Monitoring
    • Implementation Tools: The Machine Learning Ecosystem
      • Machine Learning Framework
      • Data Ingestion tools
      • Databases
      • Programming Languages
      • Visualization Tools
    • Cloud Computing
      • Types and Services
    • High-Performance Computing
      • Deployment on-premise vs on-cloud
      • Case Study: Data-driven approach for the implementation of Variable Speed Limit
      • Conclusion
  • CHAPTER 8 - RESOURCES
    • Mathematics and Statistics
    • Programming, languages, and software
    • Machine learning environments
    • Tools of the Trade
    • Online Learning Sites
    • Key Math Concepts
  • REFERENCES
  • IMPROVEMENT BACKLOG
Powered by GitBook
On this page
  1. CHAPTER 5 - DEEP LEARNING
  2. Recent Development

R-CNN, Fast R-CNN, and Faster R-CNN

PreviousU-Net: Full Convolutional NetworkNextMask R-CNN

Last updated 1 year ago

Girshick, et al. [56] showed that rich features generated by a single CNN could be used for both classification and localization tasks. It was found that most features learned in the convolutional layers are general and can be used for feature-based tasks in different domains. Some pre-trained CNNs on the large ImageNet dataset have been successfully tuned for smaller network with minimal modification. The architecture for the proposed Regions with CNN features (R-CNN) is shown in

Figure 2-43.

Figure 2-44 R-CNN architecture [56]

Taking an image as input, R-CNN creates region proposals or bounding boxes through selective search (step 2 in Figure 3-26). After the proposals are created, R-CNN warps the region to a standard square size (224 x 224) and passes it to a CNN, a modified version of AlexNet, to compute features (step 3 in Figure 2-43). A Support Vector Machine (SVM) is then applied on the final layer of the CNN to classify the regions by types of objects (step 4 in Figure 2-43). Once a region is classified to a certain type of objects, a linear regression is performed on the coordinates of the region to output a tighter bounding box. In R-CNN, three different models are trained separately: (1) the CNN to extract image features, (2) the SVM classifier, and (3) the regression model to tighten the bounding boxes.

Even though the regions proposal by selective search reduces computational time as compared to the sliding window method, it is still quite slow, mainly because every single region proposal requires a forward pass of the CNN. To avoid the repeated passes of the CNN, Fast R-CNN [57] was proposed and the architecture is shown in Figure 2-44.

Figure 2-45 Fast R-CNN architecture [57].

In Fast R-CNN, the Region of Interest (ROI) Pooling is used, which shares the CNN features from one single forward pass of an image across its sub-regions. Then, the features in each region are pooled. In addition, Fast R-CNN replaced the SVM classifier with a softmax layer for the classification task and added a linear regression layer in parallel to the softmax layer to generate bounding box coordinates. By doing so, a single network can handle both classification and localization. Regardless of all the improvements, there is still one bottleneck in the Fast R-CNN, which is the region proposal by selective search. To solve this bottleneck problem, Faster R-CNN was proposed [58]. Instead of executing a separate algorithm for selective search, Faster R-CNN used the image features extracted from the forward pass of the CNN to generate region proposals. This results in nearly cost-free region proposals. As seen in Figure 2-45 a single CNN was used for generation of region proposals and classification.

Figure 2-46 Faster R-CNN is a single, unified network for object detection [58].