AIPrimer.AI
  • 🚦AI Primer In Transportation
  • CHAPTER 1 - INTRODUCTION TO MACHINE LEARNING
    • Machine Learning in Transportation
    • What is Machine Learning?
    • Types of Machine Learning
      • Supervised Learning
      • Unsupervised Learning
      • Semi-supervised Learning
      • Reinforced Learning
    • Fundamental concepts of machine learning
      • Model Training and Testing
      • Evaluating the Model’s Prediction Accuracy
      • The Underfitting and Overfitting Problems
      • Bias-Variance Tradeoff in Overfitting
      • Model Validation Techniques
      • Hyperparameter Tuning
      • Model Regularization
      • The Curse of Ddimensionality
    • Machine Learning versus Statistics
  • CHAPTER 2 - SUPERVISED METHODS
    • Supervised Learning_Complete Draft
    • K-Nearest Neighbor (KNN) Algorithm
    • Tree-Based Methods
    • Boosting
    • Support Vector Machines (SVMs)
  • CHAPTER 3 - UNSUPERVISED LEARNING
    • Principal Component Analysis
      • How Does It Work?
      • Interpretation of PCA result
      • Applications in Transportation
    • CLUSTERING
      • K-MEANS
      • SPECTRAL CLUSTERING
      • Hierarchical Clustering
    • REFERENCE
  • CHAPTER 4 - NEURAL NETWORK
    • The Basic Paradigm: Multilayer Perceptron
    • Regression and Classification Problems with Neural Networks
    • Advanced Topologies
      • Modular Network
      • Coactive Neuro–Fuzzy Inference System
      • Recurrent Neural Networks
      • Jordan-Elman Network
      • Time-Lagged Feed-Forward Network
      • Deep Neural Networks
  • CHAPTER 5 - DEEP LEARNING
    • Convolutional Neural Networks
      • Introduction
      • Convolution Operation
      • Typical Layer Structure
      • Parameters and Hyperparameters
      • Summary of Key Features
      • Training of CNN
      • Transfer Learning
    • Recurrent Neural Networks
      • Introduction
      • Long Short-Term Memory Neural Network
      • Application in transportation
    • Recent Development
      • AlexNet, ZFNet, VggNet, and GoogLeNet
      • ResNet
      • U-Net: Full Convolutional Network
      • R-CNN, Fast R-CNN, and Faster R-CNN
      • Mask R-CNN
      • SSD and YOLO
      • RetinaNet
      • MobileNets
      • Deformable Convolution Networks
      • CenterNet
      • Exemplar Applications in Transportation
    • Reference
  • CHAPTER 6 - REINFORCEMENT LEARNING
    • Introduction
    • Reinforcement Learning Algorithms
    • Model-free v.s. Model-based Reinforcement Learning
    • Applications of Reinforcement Learning to Transportation and Traffic Engineering
    • REFERENCE
  • CHAPTER 7 - IMPLEMENTING ML AND COMPUTATIONAL REQUIREMENTS
    • Data Pipeline for Machine Learning
      • Introduction
      • Problem Definition
      • Data Ingestion
      • Data Preparation
      • Data Segregation
      • Model Training
      • Model Deployment
      • Performance Monitoring
    • Implementation Tools: The Machine Learning Ecosystem
      • Machine Learning Framework
      • Data Ingestion tools
      • Databases
      • Programming Languages
      • Visualization Tools
    • Cloud Computing
      • Types and Services
    • High-Performance Computing
      • Deployment on-premise vs on-cloud
      • Case Study: Data-driven approach for the implementation of Variable Speed Limit
      • Conclusion
  • CHAPTER 8 - RESOURCES
    • Mathematics and Statistics
    • Programming, languages, and software
    • Machine learning environments
    • Tools of the Trade
    • Online Learning Sites
    • Key Math Concepts
  • REFERENCES
  • IMPROVEMENT BACKLOG
Powered by GitBook
On this page
  1. CHAPTER 5 - DEEP LEARNING
  2. Convolutional Neural Networks

Introduction

PreviousConvolutional Neural NetworksNextConvolution Operation

Last updated 1 year ago

Convolutional Neural Networks, typically referred to as CNN or ConvNet, are perhaps the greatest success story of biologically inspired artificial intelligence [31]. Their success has been well-known for tasks, such as image classification, video analysis, object detection and tracking, natural language processing, etc. The domain of their applications continues to expand. CNN are constructed with convolutional layers that mimic mammalian visual cortexes containing neurons that individually respond to small regions of the visual field, discovered by Hubel and Weisel [33], who later received the Nobel Prize for their work. In a convolutional layer, each neuron receives input from a local subarea of the previous layer, named receptive field. For example, for an image, the receptive field is typically of a square shape (e.g., 3x3) with depth (e.g., 3 channels for RGB color images). CNN also uses a parameter-sharing scheme, in which each filter (i.e., same set of weights) scans or slides over the entire input area, resulting in significantly fewer parameters (or weights) than traditional multilayer neural networks, in which each neuron receive input from every element of the previous layer. When the size of the receptive field grows to the same size of the previous layer, the convolutional layer is the same as the fully connected or dense layer in traditional multilayer neural networks. In this sense, CNN can be considered as a more generic form that contains traditional multilayer neural networks as a special case. This built-in versatility makes CNN extremely scalable with large inputs (e.g., image data). Figure 2-18 illustrates the concept of parameter sharing with a simple one-dimensional (1D) example.

Figure 2-19 Effect of parameter sharing – comparison of CNN versus traditional multilayer neural networks

In practice, CNN has been widely used to deal with images. However, its applications are not limited to image data. CNN can discover or learn patterns from any data or signals with certain structures in spatial and/or temporal dimensions. For example, Abdoli, et al. [34] classified environmental sound based on a 1D CNN (Figure 2-19), which learns a representation directly from the audio signal.

Figure 2-20 The architecture of the proposed end-to-end 1D CNN for environmental sound classification [34]

Each convolutional layer typically has a number of “filters” to extract or assemble specific features from the previous layer. In the context of image processing, a filter is a set of numbers, typically oriented in the form of one or multiple square matrices. Each filter looks for a particular type of features, such as vertical edges or horizontal edges (note: we will have an example on this shortly). The filter in CNN is analogous to an optical filter, which selectively transmits light of different wavelengths. For example, to be able to identify a unique object such as a Stop Sign, a hierarchy of filters would be needed to extract specific features at different levels, including color (red and white), edges (vertical, horizontal, and slanting lines), shape (octagon), and four letters (STOP). Zeiler, et al. [35] showed that CNN can capture image information in a variety of forms: the low-level edges, mid-level edge junctions, high-level object parts and complete objects (see Figure 2-20).

The extracted features from images are normally general and can be used for many different tasks, such as classification, as shown in Figure 2-21.

Now, let us dig a bit deeper to understand how the “magic” happens. When applying a filter or kernel (the term “filter” is commonly used in CNN) to extract information from an image, a convolution operation is carried out between the filter and the image. For brevity without losing generality, we will use a gray-scale image, which has one channel as compared to three-channel RGB images. The pixel value or gray scale is represented by one byte (i.e., 8 bit) that codes 28 = 256 integer values, with 0 for Black and 255 for White. Any number in between represents Gray at an intermediate level. A simple 9×9 grayscale image is shown in Figure 2-22. The image has two distinct features: one vertical edge and one horizontal edge. How can we extract those features separately by applying two different filters?

To accomplish this task, we could use the two filters (i.e., matrices) in Figure 2-23. Those filters are also referred to as Sobel kernels [36].

To apply the filters, convolution operation is performed, which is discussed in detail in the following section.

Figure 2-21 Top-down parts-based image decomposition with an adaptive deconvolutional network. Each column corresponds to a different input image under the same model. Row 1 shows a single activation of a 4th layer feature map projected into image space. Conditional on the activations in the layer above, we also take a subset of 5,25 and 125 active features in layers 3, 2 and 1 respectively and visualize them in image space (rows 2–4). The activations reveal mid and high level primitives learned by our model. In practice there are many more activations such that the complete set sharply reconstructs the entire image from each layer [35].
Figure 2-22 CNN can solve the classification problem based on a hierarchy of features extracted from images [31]
Figure 2-23 Image with a size of 9 pixels by 9 pixels
Figure 2-24 Sobel Kernels Edge Filters