AIPrimer.AI
  • 🚦AI Primer In Transportation
  • CHAPTER 1 - INTRODUCTION TO MACHINE LEARNING
    • Machine Learning in Transportation
    • What is Machine Learning?
    • Types of Machine Learning
      • Supervised Learning
      • Unsupervised Learning
      • Semi-supervised Learning
      • Reinforced Learning
    • Fundamental concepts of machine learning
      • Model Training and Testing
      • Evaluating the Model’s Prediction Accuracy
      • The Underfitting and Overfitting Problems
      • Bias-Variance Tradeoff in Overfitting
      • Model Validation Techniques
      • Hyperparameter Tuning
      • Model Regularization
      • The Curse of Ddimensionality
    • Machine Learning versus Statistics
  • CHAPTER 2 - SUPERVISED METHODS
    • Supervised Learning_Complete Draft
    • K-Nearest Neighbor (KNN) Algorithm
    • Tree-Based Methods
    • Boosting
    • Support Vector Machines (SVMs)
  • CHAPTER 3 - UNSUPERVISED LEARNING
    • Principal Component Analysis
      • How Does It Work?
      • Interpretation of PCA result
      • Applications in Transportation
    • CLUSTERING
      • K-MEANS
      • SPECTRAL CLUSTERING
      • Hierarchical Clustering
    • REFERENCE
  • CHAPTER 4 - NEURAL NETWORK
    • The Basic Paradigm: Multilayer Perceptron
    • Regression and Classification Problems with Neural Networks
    • Advanced Topologies
      • Modular Network
      • Coactive Neuro–Fuzzy Inference System
      • Recurrent Neural Networks
      • Jordan-Elman Network
      • Time-Lagged Feed-Forward Network
      • Deep Neural Networks
  • CHAPTER 5 - DEEP LEARNING
    • Convolutional Neural Networks
      • Introduction
      • Convolution Operation
      • Typical Layer Structure
      • Parameters and Hyperparameters
      • Summary of Key Features
      • Training of CNN
      • Transfer Learning
    • Recurrent Neural Networks
      • Introduction
      • Long Short-Term Memory Neural Network
      • Application in transportation
    • Recent Development
      • AlexNet, ZFNet, VggNet, and GoogLeNet
      • ResNet
      • U-Net: Full Convolutional Network
      • R-CNN, Fast R-CNN, and Faster R-CNN
      • Mask R-CNN
      • SSD and YOLO
      • RetinaNet
      • MobileNets
      • Deformable Convolution Networks
      • CenterNet
      • Exemplar Applications in Transportation
    • Reference
  • CHAPTER 6 - REINFORCEMENT LEARNING
    • Introduction
    • Reinforcement Learning Algorithms
    • Model-free v.s. Model-based Reinforcement Learning
    • Applications of Reinforcement Learning to Transportation and Traffic Engineering
    • REFERENCE
  • CHAPTER 7 - IMPLEMENTING ML AND COMPUTATIONAL REQUIREMENTS
    • Data Pipeline for Machine Learning
      • Introduction
      • Problem Definition
      • Data Ingestion
      • Data Preparation
      • Data Segregation
      • Model Training
      • Model Deployment
      • Performance Monitoring
    • Implementation Tools: The Machine Learning Ecosystem
      • Machine Learning Framework
      • Data Ingestion tools
      • Databases
      • Programming Languages
      • Visualization Tools
    • Cloud Computing
      • Types and Services
    • High-Performance Computing
      • Deployment on-premise vs on-cloud
      • Case Study: Data-driven approach for the implementation of Variable Speed Limit
      • Conclusion
  • CHAPTER 8 - RESOURCES
    • Mathematics and Statistics
    • Programming, languages, and software
    • Machine learning environments
    • Tools of the Trade
    • Online Learning Sites
    • Key Math Concepts
  • REFERENCES
  • IMPROVEMENT BACKLOG
Powered by GitBook
On this page
  1. CHAPTER 3 - UNSUPERVISED LEARNING
  2. Principal Component Analysis

Applications in Transportation

The most well-known ability of PCA is the dimensional reduction. Most of real-world datasets with very high dimensions are very challenging for visualization. By finding the first two or three principal components of the dataset, the samples can be represented in a 2D or 3D space. Besides, PCA can compress the original dataset by dimensional reduction. For example, Li et. al. (Li et al., 2007) used PCA method to compress the traffic flow data. By keeping a certain number of principal components, the compressed data can reserve the majority information of original dataset with an acceptable compression ratio.

Another application in transportation is the traffic flow prediction (Tsekeris and Stathopoulos, 2006, Li et al., 2007, Xing et al., 2015). Xing et.al (Xing et al., 2015) used a robust principal component analysis to predict a real-life highway traffic. The prediction results overperform other prediction methods. Li et.al. (Li et al., 2014) also demonstrated PCA based detrending method can be used for traffic prediction by decomposing traffic time series data into independent components.

Given the attribute that the top principal components of a dataset can capture the most important information rather than background noise, PCA is also a great tool for feature extraction and data denoising. Zhang et.al.(Zhang et al., 2006) used a PCA-SVM method for the vehicle classification. The PCA is used to extract the most similarity characteristics among a group of vehicle dataset. Then the extracted features were fed into a Support Vector Machine (SVM) for classification. Patrik (Jonsson, 2011) used PCA to cluster different road conditions by using traffic camera and whether data.

The principal directions are very sensitive to the anomaly data. Therefore, PCA is an efficient way for anomaly detection. Lee et.al.(Lee et al., 2012) denoted that when removing or adding a normal data from or into a dataset, the principal direction won’t change much. However, if an abnormal data is removed or added, there will be a significant variant for the principal direction. When it comes to the traffic network, Huang et.al. (Huang et al., 2007) extended this work for traffic network anomaly detection by considering both spatial and temporal relationship.

Limitations There are two main limitations of PCA:

Firstly, the PCA method relies on linear assumptions. In other words, PCA will get the best performance when the variables of our dataset are linearly correlated. However, when the variables are not linearly correlated, the result of PCA may be not good.

The second limitation of this method is that PCA considers all the principal components are orthogonal to each other. However, this may restrict the projections to find the real principal direction with highest variance.

Summary

In summary, we explained the principal components, principal directions and calculation process of PCA method. The main advantage of principal component analysis is that it can cluster different classes in an efficient way. The meaning of PCA is straightforward and the result of this process is interpretable. By giving some examples of traffic data compression, traffic data prediction, vehicle feature extraction and abnormal detection, we have a first impression about the application of PCA. Furthermore, the limitations of this method were also discussed in this section.

PreviousInterpretation of PCA resultNextCLUSTERING

Last updated 1 year ago