AIPrimer.AI
  • 🚦AI Primer In Transportation
  • CHAPTER 1 - INTRODUCTION TO MACHINE LEARNING
    • Machine Learning in Transportation
    • What is Machine Learning?
    • Types of Machine Learning
      • Supervised Learning
      • Unsupervised Learning
      • Semi-supervised Learning
      • Reinforced Learning
    • Fundamental concepts of machine learning
      • Model Training and Testing
      • Evaluating the Model’s Prediction Accuracy
      • The Underfitting and Overfitting Problems
      • Bias-Variance Tradeoff in Overfitting
      • Model Validation Techniques
      • Hyperparameter Tuning
      • Model Regularization
      • The Curse of Ddimensionality
    • Machine Learning versus Statistics
  • CHAPTER 2 - SUPERVISED METHODS
    • Supervised Learning_Complete Draft
    • K-Nearest Neighbor (KNN) Algorithm
    • Tree-Based Methods
    • Boosting
    • Support Vector Machines (SVMs)
  • CHAPTER 3 - UNSUPERVISED LEARNING
    • Principal Component Analysis
      • How Does It Work?
      • Interpretation of PCA result
      • Applications in Transportation
    • CLUSTERING
      • K-MEANS
      • SPECTRAL CLUSTERING
      • Hierarchical Clustering
    • REFERENCE
  • CHAPTER 4 - NEURAL NETWORK
    • The Basic Paradigm: Multilayer Perceptron
    • Regression and Classification Problems with Neural Networks
    • Advanced Topologies
      • Modular Network
      • Coactive Neuro–Fuzzy Inference System
      • Recurrent Neural Networks
      • Jordan-Elman Network
      • Time-Lagged Feed-Forward Network
      • Deep Neural Networks
  • CHAPTER 5 - DEEP LEARNING
    • Convolutional Neural Networks
      • Introduction
      • Convolution Operation
      • Typical Layer Structure
      • Parameters and Hyperparameters
      • Summary of Key Features
      • Training of CNN
      • Transfer Learning
    • Recurrent Neural Networks
      • Introduction
      • Long Short-Term Memory Neural Network
      • Application in transportation
    • Recent Development
      • AlexNet, ZFNet, VggNet, and GoogLeNet
      • ResNet
      • U-Net: Full Convolutional Network
      • R-CNN, Fast R-CNN, and Faster R-CNN
      • Mask R-CNN
      • SSD and YOLO
      • RetinaNet
      • MobileNets
      • Deformable Convolution Networks
      • CenterNet
      • Exemplar Applications in Transportation
    • Reference
  • CHAPTER 6 - REINFORCEMENT LEARNING
    • Introduction
    • Reinforcement Learning Algorithms
    • Model-free v.s. Model-based Reinforcement Learning
    • Applications of Reinforcement Learning to Transportation and Traffic Engineering
    • REFERENCE
  • CHAPTER 7 - IMPLEMENTING ML AND COMPUTATIONAL REQUIREMENTS
    • Data Pipeline for Machine Learning
      • Introduction
      • Problem Definition
      • Data Ingestion
      • Data Preparation
      • Data Segregation
      • Model Training
      • Model Deployment
      • Performance Monitoring
    • Implementation Tools: The Machine Learning Ecosystem
      • Machine Learning Framework
      • Data Ingestion tools
      • Databases
      • Programming Languages
      • Visualization Tools
    • Cloud Computing
      • Types and Services
    • High-Performance Computing
      • Deployment on-premise vs on-cloud
      • Case Study: Data-driven approach for the implementation of Variable Speed Limit
      • Conclusion
  • CHAPTER 8 - RESOURCES
    • Mathematics and Statistics
    • Programming, languages, and software
    • Machine learning environments
    • Tools of the Trade
    • Online Learning Sites
    • Key Math Concepts
  • REFERENCES
  • IMPROVEMENT BACKLOG
Powered by GitBook
On this page
  1. CHAPTER 3 - UNSUPERVISED LEARNING
  2. Principal Component Analysis

How Does It Work?

PreviousPrincipal Component AnalysisNextInterpretation of PCA result

Last updated 1 year ago

The goal of PCA method is to find out the principal components of the dataset. The principal components are the values of this dataset projected into the principal directions. The direction which has the largest variance of projected value is taken as the first principal direction, then is the second principal direction and so on. Each principal direction is orthogonal to each other.

Therefore, PCA method can be explained by answering the following two questions, a) what is the principal components and principal directions, b) how to find out the principal components and principal directions

Principal Components

The first principal component is defined in such a way that if we project the dataset into a vector linearly, the projected value of the samples has the largest possible variance, then the projected value of each sample is called principal component, and this vector is the principal direction. By making the variance of projected values as large as possible, we can guarantee that samples are spitted as much as possible in this direction. Each succeeding principal components in turn has the largest projected variance under the constraint that this principal component is orthogonal to the preceding components.

The first principal component is defined in such a way that if we project the dataset into a vector linearly, the projected value of the samples has the largest possible variance, then the projected value of each sample is called principal component, and this vector is the principal direction. By making the variance of projected values as large as possible, we can guarantee that samples are spitted as much as possible in this direction. Each succeeding principal components in turn has the largest projected variance under the constraint that this principal component is orthogonal to the preceding components.

Principal Components Calculation

There are many approaches to calculate the principal components of the dataset, such as gradient descent method, eigen-decomposition of covariance/correlation matrix method, singular value decomposition of covariance/correlation method and so on(Shamir, 2016, Smith, 2002, Wall et al., 2003). In this section, we will give a step by step tutorial of the eigen-decomposition method and singular value decomposition method. The last method is also famous for its high efficiency and is widely used by different machine learning packages.

Step 1. Data preprocessing – standardization

Standardizing the data prior to the PCA process is crucial for most dataset. By subtracting the mean value and dividing the standard deviation of each column, each feature of the dataset can be transformed into unit scale (mean=0 and variance =1). This process is quite useful especially when the features are measured on different scales. Because if one feature’s scale is much more than others, the variance of this feature will significantly larger than others, which made this feature has a higher weight for the calculation of principal components. Mathematically, each data in the dataset can be standarized by the following equation:

Step 2, Covariance/correlation matrix Calculation

The classic approach to PCA is applying eigen-decomposition on its covariance /correlation matrix. The eigenvectors are the principal directions, while the eigenvalues represent the variance of the dataset along this vector. In other words, the eigenvector with highest eigenvalue is the first principal direction. Then the second largest eigenvalue corresponds to the second principal direction and so on.

Step 3, Eigen-decomposition of the covariance matrix or correlation matrix.

Different from the eigen decomposition of covariance/correlation matrix, the SVD method can compute the principal directions and principal components at the same time, which is very efficient for big dataset computation.

where is the standardized value of th sample th feature; is the data of the dataset; is the mean value of th feature of the dataset; is the variance of the th feature of the dataset.

Since both covariance matrix and correlation matrix are symmetric matrices with a dimension . The eigen decomposition for real symmetric matrices can be represented as