AIPrimer.AI
  • 🚦AI Primer In Transportation
  • CHAPTER 1 - INTRODUCTION TO MACHINE LEARNING
    • Machine Learning in Transportation
    • What is Machine Learning?
    • Types of Machine Learning
      • Supervised Learning
      • Unsupervised Learning
      • Semi-supervised Learning
      • Reinforced Learning
    • Fundamental concepts of machine learning
      • Model Training and Testing
      • Evaluating the Model’s Prediction Accuracy
      • The Underfitting and Overfitting Problems
      • Bias-Variance Tradeoff in Overfitting
      • Model Validation Techniques
      • Hyperparameter Tuning
      • Model Regularization
      • The Curse of Ddimensionality
    • Machine Learning versus Statistics
  • CHAPTER 2 - SUPERVISED METHODS
    • Supervised Learning_Complete Draft
    • K-Nearest Neighbor (KNN) Algorithm
    • Tree-Based Methods
    • Boosting
    • Support Vector Machines (SVMs)
  • CHAPTER 3 - UNSUPERVISED LEARNING
    • Principal Component Analysis
      • How Does It Work?
      • Interpretation of PCA result
      • Applications in Transportation
    • CLUSTERING
      • K-MEANS
      • SPECTRAL CLUSTERING
      • Hierarchical Clustering
    • REFERENCE
  • CHAPTER 4 - NEURAL NETWORK
    • The Basic Paradigm: Multilayer Perceptron
    • Regression and Classification Problems with Neural Networks
    • Advanced Topologies
      • Modular Network
      • Coactive Neuro–Fuzzy Inference System
      • Recurrent Neural Networks
      • Jordan-Elman Network
      • Time-Lagged Feed-Forward Network
      • Deep Neural Networks
  • CHAPTER 5 - DEEP LEARNING
    • Convolutional Neural Networks
      • Introduction
      • Convolution Operation
      • Typical Layer Structure
      • Parameters and Hyperparameters
      • Summary of Key Features
      • Training of CNN
      • Transfer Learning
    • Recurrent Neural Networks
      • Introduction
      • Long Short-Term Memory Neural Network
      • Application in transportation
    • Recent Development
      • AlexNet, ZFNet, VggNet, and GoogLeNet
      • ResNet
      • U-Net: Full Convolutional Network
      • R-CNN, Fast R-CNN, and Faster R-CNN
      • Mask R-CNN
      • SSD and YOLO
      • RetinaNet
      • MobileNets
      • Deformable Convolution Networks
      • CenterNet
      • Exemplar Applications in Transportation
    • Reference
  • CHAPTER 6 - REINFORCEMENT LEARNING
    • Introduction
    • Reinforcement Learning Algorithms
    • Model-free v.s. Model-based Reinforcement Learning
    • Applications of Reinforcement Learning to Transportation and Traffic Engineering
    • REFERENCE
  • CHAPTER 7 - IMPLEMENTING ML AND COMPUTATIONAL REQUIREMENTS
    • Data Pipeline for Machine Learning
      • Introduction
      • Problem Definition
      • Data Ingestion
      • Data Preparation
      • Data Segregation
      • Model Training
      • Model Deployment
      • Performance Monitoring
    • Implementation Tools: The Machine Learning Ecosystem
      • Machine Learning Framework
      • Data Ingestion tools
      • Databases
      • Programming Languages
      • Visualization Tools
    • Cloud Computing
      • Types and Services
    • High-Performance Computing
      • Deployment on-premise vs on-cloud
      • Case Study: Data-driven approach for the implementation of Variable Speed Limit
      • Conclusion
  • CHAPTER 8 - RESOURCES
    • Mathematics and Statistics
    • Programming, languages, and software
    • Machine learning environments
    • Tools of the Trade
    • Online Learning Sites
    • Key Math Concepts
  • REFERENCES
  • IMPROVEMENT BACKLOG
Powered by GitBook
On this page
  1. CHAPTER 3 - UNSUPERVISED LEARNING
  2. CLUSTERING

SPECTRAL CLUSTERING

Spectral clustering is a class of clustering algorithms that use linear algebra to identify groups of data points.

The first step in spectral clustering is to construct a similarity graph. Let x1, ..., xnbe a set of data points. The similarity graph, G=(V,E), can then be constructed with each vertex virepresenting one data point xiand edges between points viand vjweighted by their similarity score, si,j. Generally all edges with si,j>0are included, but a higher threshold can also be used. Various methods exist for calculating similarity scores. Two examples are the ε-neighborhood, where si,j=1if xi-xj< and si,j=0otherwise, and k-nearest neighbors, where si,j=1if vjis among the k-nearest neighbors of viand si,j=0otherwise.

Next, a graph Laplacian is used to transform the adjacency matrix, W=wi,ji,j=1,...,n, of the similarity graph. Various graph Laplacians have been developed in the literature on spectral graph theory. These include the unnormalized graph Laplacian and normalized graph Laplacians [8].

The next step is to calculate the first keigenvectors, u1, ..., unof Land assemble these into a matrix Uwhere the eigenvectors are the columns. (In some algorithms, the generalized eigenvectors are computed instead.)

Finally, for i=1,...,nlet yiℝkbe the vector corresponding to the ith row of U. Then cluster the points yii=1,...,nin ℝkwith the k-means algorithm into clusters C1,...,Ck.

The advantages of spectral clustering are that it avoids strong assumptions about the shape of clusters and becomes a linear problem once the similarity graph has been chosen [7]. As mentioned above, k-means works best on clusters that are “blob-like”. The transformations involved in spectral clustering allow more complex geometries, such as rings or even intertwined spirals to be identified. Computationally, spectral clustering algorithms are easy to implement and are not sensitive to local minima or initial conditions.

The disadvantages of spectral clustering are in constructing a good similarity graph and choosing an appropriate value of k. Constructing a similarity graph is nontrivial and the structure of the similarity graph has been shown to impact clustering convergence. The difficulty in choosing k is largely the same as with k-means.

PreviousK-MEANSNextHierarchical Clustering

Last updated 1 year ago