AIPrimer.AI
  • 🚦AI Primer In Transportation
  • CHAPTER 1 - INTRODUCTION TO MACHINE LEARNING
    • Machine Learning in Transportation
    • What is Machine Learning?
    • Types of Machine Learning
      • Supervised Learning
      • Unsupervised Learning
      • Semi-supervised Learning
      • Reinforced Learning
    • Fundamental concepts of machine learning
      • Model Training and Testing
      • Evaluating the Model’s Prediction Accuracy
      • The Underfitting and Overfitting Problems
      • Bias-Variance Tradeoff in Overfitting
      • Model Validation Techniques
      • Hyperparameter Tuning
      • Model Regularization
      • The Curse of Ddimensionality
    • Machine Learning versus Statistics
  • CHAPTER 2 - SUPERVISED METHODS
    • Supervised Learning_Complete Draft
    • K-Nearest Neighbor (KNN) Algorithm
    • Tree-Based Methods
    • Boosting
    • Support Vector Machines (SVMs)
  • CHAPTER 3 - UNSUPERVISED LEARNING
    • Principal Component Analysis
      • How Does It Work?
      • Interpretation of PCA result
      • Applications in Transportation
    • CLUSTERING
      • K-MEANS
      • SPECTRAL CLUSTERING
      • Hierarchical Clustering
    • REFERENCE
  • CHAPTER 4 - NEURAL NETWORK
    • The Basic Paradigm: Multilayer Perceptron
    • Regression and Classification Problems with Neural Networks
    • Advanced Topologies
      • Modular Network
      • Coactive Neuro–Fuzzy Inference System
      • Recurrent Neural Networks
      • Jordan-Elman Network
      • Time-Lagged Feed-Forward Network
      • Deep Neural Networks
  • CHAPTER 5 - DEEP LEARNING
    • Convolutional Neural Networks
      • Introduction
      • Convolution Operation
      • Typical Layer Structure
      • Parameters and Hyperparameters
      • Summary of Key Features
      • Training of CNN
      • Transfer Learning
    • Recurrent Neural Networks
      • Introduction
      • Long Short-Term Memory Neural Network
      • Application in transportation
    • Recent Development
      • AlexNet, ZFNet, VggNet, and GoogLeNet
      • ResNet
      • U-Net: Full Convolutional Network
      • R-CNN, Fast R-CNN, and Faster R-CNN
      • Mask R-CNN
      • SSD and YOLO
      • RetinaNet
      • MobileNets
      • Deformable Convolution Networks
      • CenterNet
      • Exemplar Applications in Transportation
    • Reference
  • CHAPTER 6 - REINFORCEMENT LEARNING
    • Introduction
    • Reinforcement Learning Algorithms
    • Model-free v.s. Model-based Reinforcement Learning
    • Applications of Reinforcement Learning to Transportation and Traffic Engineering
    • REFERENCE
  • CHAPTER 7 - IMPLEMENTING ML AND COMPUTATIONAL REQUIREMENTS
    • Data Pipeline for Machine Learning
      • Introduction
      • Problem Definition
      • Data Ingestion
      • Data Preparation
      • Data Segregation
      • Model Training
      • Model Deployment
      • Performance Monitoring
    • Implementation Tools: The Machine Learning Ecosystem
      • Machine Learning Framework
      • Data Ingestion tools
      • Databases
      • Programming Languages
      • Visualization Tools
    • Cloud Computing
      • Types and Services
    • High-Performance Computing
      • Deployment on-premise vs on-cloud
      • Case Study: Data-driven approach for the implementation of Variable Speed Limit
      • Conclusion
  • CHAPTER 8 - RESOURCES
    • Mathematics and Statistics
    • Programming, languages, and software
    • Machine learning environments
    • Tools of the Trade
    • Online Learning Sites
    • Key Math Concepts
  • REFERENCES
  • IMPROVEMENT BACKLOG
Powered by GitBook
On this page

CHAPTER 4 - NEURAL NETWORK

PreviousREFERENCENextThe Basic Paradigm: Multilayer Perceptron

Last updated 1 year ago

Neural networks (NNs), or connectionist systems, have experienced a resurgence of interest in recent years as a paradigm of computational and knowledge representation. After a first surge of attempts to simulate the functioning of the human brain using artificial neurons in the 1950s and 1960s, this AI subdiscipline did not receive much attention until the 1990s. The resurgence has been due mainly to the appearance of faster digital computers that can simulate large networks and the discovery of new NN architectures and more powerful learning mechanisms. The new network architectures, for the most part, are not meant to duplicate the operation of the human brain, but rather to receive inspiration from known facts about how the brain works.

NNs are concerned with processing the information by a learning process and by adaptively responding to inputs in accordance with a learning rule. These powerful models are composed of many simulated neurons or simple computational units that are connected in such a way that they are able to learn in a manner similar to how human brains learn. This distributed architecture makes NNs particularly appropriate for solving nonlinear problems and input–output mapping problems. The usual application of NNs is in the area of learning and generalization of knowledge and patterns. They are not suitable for expert reasoning and they have poor explanation capabilities.

While there are several definitions for NNs, the following definition emphasizes the key features of such models. An NN can be defined as a distributed, adaptive, generally nonlinear learning machine built from interconnecting different processing elements [15]. The functionality of NNs is based on the interconnectivity between the PEs. Each PE receives connections from other PEs and/or itself. The connectivity defines the topology of NN and plays a role at least as important as the PEs in the NN’s functionality. The signals transmitted via the connections are controlled by adjustable parameters called weights, wij .

A typical PE structure is depicted in Figure 2-10 as a nonlinear (static) function applied to the sum of all the PE’s inputs. Due to the fact that NNs’ knowledge is stored in a distributed fashion through the connection weights between PEs and also the fact that the knowledge is acquired through a learning process that involves modification of the connection strengths between PEs, NNs tend to resemble in functionality the human brain.

There are many types of NN architectures, each designed to address a class of problems such as system identification, function approximation, nonlinear prediction, control, pattern recognition, clustering, feature extraction, and others. NNs may also be classified as either static or dynamic. Static networks represent good function approximators with the ability to build long-term memory into their synaptic weights during training. On the other hand, dynamic networks have a built-in mechanism to produce an output based on more than one time instant in the past, establishing what is commonly referred to as short-term memory.

Figure 2-11 Example of a neural network.

The development process of NN models is typically carried out in two stages: training and testing. During the training stage an NN learns from the patterns presented in an existing dataset. The performance of the network is consequently evaluated using a testing dataset that is composed of patterns the network was never exposed to before. Because the learned knowledge is extracted from training datasets, NNs are considered both model-based and data-driven systems. Usually the learning phase uses an algorithm to adjust the connection weights, based on a given dataset of input–output pairs. Training patterns are presented to the network repeatedly until the error of the overall output is minimized. The presentation of all patterns once to the network is called an epoch and results in adjustment of the connection weights such that the network performance is improved. The training stage of NN is terminated when the error drops below a prespecified threshold value or when the number of epochs exceeds a certain prespecified limit. Another method to control the efficiency of the training stage is to monitor the network performance (errors) during the training stage on a cross-validation (CV) dataset, usually smaller than the learning dataset. The role of CV is to test for the network’s generalization capabilities during the training process. If the network is overtrained a sudden degradation of the network based on the CV data will trigger the training process to stop.