CHAPTER 4 - NEURAL NETWORK

Neural networks (NNs), or connectionist systems, have experienced a resurgence of interest in recent years as a paradigm of computational and knowledge representation. After a first surge of attempts to simulate the functioning of the human brain using artificial neurons in the 1950s and 1960s, this AI subdiscipline did not receive much attention until the 1990s. The resurgence has been due mainly to the appearance of faster digital computers that can simulate large networks and the discovery of new NN architectures and more powerful learning mechanisms. The new network architectures, for the most part, are not meant to duplicate the operation of the human brain, but rather to receive inspiration from known facts about how the brain works.

NNs are concerned with processing the information by a learning process and by adaptively responding to inputs in accordance with a learning rule. These powerful models are composed of many simulated neurons or simple computational units that are connected in such a way that they are able to learn in a manner similar to how human brains learn. This distributed architecture makes NNs particularly appropriate for solving nonlinear problems and input–output mapping problems. The usual application of NNs is in the area of learning and generalization of knowledge and patterns. They are not suitable for expert reasoning and they have poor explanation capabilities.

While there are several definitions for NNs, the following definition emphasizes the key features of such models. An NN can be defined as a distributed, adaptive, generally nonlinear learning machine built from interconnecting different processing elements [15]. The functionality of NNs is based on the interconnectivity between the PEs. Each PE receives connections from other PEs and/or itself. The connectivity defines the topology of NN and plays a role at least as important as the PEs in the NN’s functionality. The signals transmitted via the connections are controlled by adjustable parameters called weights, wij .

A typical PE structure is depicted in Figure 2-10 as a nonlinear (static) function applied to the sum of all the PE’s inputs. Due to the fact that NNs’ knowledge is stored in a distributed fashion through the connection weights between PEs and also the fact that the knowledge is acquired through a learning process that involves modification of the connection strengths between PEs, NNs tend to resemble in functionality the human brain.

There are many types of NN architectures, each designed to address a class of problems such as system identification, function approximation, nonlinear prediction, control, pattern recognition, clustering, feature extraction, and others. NNs may also be classified as either static or dynamic. Static networks represent good function approximators with the ability to build long-term memory into their synaptic weights during training. On the other hand, dynamic networks have a built-in mechanism to produce an output based on more than one time instant in the past, establishing what is commonly referred to as short-term memory.

Figure 2-11 Example of a neural network.

The development process of NN models is typically carried out in two stages: training and testing. During the training stage an NN learns from the patterns presented in an existing dataset. The performance of the network is consequently evaluated using a testing dataset that is composed of patterns the network was never exposed to before. Because the learned knowledge is extracted from training datasets, NNs are considered both model-based and data-driven systems. Usually the learning phase uses an algorithm to adjust the connection weights, based on a given dataset of input–output pairs. Training patterns are presented to the network repeatedly until the error of the overall output is minimized. The presentation of all patterns once to the network is called an epoch and results in adjustment of the connection weights such that the network performance is improved. The training stage of NN is terminated when the error drops below a prespecified threshold value or when the number of epochs exceeds a certain prespecified limit. Another method to control the efficiency of the training stage is to monitor the network performance (errors) during the training stage on a cross-validation (CV) dataset, usually smaller than the learning dataset. The role of CV is to test for the network’s generalization capabilities during the training process. If the network is overtrained a sudden degradation of the network based on the CV data will trigger the training process to stop.

PreviousREFERENCE NextThe Basic Paradigm: Multilayer Perceptron

Last updated 1 year ago