Model Training
Last updated
Last updated
Based on the problem objective and data available a machine learning algorithm is chosen for implementation. A machine learning model is designed with all the necessary variables, weights and parameters for the problem. To perform any kind of analysis the model has to learn the pattern within the data. Hence, the model is trained with training data sets to learn the parameters needed for performing the analysis. These parameters are called 'model parameters'. Another set of parameters important to model training is 'Hyperparameters' which defines the architecture of the model. (Figure 5-5)
Model parameters
They represent the ability of the model to learn from its data and predict accurately. These model parameters are estimated using an optimization technique that can be statistical or based on programming. For example, consider the distribution of a variable as gaussian, hence the terms mean and standard deviation are calculated from the data are used as model parameters.
Hyperparameters
Parameters that define the model architecture are called Hyperparameters. The process of optimization to find the ideal parameters for the model is called Hyperparameter tuning. Unlike the model parameters, hyperparameters cannot be learned from data. A range of values can be tested for the hyperparameters and the best working model must be chosen. Some of the common methods of hyperparameter tuning are Grid Search, Random Search, and Bayesian Optimization.
This technique uses a grid of different hyperparameter values. A combination of these values is used in training the model and consecutively validating with the dataset. The results from several combination trials are compared and the combination with the best results is chosen. Since this technique employs testing every combination in the grid it can be exhaustive and inefficient.
In Random search, random values of hyperparameter are chosen from a range of values to test. Sometimes a statistical distribution is defined for each hyperparameter based on which random values are generated. These random values are used, and the model is trained and validated, choosing the best performing model.
Apart from reducing processing time, a random search is a better-suited option when all hyperparameters of the model are not of equal importance. In such a case of a grid search, one hyperparameter values are changed whereas the rest are kept constant. If this hyperparameter is not important to the model score, then the efforts of these trials are wasted. On the contrary, a random search performs better because it has higher exploratory power. ( Figure 5-6)
Figure 5-6 Grid search vs Random search in minimizing a function with one important and unimportant parameter.
During training, a machine learning model must be checked for Bias and Variance which can lead to underfitting and overfitting of data respectively (Figure 5-7). When a model is trained to learn its parameters, the objective remains to minimize the error i.e the difference between the expected and predicted outcomes. This error during training is called the training error. During validation, this error is called cross-validation error. An effective machine learning model must have low training and cross-validation error.
Figure 5-7 Different scenarios of curve fitting for a Machine learning model
When there is high bias in the model, it has a high training and cross-validation error. This means the model predicts certain outcomes more often than others, hence the model is set to underfit data. When the training error is low but the cross-validation error is high it is due to high variance. The model is trained to predict every outcome in the training set hence it results in overfitting of data. To control for bias and variance a regularization parameter is usually introduced to prevent the model from overfitting. Also, visualization of training and cross-validation error with each iteration of model training can be helpful in identifying bias or variance. (Figure 5-8)
Figure 5-8 Underfitting and Overfitting of data for machine learning models
Model Evaluation Evaluation of a machine learning model depicts the ability of the model to effectively predict outcomes. It is important to choose the dataset for evaluation carefully in order to rightly assess the performance of the model. The following techniques are adapted to generating new data for evaluation (Figure 5-9) .
Holdout
In this approach, it is advised to evaluate the model performance on a dataset different than the set used for training. The data is first segregated into training, validation and test data. The training dataset is used to train the model and calculate the model parameters. The validation dataset is used for the assessment of model structure to fine-tune and select parameters that provide the best results. The unseen test dataset is then used with the model for analysis. This approach is simple, flexible and efficient in terms of speed. However, it often encounters high variability in results because the difference in training and test dataset can result in significant differences while estimating the accuracy.
Cross-Validation
In cross-validation, the original dataset is divided into the training set and test set. The most common approach used is called the k-fold cross-validation. Here, the original dataset is divided into k number of equal size subsets or samples wherein k is a user-specified number ranging from 5 to 10. This division is repeated k times and each time one of the k subsets is used as validation or test set while the rest (k-1) subsets are used for training the model. The error estimation is averaged over all k trials to estimate the overall effectiveness of the model. Another method is called leave-one-out cross-validation wherein the number of subsets k is equal to the total number of data points. This approach is particularly useful with small training datasets.
Bootstrap
This is another resampling technique similar to cross-validation. In this approach, multiple datasets are generated by sampling from the original dataset. Each of the newly sampled data possesses a quantity of interest and in this way multiple estimates are available. Although similar to cross-validation it differs in the aspect that data is sampled with replacement i.e when a point is picked from the dataset randomly, it is added to the bootstrapped set and then added back in the dataset, further the process is repeated. Using these datasets for evaluation, unique instances of bootstrap datasets are used for training the model and the rest unseen data is used as test data.
Different evaluation metrics have been designed for evaluating the prediction accuracy of the model which is an indicator of its performance. These metrics popularly aim at highlighting the error in prediction compared to expected outcomes. Visualization of expected and predicted outcomes can also provide insight into the performance of the model
Confusion Matrix
This metric is used for multiclass classification problems. It is a matrix visualization of expected outcomes to predicted outcomes. For example, consider a binary classification problem (0 or 1). In this case, 1 is positive, 0 is negative. The predictions completed are segregated into 4 groups - True Positives, False Positives, True Negatives, and False Negatives. (Figure 5-10)
The group conditions are as follows:
True Positives - Predicted and Expected outcomes are both 1.
True Negatives - Predicted and Expected outcomes are both 0.
False Positives - Predicted outcome is 1 but Expected outcome is 0.
False Negatives - Predicted outcome is 0 but Expected outcome is 1.
An ideal model must have 0% false positives, negatives and 100% true positives, negatives. To achieve ideal model conditions, steps must be taken to minimize false positives or false negatives. Using the confusion matrix many metrics such as Accuracy, Precision, Recall, and Specificity are used for model evaluation.
Figure 5-10 Confusion Matrix
Other popular evaluation metrics are listed in the table below (Table 5-1).
Table 5-1 Evaluation Metrics for Machine Learning Models
Accuracy
Classification Models
Precision
Recall
False Positive Rate
Logarithmic Loss
Confusion Matrix
F1 Score
Area under the curve (AUC)
Standard Root Mean Square Error
Regression Models
Mean Absolute Percentage Error