Data Segregation
Data segregation comprises splitting the data available for model training and evaluation. The training data is labeled data where the predicted variable is also available. The labeled data is split into a training set and validation set. A new set of data called the test set will be used for model implementation. There are different ways to conduct data segregation.
Using a custom split proportion such as 70-30, sequential or random segregation can be applied.
Sequential segregation consists of using the first 70% data as the training set and the rest 30% as a validation set.
This process can be randomized wherein a random 70% of the data can be used as a training set and its complement as a validation set.
Apart from these users can also apply any custom-built strategy that can be well defended for the problem defined.
Last updated