Cross Validation
Cross-validation is a method for Hyperparameter Tuning, oftentimes used when the dataset is small. Typical number of folds are 3-fold, 5-fold or 10-fold cross-validation.
To complete a -fold cross validation:
- You split the dataset in different equal folds
- Use one of the the folds as the validation set and others as training set
- Repeat times (each time changing the validation fold) and average performance
In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive.
Dataset The splits people tend to use is between 50%-90% of the training data for training and rest for validation.
- If the number of hyperparameters is large, use bigger validation splits
- If the number of examples in the validation set is small (~few hundred or so), it is safer to use cross-validation.