Cross Validation

Cross-validation is a method for Hyperparameter Tuning, oftentimes used when the dataset is small. Typical number of folds are 3-fold, 5-fold or 10-fold cross-validation.

To complete a -fold cross validation:

  • You split the dataset in different equal folds
  • Use one of the the folds as the validation set and others as training set
  • Repeat times (each time changing the validation fold) and average performance

In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive.

Dataset The splits people tend to use is between 50%-90% of the training data for training and rest for validation.

  • If the number of hyperparameters is large, use bigger validation splits
  • If the number of examples in the validation set is small (~few hundred or so), it is safer to use cross-validation.