Classification

K-Nearest Neighbors

K-Nearest Neighbours idea: Use the k nearest neighbors (from the training data) of the input class to decide the class of the input.

With kNN, There is no “training” stage. Instead, at test time, when we want to produce an output y for a new test input x, we find the -nearest neighbor’s to in the training data , and the neighbors “votes” on the label of . Majority wins.

https://cs231n.github.io/classification/

If you wish to apply kNN in practice (hopefully not on images, or perhaps as only a baseline) proceed as follows:

  1. Preprocess your data: Normalize the features in your data to have zero mean and unit variance.
  2. If your data is very high-dimensional, consider using a dimensionality reduction technique such as PCA , NCA or even Random Projection.
  3. Train and evaluate the kNN classifier on the validation data (for all folds for Cross Validation) for many choices of k (e.g. the more the better) and across different distance types (L1 and L2 are good candidates)
  4. Final performance: Take note of the hyperparameters that gave the best results. Do not use the validation data in the final classifier and consider it to be burned on estimating the hyperparameters. Evaluate the best model on the test set. Report the test set accuracy and declare the result to be the performance of the kNN classifier on your data.
  • Use the Approximate Nearest Neighbor library (e.g. FLANN) to accelerate the retrieval (at cost of some accuracy), if kNN is too slow