I'm using Matlab's k-nearest-neighbors classifier (knnclassify
) to train and test binary attributes. The default value argument for k if none provided is 1 and one can choose other values of k. I've done research online and on stackoverflow but nothing relevant came up to address my question for what value of k would be of best use. Is there a built in function that can tell me that for my particular data or is it simply guess and wait to see what accuracy is derived?
Here is the link to matlab's knnclassify documentation: knnclassify
What you have here is a typical model selection problem. What you want is to pick the k
that gives you the lowest overall error on your data. Larger values of k generalize better, and smaller values may tend to overfit.
Hence, cross-validation is a good way to choose this parameter and I found the this article, which seems like a reasonable method.