Ulf Johansson, Henrik Boström, Rikard König
The standard kNN algorithm suffers from two major drawbacks: sensitivity to the parameter value k, i.e., the number of neighbors, and the use of k as a global constant that is independent of the particular region in which the example to be classified falls. Methods using weighted voting schemes only partly alleviate these problems, since they still involve choosing a fixed k. In this paper, a novel instance-based learner is introduced that does not require k as a parameter, but instead employs a flexible strategy for determining the number of neighbors to consider for the specific example to be classified, hence using a local instead of global k. A number of variants of the algorithm are evaluated on 18 datasets from the UCI repository. The novel algorithm in its basic form is shown to significantly outperform standard kNN with respect to accuracy, and an adapted version of the algorithm is shown to be clearly ahead with respect to the area under ROC curve. Similar to standard kNN, the novel algorithm still allows for various extensions, such as weighted voting and axes scaling.
Subjects: 12. Machine Learning and Discovery; Please choose a second document classification
Submitted: Feb 25, 2008