Janet Aisbett and Greg Gibbon, The University of Newcastle
Classification assigns an entity to a category on the basis of feature values encoded from a stimulus. Provided they are presented with sufficient training data, inductive classifier builders such as C4.5 are limited by encoding deficiencies and noise in the data, rather than by the method of deciding the category. However, such classification techniques do not perform well on the small, dirty /or and dynamic data sets which are all that are available in many decision making domains. Moreover, their computational overhead may not be justified. This paper draws on conjectures about human categorization processes to design a frugal algorithm for use with such data. On presentation of an observation, case-specific rules are derived from a small subset of the stored examples, where the subset is selected on the basis of similarity to the encoded stimulus. Attention is focused on those features that appear to be most useful for distinguishing categories of observations similar to the current one. A measure of logical semantic information value is used to discriminate between categories that remain plausible after this. The new classifier is demonstrated against neural net and decision tree classifiers on some standard UCI data sets and shown to perform well.