Hans Holland, Miroslav Kubat, Jan Zizka
Machine learning usually assumes that attribute values, as well as class labels, are either known precisely or not known at all. However, in our attempt to automate evaluation of intrusion detection systems, we have encountered ambiguous examples such that, for instance, an attribute's value in a given example is known to be a or b but definitely not c or d. Previous research usually either "disambiguated" the value by giving preference to a or b, or just replaced it with a "don't-know" symbol. Disliking both of these two approaches, we decided to explore the behavior of ways to address the situation. To keep the work focused, we limited ourselves to nearest-neighbor classifiers. The paper describes a few techniques and reports relevant experiments. We also discuss certain ambiguity-related issues that deserve closer attention.
Subjects: 12. Machine Learning and Discovery; 3.1 Case-Based Reasoning
Submitted: Jan 30, 2007