Markus Junker and Andreas Abecker
Knowledge-based approaches to document categorization make use of well elaborated and powerful pattern languages for manual writing of classification rules. Although such classification patterns have proven useful in many practical applications, algorithms for learning classifiers from examples mostly rely on much simpler representations of classification knowledge. In this paper, we describe a learning algorithm which employs a pattern language similar to languages used for manual rule editing. We focus on the learning of three specific constructs of this pattern language, namely phrases, tolerance matches of words and substring matches of words.