Svetlana Kiritchenko and Stan Matwin, SITE, University of Ottawa
Classification learning algorithms in general, and text classification methods in particular, tend to focus on features of individual training examples, rather than on the relationships between the examples. However, in many situations a set of items contains more information than just feature values of individual items. We propose to recognize and put in use generalized features (or set features) that describe a training example, but that depend on the dataset as a whole, with the goal of achieving better classification accuracy. In particular, we work on the integration of temporal relations into conventional word-based email classification.