Shlomo Argamon and Jeff Dodick
We use textual features motivated by systemic functional linguistic theory for genre-based text categorization. We have developed feature sets representing different types of conjunctions and modal assessment, which together indicate (partially) how different genres structure texts and express attitudes towards propositions in the text. Using such fea-tures enables analysis of large-scale rhetorical differences between genres by examining which features are important for classification. The specific domain studied comprises scientific articles in historical and experimental sciences (paleontology and physical chemistry respectively). The SMO learning algorithm with our feature set achieved over 83% accuracy for classifying articles according to field, though no field-specific terms were used as features. The most highly-weighted features were consistent with hy-pothesized methodological differences between historical and experimental sciences, thus lending empirical evidence to the notion of multiple scientific methods.