Umarani Pappuswamy, Dumisizwe Bhembe, Pamela W. Jordan, and Kurt VanLehn, University of Pittsburgh
In this paper, we describe a multi-tier Natural Language (NL) clustering approach to text classification for classifying students’ essays for tutoring applications. The main task of the classifier is to map the students’ essay statements into target concepts, namely physics principles and misconceptions. A simple `Bag-Of-Words (BOW)’ classifier using a naïve-Bayes algorithm was unsatisfactory for our purposes as it frequently misclassified due to the semantic relatedness of the NL descriptions of the target concepts. We describe how we used the NL descriptions to define clusters of concepts that reduce the dimensionality of the data when classifying students’ essays. The clustering generated multi-tier tagging schemata (cluster, sub-cluster and class) which led to better classification of the student’s essay.