Akinori Fujino, Naonori Ueda, Kazumi Saito
Semi-supervised classifier design that simultaneously utilizes both labeled and unlabeled samples is a major research issue in machine learning. Existing semi-supervised learning methods belong to either generative or discriminative approaches. This paper focuses on probabilistic semi-supervised classifier design and presents a hybrid approach to take advantage of the generative and discriminative approaches. Our formulation considers a generative model trained on labeled samples and a newly introduced bias correction model. Both models belong to the same model family. The proposed hybrid model is constructed by combining both generative and bias correction models based on the maximum entropy principle. The parameters of the bias correction model are estimated by using training data, and combination weights are estimated so that labeled samples are correctly classified. We use naive Bayes models as the generative models to apply the hybrid approach to text classification problems. In our experimental results on three text data sets, we confirmed that the proposed method significantly outperformed pure generative and discriminative methods when the classification performances of the both methods were comparable.
Content Area: 12. Machine Learning
Subjects: 12. Machine Learning and Discovery
Submitted: May 8, 2005