Dirichlet Multinomial Mixture with Variational Manifold Regularization: Topic Modeling over Short Texts

Ximing Li; Jiaojiao Zhang; Jihong Ouyang

doi:10.1609/aaai.v33i01.33017884

Authors

Ximing Li Jilin University
Jiaojiao Zhang Jilin University
Jihong Ouyang Jilin University

DOI:

https://doi.org/10.1609/aaai.v33i01.33017884

Abstract

Conventional topic models suffer from a severe sparsity problem when facing extremely short texts such as social media posts. The family of Dirichlet multinomial mixture (DMM) can handle the sparsity problem, however, they are still very sensitive to ordinary and noisy words, resulting in inaccurate topic representations at the document level. In this paper, we alleviate this problem by preserving local neighborhood structure of short texts, enabling to spread topical signals among neighboring documents, so as to correct the inaccurate topic representations. This is achieved by using variational manifold regularization, constraining the close short texts should have similar variational topic representations. Upon this idea, we propose a novel Laplacian DMM (LapDMM) topic model. During the document graph construction, we further use the word mover’s distance with word embeddings to measure document similarities at the semantic level. To evaluate LapDMM, we compare it against the state-of-theart short text topic models on several traditional tasks. Experimental results demonstrate that our LapDMM achieves very significant performance gains over baseline models, e.g., achieving even about 0.2 higher scores on clustering and classification tasks in many cases.

Dirichlet Multinomial Mixture with Variational Manifold Regularization: Topic Modeling over Short Texts

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription