AAAI Publications, The Twenty-Seventh International Flairs Conference

Font Size: 
SMART Electronic Legal Discovery Via Topic Modeling
Clint Pazhayidam George, Sahil Puri, Daisy Zhe Wang, Joseph N. Wilson, William F. Hamilton

Last modified: 2014-05-03


Electronic discovery is an interesting subproblem of information retrieval in which one identifies documents that are potentially relevant to issues and facts of a legal case from an electronically stored document collection (a corpus). In this paper, we consider representing documents in a topic space using the well-known topic models such as latent Dirichlet allocation and latent semantic indexing, and solving the information retrieval problem via finding document similarities in the topic space rather doing it in the corpus vocabulary space. We also develop an iterative SMART ranking and categorization framework including human-in-the-loop to label a set of seed (training) documents and using them to build a semi-supervised binary document classification model based on Support Vector Machines. To improve this model, we propose a method for choosing seed documents from the whole population via an active learning strategy. We report the results of our experiments on a real dataset in the electronic discovery domain.


EDiscovery, Electronic Discovery, Topic Modeling, Latent Dirichlet Allocation, Latent Semantic Indexing, Predictive Coding, Computer Assisted Review

Full Text: PDF