Catherine Blake and Wanda Pratt
The scientific literature available to researchers continues to increase at an alarming rate. Despite the wealth of knowledge within the published literature, the quantity and unstructured nature of those texts make it difficult to use them for answering research questions. Thus, many potentially useful connections among the documents go unnoticed. To address this problem, we developed three approaches to detect such connections automatically. The simplest approach used only words in document titles. The other two approaches used knowledge from an existing terminology model; one used the knowledge base to transform the titles to known medical concepts, and the other applied additional semantic constraints to prune concepts. To determine effectiveness, we compared each approach on the task of identifying a set of now-known but previously implicit connections in the biomedical literature, which suggest magnesium would be effective in treating migraines. The concept representation improved precision from 8.3 to 9.8% and recall from 22.7 to 30.1% when compared with using word features. Applying additional semantic constraints improved precision (22.3%) with only a small degradation in recall (I 9.4%).