Hagit Shatkay, Stephen Edwards, W. John Wilbur, and Mark Boguski, National Center for Biotechnology Information
The immense volume of data resulting from DNA microarray experiments, accompanied by anincrease in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Content- based relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes. A comparison of the produced results with well- established yeast gene functions demonstrates the effectiveness of our approach.