John Carlis, Elizabeth Shoop and Scott Krieger
We have been working on two different KDD systems for scientific data. One system involves comparative genomics, where the database contains more than 60,000 plant gene and protein sequences plus results extracted from similarity searches against public sequence databases. The second system supports a several-decades long longitudinal field study of chimpanzee behavior. Both systems have components for the storing of raw data and for cleaning data before querying begins and for displaying data extractions. Both systems use a relational DBMS. In this paper we report on a) the extensions we made to the DBMS to support our analysis of the data, and b) the way that we used those extensions as, with users, we developed a thought from an initial idea to a richer analysis. We have found that as a user’s initial thought develops, he or she makes finer distinctions and looks to explain anomalies seen in coarse calculations. In the queries to accomplish those explorations we have found it valuable to move pieces of SQL commands into attribute values, and to accomplish several smaller queries all at once via a command relation. Thus there is a blurring of the distinction between command and data. This blurring allowed us to formulate and accomplish more sophisticated analyses than we had been doing previously.