Claudine Medigue, T. Vermat, G. Bisson, A. Viari, and A. Danchin
Analysis of the huge volumes of data generated by large scale sequencing projects clearly requires the construction of new sophisticated computer systems. These systems should be able to handle the biological data as well as the results of the analysis of this data. They should also help the user to choose the most appropriate method for a simple task and to string together the methods needed to solve a global analysis task. In this paper we present the prototype of a software system that provides an environment for the analysis of large-scale sequence data. In a first approach this environment has been put to the test within the B. subtilis sequencing project. This system integrates both a descriptive knowledge of the entities involved (genes, regulatory signals etc.) and the methodological knowledge concerning an extendable set of analytical methods (i.e. how to solve a sequence analysis problem through task decomposition and method selection). A knowledge representation based on two existing object-oriented models, named Shirka and SCARP, is used to implement this integrated system. In addition, the present prototype provides a suitable user interface for both displaying the results generated by several methods and interacting with the objects. We present in this paper an overview of the knowledge-based models used to build this integrated system, and a description of the way in which biological entities and sequence analysis tasks are represented. We give illustrations of the co-operation between user and system during the problem solving process. Such a system constitutes a computer workbench for molecular biologists studying the genetic programs of living organisms.