Yolanda Gil, Varun Ratnakar, Ewa Deelman, Gaurang Mehta, Jihie Kim
Scientific workflows are being developed for many domains as a useful paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed over distributed environments. This paper describes a new approach to workflow creation that uses semantic representations to describe compactly complex scientific applications in a data-independent manner, then automatically generates workflows of computations for given data sets, and finally maps them to available computing resources. The semantic representations are used to automatically generate descriptions for each of the thousands of new data products. We interleave the creation of the workflow with its execution, which allows intermediate execution data products to influence the generation of the following portions of the workflow. We have implemented this approach in Wings, a workflow creation system that combines semantic representations with planning techniques. We have used Wings to create workflows of thousands of computations, which are submitted to the Pegasus mapping system for execution over distributed computing environments. We show results on an earthquake simulation workflow that was automatically created with a total number of 24,135 jobs and that executed for a total of 1.9 CPU years.
Subjects: 1.6 Engineering And Science; 11. Knowledge Representation
Submitted: Apr 3, 2007