Kristina Lerman, Anon Plangprasopchok, Craig A. Knoblock
Information integration systems combine data from multiple heterogeneous Web services to answer complex user queries, provided a user has semantically modeled the service first. To model a service, the user has to specify semantic types of the input and output data it uses and its functionality. As large number of new services come online, it is impractical to require the user to come up with a semantic model of the service or rely on the service providers to conform to a standard. Instead, we would like to automatically learn the semantic model of a new service. This paper addresses one part of the problem: namely, automatically recognizing semantic types of the data used by Web services. We describe a metadata-based classification method for recognizing input data types using only the terms extracted from a Web Service Definition file. We then verify the classifier's predictions by invoking the service with some sample data of that type. Once we discover correct classification, we invoke the service to produce output data samples. We then use content-based classifiers to recognize semantic types of the output data. We provide performance results of both classification methods and validate our approach on several live Web services.
Subjects: 12. Machine Learning and Discovery