Gunter Grieser, Technical University Darmstadt, Germany and Steffen Lange, German Research Center for Artificial Intelligence Ltd., Germany
The number, the size, and the dynamics of lntemet information sources bears abundant evidence of the need of automation in information extraction (IE). This paper deals with the question of how such extraction mechanisms can automatically be created by invoking learning techniques. The underlying scenario of system-supported IE is putting certain constraints on the available training examples. Therefore, the traditional approaches to formal language learning do not capture the kind of problems to be solved when learning the corresponding extraction mechanisms. We illustrate the resulting differences by studying the problem of learning a particular type of extraction mechanisms (so-called island wrappers). We show how to decompose this learning problem into different subproblems that can be handled independently and in parallel. Moreover, we relate the learning problems on hand to the problems that learning theory papers originally address and point out what they have in common and where the differences are.