Peter Vanderheyden and Robin Cohen
The task of information extraction (IE) calls for a limited understanding of text, limited by the demands of the user and the domain of inquiry -- IE returns specific instances of the concept or relation of interest to the user. Whereas IE systems have, to date, been oriented towards either system experts (e.g., computational linguists) or domain experts (e.g., professionals searching for information within the field of their profession), the availability of large amounts of on-line textual information to the casual user strongly suggests that techniques oriented towards nonexperts are needed. We present a review of current user-involvement techniques in IE, and begin to investigate issues of knowledge representation and learning in the development of a mixed-initiative information extraction system. In particular, we discuss some advantages of dividing the knowledge used by the IE system into a query model, a domain model and a corpus model, to assist casual users in interacting with the system. We also advocate flexibility in determining increments for learning, supporting negotiation between system and user.