Yves Kodratoff, Adrian Dimulescu and Ahmed Amrani
We are presently developing a tool to help field experts to design the series of steps they need in order to be able to recognize the linguistic instances of a set of concepts in texts relative to their field. This involves several challenging steps, from cleaning to concept tagging. Our approach relies on two basic principles. One is that only a field expert can develop tools able to solve these problems. It follows that the computer scientist should develop user-friendly tools enabling the expert to transfer the expertise to the programs. The second is that inductive tools must also be provided, otherwise the workload is so high that nothing substantial can ever be achieved. The difficulty is that induction has to take place from data that is both incomplete and very noisy; this is a well-known cause of failure in most of the existing inductive programs. In this paper, we shall describe the way we ask the expert and inductive techniques to cooperate in order to solve three of the crucial steps of knowledge extraction, namely the step of Part-of-Speech tagging, the step of terminology and the step of coreference resolution.