Tat-Seng Chua and Jimin Liu, National University of Singapore
Named entity (NE) extraction in Chinese is very difficult task because of the flexibility in the language structure and uncertainty in word segmentation. It is equivalent to relation and information extraction problems in English. This paper presents a hybrid rule induction approach to extract NEs in Chinese. The method induces rules and names and their context, and generalizes these rules using linguistic lexical chaining. In order to handle the ambiguities and other contextual problems peculiar to Chinese, we supplement the basic method with other approaches such as the default-exception tree and decision tree. We tested our method on the MET2 test set and the method has been found to out-perform all reported methods with an overall F1 measure of over 91%.