Cheng Niu, Wei Li, and Rohini K. Srihari
This paper presents a seed-driven, bootstrapping approachto domain porting that could be used to customize a genericinformation extraction (IE) capability for a specific domain.The approach taken is based on the existence of a robust,domain-independent IE engine that can continue to beenhanced, independent of any particular domain. Thisapproach combines the strengths of parsing-based symbolicrule learning and the high performance linear string-basedHidden Markov Model (HMM) to automatically derive acustomized IE system with balanced precision and recall.The key idea is to apply precision-oriented symbolic ruleslearned in the first stage to a large corpus in order toconstruct an automatically tagged training corpus. Thistraining corpus is then used to train an HMM to boost therecall. The experiments conducted in named entity (NE)tagging and relationship extraction show a performanceclose to the performance of supervised learning systems.