Un Yong Nahm and Raymond J. Mooney
Teat mining concerns looking for patterns in unstructured text. The related task of In/ormation Eztractio, (IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DxscoTEX (Discovery from Text EXtraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DmcoTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. However, this approach has problems when the same extracted entity or feature is represented by similar but not identical strings in different documents. Consequently, we also develop an alternate rule induction system called TexTRISE, that allows for partial matching of textual items. Encouraging preliminary results are presented on applying these techniques to a corpus of Internet documents.