Phoebus: A System for Extracting and Integrating Data from Unstructured and Ungrammatical Sources

Matthew Michelson, Craig A. Knoblock

With the proliferation of online classifieds and auctions comes a new need to meaningfully search and organize the items for sale. However, since the seller's item descriptions are not structured and do not conform to a standard set of values (think "Chevy" versus "Chevrolet"), searching and organizing this data is difficult. This paper describes a working demonstration of the Phoebus system which uses both record linkage and information extraction to parse out the meaningful attributes of an item description and assign them standard values. This allows the data to be sorted, searched and linked to other data sources where standard values for the attributes are required to link the sources together.

Subjects: 1.10 Information Retrieval

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.