Schema Discovery for Semistructured Data

Ke Wang and Huiqing Liu

To formulate a meaningful query on semistructured data, such as on the Web, that matches some of the source’s structure, we need first to discover something about how the information is represented in the source. This is referred to as schema discovery and was considered for a single object recently. In the case of multiple objects, the task of schema discovery is to identify typical structuring information of those objects as a whole. We motivate the schema discovery in this general setting and propose a framework and algorithm for it. We apply the framework to a real Web database, the Internet Movies Database, to discover typical schema of most voted movies.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.