Wrapper Induction: Efficiency and Expressiveness

Nick Kushmerick

Recently, many systems have been built that automatically interact with Internet information resources. However, these resources are usually formatted for use by people; eg, the relevant content is embedded in HTML pages. Wrappers are often used to extract a resource’s content, but hand-coding wrappers is tedious and error-prone. We advocate wrapper induction, a technique for automatically constructing wrappers. We have identified several wrapper classes that can be learned quickly (most sites require only a handful of examples, consuming a few CPU seconds of processing), yet which are useful for handling numerous Internet resources (70% of surveyed sites can be handled by our techniques).

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.