A Grammar Inference Algorithm for the World Wide Web

Authors

Terrance Goan

Nels Belson

and Oren Etzioni

Proceedings:

Machine Learning in Information Access

Volume

Issue:

Papers from the 1996 AAAI Spring Symposium

Track:

Contents

Downloads:

Download PDF

Abstract:

The World Wide Web is a treasure trove of information. The Web’s sheer scale makes automatic location and extraction of information appealing. However, much of the information lies bmied in documents designed for human consumption, such as home pages or product catalogs. Before software agents can extract nuggets of information from Web documents, they have to be able to recognize it despite the multitude of formats in which it may appear. In this paper, we take a machine learning approach to the problem. We explain why existing grammar inference techniques face difficulties in this domain, present a new technique, and demonstrate its success on examples drawn from the Web ranging from CMU Tech Report codes to bus schedules. Our algorithm is shown to learn target languages found on the Web in significantly fewer examples than in previous methods.

Spring

Papers from the 1996 AAAI Spring Symposium

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.