Learning to Query the Web

Authors

William W. Cohen

Yoram Singer

Track:

Contents

Downloads:

Abstract:

The World Wide Web (WWW) is filled with "resource directories"--i.e., documents that collect together links to all known documents on a specific topic. Keeping resource directories up-to-date is difficult because of the rapid growth in online documents. We propose using machine learning methods to address this problem. In particular, we propose to treat a resource directory as a list of positive examples of an unknown concept, and then use machine learning methods to construct from these examples a definition of the unknown concept. If the learned definition is in the appropriate form, it can be translated into a query, or series of queries, for a WWW search engine. This query can be used at a later date to detect any new instances of the concept. We present experimental results with two implemented systems, and two learning methods. One system is interactive, and is implemented as an augmented WWW browser; the other is a batch system, which can collect and label documents without any human intervention. The learning methods are the RIPPER rule learning system, and a rule-learning version of a new online weight allocation algorithm called the sleeping experts prediction algorithm. The experiments are performed on real data obtained from the WWW.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.