Automatic Generation of Background Text to Aid Classification

Authors

Sarah Zelikovitz

Robert Hafner

Published:

May 2004

Proceedings:

Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)

Volume

Issue:

Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)

Track:

All Papers

Downloads:

Download PDF

Abstract:

We illustrate that Web searches can often be utilized to generate background text for use with text classification. This is the case because there are frequently many pages on the World Wide Web that are relevant to particular text classification tasks. We show that an automatic method of creation of a secondary corpus of unlabeled but related documents can help decrease error rates in text categorization problems. Furthermore, if the test corpus is known, this related set of information can be tailored to match the particular categorization problem in a transductive approach. Our system uses WHIRL, a tool that combines database functionalities with techniques from the information retrieval literature. When there is a limited number of training examples, or the process of obtaining training examples is expensive or difficult, this method can be especially useful.

FLAIRS

Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)

ISBN 978-1-57735-201-3

Published by The AAAI Press, Menlo Park, California.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.