Finding Phrases Rather Than Discovering Collocations: Searching Corpora for Dictionary Phrases

Debra S. Baddorf and Martha W. Evens

This paper describes our attempts to find information about phrases and their syntactic variants for inclusion in a computer lexicon. We started with a list of thirty phrases from the British Collins English Dictionary. After making a few phrase modifications to accommodate American usage, we searched for occurrences in the Gutenberg corpus, the Wall Street Journal (1987, 1988, and 1989) and the Department of Energy technical abstracts from the ACL-DCI CDROM. Finding syntactic variants of phrases forced us to allow variations in word order, tense and number, and to be flexible in looking for the smaller words of a phrase. We had to use flexible matching techniques to handle insertions or changes in adjectives, adverbs, and prepositions.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.