Clustering Lexical Patterns Obtained from a Text Corpus

Howard W. Beck and Balaji Kumar

A system for lexical acquisition is presented where word meanings are represented by clusters of phrase patterns obtained from analysis of a text corpus. A sample of cases, in the form of a concordance of phrases in which a particular word occurs in the text, is used Ibr the basic analysis. Clustering techniques are used to group together cases having similar grammar and/or meaning. This view is that words obtain their meaning from the category describing this clustering of cases. This category is theory-based in that it contains a model to represent the word meaning at an abstract level, whereas the cases provide empirical evidence which confirm or disprove the model. A complex category evolves as more cases are encountered. Each new case matches to an existing category, or may dynamically alter existing categories as needed to account for the new case. An experimental system is presented which includes syntactic and semantic analysis of phrases obtained from text. It uses a hand-built lexicon and grammar to bootstrap a learning process. The ability to dynamically alter category structure through interpretation of new cases is shown as a way to build lexical structure semi-automatically.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.