Maria Lapata, University of Edinburgh
This paper discusses the interpretation of compound nouns in domain independent wide-coverage text. We focus on the interpretation of nominalizations, i.e., compounds whose head noun is a nominalized verb and whose prenominal modifier is derived from either the underlying subject or direct object of this verb (Levi 1978). Examples of nominalizations are given in (1)-(3). (1) datum holder (2) neighbour behaviour (3) reader reception Any attempt to automatically interpret nominalizations needs to take into account: (a) the selectional constraints imposed by the deverbal compound head, (b) the fact that these constraints can be easily overridden by contextual or pragmatic factors, and (c) the fact that the relation of the modifier and the head noun can be ambiguous out of context (see example (3)). The interpretation of nominalizations poses a challenge for probabilistic approaches since the argument relations between a head and its modifier are not readily available in the corpus. We present a probabilistic model which treats the interpretation task as a disambiguation problem. We show how the severe sparse data problems can be overcome by using partial parsing, smoothing techniques and domain independent taxonomic information (e.g., WordNet). We report on the results of four experiments which achieve a combined precision of 80% over a baseline of 59% on the British National Corpus, a 100 million word collection of samples of written and spoken language from a wide range of sources designed to represent a wide cross-section of current British English.