Statistically Guided Work Sense Disambiguation

Authors

Elizabeth Liddy and Woojin Paik

Track:

Contents

Downloads:

Abstract:

Within the field of Natural Language Processing, lexical disambiguation remains one of the toughest hurdles to overcome in the development of fully operational systems. As part of a larger document detection system (DR-LINK), we have implemented a computational approximation of word sense disambiguation by combining information from a machine-readable dictionary, local context, and corpus statistics. We use the Subject-Field Codes (SFC) extracted from a machine-readable dictionary produce a preliminary, multi-tagged semantic coding of words in a text. Then we apply local heuristics that evaluate the SFCs of ambiguous words to choose among the multiple SFCs. Choices which cannot be made using local heuristics are resolved by statistical evidence, namely, an SFC correlation matrix that was generated by processing a corpus of 977 Wall Street Journal (WSJ) articles containing 442,059 words. The implementation was tested on a sample of 1638 words from the WSJ and selected the correct SFC 89% of the time. The resultant, disambiguated SFC frequencies are summed and normalized to produce a weighted semantic vector representation of each text. These SFC vectors provide the basis on which the system automatically classifies texts as the first stage in DR-LINK.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.