Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases

Authors

Roberto Navigli

Università di Roma "La Sapienza"

Track:

All Papers

Downloads:

Abstract:

Linguistic resources are essential for the success of many AI tasks. Building a new lexical resource from scratch or combining heterogeneous resources is not only complex and time-consuming, but can also lead to knowledge inconsistency and redundancy. In this paper, we present a novel method for the large-scale semantic enrichment of a computational linguistic resource. To this end, with the aid of a controlled vocabulary, we identified a set of representative concepts, i.e. a restricted, but meaningful number of concepts from WordNet, such that each of them can replace any of its descendants in the taxonomical hierarchy without a substantial loss of information in natural language sentences (e.g. restaurant#1 is a representative for bistro#1 or cybercafe#1). Then, we manually enriched these representative concepts with collocations extracted from a variety of linguistic resources. After this manual step, representative concepts are still related with words, rather than with concepts (e.g. for taxi#1: fare, passenger, driver, etc.). The final step is to automatically disambiguate these terms, using a word sense disambiguation algorithm named Structural Semantic Interconnections (SSI). SSI is a knowledge-based WSD algorithm that is particularly performant when words in a context are highly semantically associated. As a result, the precision of this automatic disambiguation step is very high, to a point that residual disambiguation errors could be tolerated. In any case, since SSI provides semantic patterns to justify its sense choices, manual corrections by human annotators would be considerably facilitated, achieving a significant speed-up in semantic annotation. Furthermore, SSI helps in supporting a consistency of the lexical knowledge base.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.