Viewing Vision-Language Integration as a Double-Grounding Case

Katerina Pastra

While vision-language integration is important for a wide range of Artificial Intelligence (AI) prototypes and applications, the notion of integration has not been established within a theoretical framework that would allow for more thorough research on the issue. In this paper, we attempt to explore the reasons that dictate this content integration by bringing together Searle’s theory of intentionality, the symbol grounding problem, as well as arguments regarding the nature of images and language developed within different AI subfields. In doing so, the Double-Grounding theory emerges which provides an explanatory theoretical definition for vision-language integration. In correlating the need for vision-language integration with inherent characteristics of the integrated media and in associating this need with an agent’s intentionality and intelligence, the work presented in this paper aims at providing a theoretically established and therefore solid common ground for currently isolated and scattered multimedia integration research in AI subfields.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.