AAAI Publications, Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Font Size: 
An Inherently Explainable Model for Video Activity Interpretation
Sathyanarayanan N. Aakur, Fillipe DM de Souza, Sudeep Sarkar

Last modified: 2018-06-20


The ability of artificial intelligence systems to offer explanations for its decisions is central to building user confidence and structuring smart human-machine interactions. Understanding the rationale behind such a system’s output helps in making an informed action based on a model’s prediction. In this paper, we introduce a novel framework integrating Grenandar’s pattern theory structures to produce inherently explainable, symbolic representations for video activity interpretation. These representations provide semantically coherent, rich interpretations of video activity using connected structures of detected (grounded) concepts, such as objects and actions, that are bound by semantics through background concepts not directly observed, i.e. contextualization cues. We use contextualization cues to establish semantic relationships among entities directly hypothesized from video signal, such as possible object and actions labels, and infer a deeper interpretation of events than what can be directly sensed. We demonstrate the viability of this idea on video data primarily from the cooking domain by introducing a dialog model that uses these interpretations as the source of knowledge to generate explanations grounded in both video data as well as semantic connections between concepts.


Explainable AI; Video Activity Interpretation; Pattern Theory

Full Text: PDF