D. Magee, C. J. Needham, P. Santos, A. G. Cohn, and D. C. Hogg
A framework for autonomous (human-like) learning of object, event and protocol models from audio-visual data, for use by an artificial "cognitive agent," is presented. This is motivated by the aim of creating a synthetic agent that can observe a scene containing unknown objects and agents, operating under unknown spatio-temporal motion protocols, and learn models of these objects and protocols sufficient to act in accordance with the implicit protocols presented to it. The framework supports low-level (continuous) statistical learning methods, for object learning, and higher-level (symbolic) learning for sequences of events representing implicit temporal protocols (analogous to grammar learning). Symbolic learning is performed using the "Progol" Inductive Logic Programming (ILP) system to generalise a symbolic data set, formed using the lower level (continuous) methods. The subsumption learning approach employed by the ILP system allows for generalisations of concepts such as equality, transitivity and symmetry, not easily generalised using standard statistical techniques, and for the automatic selection of relevant configural and temporal information. The system is potentially applicable to a wide range of domains, and is demonstrated in multiple simple game playing scenarios, in which the agent first observes a human playing a game (including vocal facial expression), and then attempts game playing based on the low level (continuous) and high level (symbolic) generalisations it has formulated.