Peter Gorniak and Bruce Blumberg
Many learning algorithms concern themselves with learning from large amounts of data without human interaction. Synthetic characters that interact with human beings present a wholly different problem: they must learn quickly from few examples provided by a non-expert teacher. Training must be intuitive, provide feedback, and still allow training of nontrivial new behaviours. We present a learning mechanism that allows an autonomous synthetic character to learn sequences of actions from natural interaction with a human trainer. The synthetic character learns from only a handful of training examples, in a realtime and complex environment. Building on an existing framework for training a virtual dog to perform single actions on command and explore its action and state space, we give the dog the ability to notice consistent reward patterns that follow sequences of actions. Using an approximate online algorithm to check the Markov property for an action, the dog can discover action sequences that reliably predict rewards and turn these sequences into actions, allowing them to be associated with speech commands. This framework leads to a natural and easy training procedure that is a version of Backward Chaining, a technique commonly used by animal trainers to teach sequences of actions.