Imitation Learning from Observation
Humans and other animals have a natural ability to learn skills from observation, often simply from seeing the effects of these skills: without direct knowledge of the underlying actions being taken. For example, after observing an actor doing a jumping jack, a child can copy it despite not knowing anything about what's going on inside the actor's brain and nervous system. The main focus of this thesis is extending this ability to artificial autonomous agents, an endeavor recently referred to as "imitation learning from observation." Imitation learning from observation is especially relevant today due to the accessibility of many online videos that can be used as demonstrations for robots. Meanwhile, advances in deep learning have enabled us to solve increasingly complex control tasks mapping visual input to motor commands. This thesis contributes algorithms that learn control policies from state-only demonstration trajectories. Two types of algorithms are considered. The first type begins by recovering the missing action information from demonstrations and then leverages existing imitation learning algorithms on the full state-action trajectories. Our preliminary work has shown that learning an inverse dynamics model of the agent in a self-supervised fashion and then inferring the actions performed by the demonstrator enables sufficient action recovery for this purpose. The second type of algorithm uses model-free end-to-end learning. Our preliminary results indicate that iteratively optimizing a policy based on the closeness of the imitator's and expert's state transitions leads to a policy that closely mimics the demonstrator's trajectories.