Andrea L. Thomaz, Cynthia Breazeal
As robots become a mass consumer product, they will need to learn new skills by interacting with typical human users. Past approaches have adapted reinforcement learning (RL) to accept a human reward signal; however, we question the implicit assumption that people shall only want to give the learner feedback on its past actions. We present findings from a human user study showing that people use the reward signal not only to provide feedback about past actions, but also to provide future directed rewards to guide subsequent actions. Given this, we made specific modifications to the simulated RL robot to incorporate guidance. We then analyze and evaluate its learning performance in a second user study, and we report significant improvements on several measures. This work demonstrates the importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance.
Subjects: 12.1 Reinforcement Learning; 6. Computer-Human Interaction