AAAI Publications, Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Font Size: 
Corrigibility
Nate Soares, Benja Fallenstein, Stuart Armstrong, Eliezer Yudkowsky

Last modified: 2015-04-01

Abstract


As artificially intelligent systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system "corrigible" if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences. We introduce the notion of corrigibility and analyze utility functions that attempt to make an agent shut down safely if a shutdown button is pressed, while avoiding incentives to prevent the button from being pressed or cause the button to be pressed, and while ensuring propagation of the shutdown behavior as it creates new subsystems or self-modifies. While some proposals are interesting, none have yet been demonstrated to satisfy all of our intuitive desiderata, leaving this simple problem in corrigibility wide-open.

Full Text: PDF