Charles Fox, Neil Girdhar, Kevin Gurney
Reinforcement Learning (RL) is a heuristic method for learning locally optimal policies in Markov Decision Processes (MDP). Its classical formulation (Sutton and Barto 1998) maintains point estimates of the expected values of states or state-action pairs. Bayesian RL (Dearden, Friedman, and Russell 1998) extends this to beliefs over values. However the concept of values sits uneasily with the original notion of Bayesian Net- works (BNs), which were defined (Pearl 1988) as hav- ing explicitly causal semantics. In this paper we show how Bayesian RL can be cast in an explicitly Bayesian Network formalism, making use of backwards-in-time causality. We show how the heuristic used by RL can be seen as an instance of a more general BN inference heuristic, which cuts causal links in the network and re- places them with non-causal approximate hashing links for speed. This view brings RL into line with stan- dard Bayesian AI concepts, and suggests similar hash- ing heuristics for other general inference tasks.
Subjects: 15.5 Decision Theory; 9.1 Causality
Submitted: Feb 24, 2008