*Charles Fox, Neil Girdhar, Kevin Gurney*

Reinforcement Learning (RL) is a heuristic method for learning locally optimal policies in Markov Decision Processes (MDP). Its classical formulation (Sutton and Barto 1998) maintains point estimates of the expected values of states or state-action pairs. Bayesian RL (Dearden, Friedman, and Russell 1998) extends this to beliefs over values. However the concept of values sits uneasily with the original notion of Bayesian Net- works (BNs), which were defined (Pearl 1988) as hav- ing explicitly causal semantics. In this paper we show how Bayesian RL can be cast in an explicitly Bayesian Network formalism, making use of backwards-in-time causality. We show how the heuristic used by RL can be seen as an instance of a more general BN inference heuristic, which cuts causal links in the network and re- places them with non-causal approximate hashing links for speed. This view brings RL into line with stan- dard Bayesian AI concepts, and suggests similar hash- ing heuristics for other general inference tasks.

*Subjects: *15.5 Decision Theory; 9.1 Causality

*Submitted:* Feb 24, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.