Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = A T Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted L_p-norms (p>=1) of the approximation errors. The results extend usual analysis in L_infinity-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in L_p-norm, for p=1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.
Content Area: 13. Markov Decision Processes & Uncertainty
Subjects: 12.1 Reinforcement Learning; 9.3 Mathematical Foundations
Submitted: May 5, 2005