Nathaniel Scott Winstead
Historically, the accepted approach to control problems in physically complicated domains has been through machine learning, due to the fact that knowledge engineering in these domains can be extremely complicated. When the already physically complicated domain is also continuous and dynamical (possibly with composite and/or sequential goals), the learning task becomes even more difficult due to ambiguities of reward assignment in these domains. However, these continuous, complicated, dynamical domains can effectively be modeled discretely as Markov Decision Processes, which would suggest using a Temporal Difference learning approach on the problem. This is the traditional method of approaching these problems. In Temporal Difference learning, the value of a discrete action is defined to be the difference in some value (usually an expected reward) between the current state and and its predecessor state. However, in the problem of playing pinball, the traditional Temporal Difference methods converge slowly and perform poorly. This leads to the addition of knowledge engineering elements to the traditional Temporal Difference methods, which was previously considered difficult to do. However, by making straightforward, simple changes to the basic Temporal Difference algorithm to incorporate knowledge engineering I was able to both speed convergence of the algorithm, and greatly improve performance. Composite and/or sequential tasks may similarly be modeled as Markov Decision Processes, which again suggests the use of Temporal Difference learning. However, applying Temporal Difference learning to composite and/or sequential continuous, dynamical tasks has been traditionally viewed as a burdensome task, involving complex changes to the basic learning architecture. Again, one goal was to keep the learning architecture simple, despite the complexity of the environment. Starting with the existing composite Temporal Difference methods, I have constructed a new composite technique which maintains the elegance and simplicity of the Temporal Difference architecture, while enabling for learning of composite tasks.