Gregory D. Weber
Brickworld is a simulated environment which has been developed as a testbed for learning and planning--in particular, for learning and using knowledge of causal relations. The environment is both dynamic--there are other "agents" whose actions affect "the" agent’s performance--and stochastic---future states can be predicted only with uncertainty. The task, building and maintaining a wall, has been formulated as a reinforcement learning problem. The ultimate goal of the Brickworld project is to develop a relational reinforcement learning agent that will learn a causal model of the environment representing both its own causal powers and those of the other "agents." The term "agents" is used here in the broadest possible sense, including not only intelligent agents but brute animals and even natural forces such as wind and rain--anything that can be a cause of environmental change. This paper describes seven implemented agents-- a quasi-reactive agent, four non-learning rule-based agents, and two (non-relational) reinforcement learning agents--and compares their performance. The experiments show that a reasonable knowledge representation for the environment results in a state-value function which has local optima, making greedy and e-greedy policies inappropriate. Deeper search is required, leading to problems of inefficiency, which may be alleviated through hierarchical problem spaces. The paper raises questions about the legitimacy of programmerdesigned hierarchies in the framework of reinforcement learning and suggests a principled solution.