Stefan Edelkamp and Jürgen Eckerle
In contrast to off-line search algorithms such as A and IDA, in real-time heuristic search we have to commit a move within a limited search horizon or time. One well known algorithm in this class is RTA. An algorithm is said to learn if it improves its performance over successive problem trials. In RTA" the heuristic estimation is in general not admissible. Thus RTA has to be modified to a variant LRTA" that is capable of learning. The aim of the strategies proposed in this paper is to improve the estimations found in LRTA. First, we examine two new schemas forward updating and backward updating for LRTA*. Then we propose CRTA" which works similar to RTA* but terminates with admissible heuristic values. It is shown that the strategy used in CRTA* can be made efficiently. Combined with lazy evaluation updating CRTA* leads to an improved real time learning algorithm called SLRTA. Experimentally we show that CRTA* expands significantly less nodes than LRTA" and thus converges faster to the optimal values.