Larry A. Rendell
The author’s original state-space learning system (based on a probabilistic performance measure clustered in feature space) was effective in optimizing parameterized linear evaluation functions. However, more accurate probability estimates would allow stabilization in cases of strong feature interactions. To attain this accuracy and stability, a second level of learning is added, a genetic (parallel) algorithm which supervises multiple activations of the original system. This scheme is aided by the probability clusters themselves. These structures are intermediate between the detailed performance statistics and the more general heuristic, and they estimate an absolute quantity independently of one another. Consequently the system allows both credit localization at this mediating level of knowledge and feature interaction at the derived heuristic level. Early experimental results have been encouraging. As predicted by the analysis, stability is very good.