Vadim Bulitko, Ilya Levner, and Russ Greiner
Decision-making in practical domains is usually complex, as a coordinated sequence of actions is needed to reach a satisfactory state, and responsive, as no fixed sequence works for all cases -- instead we need to select actions after sensing the environment. At each step, a lookahead control policy chooses among feasible actions by envisioning their effects into the future and selecting the action leading to the most promising state. There are several challenges to producing the appropriate policy. First, when each individual state description is large, the policy may instead use a low-dimensional abstraction of the states. Second, in some situations the quality of the final state is not given, but can only be learned from data. Deeper lookahead typically selects actions that lead to higher-quality outcomes. Of course, as deep forecasts are computationally expensive, it is problematic when computational time is a factor. This paper makes this accuracy/efficiency tradeoff explicit, defining a system’s effectiveness in terms of both the quality of the returned response, and the computational cost. We then investigate how deeply a system should search, to optimize this ``type II'' performance criterion. Keywords: Decision-making, control policy, lookahead state tree expansion, abstraction, responsive image recognition, real-time best-first heuristic search, search horizon.