We formalize a model for supervised learning of action strategies in dynamic stochastic domains, and show that pat-learning results on Occam algorithms hold in this model as well. We then identify a particularly useful bias for action strategies based on production rule systems. We show that a subset of production rule systems, including rules in predicate calculus style, small hidden state, and unobserved support predicates, is properly learnable. The bias we introduce enables the learning algorithm to invent the recursive support predicates which are used in the action strategy, and to reconstruct the internal state of the strategy. It is also shown that hierarchical strategies are learnable if a helpful teacher is available, but that otherwise the problem is computationally hard.