Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong
In multiple criteriaMarkov Decision Processes (MDP) where multiple costs are incurred at every decision point, current methods solve them by minimising the expected primary cost criterion while constraining the expectations of other cost criteria to some critical values. However, systems are often faced with hard constraints where the cost criteria should never exceed some critical values at any time, rather than constraints based on the expected cost criteria. For example, a resource-limited sensor network no longer functions once its energy is depleted. Based on the semi-MDP (sMDP) model, we study the hard constrained (HC) problem in continuous time, state and action spaces with respect to both finite and infinite horizons, and various cost criteria. We show that the HCsMDP problem is NP-hard and that there exists an equivalent discrete-time MDP to every HCsMDP. Hence, classical methods such as reinforcement learning can solve HCsMDPs.