Exploration With and Without a Map

Authors

Sven Koenig

Reid G. Simmons

Track:

Contents

Downloads:

Abstract:

We investigate the problem for an agent to reach one of a number of goal states by taking actions, where actions cause a deterministic state change. Initially, the topology of the state space and its size are unknown to the agent. We compare a zero-initialized version of Q-learning, that minimizes deliberation time between action executions, with other uninformed search algorithms (i.e. where the effects of actions are initially unknown). We show that the big-O worst-case complexity of every uninformed search algorithm over all domains is at least as large as the big-O worstcase complexity of Q-learning. However, learning and subsequently using a map of the state space can provide a search algorithm with an advantage over Q-learning in some (but not all) domains. Formally, we show that there exists an uninformed search algorithm that dominates Q-learning, i.e. always performs no worse, but performs strictly better in at least one case. In particular, there is at least one domain in which the number of action executions can be reduced by more than a constant factor.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.