Guidelines for Action Space Definition in Reinforcement Learning-Based Traffic Signal Control Systems
Previous works in the field of reinforcement learning applied to traffic signal control (RL-TSC) have focused on optimizing state and reward definitions, leaving the impact of the agent's action space definition largely unexplored. In this paper, we compare different types of TSC controllers – phase-based and step-based – in a simulated network featuring different traffic demand patterns in order to provide guidelines for optimally defining RL-TSC actions. Our results show that an agent's performance and convergence speed both increase with its interaction frequency with the environment. However, certain methods with lower observation frequencies – that can be achieved with realistic sensing technologies – have reasonably similar performance compared to higher frequency ones in all scenarios, and even outperform them under specific traffic conditions.