Meta Learning for Image Captioning
Reinforcement learning (RL) has shown its advantages in image captioning by optimizing the non-differentiable metric directly in the reward learning process. However, due to the reward hacking problem in RL, maximizing reward may not lead to better quality of the caption, especially from the aspects of propositional content and distinctiveness. In this work, we propose to use a new learning method, meta learning, to utilize supervision from the ground truth whilst optimizing the reward function in RL. To improve the propositional content and the distinctiveness of the generated captions, the proposed model provides the global optimal solution by taking different gradient steps towards the supervision task and the reinforcement task, simultaneously. Experimental results on MS COCO validate the effectiveness of our approach when compared with the state-of-the-art methods.