Evaluating Visual Reasoning through Grounded Language Understanding

Authors

  • Alane Suhr Cornell University
  • Mike Lewis Facebook
  • James Yeh
  • Yoav Artzi Cornell University

DOI:

https://doi.org/10.1609/aimag.v39i2.2796

Abstract

Autonomous systems that understand natural language must reason about complex language and visual observations. Key to making progress towards such systems is the availability of benchmark datasets and tasks. We introduce the Cornell Natural Language Visual Reasoning (NLVR) corpus, which targets reasoning skills like counting, comparisons, and set theory. NLVR contains 92,244 examples of natural language statements paired with synthetic images and annotated with boolean values for the simple task of determining whether the sentence is true or false about the image. While it presents a simple task, NLVR has been developed to challenge systems with diverse linguistic phenomena and complex reasoning. Linguistic analysis confirms that NLVR presents diversity and complexity beyond what is provided by contemporary benchmarks. Empirical evaluation of several methods further demonstrates the open challenges NLVR presents.

Author Biography

James Yeh

Cornell University

Downloads

Published

2018-07-01

How to Cite

Suhr, A., Lewis, M., Yeh, J., & Artzi, Y. (2018). Evaluating Visual Reasoning through Grounded Language Understanding. AI Magazine, 39(2), 45-52. https://doi.org/10.1609/aimag.v39i2.2796

Issue

Section

Articles