Learn more about AI2's Lasting Impact Award
All Years
Viewing 11-20 of 144 papers
  • Critical Thinking for Language Models

    Gregor Betz, Christian Voigt, Kyle RichardsonIWCS2021 This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic text corpus of deductively valid arguments, and use this artificial argument corpus to train and evaluate GPT-2… more
  • Temporal Reasoning on Implicit Events from Distant Supervision

    Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, D. RothNAACL2021 Existing works on temporal reasoning among events described in text focus on modeling relationships between explicitly mentioned events and do not handle event end time effectively. However, human readers can infer from natural language text many implicit… more
  • Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

    Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish SabharwalNAACL2021 A common approach to solve complex tasks is by breaking them down into simple sub-problems that can then be solved by simpler modules. However, these approaches often need to be designed and trained specifically for each complex task. We propose a general… more
  • Multi-Modal Answer Validation for Knowledge-Based VQA

    Jialin Wu, Jiasen Lu, Ashish Sabharwal, R. MottaghiarXiv2021 The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in a variety of forms, including visual, textual, and commonsense… more
  • Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions

    Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hanna HajishirziarXiv2021 Can we enable NLP models to appropriately respond to instructional prompts and consequently generalize to new tasks? To study this question, we leverage the existing NLP datasets and the instructions that were used to crowdsource them to create… more
  • Enriching a Model's Notion of Belief using a Persistent Memory

    Nora Kassner, Oyvind Tafjord, H. Schutze, P. ClarkarXiv2021 Although pretrained language models (PTLMs) have been shown to contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after using specialized training techniques to reduce inconsistency. As a… more
  • Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

    G. Betz, Kyle Richardson, Christian VoigtarXiv2021 Thinking aloud is an effective meta-cognitive strategy human reasoners apply to solve difficult problems. We suggest to improve the reasoning ability of pre-trained neural language models in a similar way, namely by expanding a task’s context with problem… more
  • Information to Wisdom: Commonsense Knowledge Extraction and Compilation

    Simon Razniewski, Niket Tandon, Aparna S. Varde WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining2021 Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt's birth date, or Angelina Jolie's movie awards, has… more
  • Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, B. D. Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, P. ClarkarXiv2021 We present the ARC-DA dataset, a direct-answer (“open response”, “freeform”) version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world… more
  • GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

    Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. WeldarXiv2021 Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks which can be reliably evaluated in an automatic… more
All Years