Papers

Learn more about AI2's Lasting Impact Award
Viewing 81-90 of 216 papers
  • General-Purpose Question-Answering with Macaw

    Oyvind Tafjord, Peter ClarkarXiv2021 Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available. In response, we present MACAW, a versatile, generative question-answering (QA) system that we are making available to…
  • Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

    Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan BerantTACL2021 A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce STRATEGYQA, a question answering (QA) benchmark where the required reasoning steps…
  • ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

    Oyvind Tafjord, B. D. Mishra, P. ClarkFindings of ACL2021 Transformers have been shown to emulate logical deduction over natural language theories (logical rules expressed in natural language), reliably assigning true/false labels to candidate implications. However, their ability to generate implications of a theory…
  • ParsiNLU: A Suite of Language Understanding Challenges for Persian

    Daniel Khashabi, Arman Cohan, Siamak Shakeri, et al. TACL2021 Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely…
  • Critical Thinking for Language Models

    Gregor Betz, Christian Voigt, Kyle RichardsonIWCS2021 This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic text corpus of deductively valid arguments, and use this artificial argument corpus to train and evaluate GPT-2…
  • Temporal Reasoning on Implicit Events from Distant Supervision

    Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, D. RothNAACL2021 Existing works on temporal reasoning among events described in text focus on modeling relationships between explicitly mentioned events and do not handle event end time effectively. However, human readers can infer from natural language text many implicit…
  • Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

    Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish SabharwalNAACL2021 A common approach to solve complex tasks is by breaking them down into simple sub-problems that can then be solved by simpler modules. However, these approaches often need to be designed and trained specifically for each complex task. We propose a general…
  • Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

    G. Betz, Kyle Richardson, Christian VoigtarXiv2021 Thinking aloud is an effective meta-cognitive strategy human reasoners apply to solve difficult problems. We suggest to improve the reasoning ability of pre-trained neural language models in a similar way, namely by expanding a task’s context with problem…
  • Information to Wisdom: Commonsense Knowledge Extraction and Compilation

    Simon Razniewski, Niket Tandon, Aparna S. Varde WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining2021 Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt's birth date, or Angelina Jolie's movie awards, has…
  • Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, B. D. Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, P. ClarkarXiv2021 We present the ARC-DA dataset, a direct-answer (“open response”, “freeform”) version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world…