Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 813 papers
  • Lila: A Unified Benchmark for Mathematical Reasoning

    Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin KalyanEMNLP2022 Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning…
  • Statistical and Computational Guarantees for Influence Diagnostics

    Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid HarchaouiarXiv2022 Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets…
  • Abstract Visual Reasoning with Tangram Shapes

    Anya Ji, Noriyuki Kojima, N. Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav ArtziEMNLP2022
    Best Long Paper Award
    We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with > 1k distinct stimuli, is orders of…
  • Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

    Kaige Xie, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Multi-hop Question Answering (QA) is a chal-lenging task since it requires an accurate ag-gregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that…
  • Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems

    Yoshitomo Matsubara, Luca Soldaini, Eric Lind, Alessandro MoschittiFindings of EMNLP2022 Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high computational costs prevent their use in many real-world applications. In this pa-per, we explore the following research question: How can we make the AS2 models…
  • Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning

    Oyvind Tafjord, Bhavana Dalvi Mishra, Peter ClarkEMNLP2022 Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning . Such a capability would allow better understanding of why a model produced the answer it did. Our approach…
  • GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

    Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. WeldEMNLP2022 While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research. We revisit this problem with a focus on pro-ducing consistent evaluations that are reproducible —over time and across different…
  • How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

    Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah Smith, Roy SchwartzEMNLP Findings2022 The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as…
  • In-Context Learning for Few-Shot Dialogue State Tracking

    Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari OstendorfEMNLP Findings2022 Collecting and annotating task-oriented dialogues is time-consuming and costly. Thus, zero and few shot learning for dialogue tasks presents an exciting opportunity. In this work, we propose an in-context (IC) learning framework for zero-shot and few-shot…
  • Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning

    Xiangyu Peng, Siyan Li, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generating narratives over time, and critically lack basic commonsense reasoning…