Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 175 papers
  • Lila: A Unified Benchmark for Mathematical Reasoning

    Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin KalyanEMNLP2022 Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning…
  • Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

    Kaige Xie, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Multi-hop Question Answering (QA) is a chal-lenging task since it requires an accurate ag-gregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that…
  • Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning

    Oyvind Tafjord, Bhavana Dalvi Mishra, Peter ClarkEMNLP2022 Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning . Such a capability would allow better understanding of why a model produced the answer it did. Our approach…
  • Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning

    Xiangyu Peng, Siyan Li, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generating narratives over time, and critically lack basic commonsense reasoning…
  • Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement

    Bhavana Dalvi Mishra, Oyvind Tafjord, Peter ClarkEMNLP2022 Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time. Our approach is to augment a QA model with a dynamic memory of…
  • One Venue, Two Conferences: The Separation of Chinese and American Citation Networks

    Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi SaphraNeurIPS • AI Cultures Workshop2022 At NeurIPS, American and Chinese institutions cite papers from each other’s regions substantially less than they cite endogamously. We build a citation graph to quantify this divide, compare it to European connectivity, and discuss the causes and consequences…
  • Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs

    Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna Shahaf, Ashish SabharwalEMNLP2022 Can we teach natural language understanding models to track their beliefs through intermediate points in text? We propose a representation learning framework called breakpoint modeling that allows for learning of this type. Given any text encoder and data…
  • Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts

    Ben Zhou, Kyle Richardson, Xiaodong Yu, Dan RothEMNLP2022 Explicit decomposition modeling, which involves breaking down complex tasks into more straightforward and often more interpretable sub-tasks, has long been a central theme in developing robust and interpretable NLU systems. However, despite the many datasets…
  • Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

    Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter ClarkEMNLP • The Third Workshop on Figurative Language Processing 2022 Figurative language (e.g., “he flew like the wind”) is challenging to understand, as it is hard to tell what implicit information is being conveyed from the surface form alone. We hypothesize that to perform this task well, the reader needs to mentally…
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish SabharwalarXiv2022 Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn…