Papers

Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 263 papers
  • Stubborn Lexical Bias in Data and Models

    Sofia Serrano, Jesse Dodge, Noah A. SmithACL2023 In NLP, recent work has seen increased focus on spurious correlations between various features and labels in training data, and how these influence model behavior. However, the presence and effect of such correlations are typically examined feature by feature…
  • Task-aware Retrieval with Instructions

    Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau YihACL • Findings2023 We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can…
  • When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

    Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh HajishirziACL2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…
  • Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

    Li Lucy, Jesse Dodge, David Bamman, Katherine A. KeithFindings of ACL2023 Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring…
  • Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

    Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar KhotICML 2023, the Challenges in Deployable Generative AI workshop2023 As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging. This work proposes Chain-of-Thought Hub, an open-source evaluation suite on the multi-step reasoning capabilities of large…
  • Estimating the Causal Effect of Early ArXiving on Paper Acceptance

    Yanai Elazar, Jiayao Zhang, David Wadden, Boshen Zhang, Noah A. SmitharXiv.org2023 What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--2022) and…
  • Evaluating the Social Impact of Generative AI Systems in Systems and Society

    Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daum'e, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, A. Luccioni, Alberto Lusoli, Margaret Mitchell, J. Newman, Marie-Therese Png, A. Strait, Apostol T. VassilevarXiv.org2023 Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard…
  • Morphosyntactic probing of multilingual BERT models

    Judit Ács, Endre Hamerlik, Roy Schwartz, Noah A. Smith, András KornaiJournal of Natural Language Engineering2023 We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived…
  • Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

    Wanrong Zhu, Jack Hessel, Anas Awadalla, S. Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin ChoiarXiv.org2023 In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex…
  • Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

    Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, Hannaneh HajishirziACL 20232023 Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available. In this paper, we introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo…