Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 263 papers
Stubborn Lexical Bias in Data and Models
Sofia Serrano, Jesse Dodge, Noah A. SmithACL • 2023 In NLP, recent work has seen increased focus on spurious correlations between various features and labels in training data, and how these influence model behavior. However, the presence and effect of such correlations are typically examined feature by feature…Task-aware Retrieval with Instructions
Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau YihACL • Findings • 2023 We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can…When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh HajishirziACL • 2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
Li Lucy, Jesse Dodge, David Bamman, Katherine A. KeithFindings of ACL • 2023 Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring…Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar KhotICML 2023, the Challenges in Deployable Generative AI workshop • 2023 As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging. This work proposes Chain-of-Thought Hub, an open-source evaluation suite on the multi-step reasoning capabilities of large…Estimating the Causal Effect of Early ArXiving on Paper Acceptance
Yanai Elazar, Jiayao Zhang, David Wadden, Boshen Zhang, Noah A. SmitharXiv.org • 2023 What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--2022) and…Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daum'e, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, A. Luccioni, Alberto Lusoli, Margaret Mitchell, J. Newman, Marie-Therese Png, A. Strait, Apostol T. VassilevarXiv.org • 2023 Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard…Morphosyntactic probing of multilingual BERT models
Judit Ács, Endre Hamerlik, Roy Schwartz, Noah A. Smith, András KornaiJournal of Natural Language Engineering • 2023 We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived…Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
Wanrong Zhu, Jack Hessel, Anas Awadalla, S. Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin ChoiarXiv.org • 2023 In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex…Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, Hannaneh HajishirziACL 2023 • 2023 Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available. In this paper, we introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo…