Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 51-60 of 292 papers
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh HajishirziACL • 2023 Large “instruction-tuned” language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in…Stubborn Lexical Bias in Data and Models
Sofia Serrano, Jesse Dodge, Noah A. SmithACL • 2023 In NLP, recent work has seen increased focus on spurious correlations between various features and labels in training data, and how these influence model behavior. However, the presence and effect of such correlations are typically examined feature by feature…Task-aware Retrieval with Instructions
Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau YihACL • Findings • 2023 We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can…When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh HajishirziACL • 2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
Li Lucy, Jesse Dodge, David Bamman, Katherine A. KeithFindings of ACL • 2023 Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring…Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar KhotICML 2023, the Challenges in Deployable Generative AI workshop • 2023 As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging. This work proposes Chain-of-Thought Hub, an open-source evaluation suite on the multi-step reasoning capabilities of large…ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug DowneyarXiv.org • 2023 Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways…Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daum'e, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, A. Luccioni, Alberto Lusoli, Margaret Mitchell, J. Newman, Marie-Therese Png, A. Strait, Apostol T. VassilevarXiv.org • 2023 Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard…Morphosyntactic probing of multilingual BERT models
Judit Ács, Endre Hamerlik, Roy Schwartz, Noah A. Smith, András KornaiJournal of Natural Language Engineering • 2023 We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived…Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
Wanrong Zhu, Jack Hessel, Anas Awadalla, S. Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin ChoiarXiv.org • 2023 In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex…