Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 31-40 of 292 papers
When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
Orion Weller, Kyle Lo, David Wadden, Dawn J Lawrie, Benjamin Van Durme, Arman Cohan, Luca SoldainiarXiv • 2023 Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular…PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo LuoICCV • Proceedings • 2023 Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their strong knowledge…TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A. SmithICCV • Proceedings • 2023 Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation…The Bias Amplification Paradox in Text-to-Image Generation
P. Seshadri, Sameer Singh, Yanai ElazararXiv • 2023 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…LEXPLAIN: Improving Model Explanations via Lexicon Supervision
Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, Noah A. Smith*SEM • Proceedings • 2023 Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s…When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Alex Mallen, Akari Asai, Victor Zhong, R. Das, Daniel Khashabi, Hannaneh Hajishirzi, Annual Meeting of the Association for Computational Linguistics • 2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep DasigiACL Findings • 2023 Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021a) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on…HINT: Hypernetwork Instruction Tuning for Efficient Few- and Zero-Shot Generalisation
Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew E. PetersACL • 2023 Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on…Reproducibility in NLP: What Have We Learned from the Checklist?
Ian H. Magnusson, Noah A. Smith, Jesse DodgeFindings of ACL • 2023 Scientific progress in NLP rests on the reproducibility of researchers' claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first…CREPE: Open-Domain Question Answering with False Presuppositions
Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer, Hannaneh HajishirziACL • 2023 When asking about unfamiliar topics, information seeking users often pose questions with false presuppositions. Most existing question answering (QA) datasets, in contrast, assume all questions have well defined answers. We introduce CREPE, a QA dataset…