Papers

Learn more about AI2's Lasting Impact Award
Viewing 31-40 of 292 papers
  • When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

    Orion Weller, Kyle Lo, David Wadden, Dawn J Lawrie, Benjamin Van Durme, Arman Cohan, Luca SoldainiarXiv2023 Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular…
  • PromptCap: Prompt-Guided Task-Aware Image Captioning

    Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo LuoICCV • Proceedings2023 Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their strong knowledge…
  • TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

    Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A. SmithICCV • Proceedings2023 Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation…
  • The Bias Amplification Paradox in Text-to-Image Generation

    P. Seshadri, Sameer Singh, Yanai ElazararXiv2023 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…
  • LEXPLAIN: Improving Model Explanations via Lexicon Supervision

    Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, Noah A. Smith*SEM • Proceedings2023 Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s…
  • When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

    Alex Mallen, Akari Asai, Victor Zhong, R. Das, Daniel Khashabi, Hannaneh Hajishirzi, Annual Meeting of the Association for Computational Linguistics2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…
  • Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

    Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep DasigiACL Findings2023 Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021a) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on…
  • HINT: Hypernetwork Instruction Tuning for Efficient Few- and Zero-Shot Generalisation

    Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew E. PetersACL2023 Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on…
  • Reproducibility in NLP: What Have We Learned from the Checklist?

    Ian H. Magnusson, Noah A. Smith, Jesse DodgeFindings of ACL2023 Scientific progress in NLP rests on the reproducibility of researchers' claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first…
  • CREPE: Open-Domain Question Answering with False Presuppositions

    Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer, Hannaneh HajishirziACL2023 When asking about unfamiliar topics, information seeking users often pose questions with false presuppositions. Most existing question answering (QA) datasets, in contrast, assume all questions have well defined answers. We introduce CREPE, a QA dataset…