Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 926 papers
  • Self-Refine: Iterative Refinement with Self-Feedback

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, Peter ClarkNeurips2023 Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback…
  • SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

    Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdinando, Aniruddha KembhaviICCV2023 Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes…
  • A machine learning parameterization of clouds in a coarse-resolution climate model for unbiased radiation

    Brian Henn, Y. R. Jauregui, Spencer K. Clark, Noah Brenowitz, J. McGibbon, Oliver Watt‐Meyer, Andrew G. Pauling, C. BrethertonESSOAr2023 Coarse-grid weather and climate models rely particularly on parameterizations of cloud fields, and coarse-grained cloud fields from a fine-grid reference model are a natural target for a machine-learned parameterization. We machine-learn the coarsened-fine…
  • PromptCap: Prompt-Guided Task-Aware Image Captioning

    Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo LuoICCV • Proceedings2023 Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their strong knowledge…
  • TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

    Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A. SmithICCV • Proceedings2023 Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation…
  • Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

    Nirbhay Modhe, Qiaozi Gao, A. Kalyan, Dhruv Batra, G. Thattai, G. SukhatmearXiv.org2023 Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based…
  • The Bias Amplification Paradox in Text-to-Image Generation

    P. Seshadri, Sameer Singh, Yanai ElazararXiv2023 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…
  • Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms

    Organizer of Queer In AI, Nathaniel Dennler, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu, William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, Jessica de Jesus de Pinho PinhalAIES2023 Bias evaluation benchmarks and dataset and model documentation have emerged as central processes for assessing the biases and harms of artificial intelligence (AI) systems. However, these auditing processes have been criticized for their failure to integrate…
  • LEXPLAIN: Improving Model Explanations via Lexicon Supervision

    Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, Noah A. Smith*SEM • Proceedings2023 Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s…
  • When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

    Alex Mallen, Akari Asai, Victor Zhong, R. Das, Daniel Khashabi, Hannaneh Hajishirzi, Annual Meeting of the Association for Computational Linguistics2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…