Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 224 papers
Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep DasigiACL Findings • 2023 Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021a) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on…HINT: Hypernetwork Instruction Tuning for Efficient Few- and Zero-Shot Generalisation
Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew E. PetersACL • 2023 Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on…Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation
Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, D. Klakow, Yanai ElazarFindings of ACL 2023 • 2023 Few-shot fine-tuning and in-context learning are two alternative strategies for task adaptation of pre-trained language models. Recently, in-context learning has gained popularity over fine-tuning due to its simplicity and improved out-of-domain…Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
Yao Fu, Hao-Chun Peng, Tushar Khot, Mirella LapataarXiv.org • 2023 We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the…Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu, Hao-Chun Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR • 2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle LoEACL • 2023 While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers…Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling
Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hanna Hajishirzi, Sameer Singh, Roy FoxarXiv • 2023 Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world, which makes learning complex tasks with sparse rewards difficult. If initialized with knowledge of high-level subgoals and transitions between subgoals, RL…Does progress on ImageNet transfer to real-world datasets?
Alexander W. Fang, Simon Kornblith, Ludwig SchmidtarXiv • 2023 Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% - 83%) on six practical image classification datasets. In particular, we study datasets collected with…Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, J. JitsevarXiv • 2022 Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale…Continued Pretraining for Better Zero- and Few-Shot Promptability
Zhaofeng Wu, Robert L. Logan IV, Pete Walsh, Akshita Bhagia, Dirk Groeneveld, Sameer Singh, Iz BeltagyEMNLP • 2022 Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-specific parameters. Never-theless, these methods still often trail behind full model finetuning. In this work…