Papers

Learn more about AI2's Lasting Impact Award
Viewing 61-70 of 292 papers
  • Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

    Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, Hannaneh HajishirziACL 20232023 Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available. In this paper, we introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo…
  • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

    Yao Fu, Hao Peng, Tushar Khot, Mirella LapataarXiv.org2023 We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the…
  • LeTI: Learning to Generate from Textual Interactions

    Xingyao Wang, Hao Peng, Reyhaneh Jabbarvand, Heng JiarXiv.org2023 Finetuning pre-trained language models (LMs) enhances the models' capabilities. Prior techniques fine-tune a pre-trained LM on input-output pairs (e.g., instruction fine-tuning), or with numerical rewards that gauge the quality of its outputs (e.g…
  • TESS: Text-to-Text Self-Conditioned Simplex Diffusion

    Rabeeh Karimi Mahabadi, Jaesung Tae, Hamish Ivison, J. Henderson, Iz Beltagy, Matthew E. Peters, Arman CohanarXiv2023 Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various domains with continuous-valued inputs. Despite the promises of fully non-autoregressive text generation, applying diffusion models to natural language…
  • Binding Language Models in Symbolic Languages

    Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao YuICLR • Proceedings2023 Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program…
  • Complexity-Based Prompting for Multi-Step Reasoning

    Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…
  • Editing Models with Task Arithmetic

    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali FarhadiICLR2023 Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for…
  • InSCIt: Information-Seeking Conversations with Mixed-Initiative Interactions

    Zeqiu Wu, Ryu Parish, Hao Cheng, Sewon Min, Prithviraj Ammanabrolu, Mari Ostendorf, Hannaneh HajishirziTACL2023 In an information-seeking conversation, a user may ask questions that are under-specified or unanswerable. An ideal agent would interact by initiating different response types according to the available knowledge sources. However, most current studies either…
  • Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

    Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin ChoiICLR2023 We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL…
  • LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

    Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle LoEACL2023 While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers…