Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 298 papers
  • Detection and Measurement of Syntactic Templates in Generated Text

    Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. WallacearXiv2024 Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic…
  • Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG

    William Merrill, Noah A. Smith, Yanai ElazararXiv2024 How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate /n/-grams from their training data, evaluating both (i) the probability LMs assign to complete…
  • Evaluating In-Context Learning of Libraries for Code Generation

    Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep DasigiNAACL2024 Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work…
  • The Bias Amplification Paradox in Text-to-Image Generation

    P. Seshadri, Sameer Singh, Yanai ElazarNAACL2024 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…
  • Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

    Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel StanovskyarXiv2024 Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications…
  • BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

    Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh HajishirziICLR2024 Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of…
  • MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chun-yue Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng GaoICLR2024 Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we…
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh HajishirziICLR2024 Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that…
  • SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

    Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke ZettlemoyerICLR2024 The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government…
  • TRAM: Bridging Trust Regions and Sharpness Aware Minimization

    Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao PengICLR2024 By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of…