Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 996 papers
  • MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chun-yue Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng GaoICLR2024 Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we…
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh HajishirziICLR2024 Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that…
  • SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

    Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke ZettlemoyerICLR2024 The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government…
  • The Expressive Power of Transformers with Chain of Thought

    William Merrill, Ashish SabharwalICLR2024 Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after…
  • TRAM: Bridging Trust Regions and Sharpness Aware Minimization

    Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao PengICLR2024 By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of…
  • What's In My Big Data?

    Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse DodgeICLR2024 Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose…
  • CARE: Extracting Experimental Findings From Clinical Literature

    Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom HopeNAACL 20242024 Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the real-world complexity…
  • Estimating the Causal Effect of Early ArXiving on Paper Acceptance

    Yanai Elazar, Jiayao Zhang, David Wadden, Boshen Zhang, Noah A. SmithCLearR2024 What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--2022) and…
  • The precipitation response to warming and CO2 increase: A comparison of a global storm resolving model and CMIP6 models.

    Ilai Guendelman, Timothy M. Merlis, Kai-Yuan Cheng, Lucas M. Harris, Christopher S. Bretherton, Max Bolot, Lin Zhou, Alex Kaltenbaugh, Spencer K. Clark, Stephan FueglistalerGeophysical Research Letters2024 Global storm-resolving models (GSRMs) can explicitly resolve some of deep convection are now being integrated for climate timescales. GSRMs are able to simulate more realistic precipitation distributions relative to traditional CMIP6 models. In this study, we…
  • FigurA11y: AI Assistance for Writing Scientific Alt Text

    Nikhil Singh, Lucy Lu Wang, Jonathan BraggIUI2024 High-quality alt text is crucial for making scientific figures accessible to blind and low-vision readers. Crafting complete, accurate alt text is challenging even for domain experts, as published figures often depict complex visual information and readers…