Papers

Learn more about AI2's Lasting Impact Award
Viewing 31-40 of 991 papers
  • Kilometer-scale global warming simulations and active sensors reveal changes in tropical deep convection

    Maximilien Bolot, Lucas M. Harris, Kai-Yuan Cheng, Timothy M. Merlis, Peter N. Blossey, Christopher S. Bretherton, Spencer K. Clark, Alex Kaltenbaugh, Linjiong Zhou & Stephan Fueglistaler NPJ Climate and Atmospheric Science2023 Changes in tropical deep convection with global warming are a leading source of uncertainty for future climate projections. A comparison of the responses of active sensor measurements of cloud ice to interannual variability and next-generation global storm…
  • ACE: A fast, skillful learned global atmospheric model for climate prediction

    Oliver Watt‐Meyer, Gideon Dresdner, J. McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah Brenowitz, K. Kashinath, Michael S. Pritchard, B. Bonev, Matthew E. Peters, Christopher S. BrethertonNeurIPS • Tackling Climate Change with Machine Learning2023 Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing…
  • IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

    Wenhao Yu, Meng Jiang, Peter Clark, Ashish SabharwalEMNLP2023 Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we…
  • Probabilistic Precipitation Downscaling with Optical Flow-Guided Diffusion

    Prakhar Srivastava, Ruihan Yang, Gavin Kerrigan, Gideon Dresdner, Jeremy McGibbon, Christopher Bretherton, S. MandtarXiv2023 In climate science and meteorology, local precipitation predictions are limited by the immense computational costs induced by the high spatial resolution that simulation methods require. A common workaround is statistical downscaling (aka superresolution…
  • Self-Refine: Iterative Refinement with Self-Feedback

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, Peter ClarkNeurIPS2023 Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback…
  • A Logic for Expressing Log-Precision Transformers

    William Merrill, Ashish SabharwalNeurIPS2023 One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently…
  • Faith and Fate: Limits of Transformers on Compositionality

    Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jian, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, S. Welleck, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, Yejin ChoiNeurIPS2023 Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question…
  • Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

    Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hanna HajishirziNeurIPS2023 Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a…
  • How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

    Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hanna HajishirziNeurIPS2023 In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied…
  • RealTime QA: What's the Answer Right Now?

    Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Velocity Yu, Dragomir R. Radev, Noah A. Smith, Yejin Choi, Kentaro InuiNeurIPS2023 We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the current world, and QA systems need to answer questions about…