Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 1033 papers
  • MacGyver: Are Large Language Models Creative Problem Solvers?

    Yufei Tian, Abhilasha Ravichander, Lianhui Qin, Ronan Le Bras, Raja Marjieh, Nanyun Peng, Yejin Choi, Thomas L. Griffiths, Faeze BrahmanNAACL2024 We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. To this end, we create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems deliberately designed to trigger innovative…
  • NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge

    Phillip Howard, Junlin Wang, Vasudev Lal, Gadi Singer, Yejin Choi, Swabha SwayamdiptaNAACL2024 Comparative knowledge (e.g., steel is stronger and heavier than styrofoam) is an essential component of our world knowledge, yet understudied in prior literature. In this paper, we harvest the dramatic improvements in knowledge capabilities of language models…
  • On-the-fly Definition Augmentation of LLMs for Biomedical NER

    Monica Munnangi, Sergey Feldman, Byron C Wallace, Silvio Amir, Tom Hope, Aakanksha NaikNAACL 20242024 Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data…
  • Personalized Jargon Identification for Enhanced Interdisciplinary Communication

    Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal AugustNAACL2024 Scientific jargon can impede researchers when they read materials from other domains. Current methods of jargon identification mainly use corpus-level familiarity indicators (e.g., Simple Wikipedia represents plain language). However, researchers' familiarity…
  • Promptly Predicting Structures: The Return of Inference

    Maitrey Mehta, Valentina Pyatkin, Vivek SrikumarNAACL2024 Prompt-based methods have been used extensively across NLP to build zero- and few-shot label predictors. Many NLP tasks are naturally structured: that is, their outputs consist of multiple labels which constrain each other. Annotating data for such tasks can…
  • QualEval: Qualitative Evaluation for Model Improvement

    Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin KalyanNAACL2024 Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world…
  • The Bias Amplification Paradox in Text-to-Image Generation

    P. Seshadri, Sameer Singh, Yanai ElazarNAACL2024 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…
  • UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

    Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane SuhrNAACL2024 Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model…
  • SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

    Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yangtechnical report2024 Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and…
  • Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

    Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel StanovskyarXiv2024 Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications…