Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
On-the-fly Definition Augmentation of LLMs for Biomedical NER
Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out…
Personalized Jargon Identification for Enhanced Interdisciplinary Communication
Scientific jargon can impede researchers when they read materials from other domains. Current methods of jargon identification mainly use corpus-level familiarity indicators (e.g., Simple Wikipedia…
Promptly Predicting Structures: The Return of Inference
Prompt-based methods have been used extensively across NLP to build zero- and few-shot label predictors. Many NLP tasks are naturally structured: that is, their outputs consist of multiple labels…
QualEval: Qualitative Evaluation for Model Improvement
Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have…
The Bias Amplification Paradox in Text-to-Image Generation
Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by…
To Tell The Truth: Language of Deception and Language Models
Text-based false information permeates online discourses, yet evidence of people’s ability to discern truth from such deceptive textual content is scarce. We analyze a novel TV game show data where…
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common,…
Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts
Research-paper blog posts help scientists to disseminate their work to a larger audience, but translating scientific long documents into long-form summaries like blog posts raises unique challenges:…
OLMES: A Standard for Language Model Evaluations
Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to…
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in…