An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Jiangjie ChenSiyu YuanRong YeKyle RichardsonKyle Richardson

2023

arXiv

Can Large Language Models (LLMs) simulate human behavior in complex environments? LLMs have recently been shown to exhibit advanced reasoning skills but much of NLP evaluation still relies on static…

SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

Favyen BastaniPiper WoltersRitwik GuptaAniruddha Kembhavi

2023

ICCV

Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of…

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Ori YoranTomer WolfsonOri RamJonathan Berant

2023

ICLR

Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that…

The Surveillance AI Pipeline

Pratyusha Ria KalluriWilliam AgnewM. ChengA. Birhane

2023

arXiv

A rapidly growing number of voices have argued that AI research, and computer vision in particular, is closely tied to mass surveillance. Yet the direct path from computer vision research to…

When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

Orion WellerKyle LoDavid WaddenLuca Soldaini

2023

arXiv

Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or…

A machine learning parameterization of clouds in a coarse-resolution climate model for unbiased radiation

Brian HennY. R. JaureguiSpencer K. ClarkC. Bretherton

2023

ESSOAr

Coarse-grid weather and climate models rely particularly on parameterizations of cloud fields, and coarse-grained cloud fields from a fine-grid reference model are a natural target for a…

PromptCap: Prompt-Guided Task-Aware Image Captioning

Yushi HuHang HuaZhengyuan YangJiebo Luo

2023

ICCV • Proceedings

Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly…

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Yushi HuBenlin LiuJungo KasaiNoah A. Smith

2023

ICCV • Proceedings

Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text…

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

Nirbhay ModheQiaozi GaoA. KalyanG. Sukhatme

2023

arXiv.org

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free…

Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms

Organizer of Queer In AINathaniel DennlerAnaelia OvalleJessica de Jesus de Pinho Pinhal

2023

AIES

Bias evaluation benchmarks and dataset and model documentation have emerged as central processes for assessing the biases and harms of artificial intelligence (AI) systems. However, these auditing…

Previous181-190Next