Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

A Logic for Expressing Log-Precision Transformers

William MerrillAshish Sabharwal
2023
NeurIPS

One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed… 

Faith and Fate: Limits of Transformers on Compositionality

Nouha DziriXiming LuMelanie SclarYejin Choi
2023
NeurIPS

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures… 

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Zeqiu WuYushi HuWeijia ShiHanna Hajishirzi
2023
NeurIPS

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human… 

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong WangHamish IvisonPradeep DasigiHanna Hajishirzi
2023
NeurIPS

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with… 

RealTime QA: What's the Answer Right Now?

Jungo KasaiKeisuke SakaguchiYoichi TakahashiKentaro Inui
2023
NeurIPS

We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the… 

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Cheng-Yu HsiehJieyu ZhangZixian MaRanjay Krishna
2023
NeurIPS

In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem. Given an image, these benchmarks… 

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Bill Yuchen LinYicheng FuKarina YangXiang Ren
2023
NeurIPS

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage… 

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Amanpreet SinghMike D'ArcyArman CohanSergey Feldman
2023
EMNLP

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these… 

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Benjamin NewmanLuca SoldainiRaymond FokKyle Lo
2023
EMNLP

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may… 

Crystal: Introspective Reasoners Reinforced with Self-Feedback

Jiacheng LiuRamakanth PasunuruHannaneh HajishirziAsli Celikyilmaz
2023
EMNLP

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the…