An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Bill Yuchen LinYicheng FuKarina YangXiang Ren

2023

NeurIPS

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage…

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Amanpreet SinghMike D'ArcyArman CohanSergey Feldman

2023

EMNLP

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these…

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Benjamin NewmanLuca SoldainiRaymond FokKyle Lo

2023

EMNLP

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may…

Crystal: Introspective Reasoners Reinforced with Self-Feedback

Jiacheng LiuRamakanth PasunuruHannaneh HajishirziAsli Celikyilmaz

2023

EMNLP

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the…

Demystifying Prompts in Language Models via Perplexity Estimation

Hila GonenSrini IyerTerra BlevinsLuke Zettlemoyer

2023

EMNLP Findings

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand…

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Orevaoghene AhiaSachin KumarHila GonenYulia Tsvetkov

2023

EMNLP

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The…

Editing Common Sense in Transformers

Anshita Gupta*Debanjan Mondal*Akshay Krishna Sheshadri*Niket Tandon*

2023

EMNLP

Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on…

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Sewon MinKalpesh KrishnaXinxi LyuHannaneh Hajishirzi

2023

EMNLP

Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of…

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Hyunwoo KimMelanie SclarXuhui ZhouMaarten Sap

2023

EMNLP

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM…

Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

Sarah WiegreffeMatthew FinlaysonOyvind TafjordAshish Sabharwal

2023

EMNLP

When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer…

Previous142-151Next