Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Editing Common Sense in Transformers

Anshita Gupta*Debanjan Mondal*Akshay Krishna Sheshadri*Niket Tandon*
2023
EMNLP

Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on… 

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Orevaoghene AhiaSachin KumarHila GonenYulia Tsvetkov
2023
EMNLP

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The… 

Demystifying Prompts in Language Models via Perplexity Estimation

Hila GonenSrini IyerTerra BlevinsLuke Zettlemoyer
2023
EMNLP Findings

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand… 

Measuring and Narrowing the Compositionality Gap in Language Models

Ofir PressMuru ZhangSewon MinMike Lewis
2023
EMNLP Findings

We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often… 

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Ximing LuFaeze BrahmanPeter WestYejin Choi
2023
EMNLP

Large language models excel at a variety of language tasks when prompted with examples or instructions. Yet controlling these models through prompting alone is limited. Tailoring language models… 

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Hyunwoo KimMelanie SclarXuhui ZhouMaarten Sap
2023
EMNLP

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM… 

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Kyle LoZejiang ShenBenjamin NewmanLuca Soldaini
2023
EMNLP

Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in… 

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Benjamin NewmanLuca SoldainiRaymond FokKyle Lo
2023
EMNLP

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may… 

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Sewon MinKalpesh KrishnaXinxi LyuHannaneh Hajishirzi
2023
EMNLP

Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of… 

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

Joongwon KimAkari AsaiGabriel IlharcoHannaneh Hajishirzi
2023
EMNLP

Recent work in NLP has shown promising results in training models on large amounts of tasks to achieve better generalization. However, it is not well-understood how tasks are related, and how…