Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning

Shramay PaltaNishant BalepurPeter RankelRachel Rudinger
2024
EMNLP Findings

Questions involving commonsense reasoning about everyday situations often admit many possible or plausible answers. In contrast, multiple-choice question (MCQ) benchmarks for commonsense reasoning… 

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging

Clara NaIan MagnussonAnanya Harsh JhaPradeep Dasigi
2024
EMNLP

Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data… 

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Ben BoginKejuan YangShashank GuptaTushar Khot
2024
EMNLP

Given that Large Language Models (LLMs) have made significant progress in writing code, can they now be used to autonomously reproduce results from research repositories? Such a capability would be… 

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

Wenhao YuMeng JiangPeter ClarkAshish Sabharwal
2023
EMNLP

Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and… 

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Amanpreet SinghMike D'ArcyArman CohanSergey Feldman
2023
EMNLP

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these… 

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Benjamin NewmanLuca SoldainiRaymond FokKyle Lo
2023
EMNLP

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may… 

Crystal: Introspective Reasoners Reinforced with Self-Feedback

Jiacheng LiuRamakanth PasunuruHannaneh HajishirziAsli Celikyilmaz
2023
EMNLP

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the… 

Demystifying Prompts in Language Models via Perplexity Estimation

Hila GonenSrini IyerTerra BlevinsLuke Zettlemoyer
2023
EMNLP Findings

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand… 

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Orevaoghene AhiaSachin KumarHila GonenYulia Tsvetkov
2023
EMNLP

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The… 

Editing Common Sense in Transformers

Anshita Gupta*Debanjan Mondal*Akshay Krishna Sheshadri*Niket Tandon*
2023
EMNLP

Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on…