Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca SoldainiRodney KinneyAkshita BhagiaKyle Lo
2024
ACL 2024

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often… 

MARG: Multi-Agent Review Generation for Scientific Papers

Mike D'ArcyTom HopeLarry BirnbaumDoug Downey
2024
arXiv.org

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By… 

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong WangHamish IvisonPradeep DasigiHanna Hajishirzi
2023
NeurIPS

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with… 

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Amanpreet SinghMike D'ArcyArman CohanSergey Feldman
2023
EMNLP

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these… 

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Benjamin NewmanLuca SoldainiRaymond FokKyle Lo
2023
EMNLP

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may… 

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Kyle LoZejiang ShenBenjamin NewmanLuca Soldaini
2023
EMNLP

Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in… 

RCT Rejection Sampling for Causal Estimation Evaluation

Katherine A. KeithSergey FeldmanDavid JurgensRohit Bhattacharya
2023
Transactions on Machine Learning Research

Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the… 

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

Arie CattanTom HopeDoug DowneyIdo Dagan
2023
Conference on Empirical Methods in Natural Language Processing

Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference… 

CARE: Extracting Experimental Findings From Clinical Literature

Aakanksha NaikBailey KuehlErin BransomTom Hope
2023
arXiv.org

Extracting fine-grained experimental findings from literature can provide massive utility for scientific applications. Prior work has focused on developing annotation schemas and datasets for… 

LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks

Mihir ParmarAakanksha NaikHimanshu GuptaChitta Baral
2023
arXiv.org

Many large language models (LLMs) for medicine have largely been evaluated on short texts, and their ability to handle longer sequences such as a complete electronic health record (EHR) has not been…