An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

Yoonjoo LeeHyeonsu B KangMatt LatzkePao Siangliulue

2024

CHI

With the rapid growth of scholarly archives, researchers subscribe to"paper alert"systems that periodically provide them with recommendations of recently published papers that are similar to…

CARE: Extracting Experimental Findings From Clinical Literature

Aakanksha NaikBailey KuehlErin BransomTom Hope

2024

NAACL 2024

Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects…

Estimating the Causal Effect of Early ArXiving on Paper Acceptance

Yanai ElazarJiayao ZhangDavid WaddenNoah A. Smith

2024

CLearR

What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this…

FigurA11y: AI Assistance for Writing Scientific Alt Text

Nikhil SinghLucy Lu WangJonathan Bragg

2024

IUI

High-quality alt text is crucial for making scientific figures accessible to blind and low-vision readers. Crafting complete, accurate alt text is challenging even for domain experts, as published…

OLMo: Accelerating the Science of Language Models

Dirk GroeneveldIz BeltagyPete WalshHanna Hajishirzi

2024

ACL 2024

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off,…

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca SoldainiRodney KinneyAkshita BhagiaKyle Lo

2024

ACL 2024

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often…

MARG: Multi-Agent Review Generation for Scientific Papers

Mike D'ArcyTom HopeLarry BirnbaumDoug Downey

2024

arXiv.org

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By…

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong WangHamish IvisonPradeep DasigiHanna Hajishirzi

2023

NeurIPS

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with…

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Amanpreet SinghMike D'ArcyArman CohanSergey Feldman

2023

EMNLP

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these…

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Benjamin NewmanLuca SoldainiRaymond FokKyle Lo

2023

EMNLP

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may…

Previous11-20Next