Skip to main content ->

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Shauli RavfogelYanai ElazarJacob GoldbergerYoav Goldberg
EMNLP • BlackboxNLP Workshop

Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic task. In this work, we tackle the task of unsupervised disentanglement… 

PySBD: Pragmatic Sentence Boundary Disambiguation

Nipun SadvilkarM. Neumann
EMNLP • NLP-OSS Workshop

In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical… 

Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

Dongyeop KangAndrew HeadRisham SidhuMarti A. Hearst
EMNLP • SDP workshop

The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition… 

The Extraordinary Failure of Complement Coercion Crowdsourcing

Yanai ElazarVictoria BasmovShauli RavfogelReut Tsarfaty
EMNLP • Insights from Negative Results in NLP Workshop

Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years. In this work, we follow known methodologies of collecting labeled data for the complement coercion… 

A Simple Yet Strong Pipeline for HotpotQA

Dirk GroeneveldTushar KhotMausamAshish Sabharwal

State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition,… 

UnifiedQA: Crossing Format Boundaries With a Single QA System

Daniel KhashabiSewon MinTushar KhotHannaneh Hajishirzi
Findings of EMNLP

Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit… 

Fact or Fiction: Verifying Scientific Claims

David WaddenKyle LoLucy Lu WangHannaneh Hajishirzi

We introduce the task of scientific fact-checking. Given a corpus of scientific articles and a claim about a scientific finding, a fact-checking model must identify abstracts that support or refute… 

TLDR: Extreme Summarization of Scientific Documents

Isabel CacholaKyle LoArman CohanDaniel S. Weld
Findings of EMNLP

We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression, requiring expert background knowledge and complex language understanding. To… 

SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

Tom HopeJason PortenoyKishore VasanJevin D. West
EMNLP • Demo

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of… 

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Vered ShwartzRachel RudingerOyvind Tafjord

Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g.,…