An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

SciBERT: A Pretrained Language Model for Scientific Text

Iz BeltagyKyle LoArman Cohan

2019

EMNLP

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to…

Pretrained Language Models for Sequential Sentence Classification

Arman CohanIz BeltagyDaniel KingDaniel S. Weld

2019

EMNLP

As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in…

Don't paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing

Jonathan HerzigJonathan Berant

2019

EMNLP

A major hurdle on the road to conversational interfaces is the difficulty in collecting data that maps language utterances to logical forms. One prominent approach for data collection has been to…

Global Reasoning over Database Structures for Text-to-SQL Parsing

Ben BoginMatt GardnerJonathan Berant

2019

EMNLP

State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time (zero-shot), the parser…

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

Peter WestAri HoltzmanJan BuysYejin Choi

2019

EMNLP

The principle of the Information Bottleneck (Tishby et al. 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel…

WIQA: A dataset for "What if..." reasoning over procedural text

Niket TandonBhavana Dalvi MishraKeisuke SakaguchiPeter Clark

2019

EMNLP

We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. WIQA contains three parts: a collection of paragraphs each describing a process, e.g., beach erosion;…

Low-Resource Parsing with Crosslingual Contextualized Representations

Phoebe MulcaireJungo KasaiNoah A. Smith

2019

CoNLL

Despite advances in dependency parsing, languages with small treebanks still present challenges. We assess recent approaches to multilingual contextual word representations (CWRs), and compare them…

On the Limits of Learning to Actively Learn Semantic Representations

Omri KoshorekGabriel StanovskyYichu ZhouVivek Srikumar and Jonathan Berant

2019

CoNLL

One of the goals of natural language understanding is to develop models that map sentences into meaning representations. However, training such models requires expensive annotation of complex…

Y'all should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts

Gabriel StanovskyRonen Tamari

2019

EMNLP • W-NUT

Distinguishing between singular and plural "you" in English is a challenging task which has potential for downstream applications, such as machine translation or coreference resolution. While formal…

Universal Adversarial Triggers for Attacking and Analyzing NLP

Eric WallaceShi FengNikhil KandpalSameer Singh

2019

EMNLP

dversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a…

Previous821-830Next