Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
SciBERT: A Pretrained Language Model for Scientific Text
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to…
Pretrained Language Models for Sequential Sentence Classification
As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in…
Don't paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing
A major hurdle on the road to conversational interfaces is the difficulty in collecting data that maps language utterances to logical forms. One prominent approach for data collection has been to…
Global Reasoning over Database Structures for Text-to-SQL Parsing
State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time (zero-shot), the parser…
BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
The principle of the Information Bottleneck (Tishby et al. 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel…
WIQA: A dataset for "What if..." reasoning over procedural text
We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. WIQA contains three parts: a collection of paragraphs each describing a process, e.g., beach erosion;…
Low-Resource Parsing with Crosslingual Contextualized Representations
Despite advances in dependency parsing, languages with small treebanks still present challenges. We assess recent approaches to multilingual contextual word representations (CWRs), and compare them…
On the Limits of Learning to Actively Learn Semantic Representations
One of the goals of natural language understanding is to develop models that map sentences into meaning representations. However, training such models requires expensive annotation of complex…
Y'all should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts
Distinguishing between singular and plural "you" in English is a challenging task which has potential for downstream applications, such as machine translation or coreference resolution. While formal…
Universal Adversarial Triggers for Attacking and Analyzing NLP
dversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a…