Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
Research on parsing language to SQL has largely ignored the structure of the database (DB) schema, either because the DB was very simple, or because it was observed at both training and test time.…
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing
Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a…
Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at…
Sentence Mover's Similarity: Automatic Evaluation for Multi-Sentence Texts
For evaluating machine-generated texts, automatic methods hold the promise of avoiding collection of human judgments, which can be expensive and time-consuming. The most common automatic metrics,…
Is Attention Interpretable?
Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components' representations, it is also often assumed that attention…
SemEval-2019 Task 10: Math Question Answering
We report on the SemEval 2019 task on math question answering. We provided a question set derived from Math SAT practice exams, including 2778 training questions and 1082 test questions. For a…
Variational Pretraining for Semi-supervised Text Classification
We introduce VAMPIRE, a lightweight pretraining framework for effective text classification when data and computing resources are limited. We pretrain a unigram document model as a variational…
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these…
Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets
Several datasets have recently been constructed to expose brittleness in models trained on existing benchmarks. While model performance on these challenge datasets is significantly lower compared…
Iterative Search for Weakly Supervised Semantic Parsing
Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical…