Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
SciA11y: Converting Scientific Papers to Accessible HTML
We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes…
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers…
All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text
Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text? We run…
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best…
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still…
Social Bias Frames: Reasoning about Social and Power Implications of Language
Language has the power to reinforce stereotypes and project social biases onto others. At the core of the challenge is that it is rarely what is stated explicitly, but all the implied meanings that…
Procedural Reading Comprehension with Attribute-Aware Context Flow
Procedural texts often describe processes (e.g., photosynthesis and cooking) that happen over entities (e.g., light, food). In this paper, we introduce an algorithm for procedural reading…
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
The Winograd Schema Challenge (WSC), proposed by Levesque et al. (2011) as an alternative to the Turing Test, was originally designed as a pronoun resolution problem that cannot be solved based on…
Evaluating Question Answering Evaluation
As the complexity of question answering (QA) datasets evolve, moving away from restricted formats like span extraction and multiple-choice (MC) to free-form answer generation, it is imperative to…
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate…