Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Language (Re)modelling: Towards Embodied Language Understanding
While natural language understanding (NLU) is advancing rapidly, today’s technology differs from human-like language understanding in fundamental ways, notably in its inferior efficiency,…
Nakdan: Professional Hebrew Diacritizer
We present a system for automatic diacritization of Hebrew text. The system combines modern neural models with carefully curated declarative linguistic knowledge and comprehensive manually…
Not All Claims are Created Equal: Choosing the Right Approach to Assess Your Hypotheses
Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known…
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present…
Obtaining Faithful Interpretations from Compositional Neural Networks
Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional…
pyBART: Evidence-based Syntactic Transformations for IE
Syntactic dependencies can be predicted with high accuracy, and are useful for both machine-learned and pattern-based information extraction tasks. However, their utility can be improved. These…
QuASE: Question-Answer Driven Sentence Encoding.
Question-answering (QA) data often encodes essential information in many facets. This paper studies a natural question: Can we get supervision from QA data for other tasks (typically, non-QA ones)?…
Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models
We investigate the use of NLP as a measure of the cognitive processes involved in storytelling, contrasting imagination and recollection of events. To facilitate this, we collect and release…
S2ORC: The Semantic Scholar Open Research Corpus
We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated…
SciREX: A Challenge Dataset for Document-Level Information Extraction
Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to…