Allen Institute for AI

Papers

Viewing 21-30 of 380 papers
  • Interactive Extractive Search over Biomedical Corpora

    Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal, Yaara Cohen, Yoav GoldbergACL2020We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to… more
  • Language (Re)modelling: Towards Embodied Language Understanding

    Ronen Tamari, Chen Shani, Tom Hope, Miriam R. L. Petruck, Omri Abend, Dafna Shahaf ACL2020While natural language understanding (NLU) is advancing rapidly, today’s technology differs from human-like language understanding in fundamental ways, notably in its inferior efficiency, interpretability, and generalization. This work proposes an approach to representation and learning based on… more
  • Nakdan: Professional Hebrew Diacritizer

    Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, Yoav GoldbergACL2020We present a system for automatic diacritization of Hebrew text. The system combines modern neural models with carefully curated declarative linguistic knowledge and comprehensive manually constructed tables and dictionaries. Besides providing state of the art diacritization accuracy, the system… more
  • Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

    Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav GoldbergACL2020The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations… more
  • Obtaining Faithful Interpretations from Compositional Neural Networks

    Sanjay Subramanian, Ben Bogin, Nitish Gupta, Tomer Wolfson, Sameer Singh, Jonathan Berant, Matt Gardner ACL2020Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture. However, prior work implicitly assumed that the… more
  • pyBART: Evidence-based Syntactic Transformations for IE

    Aryeh Tiktinsky, Yoav Goldberg, Reut TsarfatyACL2020Syntactic dependencies can be predicted with high accuracy, and are useful for both machine-learned and pattern-based information extraction tasks. However, their utility can be improved. These syntactic dependencies are designed to accurately reflect syntactic relations, and they do not make… more
  • QuASE: Question-Answer Driven Sentence Encoding.

    Hangfeng He, Qiang Ning, Dan RothACL2020Question-answering (QA) data often encodes essential information in many facets. This paper studies a natural question: Can we get supervision from QA data for other tasks (typically, non-QA ones)? For example, {\em can we use QAMR (Michael et al., 2017) to improve named entity recognition?} We… more
  • Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models

    Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, James W. Pennebaker ACL2020We investigate the use of NLP as a measure of the cognitive processes involved in storytelling, contrasting imagination and recollection of events. To facilitate this, we collect and release HIPPOCORPUS, a dataset of 7,000 stories about imagined and recalled events. We introduce a measure of… more
  • S2ORC: The Semantic Scholar Open Research Corpus

    Kyle Lo, Lucy Lu Wang, Mark E Neumann, Rodney Michael Kinney, Daniel S. Weld ACL2020We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated paper metadata. We provide structured full text for 8.1M open access papers. All inline citation… more
  • SciREX: A Challenge Dataset for Document-Level Information Extraction

    Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, Iz BeltagyACL2020Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to create a large-scale information extraction (IE) dataset at the document level since it requires an… more