Papers

Learn more about AI2's Lasting Impact Award
Viewing 201-210 of 298 papers
  • Learning from Task Descriptions

    Orion Weller, Nick Lourie, Matt Gardner, Matthew PetersEMNLP2020 Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework…
  • MedICaT: A Dataset of Medical Images, Captions, and Textual References

    Sanjay Subramanian, Lucy Lu Wang, Sachin Mehta, Ben Bogin, Madeleine van Zuylen, Sravanthi Parasa, Sameer Singh, Matt Gardner, Hannaneh HajishirziFindings of EMNLP2020 Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of figures in our dataset), with detailed text describing their…
  • MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

    Anthony Chen, Gabriel Stanovsky, S. Singh, Matt GardnerEMNLP2020 Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap…
  • Multilevel Text Alignment with Cross-Document Attention

    Xuhui Zhou, Nikolaos Pappas, Noah A. SmithEMNLP2020 Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document levels. We propose a…
  • Multi-Step Inference for Reasoning over Paragraphs

    Jiangming Liu, Matt Gardner, Shay B. Cohen, Mirella LapataEMNLP2020 Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives. Prior work has largely tried to do this either symbolically or with black-box transformers. We present a middle ground between these two…
  • Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

    Ana Marasović, Chandra Bhagavatula, J. Park, Ronan Le Bras, Noah A. Smith, Yejin ChoiFindings of EMNLP2020 Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study…
  • Parsing with Multilingual BERT, a Small Treebank, and a Small Corpus

    Ethan C. Chau, Lucy H. Lin, Noah A. SmithFindings of EMNLP2020 Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these…
  • Plug and Play Autoencoders for Conditional Text Generation

    Florian Mai, Nikolaos Pappas, I. Montero, Noah A. SmithEMNLP2020 Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a mapping within the autoencoder's embedding space…
  • RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

    Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. SmithFindings of EMNLP2020 Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the…
  • The Multilingual Amazon Reviews Corpus

    Phillip Keung, Y. Lu, Gyorgy Szarvas, Noah A. SmithEMNLP2020 We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification. The corpus contains reviews in English, Japanese, German, French, Spanish, and Chinese, which were collected between…