Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 421-430 of 991 papers
Competency Problems: On Finding and Removing Artifacts in Language Data
Matt Gardner, William Cooper Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. SmithEMNLP • 2021 Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have “spurious” instead of legitimate correlations is typically left unspecified. In this…Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?
Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal and Kai-Wei Chang ACL-IJCNLP • 2021 Is it possible to use natural language to intervene in a model’s behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social…Expected Validation Performance and Estimation of a Random Variable's Maximum
Jesse Dodge, Suchin Gururangan, D. Card, Roy Schwartz, Noah A. SmithFindings of EMNLP • 2021 Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a…Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
Hai Hu, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle RichardsonFindings of ACL • 2021 Multilingual transformers (XLM, mT5) have been shown to have remarkable transfer skills in zero-shot settings. Most transfer studies, however, rely on automatically translated resources (XNLI, XQuAD), making it hard to discern the particular linguistic…ReadOnce Transformers: Reusable Representations of Text for Transformers
Shih-Ting Lin, Ashish Sabharwal, Tushar KhotACL • 2021 While large-scale language models are extremely effective when directly fine-tuned on many end-tasks, such models learn to extract information and solve the task simultaneously from end-task supervision. This is wasteful, as the general problem of gathering…Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. SmithICLR • 2021 State-of-the-art neural machine translation models generate outputs autoregressively, where every step conditions on the previously generated tokens. This sequential nature causes inherent decoding latency. Non-autoregressive translation techniques, on the…Random Feature Attention
Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng KongICLR • 2021 Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to…Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics
S. Welleck, Peter West, Jize Cao, Yejin ChoiAAAI • 2021 Neural sequence models trained with maximum likelihood estimation have led to breakthroughs in many tasks, where success is defined by the gap between training and test performance. However, their ability to achieve stronger forms of generalization remains…S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
Shivashankar Subramanian, Daniel King, Doug Downey, Sergey Feldman JCDL • 2021 Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analysis. While many AND…COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan BerantEMNLP • 2021 While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually…