Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 141-150 of 292 papers
UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training
Daniel Khashabi, Yeganeh Kordi, Hannaneh HajishirziarXiv • 2022 We present UNIFIEDQA-v2, a QA model built with the same process as UNIFIEDQA, except that it utilizes more supervision – roughly 3× the number of datasets used for UNIFIEDQA. This generally leads to better in-domain and cross-domain results.1FLEX: Unifying Evaluation for Few-Shot NLP
Jonathan Bragg, Arman Cohan, Kyle Lo, Iz BeltagyNeurIPS • 2021 Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which…Natural Adversarial Objects
Felix Lau, Nishant Subramani, Sasha Harrison, Aerin Kim, E. Branson, Rosanne LiuNeurIPS 2021 Data Centric AI Workshop • 2021 Although state-of-the-art object detection methods have shown compelling performance, models often are not robust to adversarial attacks and out-of-distribution data. We introduce a new dataset, Natural Adversarial Objects (NAO), to evaluate the robustness of…One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval
Akari Asai, Xinyan Yu, Jungo Kasai, Hanna HajishirziNeurIPS • 2021 We present CORA, a Cross-lingual Open-Retrieval Answer Generation model that can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable. We introduce a new dense passage retrieval algorithm that…Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing
Sarah Wiegreffe and Ana Marasović NeurIPS • 2021 Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated explanations. These explanations are used downstream in three ways: as data augmentation to improve performance on a predictive task, as a loss signal to train models to produce…Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau, Noah A. SmithEMNLP • Workshop on Multilingual Representation Learning • 2021Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary…Best Paper Honorable MentionCDLM: Cross-Document Language Modeling
Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido DaganFindings of EMNLP • 2021 We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our crossdocument language model (CD-LM) improves masked language modeling for these tasks with two key ideas. First, we pretrain with multiple…Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Matt GardnerEMNLP • 2021 As language models are trained on ever more text, researchers are turning to some of the largest corpora available. Unlike most other types of datasets in NLP, large unlabeled text corpora are often presented with minimal documentation, and best practices for…Finetuning Pretrained Transformers into RNNs
Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. SmithEMNLP • 2021 Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. But this comes with a significant computational cost, as the attention mechanism’s complexity scales quadratically with sequence length. Efficient transformer…Generative Context Pair Selection for Multi-hop Question Answering
Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer SinghEMNLP • 2021 Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question. However, crowdsourced datasets often capture only a slice of the underlying task distribution, which can induce…