Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 221-230 of 292 papers
On Consequentialism and Fairness
Dallas Card, Noah A. SmithFrontiers in AI Journal • 2020 Recent work on fairness in machine learning has primarily emphasized how to define, quantify, and encourage "fair" outcomes. Less attention has been paid, however, to the ethical foundations which underlie such efforts. Among the ethical perspectives that…Explain like I am a Scientist: The Linguistic Barriers of Entry to r/science
Tal August, Dallas Card, Gary Hsieh, Noah A. Smith, Katharina ReineckeCHI • 2020 As an online community for discussing research findings, r/science has the potential to contribute to science outreach and communication with a broad audience. Yet previous work suggests that most of the active contributors on r/science are science-educated…Longformer: The Long-Document Transformer
Iz Beltagy, Matthew E. Peters, Arman CohanarXiv • 2020 Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly…Evaluating NLP Models via Contrast Sets
M.Gardner, Y.Artzi, V.Basmova, J.Berant, B.Bogin, S.Chen, P.Dasigi, D.Dua, Y.Elazar, A.Gottumukkala, N.Gupta, H.Hajishirzi, G.Ilharco, D.Khashabi, K.Lin, J.Liu, N.Liu, P.Mulcaire, Q.Ning, S.Singh, N.Smith, S.Subramanian, R.Tsarfaty, E.Wallace, et.alarXiv • 2020 Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on…Multi-View Learning for Vision-and-Language Navigation
Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Yejin Choi, Noah A. SmitharXiv • 2020 Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified. In this paper, we present a novel training paradigm, Learn…Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah A. Smith arXiv • 2020 Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to…Analyzing Compositionality in Visual Question Answering
Sanjay Subramanian, Sameer Singh, Matt GardnerNeurIPS • ViGIL Workshop • 2019 Since the release of the original Visual Question Answering (VQA) dataset, several newer datasets for visual reasoning have been introduced, often with the express intent of requiring systems to perform compositional reasoning. Recently, transformer models…Evaluating Question Answering Evaluation
Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt GardnerEMNLP • MRQA Workshop • 2019 As the complexity of question answering (QA) datasets evolve, moving away from restricted formats like span extraction and multiple-choice (MC) to free-form answer generation, it is imperative to understand how well current metrics perform in evaluating QA…On Making Reading Comprehension More Comprehensive
Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon MinEMNLP • MRQA Workshop • 2019 Machine reading comprehension, the task of evaluating a machine’s ability to comprehend a passage of text, has seen a surge in popularity in recent years. There are many datasets that are targeted at reading comprehension, and many systems that perform as…ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension
Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt GardnerEMNLP • MRQA Workshop • 2019 Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple…