Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 191-200 of 298 papers
Challenges in Algorithmic Debiasing for Toxic Language Detection
Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEACL • 2021 Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and…Challenges in Automated Debiasing for Toxic Language Detection
Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEACL • 2021 Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and…Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI
Alon Jacovi, Ana Marasović, Tim Miller, Yoav GoldbergFAccT • 2021 Trust is a central component of the interaction between people and AI, in that 'incorrect' levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the…GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. WeldarXiv • 2021 Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks which can be reliably evaluated in an automatic…Green AI
Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren EtzioniCACM • 2020 The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018 [2]. These computations have a surprisingly large carbon footprint [38]. Ironically, deep learning was…A Simple Yet Strong Pipeline for HotpotQA
Dirk Groeneveld, Tushar Khot, Mausam, Ashish SabharwalEMNLP • 2020 State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However…Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq
Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, IV RobertL.Logan, Ana Marasović, Z. NieEMNLP • Demo • 2020 High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators…Grounded Compositional Outputs for Adaptive Language Modeling
Nikolaos Pappas, Phoebe Mulcaire, Noah A. SmithEMNLP • 2020 Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A language model's vocabulary---typically selected before training…IIRC: A Dataset of Incomplete Information Reading Comprehension Questions
James Ferguson, Matt Gardner. Hannaneh Hajishirzi, Tushar Khot, Pradeep DasigiEMNLP • 2020 Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a…Improving Compositional Generalization in Semantic Parsing
Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan BerantFindings of EMNLP • 2020 Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked…