Papers

Learn more about AI2's Lasting Impact Award
Viewing 191-200 of 298 papers
  • Challenges in Algorithmic Debiasing for Toxic Language Detection

    Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEACL2021 Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and…
  • Challenges in Automated Debiasing for Toxic Language Detection

    Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEACL2021 Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and…
  • Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

    Alon Jacovi, Ana Marasović, Tim Miller, Yoav GoldbergFAccT2021 Trust is a central component of the interaction between people and AI, in that 'incorrect' levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the…
  • GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

    Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. WeldarXiv2021 Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks which can be reliably evaluated in an automatic…
  • Green AI

    Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren EtzioniCACM2020 The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018 [2]. These computations have a surprisingly large carbon footprint [38]. Ironically, deep learning was…
  • A Simple Yet Strong Pipeline for HotpotQA

    Dirk Groeneveld, Tushar Khot, Mausam, Ashish SabharwalEMNLP2020 State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However…
  • Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

    Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, IV RobertL.Logan, Ana Marasović, Z. NieEMNLP • Demo2020 High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators…
  • Grounded Compositional Outputs for Adaptive Language Modeling

    Nikolaos Pappas, Phoebe Mulcaire, Noah A. SmithEMNLP2020 Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A language model's vocabulary---typically selected before training…
  • IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

    James Ferguson, Matt Gardner. Hannaneh Hajishirzi, Tushar Khot, Pradeep DasigiEMNLP2020 Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a…
  • Improving Compositional Generalization in Semantic Parsing

    Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan BerantFindings of EMNLP2020 Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked…