Menu
Viewing 1-20 of 249 papers
Clear all
    • ICCV 2019
      Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez
      In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables --such as gender-- in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased…  (More)
    • ACL 2019
      Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi
      We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only…  (More)
    • ACL 2019
      Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, Noah A. Smith
      We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected correlations between surface markers of African American English (AAE) and…  (More)
    • ACL 2019
      Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi
      Recent work by Zellers et al. (2018) introduced a new task of commonsense natural language inference: given an event description such as "A woman sits at a piano," a machine must select the most likely followup: "She sets her fingers on the keys." With the introduction of BERT, near human-level…  (More)
    • ACL 2019
      Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer
      Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs. We argue that it can be difficult to construct large multi-hop RC datasets. For example, even highly compositional questions can be answered with a single hop if they…  (More)
    • arXiv 2019
      Peter Clark, Oren Etzioni, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin
      AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge (Schoenick et al., 2016). This…  (More)
    • arXiv 2019
      Mor Geva, Yoav Goldberg, Jonathan Berant
      Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate examples. Having only a few workers generate the majority of…  (More)
    • arXiv 2019
      Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark
      Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the…  (More)
    • ACL • RepL4NLP 2019
      Matthew E. Peters, Sebastian Ruder, Noah A. Smith
      While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen…  (More)
    • ACL • BioNLP Workshop 2019
      Mark Neumann, Daniel King, Iz Beltagy, Waleed Ammar
      Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing, for which there are few robust…  (More)
    • ACL 2019
      Alon Talmor, Jonathan Berant
      A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an…  (More)
    • ACL 2019
      Ben Bogin, Jonathan Berant, Matt Gardner
      Research on parsing language to SQL has largely ignored the structure of the database (DB) schema, either because the DB was very simple, or because it was observed at both training and test time. In SPIDER, a recently-released text-to-SQL dataset, new and complex DBs are given at test time, and so…  (More)
    • ACL 2019
      Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer
      We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT). Our approach uses two recent coreference resolution datasets composed of English sentences which cast participants into non-stereotypical gender roles (e.g., "The doctor asked the…  (More)
    • arXiv 2019
      Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni
      The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018 [2]. These computations have a surprisingly large carbon footprint [38]. Ironically, deep learning was inspired by the human brain, which is…  (More)
    • arXiv 2019
      Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
      The Winograd Schema Challenge (WSC), proposed by Levesque et al. (2011) as an alternative to the Turing Test, was originally designed as a pronoun resolution problem that cannot be solved based on statistical patterns in large text corpora. However, recent studies suggest that current WSC datasets…  (More)
    • UAI 2019
      Jonathan Kuck, Tri Dao, Yuanrun Zheng, Burak Bartan, Ashish Sabharwal, Stefano Ermon
      Randomized hashing algorithms have seen recent success in providing bounds on the model count of a propositional formula. These methods repeatedly check the satisfiability of a formula subject to increasingly stringent random constraints. Key to these approaches is the choice of a fixed family of…  (More)
    • ACL 2019
      Souvik Kundu, Tushar Khot, Ashish Sabharwal, Peter Clark
      We propose a novel, path-based reasoning approach for the multi-hop reading comprehension task where a system needs to combine facts from multiple passages to answer a question. Although inspired by multi-hop reasoning over knowledge graphs, our proposed approach operates directly over unstructured…  (More)
    • JAMA 2019
      Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, Oren Etzioni
      Importance: Analyses of female representation in clinical studies have been limited in scope and scale. Objective: To perform a large-scale analysis of global enrollment sex bias in clinical studies. Design, Setting, and Participants: In this cross-sectional study, clinical studies from published…  (More)
    • arXiv 2019
      Lucy Lu Wang, Gabriel Stanovsky, Luca Weihs, Oren Etzioni
      A comprehensive and up-to-date analysis of Computer Science literature (2.87 million papers through 2018) reveals that, if current trends continue, parity between the number of male and female authors will not be reached in this century. Under our most optimistic projection models, gender parity is…  (More)
    • CVPR 2019
      Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
      Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object…  (More)