Viewing 1-20 of 307 papers
Clear all
    • ICLR 2020
      Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
      Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation…  (More)
    • WACV 2020
      Moshiko Raboh, Roei Herzig, Gal Chechik, Jonathan Berant, Amir Globerson
      Understanding the semantics of complex visual scenes involves perception of entities and reasoning about their relations. Scene graphs provide a natural representation for these tasks, by assigning labels to both entities (nodes) and relations (edges). However, scene graphs are not commonly used as…  (More)
    • arXiv 2020
      Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Yejin Choi
      Large neural models have demonstrated humanlevel performance on language and vision benchmarks such as ImageNet and Stanford Natural Language Inference (SNLI). Yet, their performance degrades considerably when tested on adversarial or out-of-distribution samples. This raises the question of whether…  (More)
    • arXiv 2020
      Peter Clark, Oyvind Tafjord, Kyle Richardson
      AI has long pursued the goal of having systems reason over explicitly provided knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a…  (More)
    • Award Outstanding Paper Award
      AAAI 2020
      Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
      The Winograd Schema Challenge (WSC), proposed by Levesque et al. (2011) as an alternative to the Turing Test, was originally designed as a pronoun resolution problem that cannot be solved based on statistical patterns in large text corpora. However, recent studies suggest that current WSC datasets…  (More)
    • AAAI 2020
      Tushar Khot, Peter Clark, Michal Guerquin, Paul Edward Jansen, Ashish Sabharwal
      Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice…  (More)
    • AAAI 2020
      Kyle Richardson, Hai Na Hu, Lawrence S. Moss, Ashish Sabharwal
      Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are…  (More)
    • SCIL 2020
      Hai Hu, Qi Chen, Kyle Richardson, Atreyee Mukherjee, Lawrence S. Moss, Sandra Kübler
      We present a new logic-based inference engine for natural language inference (NLI) called MonaLog, which is based on natural logic and the monotonicity calculus. In contrast to existing logic-based approaches, our system is intentionally designed to be as lightweight as possible, and operates using…  (More)
    • arXiv 2019
      Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant
      Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight…  (More)
    • arXiv 2019
      Kyle Richardson, Ashish Sabharwal
      Open-domain question answering (QA) is known to involve several underlying knowledge and reasoning challenges, but are models actually learning such knowledge when trained on benchmark tasks? To investigate this, we introduce several new challenge tasks that probe whether state-of-theart QA models…  (More)
    • AAAI 2019
      Chaitanya Malaviya, Chandra Bhagavatula, Antoine Bosselut, Yejin Choi
      Automatic KB completion for commonsense knowledge graphs (e.g., ATOMIC and ConceptNet) poses unique challenges compared to the much studied conventional knowledge bases (e.g., Freebase). Commonsense knowledge graphs use free-form text to represent nodes, resulting in orders of magnitude more nodes…  (More)
    • NeurIPS 2019
      Mitchell Wortsman, Ali Farhadi, Mohammad Rastegari
      The success of neural networks has driven a shift in focus from feature engineering to architecture engineering. However, successful networks today are constructed using a small and manually defined set of building blocks. Even in methods of neural architecture search (NAS) the network connectivity…  (More)
    • arXiv 2019
      Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon
      Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that can be computed efficiently. While the…  (More)
    • arXiv 2019
      Erfan Sadeqi Azer, Daniel Khashabi, Ashish Sabharwal, Dan Roth
      Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues. While alternative proposals have been well-debated and adopted in other fields, they remain rarely…  (More)
    • EMNLP 2019
      Tushar Khot, Ashish Sabharwal, Peter Clark
      Multi-hop textual question answering requires combining information from multiple sentences. We focus on a natural setting where, unlike typical reading comprehension, only partial information is provided with each question. The model must retrieve and use additional knowledge to correctly answer…  (More)
    • EMNLP • MRQA Workshop 2019
      Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min
      Machine reading comprehension, the task of evaluating a machine’s ability to comprehend a passage of text, has seen a surge in popularity in recent years. There are many datasets that are targeted at reading comprehension, and many systems that perform as well as humans on some of these datasets…  (More)
    • EMNLP • MRQA Workshop 2019
      Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner
      Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to…  (More)
    • EMNLP • MRQA Workshop 2019
      Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner
      As the complexity of question answering (QA) datasets evolve, moving away from restricted formats like span extraction and multiple-choice (MC) to free-form answer generation, it is imperative to understand how well current metrics perform in evaluating QA. This is especially important as existing…  (More)
    • EMNLP • MRQA Workshop 2019
      Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner
      A key component of successfully reading a passage of text is the ability to apply knowledge gained from the passage to a new situation. In order to facilitate progress on this kind of reading, we present ROPES, a challenging benchmark for reading comprehension targeting Reasoning Over Paragraph…  (More)
    • Award Best Demo Paper Award
      EMNLP 2019
      Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matthew Gardner, Sameer Singh
      Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing…  (More)