Question Understanding

Question Understanding

The goal of this project is to develop models that understand complex questions in broad domains, and answer them from multiple information sources. Our research revolves around investigating symbolic and distributed representations that facilitate reasoning over multiple facts and offer explanations for model decisions.
About Question Understanding
  • Try the QDMR CopyNet parser | AI2 Israel, Question Understanding

    Live demo of the QDMR CopyNet parser from the paper Break It Down: A Question Understanding Benchmark (TACL 2020). The parser receives a natural language question as input and returns its Question Decomposition Meaning Representation (QDMR). Each step in the decomposition constitutes a subquestion necessary to answer the original question. More info: https://allenai.github.io/Break/

    Try the demo
    Break QDMR representation
  • Break QDMR representation
    Try the QDMR CopyNet parser | AI2 Israel, Question Understanding

    Live demo of the QDMR CopyNet parser from the paper Break It Down: A Question Understanding Benchmark (TACL 2020). The parser receives a natural language question as input and returns its Question Decomposition Meaning Representation (QDMR). Each step in the decomposition constitutes a subquestion necessary to answer the original question. More info: https://allenai.github.io/Break/

    Try the demo
    • Break It Down: A Question Understanding Benchmark

      Tomer Wolfson, Mor Geva, Ankit Gupta, Matt Gardner, Yoav Goldberg, Daniel Deutch, Jonathan BerantTACL2020
      Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, showing that quality QDMRs can be annotated at scale, and release the Break dataset, containing over 83K pairs of questions and their QDMRs. We demonstrate the utility of QDMR by showing that (a) it can be used to improve open-domain question answering on the HotpotQA dataset, (b) it can be deterministically converted to a pseudo-SQL formal language, which can alleviate annotation in semantic parsing applications. Last, we use Break to train a sequence-to-sequence model with copying that parses questions into QDMR structures, and show that it substantially outperforms several natural baselines.
    • Injecting Numerical Reasoning Skills into Language Models

      Mor Geva, Ankit Gupta, Jonathan BerantACL2020
      Large pre-trained language models (LMs) are known to encode substantial amounts of linguistic information. However, high-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only. Consequently, existing models for numerical reasoning have used specialized architectures with limited flexibility. In this work, we show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs, by generating large amounts of data, and training in a multi-task setup. We show that pre-training our model, GENBERT, on this data, dramatically improves performance on DROP (49.3 → 72.3 F1), reaching performance that matches state-of-the-art models of comparable size, while using a simple and general-purpose encoder-decoder architecture. Moreover, GENBERT generalizes well to math word problem datasets, while maintaining high performance on standard RC tasks. Our approach provides a general recipe for injecting skills into large pre-trained LMs, whenever the skill is amenable to automatic data augmentation.
    • Obtaining Faithful Interpretations from Compositional Neural Networks

      Sanjay Subramanian, Ben Bogin, Nitish Gupta, Tomer Wolfson, Sameer Singh, Jonathan Berant, Matt Gardner ACL2020
      Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture. However, prior work implicitly assumed that the structure of the network modules, describing the abstract reasoning process, provides a faithful explanation of the model's reasoning; that is, that all modules perform their intended behaviour. In this work, we propose and conduct a systematic evaluation of the intermediate outputs of NMNs on NLVR2 and DROP, two datasets which require composing multiple reasoning steps. We find that the intermediate outputs differ from the expected output, illustrating that the network structure does not provide a faithful explanation of model behaviour. To remedy that, we train the model with auxiliary supervision and propose particular choices for module architecture that yield much better faithfulness, at a minimal cost to accuracy.
    • CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

      Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan BerantNAACL2019
      When answering a question, people often draw upon their rich world knowledge in addition to the particular context. Recent work has focused primarily on answering questions given some relevant document or context, and required very little general background. To investigate question answering with prior knowledge, we present COMMONSENSEQA: a challenging new dataset for commonsense question answering. To capture common sense beyond associations, we extract from CONCEPTNET (Speer et al., 2017) multiple target concepts that have the same semantic relation to a single source concept. Crowd-workers are asked to author multiple-choice questions that mention the source concept and discriminate in turn between each of the target concepts. This encourages workers to create questions with complex semantics that often require prior knowledge. We create 12,247 questions through this procedure and demonstrate the difficulty of our task with a large number of strong baselines. Our best baseline is based on BERT-large (Devlin et al., 2018) and obtains 56% accuracy, well below human performance, which is 89%. LESS
    • The Web as a Knowledge-base for Answering Complex Questions

      Alon Talmor, Jonathan BerantNAACL2018
      Answering complex questions is a time-consuming activity for humans that requires reasoning and integration of information. Recent work on reading comprehension made headway in answering simple questions, but tackling complex questions is still an ongoing research challenge. Conversely, semantic parsers have been successful at handling compositionality, but only when the information resides in a target knowledge-base. In this paper, we present a novel framework for answering broad and complex questions, assuming answering simple questions is possible using a search engine and a reading comprehension model. We propose to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers. To illustrate the viability of our approach, we create a new dataset of complex questions, ComplexWebQuestions, and present a model that decomposes questions and interacts with the web to compute an answer. We empirically demonstrate that question decomposition improves performance from 20.8 precision@1 to 27.5 precision@1 on this new dataset.

    StrategyQA

    2,780 implicit multi-hop reasoning questions

    StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.

    Break

    83,978 examples sampled from 10 question answering datasets over text, images and databases.

    Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations (QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.

    CommonsenseQA

    12,102 multiple-choice questions with one correct answer and four distractor answers

    CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. It contains 12,102 questions with one correct answer and four distractor answers.

    ComplexWebQuestions

    34,689 complex questions and their answers, web snippets, and SPARQL query

    ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.

    Team

    AI2 Israel Members

    • Jonathan Berant's Profile PhotoJonathan BerantResearch

    Interns

    • Mor Geva's Profile PhotoMor GevaIntern
    • Tomer Wolfson's Profile PhotoTomer WolfsonIntern