Learn more about AI2's Lasting Impact Award
All Projects
All Years
Viewing 21-30 of 503 papers
  • ManipulaTHOR: A Framework for Visual Object Manipulation

    Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, R. MottaghiarXiv2021
    The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing with oft-overlooked practical setups involving visually rich and complex scenes, manipulation using mobile agents (as opposed to tabletop manipulation), and generalization to unseen environments and objects. We propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and present a new challenge to the Embodied AI community known as ArmPointNav. This task extends the popular point navigation task [2] to object manipulation and offers new challenges including 3D obstacle avoidance, manipulating objects in the presence of occlusion, and multi-object manipulation that necessitates long term planning. Popular learning paradigms that are successful on PointNav challenges show promise, but leave a large room for improvement.
  • Bootstrapping Relation Extractors using Syntactic Search by Examples

    Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav GoldbergEACL2021
    The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.
  • First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

    Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé SeddahEACL2021
    Multilingual pretrained language models have demonstrated remarkable zero-shot crosslingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model’s internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a taskspecific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during finetuning, the task predictor has little importance on the transfer and can be reinitialized during fine-tuning. We present extensive experiments with three distinct tasks, seventeen typologically diverse languages and multiple domains to support our hypothesis.
  • BERTese: Learning to Speak to BERT

    Adi Haviv, Jonathan Berant, A. GlobersonEACL2021
    Large pre-trained language models have been shown to encode large amounts of world and commonsense knowledge in their parameters, leading to substantial interest in methods for extracting that knowledge. In past work, knowledge was extracted by taking manuallyauthored queries and gathering paraphrases for them using a separate pipeline. In this work, we propose a method for automatically rewriting queries into “BERTese”, a paraphrase query that is directly optimized towards better knowledge extraction. To encourage meaningful rewrites, we add auxiliary loss functions that encourage the query to correspond to actual language tokens. We empirically show our approach outperforms competing baselines, obviating the need for complex pipelines. Moreover, BERTese provides some insight into the type of language that helps language models perform knowledge extraction.
  • Discourse Understanding and Factual Consistency in Abstractive Summarization

    Saadia Gabriel, Antoine Bosselut, Jeff Da, Ari Holtzman, Jan Buys, Kyle Lo, Asli Celikyilmaz, Yejin ChoiEACL2021
    We introduce Cooperative Generator-Discriminator Networks (Co-opNet), a general framework for abstractive summarization with distinct modeling of the narrative flow in the output summary. Most current approaches to abstractive summarization, in contrast, are based on datasets whose target summaries are either a single sentence, or a bag of standalone sentences (e.g., extracted highlights of a story), neither of which allows for learning coherent narrative flow in the output summaries. To promote research toward abstractive summarization with narrative flow, we first introduce a new dataset, Scientific Abstract SummarieS (SASS), where the abstracts are used as proxy gold summaries for scientific articles. We then propose Co-opNet, a novel transformer-based framework where the generator works with the discourse discriminator to compose a long-form summary. Empirical results demonstrate that Co-opNet learns to summarize with considerably improved global coherence compared to competitive baselines
  • Evaluating the Evaluation of Diversity in Natural Language Generation

    Guy Tevet, Jonathan BerantEACL2021
    Despite growing interest in natural language generation (NLG) models that produce diverse outputs, there is currently no principled method for evaluating the diversity of an NLG system. In this work, we propose a framework for evaluating diversity metrics. The framework measures the correlation between a proposed diversity metric and a diversity parameter, a single parameter that controls some aspect of diversity in generated text. For example, a diversity parameter might be a binary variable used to instruct crowdsourcing workers to generate text with either low or high content diversity. We demonstrate the utility of our framework by: (a) establishing best practices for eliciting diversity judgments from humans, (b) showing that humans substantially outperform automatic metrics in estimating content diversity, and (c) demonstrating that existing methods for controlling diversity by tuning a "decoding parameter" mostly affect form but not meaning. Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
  • GooAQ: Open Question Answering with Diverse Answer Types

    Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hanna Hajishirzi, Chris Callison-BurcharXiv2021
    While day-to-day questions come with a variety of answer types, the current questionanswering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GOOAQ, a large-scale dataset with a variety of answer type. This dataset contains with over 5 million questions and 3 million answers collected from Google. GOOAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GOOAQ answers are mined from Google’s responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections. We benchmark T5 models on GOOAQ and observe that: (a) in line with recent work, LM’s strong performance on GOOAQ’s short-answer questions heavily benefit from annotated data; however, (b) their quality in generating coherent and accurate responses for questions requiring long responses (such as ‘how’ and ‘why’ questions) is less reliant on observing annotated data and mainly supported by their pre-training. We release GOOAQ to facilitate further research on improving QA with diverse response types.
  • Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions

    Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hanna HajishirziarXiv2021
    Can we enable NLP models to appropriately respond to instructional prompts and consequently generalize to new tasks? To study this question, we leverage the existing NLP datasets and the instructions that were used to crowdsource them to create NATURALINSTRUCTIONS, a dataset of instructions and task-specific input/output data. This dataset consists of 61 distinct language instructions and about 600k task instances, and is used to evaluate existing state-of-the-art languagemodels (LMs) in addressing new tasks by few-shot prompting of GPT3 and fine-tuning BART. Our analysis indicates that: (a) the existing models indeed benefit from instructions and hence, show improved generalization to new tasks; (b) while models like GPT-3 generally benefit from instructions, the extent of their gains varies across different fields of instructions and also depends on the task being solved; (c) generalization to unseen tasks in NATURAL-INSTRUCTIONS remains far from perfect for the state-of-the-art, indicating significant room for more progress in this direction.
  • Enriching a Model's Notion of Belief using a Persistent Memory

    Nora Kassner, Oyvind Tafjord, H. Schutze, P. ClarkarXiv2021
    Although pretrained language models (PTLMs) have been shown to contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after using specialized training techniques to reduce inconsistency. As a result, it can be hard to identify what the model actually "believes" about the world. Our goal is to reduce this problem, so systems are more globally consistent and accurate in their answers. Our approach is to add a memory component a BeliefBank that records a model’s answers, and two mechanisms that use it to improve consistency among beliefs. First, a reasoning component a weighted SAT solver improves consistency by flipping answers that significantly clash with others. Second, a feedback component re-queries the model but using known beliefs as context. We show that, in a controlled experimental setting, these two mechanisms improve both accuracy and consistency. This is significant as it is a first step towards endowing models with an evolving memory, allowing them to construct a more coherent picture of the world.
  • MS2: Multi-Document Summarization of Medical Studies

    Jay DeYoung, Iz Beltagy, Madeleine van Zuylen, Bailey Kuehl, Lucy Lu WangarXiv2021
    To assess the effectiveness of any medical intervention, researchers must conduct a timeintensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MSˆ2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20K summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain. We experiment with a summarization system based on BART, with promising early results. We formulate our summarization inputs and targets in both free text and structured forms and modify a recently proposed metric to assess the quality of our system’s generated summaries. Data and models are available at
All Projects
All Years