Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 598 papers
  • Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery

    Jason Portenoy, Marissa Radensky, Jevin D. West, E. Horvitz, Daniel S. Weld, Tom HopeCHI2022 Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these…
  • A Search Engine for Discovery of Scientific Challenges and Directions

    D. Lahav, Jon Saad-Falcon, Bailey Kuehl, Sophie Johnson, S. Parasa, N. Shomron, Duen Horng Chau, Diyi Yang, E. Horvitz, Daniel S. Weld, Tom HopeAAAI2022 Keeping track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of important knowledge. In biomedicine, this directly impacts human lives. To…
  • A Controllable Model of Grounded Response Generation

    Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill DolanAAAI 2022 Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. This control is essential to ensure that users' semantic intents are satisfied and to impose a degree of specificity…
  • Multi-Modal Answer Validation for Knowledge-Based VQA

    Jialin Wu, Jiasen Lu, Ashish Sabharwal, R. MottaghiAAAI2022 The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in a variety of forms, including visual, textual, and commonsense…
  • SCROLLS: Standardized CompaRison Over Long Language Sequences

    Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer LevyarXiv2022 NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We…
  • DREAM: Uncovering Mental Models behind Language Models

    Yuling Gu, Bhavana Dalvi, Peter ClarkarXiv2021 (e.g., questions about a specific ethical dilemma)? While cognitive science has shown that mental models play a fundamental role in human problemsolving, it is unclear whether the high questionanswering performance of existing LMs is backed by similar model…
  • CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

    Alon Talmor, Ori Yoran, Ronan Le Bras, Chandrasekhar Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant NeurIPS2021 Constructing benchmarks that test the abilities of modern natural language un1 derstanding models is difficult – pre-trained language models exploit artifacts in 2 benchmarks to achieve human parity, but still fail on adversarial examples and make 3 errors…
  • FLEX: Unifying Evaluation for Few-Shot NLP

    Jonathan Bragg, Arman Cohan, Kyle Lo, Iz BeltagyNeurIPS2021 Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which…
  • Mauve: An Information Divergence Measure Between Neural Text and Human Text

    Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, S. Welleck, Yejin Choi, Z. HarchaouiNeurIPS2021 As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We propose Mauve, a comparison measure for open-ended text generation, which directly compares a…
  • MERLOT: Multimodal Neural Script Knowledge Models

    Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, J. S. Park, Jize Cao, Ali Farhadi, Yejin ChoiNeurIPS2021 As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of…