Allen Institute for AI

ARISTO

Build machines that read, learn and reason.

The Aristo Project aims to build systems that demonstrate a deep understanding of the world, integrating technologies for reading, learning, reasoning, and explanation.

A multiple choice question and reasoning explaining each answer
Our research integrates multiple AI technologies, including:
  • Natural language processing
  • Information extraction
  • Knowledge representation
  • Machine reasoning
  • Commonsense knowledge

Research Areas

Probing Reasoning with Language Models

Language models (LMs) have dominated much of AI recently. But what kind(s) of reasoning are they capable of? And how can they be taught to do more? We are developing analytical datasets to probe LMs and help answer these questions.

Learn More:

Multihop Reasoning

Many questions require multiple pieces of information to be combined to arrive at an answer. We are developing new multihop models capable of identifying and combining relevant facts to answer such questions.

Learn More:

Explanation

An intelligent system should not only answer questions correctly, but also be able to explain why its answers are correct. Such a capability is essential for practical acceptance of AI technology. It is also essential for the broader goals of communicating knowledge to a user, and receiving correction from the user when the system's answer is wrong.

Learn More:

Reasoning about Actions

A key aspect of intelligence is being able to reason about the dynamics of the world. This requires modeling what state the world might be in, and how different actions might affect that state. Such capabilities are essential for understanding what happens during a procedure or process, for planning, and for reasoning about "what if..." scenarios.

Learn More:

  • Transformers as Soft Reasoners over Language | Aristo

    RuleTaker determines whether statements are True or False based on rules given in natural language.

    Try the demo
    RuleTaker demo logo
  • RuleTaker demo logo
    Transformers as Soft Reasoners over Language | Aristo

    RuleTaker determines whether statements are True or False based on rules given in natural language.

    Try the demo
  • UnifiedQA screenshot
    Crossing format boundaries with a single QA system | Aristo

    UnifiedQA is a single pre-trained QA model that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. Fine-tuning UnifiedQA into specialized models results in a new state-of-the-art on 6 datasets, establishing this model as a strong starting point for building QA systems.

    Try the demo
  • UnifiedQA screenshot
    Crossing format boundaries with a single QA system | Aristo

    UnifiedQA is a single pre-trained QA model that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. Fine-tuning UnifiedQA into specialized models results in a new state-of-the-art on 6 datasets, establishing this model as a strong starting point for building QA systems.

    Try the demo
    • Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text

      Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter ClarkEMNLP2019Our goal is to better comprehend procedural text, e.g., a paragraph about photosynthesis, by not only predicting what happens, but why some actions need to happen before others. Our approach builds on a prior process comprehension framework for predicting actions' effects, to also identify… more
    • QASC: A Dataset for Question Answering via Sentence Composition

      Tushar Khot, Peter Clark, Michal Guerquin, Paul Edward Jansen, Ashish Sabharwal AAAI2020Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice… more
    • Probing Natural Language Inference Models through Semantic Fragments

      Kyle Richardson, Hai Na Hu, Lawrence S. Moss, Ashish SabharwalAAAI2020Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are… more
    • Modular Representation Underlies Systematic Generalization in Neural Natural Language Inference Models

      Atticus Geiger, Kyle Richardson, Christopher PottsBlackbox NLP2020In adversarial (challenge) testing, we pose hard generalization tasks in order to gain insights into the solutions found by our models. What properties must a system have in order to succeed at these hard tasks? In this paper, we argue that an essential factor is the ability to form modular… more
    • A Simple Yet Strong Pipeline for HotpotQA

      Dirk Groeneveld, Tushar Khot, Mausam, Ashish SabharwalEMNLP2020State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However, does their strong performance on popular… more

    QuaRTz Dataset

    3864 questions about open domain qualitative relationships

    QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).

    QuaRel Dataset

    2771 story questions about qualitative relationships

    QuaRel is a crowdsourced dataset of 2771 multiple-choice story questions, including their logical forms.

    eQASC: Multihop Explanations for QASC

    98k annotated explanations for the QASC dataset

    This dataset contains 98k 2-hop explanations for questions in the QASC dataset, with annotations indicating if they are valid (~25k) or invalid (~73k) explanations.

    hasPart KB

    A high-quality KB of hasPart relations

    A high-quality knowledge base of ~50k hasPart relationships, extracted from a large corpus of generic statements.

    “Knowing is not enough, we must apply. Willing is not enough, we must do.”
    Johann Wolfgang von Goethe

    Paul Allen's 'Digital Aristotle' sets eyes on accomplishing practical tasks

    KOMO News
    February 5, 2020
    Read the Article

    מערכת בינה מלאכותית עברה בהצטיינות יתרה מבחן במדעים של כיתה ח' (Artificial Intelligence System Cum Laude Passed 8th Grade Science Test)

    Haaretz
    September 6, 2019
    Read the Article

    Allen Institute's Aristo AI makes breakthrough, passes eighth-grade science test

    TechSpot
    September 5, 2019
    Read the Article

    Allen Institute’s Aristo AI system finally passes an eighth-grade science test

    GeekWire
    September 4, 2019
    Read the Article

    How to tutor AI from an ‘F’ to an ‘A’

    Vulcan Inc
    September 4, 2019
    Read the Article

    A Breakthrough for A.I. Technology: Passing an 8th-Grade Science Test

    The New York Times
    September 4, 2019
    Read the Article

    AI assistants say dumb things, and we’re about to find out why

    MIT Tech Review
    March 14, 2018
    Read the Article

    Moving Beyond the Turing Test with the Allen AI Science Challenge

    CACM
    September 4, 2017
    Read the Article

    Team

    • Peter Clark's Profile PhotoPeter ClarkResearch
    • Sumithra Bhakthavatsalam's Profile PhotoSumithra BhakthavatsalamEngineering
    • Bhavana Dalvi's Profile PhotoBhavana DalviResearch
    • Michal Guerquin's Profile PhotoMichal GuerquinEngineering
    • Daniel Khashabi's Profile PhotoDaniel KhashabiYoung Investigator
    • Tushar Khot's Profile PhotoTushar KhotResearch
    • Kyle Richardson's Profile PhotoKyle RichardsonResearch
    • Ashish Sabharwal's Profile PhotoAshish SabharwalResearch
    • Carissa Schoenick's Profile PhotoCarissa SchoenickProduct
    • Oyvind Tafjord's Profile PhotoOyvind TafjordResearch
    • Niket Tandon's Profile PhotoNiket TandonResearch