ARISTO

Build machines that read, learn and reason.

The Aristo Project aims to build systems that demonstrate a deep understanding of the world, integrating technologies for reading, learning, reasoning, and explanation.

A multiple choice question and reasoning explaining each answer
Our research integrates multiple AI technologies, including:
  • Natural language processing
  • Information extraction
  • Knowledge representation
  • Machine reasoning
  • Commonsense knowledge

Research Areas

Probing Reasoning with Language Models

Language models (LMs) have dominated much of AI recently. But what kind(s) of reasoning are they capable of? And how can they be taught to do more? We are developing analytical datasets to probe LMs and help answer these questions.

Learn More:

Multihop Reasoning

Many questions require multiple pieces of information to be combined to arrive at an answer. We are developing new multihop models capable of identifying and combining relevant facts to answer such questions.

Learn More:

Explanation

An intelligent system should not only answer questions correctly, but also be able to explain why its answers are correct. Such a capability is essential for practical acceptance of AI technology. It is also essential for the broader goals of communicating knowledge to a user, and receiving correction from the user when the system's answer is wrong.

Learn More:

Reasoning about Actions

A key aspect of intelligence is being able to reason about the dynamics of the world. This requires modeling what state the world might be in, and how different actions might affect that state. Such capabilities are essential for understanding what happens during a procedure or process, for planning, and for reasoning about "what if..." scenarios.

Learn More:

  • A QA model that outperforms other popular language models while being an order of magnitude smaller | Aristo

    Macaw is a high-performance question-answering (QA) model capable of outperforming other popular current language models, all while being an order of magnitude smaller. This demo allows you to explore Macaw's answers and compare them to those of the popular GPT-3 language model on a benchmark set of questions.

    Try the demo
    Macaw
  • Macaw
    A QA model that outperforms other popular language models while being an order of magnitude smaller | Aristo

    Macaw is a high-performance question-answering (QA) model capable of outperforming other popular current language models, all while being an order of magnitude smaller. This demo allows you to explore Macaw's answers and compare them to those of the popular GPT-3 language model on a benchmark set of questions.

    Try the demo
  • ProofWriter OpenGraph image
    Generating Implications, Proofs, and Abductive Statements over Natural Language | Aristo

    Like RuleTaker, ProofWriter determines whether statements are True or False based on rules given in natural language - but also generates the proof of its answers.

    Try the demo
  • ProofWriter OpenGraph image
    Generating Implications, Proofs, and Abductive Statements over Natural Language | Aristo

    Like RuleTaker, ProofWriter determines whether statements are True or False based on rules given in natural language - but also generates the proof of its answers.

    Try the demo
    • Multi-Modal Answer Validation for Knowledge-Based VQA

      Jialin Wu, Jiasen Lu, Ashish Sabharwal, R. MottaghiAAAI2022 The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in a variety of forms, including visual, textual, and commonsense…
    • DREAM: Uncovering Mental Models behind Language Models

      Yuling Gu, Bhavana Dalvi, Peter ClarkarXiv2021 (e.g., questions about a specific ethical dilemma)? While cognitive science has shown that mental models play a fundamental role in human problemsolving, it is unclear whether the high questionanswering performance of existing LMs is backed by similar model…
    • Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking

      Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson H S Liu, Reut Tsarfaty, Dafna Shahaf arXiv2021 While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model…
    • BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief

      Nora Kassner, Oyvind Tafjord, H. Schutze, P. ClarkEMNLP2021 Although pretrained language models (PTLMs) have been shown to contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after using specialized training techniques to reduce inconsistency. As a…
    • Explaining Answers with Entailment Trees

      Bhavana Dalvi, Peter A. Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, Peter ClarkEMNLP2021 Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by not just listing supporting textual evidence (“rationales”), but also showing how such evidence leads to the answer in a systematic way. If this could be done…

    The Fermi Challenge

    A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.

    A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.

    BeliefBank

    4998 facts and 12147 constraints to test a model's consistency

    Dataset of 4998 simple facts and 12147 constraints to test, and improve, a model's accuracy and consistency

    EntailmentBank

    2k multi-step entailment trees, explaining the answers to ARC science questions

    2k multi-step entailment trees, explaining the answers to ARC science questions

    StrategyQA

    2,780 implicit multi-hop reasoning questions

    StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.

    “Knowing is not enough, we must apply. Willing is not enough, we must do.”
    Johann Wolfgang von Goethe

    Paul Allen's 'Digital Aristotle' sets eyes on accomplishing practical tasks

    KOMO News
    February 5, 2020
    Read the Article

    AI2 shows off an open, Q&A-focused rival to GPT3

    TechCrunch
    January 24, 2022
    Read the Article

    AI models are becoming better at answering questions, but they’re not perfect

    VentureBeat
    January 21, 2022
    Read the Article

    AI2 releases demo of question-answering model it claims outperforms GPT-3

    GeekWire
    January 21, 2022
    Read the Article

    Multimodal models are fast becoming a reality — consequences be damned

    VentureBeat
    December 21, 2021
    Read the Article

    Allen Institute launches GENIE, a leaderboard for human-in-the-loop language model benchmarking

    VentureBeat
    January 20, 2021
    Read the Article

    מערכת בינה מלאכותית עברה בהצטיינות יתרה מבחן במדעים של כיתה ח' (Artificial Intelligence System Cum Laude Passed 8th Grade Science Test)

    Haaretz
    September 6, 2019
    Read the Article

    Allen Institute's Aristo AI makes breakthrough, passes eighth-grade science test

    TechSpot
    September 5, 2019
    Read the Article

    Team

    • Peter Clark's Profile PhotoPeter ClarkResearch
    • Bhavana Dalvi's Profile PhotoBhavana DalviResearch
    • personal photoMatt FinlaysonPredoctoral Young Investigator
    • personal photoAshwin KalyanResearch
    • Tushar Khot's Profile PhotoTushar KhotResearch
    • Kyle Richardson's Profile PhotoKyle RichardsonResearch
    • Ashish Sabharwal's Profile PhotoAshish SabharwalResearch
    • Carissa Schoenick's Profile PhotoCarissa SchoenickProduct
    • Oyvind Tafjord's Profile PhotoOyvind TafjordResearch
    • Niket Tandon's Profile PhotoNiket TandonResearch