Aristo

Building the next generation of systems that can systematically reason, explain, and continually improve over time

Diagram showing entailment tree from hypothesis and text
Our research includes pioneering work on:
  • Systematic reasoning and explanation
  • Teachable reasoning systems
  • Continual learning with memory-based architectures
  • Knowledge and belief
  • Universal mathematical reasoning

Research Areas

Teachable Reasoning Systems

By interacting with and giving feedback on a system’s reasoning, a user can teach the system so it continually improves over time – without model retraining.

Learn More:

Neuro-Symbolic Reasoning and Explanation

Solving problems by generating consistent, faithful chains of reasoning using neural components.

Learn More:

Modular Models

By learning to chain together existing models, complex problems can be solved, beyond the capabilities of the individual components.

Learn More:

Universal Mathematical Reasoners

Creating models with built-in mathematical reasoning skills, that can be rapidly fine-tuned for a wide variety of mathematical tasks.

Learn More:

  • A QA model that outperforms other popular language models while being an order of magnitude smaller | Aristo

    Macaw is a high-performance question-answering (QA) model capable of outperforming other popular current language models, all while being an order of magnitude smaller. This demo allows you to explore Macaw's answers and compare them to those of the popular GPT-3 language model on a benchmark set of questions.

    Try the demo
    Macaw
  • Macaw
    A QA model that outperforms other popular language models while being an order of magnitude smaller | Aristo

    Macaw is a high-performance question-answering (QA) model capable of outperforming other popular current language models, all while being an order of magnitude smaller. This demo allows you to explore Macaw's answers and compare them to those of the popular GPT-3 language model on a benchmark set of questions.

    Try the demo
  • ProofWriter OpenGraph image
    Generating Implications, Proofs, and Abductive Statements over Natural Language | Aristo

    Like RuleTaker, ProofWriter determines whether statements are True or False based on rules given in natural language - but also generates the proof of its answers.

    Try the demo
  • ProofWriter OpenGraph image
    Generating Implications, Proofs, and Abductive Statements over Natural Language | Aristo

    Like RuleTaker, ProofWriter determines whether statements are True or False based on rules given in natural language - but also generates the proof of its answers.

    Try the demo
    • DREAM: Improving Situational QA by First Elaborating the Situation

      Yuling Gu, Bhavana Dalvi Mishra, Peter ClarkNAACL 20212022 When people answer questions about a specific situation, e.g., "I cheated on my mid-term exam last week. Was that wrong?", cognitive science suggests that they form a mental picture of that situation before answering. While we do not know how language models…
    • Cross-Task Generalization via Natural Language Crowdsourcing Instructions

      Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hanna HajishirziACL2022 Can we enable NLP models to appropriately respond to instructional prompts and consequently generalize to new tasks? To study this question, we leverage the existing NLP datasets and the instructions that were used to crowdsource them to create…
    • Hey AI, Can You Solve Complex Tasks by Talking to Agents?

      Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish SabharwalFindings of ACL2022 Humans often solve complex problems by interacting (in natural language) with existing agents, such as AI assistants, that can solve simpler sub-tasks. These agents themselves can be powerful systems built using extensive resources and privately held data. In…
    • Saturated Transformers are Constant-Depth Threshold Circuits

      William Cooper Merrill, Ashish Sabharwal, Noah A. SmithTACL2022 Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with *hard* attention are quite limited in power, as…
    • What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

      Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter ClarkarXiv2022 The instruction learning paradigm—where a model learns to perform new tasks from task descriptions alone—has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly under…

    Multihop Questions via Single-hop Question Composition

    Multihop reading comprehension dataset with 2-4 hop questions.

    MuSiQue is a multihop reading comprehension dataset with 2-4 hop questions, built by composing seed questions from 5 existing single-hop datasets. The dataset is constructed with a bottom-up approach that systematically selects composable pairs of single-hop questions that are connected, i.e., where one reasoning step requires information from the other. This approach allows greater control over the properties of the resulting k-hop questions, allowing us to create a dataset that is substantially less cheatable (e.g. by shortcut-based or singlehop reasoning) and more challenging than prior similar datasets. MuSiQue comes in two variations -- MuSiQue-Answerable, which contains only answerable questions, and MuSiQue-Full, which contains both answerable and unanswerable questions. In the latter, each answerable question from MuSiQue-Answerable is paired with closely similar unanswerable question. In MuSiQue-Answerable, the task is to identify the answer and the supporting paragraphs, given a question and a context of up to 20 paragraphs. In MuSiQue-Full, the task is to first determine whether the question is answerable from the given context, and if it is, identify the answer and the supporting paragraphs.

    The Fermi Challenge

    A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.

    A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.

    BeliefBank

    4998 facts and 12147 constraints to test a model's consistency

    Dataset of 4998 simple facts and 12147 constraints to test, and improve, a model's accuracy and consistency

    EntailmentBank

    2k multi-step entailment trees, explaining the answers to ARC science questions

    2k multi-step entailment trees, explaining the answers to ARC science questions

    “Knowing is not enough, we must apply. Willing is not enough, we must do.”
    Johann Wolfgang von Goethe

    Paul Allen's 'Digital Aristotle' sets eyes on accomplishing practical tasks

    KOMO News
    February 5, 2020
    Read the Article

    Perceptron: AI bias can arise from annotation instructions

    TechCrunch
    May 8, 2022
    Read the Article

    Is AI2’s Macaw better than GPT-3?

    Analytics India Magazine
    January 28, 2022
    Read the Article

    AI2 shows off an open, Q&A-focused rival to GPT3

    TechCrunch
    January 24, 2022
    Read the Article

    AI models are becoming better at answering questions, but they’re not perfect

    VentureBeat
    January 21, 2022
    Read the Article

    AI2 releases demo of question-answering model it claims outperforms GPT-3

    GeekWire
    January 21, 2022
    Read the Article

    Multimodal models are fast becoming a reality — consequences be damned

    VentureBeat
    December 21, 2021
    Read the Article

    Allen Institute launches GENIE, a leaderboard for human-in-the-loop language model benchmarking

    VentureBeat
    January 20, 2021
    Read the Article

    Team

    • Peter Clark's Profile PhotoPeter ClarkResearch
    • Bhavana Dalvi's Profile PhotoBhavana DalviResearch
    • personal photoMatt FinlaysonPredoctoral Young Investigator
    • personal photoYuling GuPredoctoral Young Investigator
    • personal photoAshwin KalyanResearch
    • Tushar Khot's Profile PhotoTushar KhotResearch
    • Kyle Richardson's Profile PhotoKyle RichardsonResearch
    • Ashish Sabharwal's Profile PhotoAshish SabharwalResearch
    • Carissa Schoenick's Profile PhotoCarissa SchoenickProduct
    • Oyvind Tafjord's Profile PhotoOyvind TafjordResearch
    • Niket Tandon's Profile PhotoNiket TandonResearch