Aristo

Building the next generation of systems that can systematically reason, explain, and continually improve over time


Diagram showing entailment tree from hypothesis and text
Our research includes pioneering work on:
  • Systematic reasoning and explanation
  • Teachable reasoning systems
  • Continual learning with memory-based architectures
  • Knowledge and belief
  • Universal mathematical reasoning

Recent Updates

Research Areas

Teachable Reasoning Systems

By interacting with and giving feedback on a system’s reasoning, a user can teach the system so it continually improves over time – without model retraining.

Modular Models

By learning to chain together existing models, complex problems can be solved, beyond the capabilities of the individual components.

Universal Mathematical Reasoners

Creating models with built-in mathematical reasoning skills, that can be rapidly fine-tuned for a wide variety of mathematical tasks.

  • A QA model that outperforms other popular language models while being an order of magnitude smaller | Aristo, Research Visualization

    Macaw is a high-performance question-answering (QA) model capable of outperforming other popular current language models, all while being an order of magnitude smaller. This demo allows you to explore Macaw's answers and compare them to those of the popular GPT-3 language model on a benchmark set of questions.

    Try the demo
    Macaw
  • Macaw
    A QA model that outperforms other popular language models while being an order of magnitude smaller | Aristo, Research Visualization

    Macaw is a high-performance question-answering (QA) model capable of outperforming other popular current language models, all while being an order of magnitude smaller. This demo allows you to explore Macaw's answers and compare them to those of the popular GPT-3 language model on a benchmark set of questions.

    Try the demo
  • ProofWriter OpenGraph image
    Generating Implications, Proofs, and Abductive Statements over Natural Language | Aristo

    Like RuleTaker, ProofWriter determines whether statements are True or False based on rules given in natural language - but also generates the proof of its answers.

    Try the demo
  • ProofWriter OpenGraph image
    Generating Implications, Proofs, and Abductive Statements over Natural Language | Aristo

    Like RuleTaker, ProofWriter determines whether statements are True or False based on rules given in natural language - but also generates the proof of its answers.

    Try the demo
    • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

      Yao Fu, Hao-Chun Peng, Tushar Khot, Mirella LapataarXiv.org2023 We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the…
    • Complexity-Based Prompting for Multi-Step Reasoning

      Yao Fu, Hao-Chun Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…
    • Decomposed Prompting: A Modular Approach for Solving Complex Tasks

      Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish SabharwalICLR2023 Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn…
    • Transformers Can Be Expressed In First-Order Logic with Majority

      William Merrill, Ashish SabharwalarXiv2023 Characterizing the implicit structure of the computation within neural networks is a foundational problem in the area of deep learning interpretability. Can the inner decision process of neural networks be captured symbolically in some familiar logic? We show…
    • Do language models have coherent mental models of everyday things?

      Yuling Gu, Bhavana Dalvi Mishra, Peter ClarkarXiv2022 When people think of everyday things like an “egg,” they typically have a mental image associated with it. This commonsense knowledge helps us understand how these everyday things work and how to interact with them. For example, when someone tries to make a…

    Belief and Reasoning Dataset

    BaRDA: A Belief and REasoning Dataset that Separates Factual Accuracy and Reasoning Ability

    BaRDa is a new belief and reasoning dataset for evaluating the factual correctness ("truth") and reasoning accuracy ("rationality", or "honesty") of new language models. It was created in collaboration with, and with the support of, the Open Philanthropy organization.

    Lila

    A math reasoning benchmark of over 140K natural language questions annotated with Python programs

    A comprehensive benchmark for mathematical reasoning with over 140K natural language questions annotated with Python programs and natural language instructions. The data set comes with multiple splits: Lila-IID (train, dev, test), Lila-OOD (train, dev, test), and Lila-Robust.

    Entailer

    Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022

    Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022

    TeachMe

    Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022

    Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022

    “Knowing is not enough, we must apply. Willing is not enough, we must do.”
    Johann Wolfgang von Goethe

    This AI Paper Shows How ChatGPT’s Toxicity Can Increase Up To Six-Fold When Assigned A Persona

    Marktechpost
    April 14, 2023
    Read the Article

    'They’re All So Dirty and Smelly:' Study Unlocks ChatGPT's Inner Racist

    Gizmodo
    April 13, 2023
    Read the Article

    ChatGPT can turn toxic just by changing its assigned persona, researchers say

    VentureBeat
    April 12, 2023
    Read the Article

    Researchers discover a way to make ChatGPT consistently toxic

    TechCrunch
    April 12, 2023
    Read the Article

    Researchers From Allen Institute for AI Introduce TeachMe: A Framework To Understand And Correct AI Models

    Marktechpost
    January 17, 2023
    Read the Article

    Allen Institute for Artificial Intelligence Introduces MemPrompt: A New Method to “fix” GPT-3 After Deployment with User Interaction

    Marktechpost
    December 18, 2022
    Read the Article

    Researchers at Allen Institute for AI Built a System Called DREAM-FLUTE to Explore Machine Learning ‘Mental Models’ for Figurative Language

    Marktechpost
    December 1, 2022
    Read the Article

    Researchers at the Allen Institute for AI Propose Līla, a Unified Benchmark for Comprehensive Evaluation of the Mathematical Reasoning Abilities of Artificial Intelligence Systems

    Marktechpost
    November 14, 2022
    Read the Article

    Team

    • Peter Clark's Profile PhotoPeter ClarkInterim Chief Executive Officer
    • profile pictureBhavana DalviResearch
    • personal photoMatt FinlaysonPredoctoral Young Investigator
    • personal photoYuling GuPredoctoral Young Investigator
    • personal photoAshwin KalyanResearch
    • Tushar Khot's Profile PhotoTushar KhotResearch
    • Kyle Richardson's Profile PhotoKyle RichardsonResearch
    • Ashish Sabharwal's Profile PhotoAshish SabharwalResearch
    • Oyvind Tafjord's Profile PhotoOyvind TafjordResearch
    • Niket Tandon's Profile PhotoNiket TandonResearch
    • personal photoSarah WiegreffeYoung Investigator