Project Plato is focused on extracting visual knowledge from images, diagrams, and videos to enrich knowledge bases that are conventionally derived from textual resources.

Visual Knowledge Extraction and Reasoning

Is it possible to automatically acquire, learn, and represent knowledge from visual data? How can the complementary sources of knowledge derived from visual and textual resources be integrated? How do we go beyond standard visual recognition and incorporate real-world knowledge? Plato focuses on studying such fundamental research challenges.

Key project elements:
  • Learn, represent, and use visual knowledge to go beyond standard image classification and object recognition.
  • Alleviate explicit human supervision in achieving large scale knowledge acquisition and reasoning systems.
  • Acquire visual common-sense for enabling open-domain question-answer reasoning.
  • Augment existing textual knowledge bases by integrating them with visual knowledge.

Example Projects


Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.

Learn More


imSitu is a dataset supporting situation recognition, the problem of producing a concise summary of the situation an image depicts.

Learn More

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

This project aims at improving the efficiency of the convolutional neural networks by using the binary precision operations.

Learn More

Learning to Predict the Effect of Forces

This project aims at predicting movements of objects via estimation of scene geometry and its underlying physics.

Learn More

Diagram Understanding

This project aims to parse diagrams and answer the corresponding questions.

Learn More


Charades is a dataset which guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.

Learn More

Newtonian Image Understanding

This project aims at understanding the physics of a scene and dynamics of objects in images.

Learn More


VisKE is a VISual Knowledge Extraction and question answering system built using the idea of scalable visual verification of relation phrases.

Visit VisKE's Site


LEVAN is a fully-automated visual concept learning program that automatically learns everything there is to know about any visual concept.

Visit Levan's Site


  • Jonghyun Choi

  • Santosh Divvala

  • Ali Farhadi

  • Abhinav Gupta

  • Winson Han

  • Ani Kembhavi

  • Eric Kolve

  • Roozbeh Mottaghi

  • Mohammad Rastegari

  • Dustin Schwenk

  • Mark Yatskar

View AI2 Team
“Vision without execution is hallucination.”
—  Thomas Edison

Other Projects