Menu
Viewing 10 papers from 2018 in PRIOR
Clear all
    • CVPR 2018 Video
      Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, Ali Farhadi
      We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: “Are there any apples in the fridge?” The agent must navigate around the scene…  (More)
    • CVPR 2018
      Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
      A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every…  (More)
    • CVPR 2018
      Gunnar Sigurdsson, Cordelia Schmid, Ali Farhadi, Abhinav Gupta, Karteek Alahari
      Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such…  (More)
    • ECCV 2018
      Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi
      We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power…  (More)
    • ECCV 2018
      Krishna Kumar Singh, Santosh Kumar Divvala, Ali Farhadi, and Yong Jae Lee
      We propose the idea of transferring common-sense knowledge from source categories to target categories for scalable object detection. In our setting, the training data for the source categories have bounding box annotations, while those for the target categories only have image-level annotations…  (More)
    • ECCV 2018 Video
      Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, and Aniruddha Kembhavi
      Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval and Fusion Network (Craft), a model capable of learning this knowledge…  (More)
    • CVPR 2018
      Jonghyun Choi, Jayant Krishnamurthy, Aniruddha Kembhavi, Ali Farhadi
      Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning. However, understanding diagrams using natural image understanding approaches requires large training datasets of diagrams, which are very hard to obtain. Instead, this can be addressed as a…  (More)
    • CVPR 2018
      Kiana Ehsani, Hessam Bagherinezhad, Joe Redmon, Roozbeh Mottaghi, Ali Farhadi
      We study the task of directly modelling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes…  (More)
    • CVPR 2018
      Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi
      Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects…  (More)
    • CVPR 2018
      Rowan Zellers, Mark Yatskar, Sam Thomson, Yejin Choi
      We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that…  (More)