Menu
Viewing 57 papers in PRIOR
Clear all
    • ICCV 2019
      Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez
      In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables --such as gender-- in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased…  (More)
    • CVPR 2019
      Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
      Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object…  (More)
    • CVPR 2019
      Sachin Mehta, Mohammad Rastegari, Linda Shapiro, Hannaneh Hajishirzi
      We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2 , for modeling visual and sequential data. Our network uses group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with…  (More)
    • CVPR 2019
      Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, Babak N. Araabi
      We present a simple and effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed. During network training, we carefully feed in localization information. We excite certain activations in order to help the network learn to better localize…  (More)
    • CVPR 2019
      Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander Schwing, Aniruddha Kembhavi
      Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to…  (More)
    • CVPR 2019
      Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Loddon Yuille, Mohammad Rastegari
      Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scale policy should be…  (More)
    • CVPR 2019
      Yao-Hung Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov and Ali Farhadi
      Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts. For example, a relationship \{man, open, door\} involves a complex relation \{open\} between concrete entities \{man, door\}. While much of the existing work has studied this…  (More)
    • CVPR 2019
      Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
      Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. After we learn a task, we keep learning about it while performing the task. What we learn and how we learn it varies during different stages of learning…  (More)
    • CVPR 2019
      Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi
      Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people’s actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today’s vision…  (More)
    • ICLR 2019
      Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi
      How do humans navigate to target objects in novel scenes? Do we use the semantic/functional priors we have built over years to efficiently search and navigate? For example, to search for mugs, we search cabinets near the coffee machine and for fruits we try the fridge. In this work, we focus on…  (More)
    • CVPR 2018 Video
      Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, Ali Farhadi
      We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: “Are there any apples in the fridge?” The agent must navigate around the scene…  (More)
    • CVPR 2018
      Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
      A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every…  (More)
    • CVPR 2018
      Gunnar Sigurdsson, Cordelia Schmid, Ali Farhadi, Abhinav Gupta, Karteek Alahari
      Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such…  (More)
    • ECCV 2018
      Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi
      We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power…  (More)
    • ECCV 2018
      Krishna Kumar Singh, Santosh Kumar Divvala, Ali Farhadi, and Yong Jae Lee
      We propose the idea of transferring common-sense knowledge from source categories to target categories for scalable object detection. In our setting, the training data for the source categories have bounding box annotations, while those for the target categories only have image-level annotations…  (More)
    • ECCV 2018 Video
      Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, and Aniruddha Kembhavi
      Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval and Fusion Network (Craft), a model capable of learning this knowledge…  (More)
    • CVPR 2018
      Jonghyun Choi, Jayant Krishnamurthy, Aniruddha Kembhavi, Ali Farhadi
      Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning. However, understanding diagrams using natural image understanding approaches requires large training datasets of diagrams, which are very hard to obtain. Instead, this can be addressed as a…  (More)
    • CVPR 2018
      Kiana Ehsani, Hessam Bagherinezhad, Joe Redmon, Roozbeh Mottaghi, Ali Farhadi
      We study the task of directly modelling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes…  (More)
    • CVPR 2018
      Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi
      Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects…  (More)
    • CVPR 2018
      Rowan Zellers, Mark Yatskar, Sam Thomson, Yejin Choi
      We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that…  (More)
    • ICLR 2017
      Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi
      Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the…  (More)
    • ICLR 2017
      Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi
      In this paper, we study the problem of question answering when reasoning over multiple facts is required. We propose Query-Reduction Network (QRN), a variant of Recurrent Neural Network (RNN) that effectively handles both short-term (local) and long-term (global) sequential dependencies to reason…  (More)
    • ICRA 2017
      Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph Lim, Abhinav Gupta, Fei-Fei Li, and Ali Farhadi
      Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios…  (More)
    • CVPR 2017
      Gunnar A Sigurdsson, Santosh Divvala, Ali Farhadi, and Abhinav Gupta
      Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as…  (More)
    • CVPR 2017
      Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Hannaneh Hajishirzi, and Ali Farhadi
      We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle…  (More)
    • CVPR 2017
      Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, and Ali Farhadi
      Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing…  (More)
    • CVPR 2017
      Hessam Bagherinezhad, Mohammad Rastegari, and Ali Farhadi
      Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for convolutional neural networks that enables efficient learning and inference. We introduce LCNN, a lookup…  (More)
    • Award Best Paper Honorable Mention
      CVPR 2017
      Joseph Redmon and Ali Farhadi
      We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection…  (More)
    • ICCV 2017
      Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi
      Humans have rich understanding of liquid containers and their contents; for example, we can effortlessly pour water from a pitcher to a cup. Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the…  (More)
    • ICCV 2017
      Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
      A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a…  (More)
    • Award Best Student Paper Award
      AAAI 2016
      Babak Saleh, Ahmed Elgammal, Jacob Feldman, and Ali Farhadi
      The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset…  (More)
    • AAAI 2016
      Hessam Bagherinezhad, Hannaneh Hajishirzi, Yejin Choi, and Ali Farhadi
      Human vision greatly benefits from the information about sizes of objects. The role of size in several visual reasoning tasks has been thoroughly explored in human perception and cognition. However, the impact of the information about sizes of objects is yet to be determined in AI. We postulate…  (More)
    • NAACL 2016
      Mark Yatskar, Vicente Ordonez, and Ali Farhadi
      Obtaining common sense knowledge using current information extraction techniques is extremely challenging. In this work, we instead propose to derive simple common sense statements from fully annotated object detection corpora such as the Microsoft Common Objects in Context dataset. We show that…  (More)
    • CVPR 2016
      Mark Yatskar, Luke Zettlemoyer, and Ali Farhadi
      This paper introduces situation recognition, the problem of producing a concise summary of the situation an image depicts including: (1) the main activity (e.g., clipping), (2) the participating actors, objects, substances, and locations (e.g., man, shears, sheep, wool, and field) and most…  (More)
    • CVPR 2016
      Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, and Ali Farhadi
      In this paper, we study the challenging problem of predicting the dynamics of objects in static images. Given a query object in an image, our goal is to provide a physical understanding of the object in terms of the forces acting upon it and its long term motion as response to those forces. Direct…  (More)
    • Award OpenCV People's Choice Award
      CVPR 2016
      Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi
      We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network pre- dicts…  (More)
    • CVPR 2016
      Xiaolong Wang, Ali Farhadi, and Abhinav Gupta
      What defines an action like “kicking ball”? We argue that the true meaning of an action lies in the change or transformation an action brings to the environment. In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the…  (More)
    • CVPR 2016
      Roozbeh Mottaghi, Hannaneh Hajishirzi, and Ali Fahradi
      With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new…  (More)
    • JCDL 2016
      Christopher Clark and Santosh Divvala
      Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or presenting document summaries to users. To facilitate these applications we develop an algorithm that…  (More)
    • ICML 2016
      Junyuan Xie, Ross Girshick, and Ali Farhadi
      Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method…  (More)
    • ECCV 2016
      Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali Farhadi
      Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little…  (More)
    • ECCV 2016
      Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, and Ali Farhadi
      What happens if one pushes a cup sitting on a table toward the edge of the table? How about pushing a desk against a wall? In this paper, we study the problem of understanding the movements of objects as a result of applying external forces to them. For a given force vector applied to a specific…  (More)
    • ECCV 2016
      Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi
      We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in $32\times$ memory saving. In XNOR-Networks, both the filters and the input to…  (More)
    • ECCV 2016
      Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta
      Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are…  (More)
    • HCOMP 2016
      Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, and Abhinav Gupta
      Large-scale annotated datasets allow AI systems to learn from and build upon the knowledge of the crowd. Many crowdsourcing techniques have been developed for collecting image annotations. These techniques often implicitly rely on the fact that a new input image takes a negligible amount of time to…  (More)
    • ECCV 2016
      Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi
      ‘Which are the pedestrian detectors that yield a precision above 95% at 25% recall?’ Answering such a complex query involves identifying and analyzing the results reported in figures within several research papers. Despite the availability of excellent academic search engines, retrieving such…  (More)
    • ECCV 2016
      Junyuan Xie, Ross Girshick, and Ali Farhadi
      We propose Deep3D, a fully automatic 2D-to-3D conversion algorithm that takes 2D images or video frames as input and outputs stereo 3D image pairs. The stereo images can be viewed with 3D glasses or head-mounted VR displays. Deep3D is trained directly on stereo pairs from a dataset of 3D movies to…  (More)
    • CVPR 2016
      Mahyar Najibi, Mohammad Rastegari, and Larry Davis
      We introduce G-CNN, an object detection technique based on CNNs which works without proposal algorithms. G-CNN starts with a multi-scale grid of fixed bounding boxes. We train a regressor to move and scale elements of the grid towards objects iteratively. G-CNN models the problem of object…  (More)
    • ICCV 2015
      Hamid Izadinia, Fereshteh Sadeghi, Santosh Divvala, Hanna Hajishirzi, Yejin Choi, and Ali Farhadi
      We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a highquality segment-phrase…  (More)
    • EMNLP 2015
      Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi, Oren Etzioni, and Clint Malcolm
      This paper introduces GeoS, the first automated system to solve unaltered SAT geometry questions by combining text understanding and diagram interpretation. We model the problem of understanding geometry questions as submodular optimization, and identify a formal problem description likely to be…  (More)
    • AAAI • Workshop on Scholarly Big Data 2015
      Christopher Clark and Santosh Divvala
      Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. While many "off-the-shelf" tools…  (More)
    • CVPR 2015
      Fereshteh Sadeghi, Santosh Divvala, and Ali Farhadi
      How can we know whether a statement about our world is valid. For example, given a relationship between a pair of entities e.g., 'eat(horse, hay)', how can we know whether this relationship is true or false in general. Gathering such knowledge about entities and their relationships is one of the…  (More)
    • NIPS 2015
      Fereshteh Sadeghi, C. Lawrence Zitnick, and Ali Farhadi
      In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the…  (More)
    • CVPR 2015
      Mohammad Rastegari, Hannaneh Hajishirzi, and Ali Farhadi
      In this paper we present a bottom-up method to instance level Multiple Instance Learning (MIL) that learns to discover positive instances with globally constrained reasoning about local pairwise similarities. We discover positive instances by optimizing for a ranking such that positive (top rank…  (More)
    • ICCV 2015
      Bilge Soran, Ali Farhadi, and Linda Shapiro
      We all have experienced forgetting habitual actions among our daily activities. For example, we probably have forgotten to turn the lights off before leaving a room or turn the stove off after cooking. In this paper, we propose a solution to the problem of issuing notifications on actions that may…  (More)
    • CVPR 2014
      Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin
      Recognition is graduating from labs to real-world applications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance variations…  (More)
    • AAAI 2014
      Min Joon Seo, Hannaneh Hajishirzi, Ali Farhadi, and Oren Etzioni
      Automatically solving geometry questions is a longstanding AI problem. A geometry question typically includes a textual description accompanied by a diagram. The first step in solving geometry questions is diagram understanding, which consists of identifying visual elements in the diagram, their…  (More)