Viewing 451-460 of 519 papers
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali FarhadiCVPR • 2016 We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network pre- dicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detec- tors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
- Xiaolong Wang, Ali Farhadi, and Abhinav GuptaCVPR • 2016 What defines an action like “kicking ball”? We argue that the true meaning of an action lies in the change or transformation an action brings to the environment. In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (pre-condition) to the state after the action (effect). Motivated by recent advancements of video representation using deep learning, we design a Siamese network which models the action as a transformation on a high-level feature space. We show that our model gives improvements on standard action recognition datasets including UCF101 and HMDB51. More importantly, our approach is able to generalize beyond learned action categories and shows significant performance improvement on cross-category generalization on our new ACT dataset.
- Roozbeh Mottaghi, Hannaneh Hajishirzi, and Ali FahradiCVPR • 2016 With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new sensory devices, feature designs, feature learning methods, and algorithms, the search in the space of features and models becomes combinatorial. In this paper, we propose a novel cost-sensitive task-oriented recognition method that is based on a combination of linguistic semantics and visual cues. Our task-oriented framework is able to generalize to unseen tasks for which there is no training data and outperforms state-of-the-art cost-based recognition baselines on our new task-based dataset.
- Christopher Clark and Santosh DivvalaJCDL • 2016 Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or presenting document summaries to users. To facilitate these applications we develop an algorithm that extracts figures, tables, and captions from documents called "PDFFigures 2.0."Our proposed approach analyzes the structure of individual pages by detecting captions, graphical elements, and chunks of body text, and then locates gures and tables by reasoning about the empty regions within that text. To evaluate our work, we introduce a new dataset of computer science papers, along with ground truth labels for the locations of the gures, tables, and captions within them. Our algorithm achieves impressive results (94% precision at 90% recall) on this dataset surpassing previous state of the art. Further, we show how our framework was used to extract gures from a corpus of over one million papers, and how the resulting extractions were integrated into the user interface of a smart academic search engine, Semantic Scholar (www.semanticscholar.org). Finally, we present results of exploratory data analysis completed on the extracted gures as well as an extension of our method for the task of section title extraction. We release our dataset and code on our project webpage for enabling future research (http://pdgures2.allenai.org).
- Bhavana Dalvi, Sumithra Bhakthavatsalam, Chris Clark, Peter Clark, Oren Etzioni, Anthony Fader, and Dirk GroeneveldAKBC • 2016 Recent work on information extraction has suggested that fast, interactive tools can be highly effective; however, creating a usable system is challenging, and few publicly available tools exist. In this paper we present IKE, a new extraction tool that performs fast, interactive bootstrapping to develop high quality extraction patterns for targeted relations. Central to IKE is the notion that an extraction pattern can be treated as a search query over a corpus. To operationalize this, IKE uses a novel query language that is expressive, easy to understand, and fast to execute - essential requirements for a practical system. It is also the first interactive extraction tool to seamlessly integrate symbolic (boolean) and distributional (similarity-based) methods for search. An initial evaluation suggests that relation tables can be populated substantially faster than by manual pattern authoring while retaining accuracy, and more reliably than fully automated tools, an important step towards practical KB construction. We are making IKE publicly available.
- Carissa Schoenick, Peter Clark, Oyvind Tafjord, Peter Turney, and Oren EtzioniCACM • 2016 The field of Artificial Intelligence has made great strides forward recently, for example AlphaGo's recent victory against the world champion Lee Sedol in the game of Go, leading to great optimism about the field. But are we really moving towards smarter machines, or are these successes restricted to certain classes of problems, leaving other challenges untouched? In 2016, the Allen Institute for Artificial Intelligence (AI2) ran the Allen AI Science Challenge, a competition to test machines on an ostensibly difficult task, namely answering 8th Grade science questions. Our motivations were to encourage the field to set its sights broader and higher by exploring a problem that appears to require modeling, reasoning, language understanding, and commonsense knowledge, to probe the state of the art on this task, and sow the seeds for possible future breakthroughs. The challenge received a strong response, with 780 teams from all over the world participating. What were the results? This article describes the competition and the interesting outcomes of the challenge.
- Saif M. Mohammad, Ekaterina Shutova, and Peter D. TurneySEM • 2016 It is generally believed that a metaphor tends to have a stronger emotional impact than a literal statement; however, there is no quantitative study establishing the extent to which this is true. Further, the mechanisms through which metaphors convey emotions are not well understood. We present the first data-driven study comparing the emotionality of metaphorical expressions with that of their literal counterparts. Our results indicate that metaphorical usages are, on average, significantly more emotional than literal usages. We also show that this emotional content is not simply transferred from the source domain into the target, but rather is a result of meaning composition and interaction of the two domains in the metaphor.
- Junyuan Xie, Ross Girshick, and Ali FarhadiICML • 2016 Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods.
- Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali FarhadiECCV • 2016 Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for about 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.
- Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, and Ali FarhadiECCV • 2016 What happens if one pushes a cup sitting on a table toward the edge of the table? How about pushing a desk against a wall? In this paper, we study the problem of understanding the movements of objects as a result of applying external forces to them. For a given force vector applied to a specific location in an image, our goal is to predict long-term sequential movements caused by that force. Doing so entails reasoning about scene geometry, objects, their attributes, and the physical rules that govern the movements of objects. We design a deep neural network model that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene by combining Convolutional and Recurrent Neural Networks. Training our model requires a large-scale dataset of object movements caused by external forces. To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them. Our Forces in Scenes (ForScene) dataset contains 65,000 object movements in 3D which represent a variety of external forces applied to different types of objects. Our experimental evaluations show that the challenging task of predicting long-term movements of objects as their reaction to external forces is possible from a single image. The code and dataset are available at: https://prior.allenai.org/projects/what-happens-if