Viewing 81-100 of 143 videos See AI2’s full collection of videos on our YouTube channel.
    • August 23, 2016

      Matthew Peters

      Distributed representations of words, phrases and sentences are central to recent advances in machine translation, language modeling, semantic similarity, and other tasks. In this talk, I'll explore ways to learn similar representations of search queries, web pages and web sites. The first portion of the talk describes a method to learn a keyword-web page similarity function applicable to web search. It represents the web page as a set of attributes (URL, title, meta description tag, etc) and uses a separate LSTM encoder for each attribute. The network is trained end-to-end from clickthrough logs. The second half of the talk introduces a measure of authority for each web page and jointly learns keyword-keyword, keyword-site and keyword-site-authority relationships. The multitask network leverages a shared representation for keywords and sites and learns a fine grained topic authority (for example is an authority on the topic "Bernie Sanders" but not on "Seattle Mariners").

      Less More
    • August 22, 2016

      Jay Pujara

      Automated question answering, knowledgeable digital assistants, and grappling with the massive data flooding the Web all depend on structured knowledge. Precise knowledge graphs capturing the many, complex relationships between entities are the missing piece for many problems, but knowledge graph construction is notoriously difficult. In this talk, I will chronicle common failures from the first generation of information extraction systems and show how combining statistical NLP signals and semantic constraints addresses these problems. My method, Knowledge Graph Identification (KGI), exploits the key lessons of the statistical relational learning community and uses them for better knowledge graph construction. Probabilistic models are often discounted due to scalability concerns, but KGI translates the problem into a tractable convex objective that is amenable to parallelization. Furthermore, the inferences from KGI have provable optimality and can be updated efficiently using approximate techniques that have bounded regret. I demonstrate state-of-the-art performance of my approach on knowledge graph construction and entity resolution tasks on NELL and Freebase, and discuss exciting new directions for KG construction.

      Less More
    • August 1, 2016

      Dan Garrette

      Learning NLP models from weak forms of supervision has become increasingly important as the field moves toward applications in new languages and domains. Much of the existing work in this area has focused on designing learning approaches that are able to make use of small amounts of human-generated data. In this talk, I will present work on a complementary form of inductive bias: universal, cross-lingual principles of how grammars function. I will develop these intuitions with a series of increasingly complex models based in the Combinatory Categorial Grammar (CCG) formalism: first, a supertagging model that biases towards associative adjacent-category relationships; second, a parsing model that biases toward simpler grammatical analyses; and finally, a novel parsing model, with accompanying learning procedure, that is able to exploit both of these biases by parameterizing the relationships between each constituent label and its supertag context to find trees with a better global coherence. We model grammar with CCG because the structured, logic-backed nature of CCG categories and the use of a small universal set of constituent combination rules are ideally suited to encoding as priors, and we train our models within a Bayesian setting that combines these prior beliefs about how natural languages function with the empirical statistics gleaned from large amounts of raw text. Experiments with each of these models show that when training from only partial type-level supervision and a corpus of unannotated text, employing these universal properties as soft constraints yields empirically better models. Additional gains are obtained by further shaping the priors with corpus-specific information that is estimated automatically from the tag dictionary and raw text.

      Less More
    • July 18, 2016

      Claudio Delli Bovi

      The Open Information Extraction (OIE) paradigm has received much attention in the NLP community over the last decade. Since the earliest days, most OIE approaches have been focusing on Web-scale corpora, which raises issues such as massive amounts of noise. Also, OIE systems can be very different in nature and develop their own type inventories, with no portable ontological structure. This talk steps back and explores both issues by presenting two substantially different approaches to the task: in the first we shift the target of a full-fledged OIE pipeline to a relatively small, dense corpus of definitional knowledge; in the second we try to make sense of different OIE outputs by merging them into a single, unified and fully disambiguated knowledge repository.

      Less More
    • July 15, 2016

      Yuxin Chen

      Sequential information gathering, i.e., selectively acquiring the most useful data, plays a key role in interactive machine learning systems. Such problem has been studied in the context of Bayesian active learning and experimental design, decision making, optimal control and numerous other domains. In this talk, we focus on a class of information gathering tasks, where the goal is to learn the value of some unknown target variable through a sequence of informative, possibly noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly. We propose a class of novel, computationally efficient active learning algorithms, and prove strong theoretical guarantees that hold with correlated, possibly noisy tests. Rather than myopically optimize the value of a test (which, in our case, is the expected reduction in prediction error), at each step, our algorithms pick the test that maximizes the gain in a surrogate objective, which is adaptive submodular. This property enables us to utilize an efficient greedy optimization while providing strong approximation guarantees. We demonstrate our algorithms in several real-world problem instances, including a touch-based location task on an actual robotic platform, and an active preference learning task via pairwise comparisons.

      Less More
    • June 21, 2016

      Katrin Erk

      As the field of Natural Language Processing develops, more ambitious semantic tasks are being addressed, such as Question Answering (QA) and Recognizing Textual Entailment (RTE). Solving these tasks requires (ideally) an in-depth representation of sentence structure as well as expressive and flexible representations at the word level. We have been exploring a combination of logical form with distributional as well as resource-based information at the word level, using Markov Logic Networks (MLNs) to perform probabilistic inference over the resulting representations. In this talk, I will focus on the three main components of a system we have developed for the task of Textual Entailment: (1) Logical representation for processing in MLNs, (2) lexical entailment rule construction by integrating distributional information with existing resources, and (3) probabilistic inference, the problem of solving the resulting MLN inference problems efficiently. I will also comment on how I think the ideas from this system can be adapted to Question Answering and the more general task of in-depth single-document understanding.

      Less More
    • June 20, 2016

      Marcus Rohrbach

      Language is the most important channel for humans to communicate about what they see. To allow an intelligent system to effectively communicate with humans it is thus important to enable it to relate information in words and sentences with the visual world. For this a system should be compositional, so it is e.g. not surprised when it encounters a novel object and can still talk about it. It should be able to explain in natural language, why it recognized a given object in an image as certain class, to allow a human to trust and understand it. However, it should not only be able to generate natural language, but also understand it, and locate sentences and linguistic references in the visual world. In my talk, I will discuss how we approach these different fronts by looking at the tasks of language generation about images, visual grounding, and visual question answering. I will conclude with a discussion of the challenges ahead.

      Less More
    • June 13, 2016

      Megasthenis Asteris

      Principal component analysis (PCA) is one of the most popular tools for identifying structure and extracting interpretable information from datasets. In this talk, I will discuss constrained variants of PCA such as Sparse or Nonnegative PCA that are computationally harder, but offer higher data interpretability. I will describe a framework for solving quadratic optimization problems --such as PCA-- under sparsity or other combinatorial constraints. Our method can surprisingly solve such problems exactly when the involved quadratic form matrix is positive semidefinite and low rank. Of course, real datasets are not low-rank, but they can frequently be well approximated by low-rank matrices. For several datasets, we obtain excellent empirical performance and provable upper bounds that guarantee that our objective is close to the unknown optimum.

      Less More
    • June 13, 2016

      Niket Tandon

      There is a growing conviction that the future of computing will crucially depend on our ability to exploit Big Data on the Web to produce significantly more intelligent and knowledgeable systems. This includes encyclopedic knowledge (for factual knowledge) and commonsense knowledge (for more advanced human-like reasoning). The focus of this talk is automatic acquisition of commonsense knowledge using the Web. We require the computers to understand the environment (e.g. the properties of the objects in the environment), the relations between these objects (e.g. handle is part of a bike or that bike is slower than a car), and, the semantics of their interaction (e.g. a man and a woman meet for a dinner in a restaurant in the evening). This talk presents techniques for gathering such commonsense from textual and visual data from the Web.

      Less More
    • May 23, 2016

      Oren Etzioni

      Oren Etzioni, CEO of the Allen Institute for AI, shares his vision for deploying AI technologies for the common good.

      Less More
    • May 17, 2016

      Yi Yang

      With the resurgence of neural networks, low-dimensional dense features have been used in a wide range of natural language processing problems. Specifically, tasks like part-of-speech tagging, dependency parsing and entity linking have been shown to benefit from dense feature representations from both efficiency and effectiveness aspects. In this talk, I will present algorithms for unsupervised domain adaptation, where we train low-dimensional feature embeddings with instances from both source and target domains. I will also talk about how to extend the approach to unsupervised multi-domain adaptation by leveraging metadata domain attributes. I will then introduce a tree-based structured learning model for entity linking, where the model employs a few statistical dense features to jointly detect mentions and disambiguate entities. Finally, I will discuss some promising directions for future research.

      Less More
    • May 9, 2016

      Aditya Khosla

      When glancing at a magazine or browsing the Internet, we are continuously exposed to photographs and images. While some images stick in our minds, others are ignored or quickly forgotten. Artists, advertisers and educators are routinely challenged by the question "what makes a picture memorable?" and must then create an image that speaks to the observer. In this talk, I will show how deep learning algorithms can predict with near-human consistency which images people will remember or forget - and how we can modify images automatically to make them more or less memorable.

      Less More
    • May 3, 2016

      Saurabh Gupta

      In this talk, I will talk about detailed scene understanding from RGB-D images. We approach this problem by studying central computer vision problems like bottom-up grouping, object detection, instance segmentation, pose estimation in context of RGB-D images, and finally aligning CAD models to objects in the scene. This results in a detailed output which goes beyond what most current computer vision algorithms produce, and is useful for real world applications like perceptual robotics, and augmented reality. A central question in this work is how to learn good features for depth images in view of the fact that labeled RGB-D datasets are much smaller than labeled RGB datasets (such as ImageNet) typically used for feature learning. To this end I will describe our technique called "cross-modal distillation" which allows us to leverage easily available annotations on RGB images to learn representations on depth images. In addition, I will also briefly talk about some work on vision and language that I did on an internship at Microsoft Research.

      Less More
    • April 26, 2016

      The successes of deep learning in the past decade on difficult tasks ranging from image processing to speech recognition to game playing is strong evidence for the utility of abstract representations of complex natural sensory data. In this talk I will present the deep canonical correlation analysis (DCCA) model to learn deep representation mappings of each of two data views (e.g., from two different sensory modalities) such that the learned representations are maximally predictive of each other in the sense of correlation. Comparisons with linear CCA and kernel CCA demonstrate that DCCA is capable of finding far more highly correlated nonlinear representations than standard methods. Experiments also demonstrate the utility of the representation mappings learned by DCCA in the scenario where one of the data views is unavailable at test time.

      Less More
    • April 12, 2016

      Percy Liang

      Can we learn if we start with zero examples, either labeled or unlabeled? This scenario arises in new user-facing systems (such as virtual assistants for new domains), where inputs should come from users, but no users exist until we have a working system, which depends on having training data. I will discuss recent work that circumvent this circular dependence by interleaving user interaction and learning.

      Less More
    • April 6, 2016

      Ronan Le Bras

      Most problems, from theoretical problems in combinatorics to real-world applications, comprise hidden structural properties not directly captured by the problem definition. A key to the recent progress in automated reasoning and combinatorial optimization has been to automatically uncover and exploit this hidden problem structure, resulting in a dramatic increase in the scale and complexity of the problems within our reach. The most complex tasks, however, still require human abilities and ingenuity. In this talk, I will show how we can leverage human insights to effectively complement and dramatically boost state-of-the-art optimization techniques. I will demonstrate the effectiveness of the approach with a series of scientific discoveries, from experimental designs to materials discovery.

      Less More
    • April 4, 2016

      Jeffrey Heer

      How might we architect interactive systems that have better models of the tasks we're trying to perform, learn over time, help refine ambiguous user intents, and scale to large or repetitive workloads? In this talk I will present Predictive Interaction, a framework for interactive systems that shifts some of the burden of specification from users to algorithms, while preserving human guidance and expressive power. The central idea is to imbue software with domain-specific models of user tasks, which in turn power predictive methods to suggest a variety of possible actions. I will illustrate these concepts with examples drawn from widely-deployed systems for data transformation and visualization (with reported order-of-magnitude productivity gains) and then discuss associated design considerations and future research directions.

      Less More
    • March 25, 2016

      Ashish Vaswani

      Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.

      Less More
    • March 9, 2016

      Manaal Faruqui

      Unsupervised learning of word representations have proven to provide exceptionally effective features in many NLP tasks. Traditionally, construction of word representations relies on the distributional hypothesis, which posits that the meaning of words is evidenced by the contextual words they occur with (Harris, 1954). Although distributional context is fairly good at capturing word meaning, in this talk I'll show that going beyond the distributional hypothesis---by exploiting additional sources of word meaning information---improves the quality of word representations. First, I'll show how semantic lexicons, like WordNet, can be used to obtain better word vector representations. Second, I'll describe a novel graph-based learning framework that uses morphological information to construct large scale morpho-syntactic lexicons. I'll conclude with additional approaches that can be taken to improve word representations.

      Less More
    • March 3, 2016

      Ali Farhadi

      Ali Farhadi discusses the history of computer vision and AI.

      Less More