Menu
Viewing 81-100 of 130 videos See AI2’s full collection of videos on our YouTube channel.
    • April 26, 2016

      The successes of deep learning in the past decade on difficult tasks ranging from image processing to speech recognition to game playing is strong evidence for the utility of abstract representations of complex natural sensory data. In this talk I will present the deep canonical correlation analysis (DCCA) model to learn deep representation mappings of each of two data views (e.g., from two different sensory modalities) such that the learned representations are maximally predictive of each other in the sense of correlation. Comparisons with linear CCA and kernel CCA demonstrate that DCCA is capable of finding far more highly correlated nonlinear representations than standard methods. Experiments also demonstrate the utility of the representation mappings learned by DCCA in the scenario where one of the data views is unavailable at test time.

      Less More
    • April 12, 2016

      Percy Liang

      Can we learn if we start with zero examples, either labeled or unlabeled? This scenario arises in new user-facing systems (such as virtual assistants for new domains), where inputs should come from users, but no users exist until we have a working system, which depends on having training data. I will discuss recent work that circumvent this circular dependence by interleaving user interaction and learning.

      Less More
    • April 6, 2016

      Ronan Le Bras

      Most problems, from theoretical problems in combinatorics to real-world applications, comprise hidden structural properties not directly captured by the problem definition. A key to the recent progress in automated reasoning and combinatorial optimization has been to automatically uncover and exploit this hidden problem structure, resulting in a dramatic increase in the scale and complexity of the problems within our reach. The most complex tasks, however, still require human abilities and ingenuity. In this talk, I will show how we can leverage human insights to effectively complement and dramatically boost state-of-the-art optimization techniques. I will demonstrate the effectiveness of the approach with a series of scientific discoveries, from experimental designs to materials discovery.

      Less More
    • April 4, 2016

      Jeffrey Heer

      How might we architect interactive systems that have better models of the tasks we're trying to perform, learn over time, help refine ambiguous user intents, and scale to large or repetitive workloads? In this talk I will present Predictive Interaction, a framework for interactive systems that shifts some of the burden of specification from users to algorithms, while preserving human guidance and expressive power. The central idea is to imbue software with domain-specific models of user tasks, which in turn power predictive methods to suggest a variety of possible actions. I will illustrate these concepts with examples drawn from widely-deployed systems for data transformation and visualization (with reported order-of-magnitude productivity gains) and then discuss associated design considerations and future research directions.

      Less More
    • March 25, 2016

      Ashish Vaswani

      Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.

      Less More
    • March 9, 2016

      Manaal Faruqui

      Unsupervised learning of word representations have proven to provide exceptionally effective features in many NLP tasks. Traditionally, construction of word representations relies on the distributional hypothesis, which posits that the meaning of words is evidenced by the contextual words they occur with (Harris, 1954). Although distributional context is fairly good at capturing word meaning, in this talk I'll show that going beyond the distributional hypothesis---by exploiting additional sources of word meaning information---improves the quality of word representations. First, I'll show how semantic lexicons, like WordNet, can be used to obtain better word vector representations. Second, I'll describe a novel graph-based learning framework that uses morphological information to construct large scale morpho-syntactic lexicons. I'll conclude with additional approaches that can be taken to improve word representations.

      Less More
    • March 3, 2016

      Ali Farhadi

      Ali Farhadi discusses the history of computer vision and AI.

      Less More
    • March 2, 2016

      Ashish Sabharwal

      Artificial intelligence and machine learning communities have made tremendous strides in the last decade. Yet, the best systems to date still struggle with routine tests of human intelligence, such as standardized science exams posed as-is in natural language, even at the elementary-school level. Can we demonstrate human-like intelligence by building systems that can pass such tests? Unlike typical factoid-style question answering (QA) tasks, these tests challenge a student’s ability to combine multiple facts in various ways, and appeal to broad common-sense and science knowledge. Going beyond arguably shallow information retrieval (IR) and statistical correlation techniques, we view science QA from the lens of combinatorial optimization over a semi-formal knowledge base derived from text. Our structured inference system, formulated as an Integer Linear Program (ILP), turns out to be not only highly complementary to IR methods, but also more robust to question perturbation, as well as substantially more scalable and accurate than prior attempts using probabilistic first-order logic and Markov Logic Networks (MLNs). This talk will discuss fundamental challenges behind the science QA task, the progress we have made, and many challenges that lie ahead.

      Less More
    • February 16, 2016

      Eric Xing

      The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required — and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can indeed benefit greatly from ML-rooted statistical and algorithmic insights — and that ML researchers should therefore not shy away from such systems design — we discuss a series of principles and strategies distilled from our recent efforton industrial-scale ML solutions that involve a continuum from application, to engineering, and to theoretical research and development of Big ML system and architecture, on how to make them efficient, general, and with convergence and scaling guarantees.

      Less More
    • February 9, 2016

      Rich Caruana

      Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.

      Less More
    • January 27, 2016

      Jayant Krishnamurthy

      Lexicon learning is the first step of training a semantic parser for a new application domain, and the quality of the learned lexicon significantly affects both the accuracy and efficiency of the final semantic parser. Existing work on lexicon learning has focused on heuristic methods that lack convergence guarantees and require significant human input in the form of lexicon templates or annotated logical forms. In contrast, the proposed probabilistic models are trained directly from question/answer pairs using EM and the simplest model has a concave objective function that guarantees that EM converges to a global optimum. An experimental evaluation on a data set of 4th grade science questions demonstrates that these models improve semantic parser accuracy (35-70% error reduction) and efficiency (4-25x more sentences per second) relative to prior work, despite using less human input. The models also obtain competitive results on Geoquery without any dataset-specific engineering.

      Less More
    • January 12, 2016

      Patrice Simard

      For many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcher’s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the “head” of ML.

      We are interested in another large class of ML problems where data is sparse. For contrast, we call it the “tail” of ML. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “patio on”, “enlarge paragraph spacing”, “make appointment with doctor when back from vacation”. Anyone who has attempted building such a system has soon discovered that there are far more ways to issue a command than they originally thought. Domain knowledge, data selection, and custom features are essential to get good generalization performance with small amounts of data. With the right tools, an ML expert can build such a classifier or annotator in a matter of hours. Unfortunately, the current cost of an ML expert (if one is available) is often more than the value produced by a single domain specific model. Getting good results on the tail is not cheap or easy.

      To address this problem, we change our focus from the learner to the teacher. We define Machine Teaching as improving the “teacher” productivity given the “learner”. The teacher is human. The learner is an ML algorithm. Ideally, our approach is “learner agnostic”. Focusing on improving the teacher does not preclude using the best ML algorithm or the best deep representation features and transfer learning. We view Machine Teaching and Machine Learning as orthogonal and complementary approaches. The Machine Teaching metrics are ML metrics divided by human costs, and Machine Teaching focuses on reducing the denominator. This perspective has led to many interesting insights and significant gains in ML productivity.

      Less More
    • December 10, 2015

      Chandra Bhagavatula

      In this talk, I will describe two systems designed to extract structured knowledge from unstructured and semi-structured data. First, I'll present an entity linking system for Web tables. Next, I'll talk about a key phrase extraction system that extracts a set of key concepts from a research article. Towards the end of the talk, I will briefly introduce an underlying common problem which connects these two seemingly distinct tasks. I will also present an approach, based on topic modeling, to solve this common underlying problem.

      Less More
    • November 3, 2015

      Hanie Sedghi

      Learning with big data is akin to finding a needle in a haystack: useful information is hidden in high dimensional data. Optimization methods, both convex and nonconvex, require new thinking when dealing with high dimensional data, and I present two novel solutions.

      Less More
    • September 14, 2015

      Doug Downey

      In this talk, I will introduce efficient methods for inferring large
topic hierarchies. The approach is built upon the Sparse Backoff Tree
(SBT), a new prior for latent topic distributions that organizes the
latent topics as leaves in a tree. I will show how a document model
based on SBTs can effectively infer accurate topic spaces of over a million topics.
Experiments demonstrate that scaling to large topic spaces results in
much more accurate models, and that SBT document models make use of
large topic spaces more effectively than flat LDA. Lastly, I will
 describe how the models power Atlasify, a prototype exploratory search engine.

      Less More
    • September 10, 2015

      Shalini Ghosh

      Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this talk, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate hierarchical contextual features (e.g., topics) into the model. The CLSTM models were implemented in the Google DistBelief framework.

      Less More
    • August 18, 2015

      Iftekhar Naim

      Today we encounter enormous amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, movies and shooting scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding text sentences. We address the problem of automatically aligning natural language sentences with corresponding video segments without direct human supervision. We first propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. Next, we propose a latent-variable discriminative alignment model, which outperforms the generative models by incorporating rich features. Our alignment algorithms are applied to align biological wetlab videos with text instructions and movie scenes with shooting scripts.

      Less More
    • July 30, 2015

      Matt Gardner

      A lot of attention has recently been given to the creation of large knowledge bases that contain millions of facts about people, things, and places in the world. In this talk I present methods for using these knowledge bases to generate features for machine learning models. These methods view the knowledge base as a graph which can be traversed to find potentially predictive information. I show how these methods can be applied to models of knowledge base completion, relation extraction, and question answering.

      Less More
    • July 10, 2015

      Christof Koch

      Human and non-human animals not only act in the world but are capable of conscious experience. That is, it feels like something to have a brain and be cold, angry or see red. I will discuss the scientific progress that has been achieved over the past decades in characterizing the behavioral and the neuronal correlates of consciousness, both based on clinical case studies as well as laboratory experiments. I will introduce the Integrated Information Theory (IIT) that explains in a principled manner which physical systems are capable of conscious, subjective experience. The theory explains many biological and medical facts about consciousness and its pathologies in humans, can be extrapolated to more difficult cases, such as fetuses, mice, or non-mammalian brains and has been used to assess the presence of consciousness in individual patients in the clinic. IIT also explains why consciousness evolved by natural selection. The theory predicts that feed-forward networks, such as deep convolutional networks, are not conscious even if they perform tasks that in humans would be associated with conscious experience. Furthermore, and in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if they were to run software faithfully simulating the human brain, would experience next to nothing. That is, while in the biological realm, intelligence and consciousness are intimately related, contemporary developments in AI dissolve that link, giving rise to intelligence without consciousness.

      Less More
    • April 21, 2015

      Karthik Raman

      In this talk I discuss the challenges of learning from data that results from human behavior. I will present new machine learning models and algorithms that explicitly account for the human decision making process and factors underlying it such as human expertise, skills and needs. The talk will also explore how we can look to optimize human interactions to build robust learning systems with provable performance guarantees. I will also present examples, from the domains of search, recommendation and educational analytics, where we have successfully deployed systems for cost-effectively learning with humans in the loop.

      Less More