VideosSee AI2's full collection of videos on our YouTube channel.
Viewing 121-130 of 158 videos
- March 2, 2016 | Ashish SabharwalArtificial intelligence and machine learning communities have made tremendous strides in the last decade. Yet, the best systems to date still struggle with routine tests of human intelligence, such as standardized science exams posed as-is in natural language, even at the elementary-school level. Can we demonstrate human-like intelligence by building systems that can pass such tests? Unlike typical factoid-style question answering (QA) tasks, these tests challenge a student’s ability to combine multiple facts in various ways, and appeal to broad common-sense and science knowledge. Going beyond arguably shallow information retrieval (IR) and statistical correlation techniques, we view science QA from the lens of combinatorial optimization over a semi-formal knowledge base derived from text. Our structured inference system, formulated as an Integer Linear Program (ILP), turns out to be not only highly complementary to IR methods, but also more robust to question perturbation, as well as substantially more scalable and accurate than prior attempts using probabilistic first-order logic and Markov Logic Networks (MLNs). This talk will discuss fundamental challenges behind the science QA task, the progress we have made, and many challenges that lie ahead.
- February 16, 2016 | Eric XingThe rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required — and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can indeed benefit greatly from ML-rooted statistical and algorithmic insights — and that ML researchers should therefore not shy away from such systems design — we discuss a series of principles and strategies distilled from our recent efforton industrial-scale ML solutions that involve a continuum from application, to engineering, and to theoretical research and development of Big ML system and architecture, on how to make them efficient, general, and with convergence and scaling guarantees.
- February 9, 2016 | Rich CaruanaLocally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.
- January 27, 2016 | Jayant KrishnamurthyLexicon learning is the first step of training a semantic parser for a new application domain, and the quality of the learned lexicon significantly affects both the accuracy and efficiency of the final semantic parser. Existing work on lexicon learning has focused on heuristic methods that lack convergence guarantees and require significant human input in the form of lexicon templates or annotated logical forms. In contrast, the proposed probabilistic models are trained directly from question/answer pairs using EM and the simplest model has a concave objective function that guarantees that EM converges to a global optimum. An experimental evaluation on a data set of 4th grade science questions demonstrates that these models improve semantic parser accuracy (35-70% error reduction) and efficiency (4-25x more sentences per second) relative to prior work, despite using less human input. The models also obtain competitive results on Geoquery without any dataset-specific engineering.
- January 12, 2016 | Patrice SimardFor many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcher’s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the “head” of ML. We are interested in another large class of ML problems where data is sparse. For contrast, we call it the “tail” of ML. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “patio on”, “enlarge paragraph spacing”, “make appointment with doctor when back from vacation”. Anyone who has attempted building such a system has soon discovered that there are far more ways to issue a command than they originally thought. Domain knowledge, data selection, and custom features are essential to get good generalization performance with small amounts of data. With the right tools, an ML expert can build such a classifier or annotator in a matter of hours. Unfortunately, the current cost of an ML expert (if one is available) is often more than the value produced by a single domain specific model. Getting good results on the tail is not cheap or easy. To address this problem, we change our focus from the learner to the teacher. We define Machine Teaching as improving the “teacher” productivity given the “learner”. The teacher is human. The learner is an ML algorithm. Ideally, our approach is “learner agnostic”. Focusing on improving the teacher does not preclude using the best ML algorithm or the best deep representation features and transfer learning. We view Machine Teaching and Machine Learning as orthogonal and complementary approaches. The Machine Teaching metrics are ML metrics divided by human costs, and Machine Teaching focuses on reducing the denominator. This perspective has led to many interesting insights and significant gains in ML productivity.
- December 10, 2015 | Chandra BhagavatulaIn this talk, I will describe two systems designed to extract structured knowledge from unstructured and semi-structured data. First, I'll present an entity linking system for Web tables. Next, I'll talk about a key phrase extraction system that extracts a set of key concepts from a research article. Towards the end of the talk, I will briefly introduce an underlying common problem which connects these two seemingly distinct tasks. I will also present an approach, based on topic modeling, to solve this common underlying problem.
- November 3, 2015 | Hanie SedghiLearning with big data is akin to finding a needle in a haystack: useful information is hidden in high dimensional data. Optimization methods, both convex and nonconvex, require new thinking when dealing with high dimensional data, and I present two novel solutions.
- September 14, 2015 | Doug DowneyIn this talk, I will introduce efficient methods for inferring large topic hierarchies. The approach is built upon the Sparse Backoff Tree (SBT), a new prior for latent topic distributions that organizes the latent topics as leaves in a tree. I will show how a document model based on SBTs can effectively infer accurate topic spaces of over a million topics. Experiments demonstrate that scaling to large topic spaces results in much more accurate models, and that SBT document models make use of large topic spaces more effectively than flat LDA. Lastly, I will describe how the models power Atlasify, a prototype exploratory search engine.
- September 10, 2015 | Shalini GhoshDocuments exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this talk, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate hierarchical contextual features (e.g., topics) into the model. The CLSTM models were implemented in the Google DistBelief framework.
- August 18, 2015 | Iftekhar NaimToday we encounter enormous amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, movies and shooting scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding text sentences. We address the problem of automatically aligning natural language sentences with corresponding video segments without direct human supervision. We first propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. Next, we propose a latent-variable discriminative alignment model, which outperforms the generative models by incorporating rich features. Our alignment algorithms are applied to align biological wetlab videos with text instructions and movie scenes with shooting scripts.