Menu
Viewing 101-120 of 143 videos See AI2’s full collection of videos on our YouTube channel.
    • March 2, 2016

      Ashish Sabharwal

      Artificial intelligence and machine learning communities have made tremendous strides in the last decade. Yet, the best systems to date still struggle with routine tests of human intelligence, such as standardized science exams posed as-is in natural language, even at the elementary-school level. Can we demonstrate human-like intelligence by building systems that can pass such tests? Unlike typical factoid-style question answering (QA) tasks, these tests challenge a student’s ability to combine multiple facts in various ways, and appeal to broad common-sense and science knowledge. Going beyond arguably shallow information retrieval (IR) and statistical correlation techniques, we view science QA from the lens of combinatorial optimization over a semi-formal knowledge base derived from text. Our structured inference system, formulated as an Integer Linear Program (ILP), turns out to be not only highly complementary to IR methods, but also more robust to question perturbation, as well as substantially more scalable and accurate than prior attempts using probabilistic first-order logic and Markov Logic Networks (MLNs). This talk will discuss fundamental challenges behind the science QA task, the progress we have made, and many challenges that lie ahead.

      Less More
    • February 16, 2016

      Eric Xing

      The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required — and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can indeed benefit greatly from ML-rooted statistical and algorithmic insights — and that ML researchers should therefore not shy away from such systems design — we discuss a series of principles and strategies distilled from our recent efforton industrial-scale ML solutions that involve a continuum from application, to engineering, and to theoretical research and development of Big ML system and architecture, on how to make them efficient, general, and with convergence and scaling guarantees.

      Less More
    • February 9, 2016

      Rich Caruana

      Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.

      Less More
    • January 27, 2016

      Jayant Krishnamurthy

      Lexicon learning is the first step of training a semantic parser for a new application domain, and the quality of the learned lexicon significantly affects both the accuracy and efficiency of the final semantic parser. Existing work on lexicon learning has focused on heuristic methods that lack convergence guarantees and require significant human input in the form of lexicon templates or annotated logical forms. In contrast, the proposed probabilistic models are trained directly from question/answer pairs using EM and the simplest model has a concave objective function that guarantees that EM converges to a global optimum. An experimental evaluation on a data set of 4th grade science questions demonstrates that these models improve semantic parser accuracy (35-70% error reduction) and efficiency (4-25x more sentences per second) relative to prior work, despite using less human input. The models also obtain competitive results on Geoquery without any dataset-specific engineering.

      Less More
    • January 12, 2016

      Patrice Simard

      For many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcher’s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the “head” of ML.

      We are interested in another large class of ML problems where data is sparse. For contrast, we call it the “tail” of ML. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “patio on”, “enlarge paragraph spacing”, “make appointment with doctor when back from vacation”. Anyone who has attempted building such a system has soon discovered that there are far more ways to issue a command than they originally thought. Domain knowledge, data selection, and custom features are essential to get good generalization performance with small amounts of data. With the right tools, an ML expert can build such a classifier or annotator in a matter of hours. Unfortunately, the current cost of an ML expert (if one is available) is often more than the value produced by a single domain specific model. Getting good results on the tail is not cheap or easy.

      To address this problem, we change our focus from the learner to the teacher. We define Machine Teaching as improving the “teacher” productivity given the “learner”. The teacher is human. The learner is an ML algorithm. Ideally, our approach is “learner agnostic”. Focusing on improving the teacher does not preclude using the best ML algorithm or the best deep representation features and transfer learning. We view Machine Teaching and Machine Learning as orthogonal and complementary approaches. The Machine Teaching metrics are ML metrics divided by human costs, and Machine Teaching focuses on reducing the denominator. This perspective has led to many interesting insights and significant gains in ML productivity.

      Less More
    • December 10, 2015

      Chandra Bhagavatula

      In this talk, I will describe two systems designed to extract structured knowledge from unstructured and semi-structured data. First, I'll present an entity linking system for Web tables. Next, I'll talk about a key phrase extraction system that extracts a set of key concepts from a research article. Towards the end of the talk, I will briefly introduce an underlying common problem which connects these two seemingly distinct tasks. I will also present an approach, based on topic modeling, to solve this common underlying problem.

      Less More
    • November 3, 2015

      Hanie Sedghi

      Learning with big data is akin to finding a needle in a haystack: useful information is hidden in high dimensional data. Optimization methods, both convex and nonconvex, require new thinking when dealing with high dimensional data, and I present two novel solutions.

      Less More
    • September 14, 2015

      Doug Downey

      In this talk, I will introduce efficient methods for inferring large
topic hierarchies. The approach is built upon the Sparse Backoff Tree
(SBT), a new prior for latent topic distributions that organizes the
latent topics as leaves in a tree. I will show how a document model
based on SBTs can effectively infer accurate topic spaces of over a million topics.
Experiments demonstrate that scaling to large topic spaces results in
much more accurate models, and that SBT document models make use of
large topic spaces more effectively than flat LDA. Lastly, I will
 describe how the models power Atlasify, a prototype exploratory search engine.

      Less More
    • September 10, 2015

      Shalini Ghosh

      Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this talk, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate hierarchical contextual features (e.g., topics) into the model. The CLSTM models were implemented in the Google DistBelief framework.

      Less More
    • August 18, 2015

      Iftekhar Naim

      Today we encounter enormous amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, movies and shooting scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding text sentences. We address the problem of automatically aligning natural language sentences with corresponding video segments without direct human supervision. We first propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. Next, we propose a latent-variable discriminative alignment model, which outperforms the generative models by incorporating rich features. Our alignment algorithms are applied to align biological wetlab videos with text instructions and movie scenes with shooting scripts.

      Less More
    • July 30, 2015

      Matt Gardner

      A lot of attention has recently been given to the creation of large knowledge bases that contain millions of facts about people, things, and places in the world. In this talk I present methods for using these knowledge bases to generate features for machine learning models. These methods view the knowledge base as a graph which can be traversed to find potentially predictive information. I show how these methods can be applied to models of knowledge base completion, relation extraction, and question answering.

      Less More
    • July 10, 2015

      Christof Koch

      Human and non-human animals not only act in the world but are capable of conscious experience. That is, it feels like something to have a brain and be cold, angry or see red. I will discuss the scientific progress that has been achieved over the past decades in characterizing the behavioral and the neuronal correlates of consciousness, both based on clinical case studies as well as laboratory experiments. I will introduce the Integrated Information Theory (IIT) that explains in a principled manner which physical systems are capable of conscious, subjective experience. The theory explains many biological and medical facts about consciousness and its pathologies in humans, can be extrapolated to more difficult cases, such as fetuses, mice, or non-mammalian brains and has been used to assess the presence of consciousness in individual patients in the clinic. IIT also explains why consciousness evolved by natural selection. The theory predicts that feed-forward networks, such as deep convolutional networks, are not conscious even if they perform tasks that in humans would be associated with conscious experience. Furthermore, and in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if they were to run software faithfully simulating the human brain, would experience next to nothing. That is, while in the biological realm, intelligence and consciousness are intimately related, contemporary developments in AI dissolve that link, giving rise to intelligence without consciousness.

      Less More
    • April 21, 2015

      Karthik Raman

      In this talk I discuss the challenges of learning from data that results from human behavior. I will present new machine learning models and algorithms that explicitly account for the human decision making process and factors underlying it such as human expertise, skills and needs. The talk will also explore how we can look to optimize human interactions to build robust learning systems with provable performance guarantees. I will also present examples, from the domains of search, recommendation and educational analytics, where we have successfully deployed systems for cost-effectively learning with humans in the loop.

      Less More
    • April 7, 2015

      Erik T. Mueller

      To solve the AI problem, we need to develop systems that go beyond answering fact-based questions. Watson has been hugely successful at answering fact-based questions, but to solve hard AI tasks like passing science tests and understanding narratives, we need to go beyond simple facts. In this talk, I discuss how the systems I have most recently worked on have approached this problem. Watson for Healthcare answers Doctor's Dilemma medical competition questions, and WatsonPaths answers medical test preparation questions. These systems have achieved some success, but there is still a lot more to be done. Based on my experiences working on these systems, I discuss what I think the priorities should be going forward.

      Less More
    • April 7, 2015

      Dani Yogatama

      The majority of NLP research focuses on improving NLP systems by designing better model classes (e.g., non-linear models, latent variable models). In this talk, I will describe a complementary approach based on incorporation of linguistic bias and optimization of text representations that is applicable to several model classes. First, I will present a structured regularizer that is suitable for the problem when only some parts of an input are relevant to the prediction task (e.g., sentences in text, entities in scenes of images) and an efficient algorithm based on the alternating direction method of multipliers to solve the resulting optimization problem. I will then show how such regularizer can be used to incorporate linguistic structures into a text classification model. In the second part of the talk, I will present our first step towards building a black box NLP system that automatically chooses the best text representation for a given dataset by treating it as a global optimization problem. I will also briefly describe an improved algorithm that can generalize across multiple datasets for faster optimization. I will conclude by discussing how such a framework can be applied to other NLP problems.

      Less More
    • March 31, 2015

      In many real-world applications of AI and machine learning, such as natural language processing, computer vision and knowledge base construction, data sources possess a natural internal structure, which can be exploited to improve predictive accuracy. Sometimes the structure can be very large, containing many interdependent inputs and outputs. Learning from data with large internal structure poses many compelling challenges, one of which is that fully-labeled examples (required for supervised learning) are difficult to acquire. This is especially true in applications like image segmentation, annotating video data, and knowledge base construction.

      Less More
    • March 27, 2015

      Sonal Gupta

      Although most work in information extraction (IE) focuses on tasks that have abundant training data, in practice, many IE problems do not have any supervised training data. State-of-the-art supervised techniques like conditional random fields are impractical for such real world applications because: (1) they require large and expensive labeled corpora; (2) it is difficult to interpret them and analyze errors, an often-ignored but important feature; and (3) they are hard to calibrate, for example, to reliably extract only high-precision extractions.

      Less More
    • March 17, 2015

      Congle Zhang

      Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations (e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge.

      Less More
    • March 12, 2015

      Vicente Ordonez

      Recently, there has been great progress in both computer vision and natural language processing in representing and recognizing semantic units like objects, attributes, named entities, or constituents. These advances provide opportunities to create systems able to interpret and describe the visual world using natural language. This is in contrast to traditional computer vision systems, which typically output a set of disconnected labels, object locations, or annotations for every pixel in an image. The rich visually descriptive language produced by people incorporates world knowledge and human intuition that often can not be captured by other types of annotations. In this talk, I will present several approaches that explore the connections between language, perception, and vision at three levels: learning how to name objects, generating referring expressions for objects in natural scenes, and producing general image descriptions. These methods provide a framework to augment computer vision systems with linguistic information and to take advantage of the vast amount of text associated with images on the web. I will also discuss some of the intuitions from linguistics and perception behind these efforts and how they potentially connect to the larger goal of creating visual systems that can better learn from and communicate with people.

      Less More
    • March 11, 2015

      Joel Pfeiffer

      Networks provide an effective representation to model many real-world domains, with edges (e.g., friendships, citations, hyperlinks) representing relationships between items (e.g., individuals, papers, webpages). By understanding common network features, we can develop models of the distribution from which the network was likely sampled. These models can be incorporated into real world tasks, such as modeling partially observed networks for improving relational machine learning, performing hypothesis tests for anomaly detection, or simulating algorithms on large scale (or future) datasets. However, naively sampling networks does not scale to real-world domains; for example, drawing a single random network sample consisting of a billion users would take approximately a decade with modern hardware.

      Less More