Menu
Viewing 19 videos from 2015 See AI2’s full collection of videos on our YouTube channel.
    • December 10, 2015

      Chandra Bhagavatula

      In this talk, I will describe two systems designed to extract structured knowledge from unstructured and semi-structured data. First, I'll present an entity linking system for Web tables. Next, I'll talk about a key phrase extraction system that extracts a set of key concepts from a research article. Towards the end of the talk, I will briefly introduce an underlying common problem which connects these two seemingly distinct tasks. I will also present an approach, based on topic modeling, to solve this common underlying problem.

      Less More
    • November 3, 2015

      Hanie Sedghi

      Learning with big data is akin to finding a needle in a haystack: useful information is hidden in high dimensional data. Optimization methods, both convex and nonconvex, require new thinking when dealing with high dimensional data, and I present two novel solutions.

      Less More
    • September 14, 2015

      Doug Downey

      In this talk, I will introduce efficient methods for inferring large
topic hierarchies. The approach is built upon the Sparse Backoff Tree
(SBT), a new prior for latent topic distributions that organizes the
latent topics as leaves in a tree. I will show how a document model
based on SBTs can effectively infer accurate topic spaces of over a million topics.
Experiments demonstrate that scaling to large topic spaces results in
much more accurate models, and that SBT document models make use of
large topic spaces more effectively than flat LDA. Lastly, I will
 describe how the models power Atlasify, a prototype exploratory search engine.

      Less More
    • September 10, 2015

      Shalini Ghosh

      Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this talk, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate hierarchical contextual features (e.g., topics) into the model. The CLSTM models were implemented in the Google DistBelief framework.

      Less More
    • August 18, 2015

      Iftekhar Naim

      Today we encounter enormous amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, movies and shooting scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding text sentences. We address the problem of automatically aligning natural language sentences with corresponding video segments without direct human supervision. We first propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. Next, we propose a latent-variable discriminative alignment model, which outperforms the generative models by incorporating rich features. Our alignment algorithms are applied to align biological wetlab videos with text instructions and movie scenes with shooting scripts.

      Less More
    • July 30, 2015

      Matt Gardner

      A lot of attention has recently been given to the creation of large knowledge bases that contain millions of facts about people, things, and places in the world. In this talk I present methods for using these knowledge bases to generate features for machine learning models. These methods view the knowledge base as a graph which can be traversed to find potentially predictive information. I show how these methods can be applied to models of knowledge base completion, relation extraction, and question answering.

      Less More
    • July 10, 2015

      Christof Koch

      Human and non-human animals not only act in the world but are capable of conscious experience. That is, it feels like something to have a brain and be cold, angry or see red. I will discuss the scientific progress that has been achieved over the past decades in characterizing the behavioral and the neuronal correlates of consciousness, both based on clinical case studies as well as laboratory experiments. I will introduce the Integrated Information Theory (IIT) that explains in a principled manner which physical systems are capable of conscious, subjective experience. The theory explains many biological and medical facts about consciousness and its pathologies in humans, can be extrapolated to more difficult cases, such as fetuses, mice, or non-mammalian brains and has been used to assess the presence of consciousness in individual patients in the clinic. IIT also explains why consciousness evolved by natural selection. The theory predicts that feed-forward networks, such as deep convolutional networks, are not conscious even if they perform tasks that in humans would be associated with conscious experience. Furthermore, and in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if they were to run software faithfully simulating the human brain, would experience next to nothing. That is, while in the biological realm, intelligence and consciousness are intimately related, contemporary developments in AI dissolve that link, giving rise to intelligence without consciousness.

      Less More
    • April 21, 2015

      Karthik Raman

      In this talk I discuss the challenges of learning from data that results from human behavior. I will present new machine learning models and algorithms that explicitly account for the human decision making process and factors underlying it such as human expertise, skills and needs. The talk will also explore how we can look to optimize human interactions to build robust learning systems with provable performance guarantees. I will also present examples, from the domains of search, recommendation and educational analytics, where we have successfully deployed systems for cost-effectively learning with humans in the loop.

      Less More
    • April 7, 2015

      Erik T. Mueller

      To solve the AI problem, we need to develop systems that go beyond answering fact-based questions. Watson has been hugely successful at answering fact-based questions, but to solve hard AI tasks like passing science tests and understanding narratives, we need to go beyond simple facts. In this talk, I discuss how the systems I have most recently worked on have approached this problem. Watson for Healthcare answers Doctor's Dilemma medical competition questions, and WatsonPaths answers medical test preparation questions. These systems have achieved some success, but there is still a lot more to be done. Based on my experiences working on these systems, I discuss what I think the priorities should be going forward.

      Less More
    • April 7, 2015

      Dani Yogatama

      The majority of NLP research focuses on improving NLP systems by designing better model classes (e.g., non-linear models, latent variable models). In this talk, I will describe a complementary approach based on incorporation of linguistic bias and optimization of text representations that is applicable to several model classes. First, I will present a structured regularizer that is suitable for the problem when only some parts of an input are relevant to the prediction task (e.g., sentences in text, entities in scenes of images) and an efficient algorithm based on the alternating direction method of multipliers to solve the resulting optimization problem. I will then show how such regularizer can be used to incorporate linguistic structures into a text classification model. In the second part of the talk, I will present our first step towards building a black box NLP system that automatically chooses the best text representation for a given dataset by treating it as a global optimization problem. I will also briefly describe an improved algorithm that can generalize across multiple datasets for faster optimization. I will conclude by discussing how such a framework can be applied to other NLP problems.

      Less More
    • March 31, 2015

      In many real-world applications of AI and machine learning, such as natural language processing, computer vision and knowledge base construction, data sources possess a natural internal structure, which can be exploited to improve predictive accuracy. Sometimes the structure can be very large, containing many interdependent inputs and outputs. Learning from data with large internal structure poses many compelling challenges, one of which is that fully-labeled examples (required for supervised learning) are difficult to acquire. This is especially true in applications like image segmentation, annotating video data, and knowledge base construction.

      Less More
    • March 27, 2015

      Sonal Gupta

      Although most work in information extraction (IE) focuses on tasks that have abundant training data, in practice, many IE problems do not have any supervised training data. State-of-the-art supervised techniques like conditional random fields are impractical for such real world applications because: (1) they require large and expensive labeled corpora; (2) it is difficult to interpret them and analyze errors, an often-ignored but important feature; and (3) they are hard to calibrate, for example, to reliably extract only high-precision extractions.

      Less More
    • March 17, 2015

      Congle Zhang

      Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations (e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge.

      Less More
    • March 12, 2015

      Vicente Ordonez

      Recently, there has been great progress in both computer vision and natural language processing in representing and recognizing semantic units like objects, attributes, named entities, or constituents. These advances provide opportunities to create systems able to interpret and describe the visual world using natural language. This is in contrast to traditional computer vision systems, which typically output a set of disconnected labels, object locations, or annotations for every pixel in an image. The rich visually descriptive language produced by people incorporates world knowledge and human intuition that often can not be captured by other types of annotations. In this talk, I will present several approaches that explore the connections between language, perception, and vision at three levels: learning how to name objects, generating referring expressions for objects in natural scenes, and producing general image descriptions. These methods provide a framework to augment computer vision systems with linguistic information and to take advantage of the vast amount of text associated with images on the web. I will also discuss some of the intuitions from linguistics and perception behind these efforts and how they potentially connect to the larger goal of creating visual systems that can better learn from and communicate with people.

      Less More
    • March 11, 2015

      Joel Pfeiffer

      Networks provide an effective representation to model many real-world domains, with edges (e.g., friendships, citations, hyperlinks) representing relationships between items (e.g., individuals, papers, webpages). By understanding common network features, we can develop models of the distribution from which the network was likely sampled. These models can be incorporated into real world tasks, such as modeling partially observed networks for improving relational machine learning, performing hypothesis tests for anomaly detection, or simulating algorithms on large scale (or future) datasets. However, naively sampling networks does not scale to real-world domains; for example, drawing a single random network sample consisting of a billion users would take approximately a decade with modern hardware.

      Less More
    • March 3, 2015

      Ankur Parikh

      Being able to effectively model latent structure in data is a key challenge in modern AI research, particularly in Natural Language Processing (NLP) where it is crucial to discover and leverage syntactic and semantic relationships that may not be explicitly annotated in the training set. Unfortunately, while incorporating latent variables to represent hidden structure can substantially increase representation power, the key problems of model design and learning become significantly more complicated. For example, unlike fully observed models, latent variable models can suffer from non-identifiability, making it difficult to distinguish the desired latent structure from the others. Moreover, learning is usually formulated as a non-convex optimization problem, leading to the use of local search heuristics that may become trapped in local optima.

      Less More
    • February 26, 2015

      Ken Forbus

      Creating systems that can work with people, using natural modalities, as apprentices is a key step towards human-level AI. This talk will describe how my group is combining research on sketch understanding, natural language understanding, and analogical learning within the Companion cognitive architecture to create systems that can reason and learn about science by working with people. Some promising results will be described (e.g. solving conceptual physics problems involving sketches, modeling conceptual change, learning by reading) as well as work in progress (e.g. interactive knowledge capture via analogy).

      Less More
    • February 5, 2015

      Bhavana Dalvi

      Semi-supervised learning (SSL) has been widely used over a decade for various tasks -- including knowledge acquisition-- that lack large amount of training data. My research proposes a novel learning scenario in which the system knows a few categories in advance, but the rest of the categories are unanticipated and need to be discovered from the unlabeled data. With the availability of enormous unlabeled datasets at low cost, and difficulty of collecting labeled data for all possible categories, it becomes even more important to adapt traditional semi-supervised learning techniques to such realistic settings.

      Less More
    • January 7, 2015

      Been Kim

      I will present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the ``quintessential" observations that best represent clusters in a data set, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.

      Less More