Menu
Viewing 101-120 of 130 videos See AI2’s full collection of videos on our YouTube channel.
    • April 7, 2015

      Erik T. Mueller

      To solve the AI problem, we need to develop systems that go beyond answering fact-based questions. Watson has been hugely successful at answering fact-based questions, but to solve hard AI tasks like passing science tests and understanding narratives, we need to go beyond simple facts. In this talk, I discuss how the systems I have most recently worked on have approached this problem. Watson for Healthcare answers Doctor's Dilemma medical competition questions, and WatsonPaths answers medical test preparation questions. These systems have achieved some success, but there is still a lot more to be done. Based on my experiences working on these systems, I discuss what I think the priorities should be going forward.

      Less More
    • April 7, 2015

      Dani Yogatama

      The majority of NLP research focuses on improving NLP systems by designing better model classes (e.g., non-linear models, latent variable models). In this talk, I will describe a complementary approach based on incorporation of linguistic bias and optimization of text representations that is applicable to several model classes. First, I will present a structured regularizer that is suitable for the problem when only some parts of an input are relevant to the prediction task (e.g., sentences in text, entities in scenes of images) and an efficient algorithm based on the alternating direction method of multipliers to solve the resulting optimization problem. I will then show how such regularizer can be used to incorporate linguistic structures into a text classification model. In the second part of the talk, I will present our first step towards building a black box NLP system that automatically chooses the best text representation for a given dataset by treating it as a global optimization problem. I will also briefly describe an improved algorithm that can generalize across multiple datasets for faster optimization. I will conclude by discussing how such a framework can be applied to other NLP problems.

      Less More
    • March 31, 2015

      In many real-world applications of AI and machine learning, such as natural language processing, computer vision and knowledge base construction, data sources possess a natural internal structure, which can be exploited to improve predictive accuracy. Sometimes the structure can be very large, containing many interdependent inputs and outputs. Learning from data with large internal structure poses many compelling challenges, one of which is that fully-labeled examples (required for supervised learning) are difficult to acquire. This is especially true in applications like image segmentation, annotating video data, and knowledge base construction.

      Less More
    • March 27, 2015

      Sonal Gupta

      Although most work in information extraction (IE) focuses on tasks that have abundant training data, in practice, many IE problems do not have any supervised training data. State-of-the-art supervised techniques like conditional random fields are impractical for such real world applications because: (1) they require large and expensive labeled corpora; (2) it is difficult to interpret them and analyze errors, an often-ignored but important feature; and (3) they are hard to calibrate, for example, to reliably extract only high-precision extractions.

      Less More
    • March 17, 2015

      Congle Zhang

      Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations (e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge.

      Less More
    • March 12, 2015

      Vicente Ordonez

      Recently, there has been great progress in both computer vision and natural language processing in representing and recognizing semantic units like objects, attributes, named entities, or constituents. These advances provide opportunities to create systems able to interpret and describe the visual world using natural language. This is in contrast to traditional computer vision systems, which typically output a set of disconnected labels, object locations, or annotations for every pixel in an image. The rich visually descriptive language produced by people incorporates world knowledge and human intuition that often can not be captured by other types of annotations. In this talk, I will present several approaches that explore the connections between language, perception, and vision at three levels: learning how to name objects, generating referring expressions for objects in natural scenes, and producing general image descriptions. These methods provide a framework to augment computer vision systems with linguistic information and to take advantage of the vast amount of text associated with images on the web. I will also discuss some of the intuitions from linguistics and perception behind these efforts and how they potentially connect to the larger goal of creating visual systems that can better learn from and communicate with people.

      Less More
    • March 11, 2015

      Joel Pfeiffer

      Networks provide an effective representation to model many real-world domains, with edges (e.g., friendships, citations, hyperlinks) representing relationships between items (e.g., individuals, papers, webpages). By understanding common network features, we can develop models of the distribution from which the network was likely sampled. These models can be incorporated into real world tasks, such as modeling partially observed networks for improving relational machine learning, performing hypothesis tests for anomaly detection, or simulating algorithms on large scale (or future) datasets. However, naively sampling networks does not scale to real-world domains; for example, drawing a single random network sample consisting of a billion users would take approximately a decade with modern hardware.

      Less More
    • March 3, 2015

      Ankur Parikh

      Being able to effectively model latent structure in data is a key challenge in modern AI research, particularly in Natural Language Processing (NLP) where it is crucial to discover and leverage syntactic and semantic relationships that may not be explicitly annotated in the training set. Unfortunately, while incorporating latent variables to represent hidden structure can substantially increase representation power, the key problems of model design and learning become significantly more complicated. For example, unlike fully observed models, latent variable models can suffer from non-identifiability, making it difficult to distinguish the desired latent structure from the others. Moreover, learning is usually formulated as a non-convex optimization problem, leading to the use of local search heuristics that may become trapped in local optima.

      Less More
    • February 26, 2015

      Ken Forbus

      Creating systems that can work with people, using natural modalities, as apprentices is a key step towards human-level AI. This talk will describe how my group is combining research on sketch understanding, natural language understanding, and analogical learning within the Companion cognitive architecture to create systems that can reason and learn about science by working with people. Some promising results will be described (e.g. solving conceptual physics problems involving sketches, modeling conceptual change, learning by reading) as well as work in progress (e.g. interactive knowledge capture via analogy).

      Less More
    • February 5, 2015

      Bhavana Dalvi

      Semi-supervised learning (SSL) has been widely used over a decade for various tasks -- including knowledge acquisition-- that lack large amount of training data. My research proposes a novel learning scenario in which the system knows a few categories in advance, but the rest of the categories are unanticipated and need to be discovered from the unlabeled data. With the availability of enormous unlabeled datasets at low cost, and difficulty of collecting labeled data for all possible categories, it becomes even more important to adapt traditional semi-supervised learning techniques to such realistic settings.

      Less More
    • January 7, 2015

      Been Kim

      I will present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the ``quintessential" observations that best represent clusters in a data set, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.

      Less More
    • December 4, 2014

      Aria Haghigi

      I discuss three problems in applied natural language processing and machine learning: event discovery from distributed discourse, document content models for information extraction, and relevance engineering for a large-scale personalization engine. The first two are information extraction problems over social media which attempt to utilize richer structure and context for decision making; these sections reflect work from the tail end of my purely academic work. The relevance section will discuss work done while at my former startup Prismatic and will focus on issues arising from productionizing real-time machine learning. Along the way, I'll share my thoughts and experience around productizing research and interesting future directions.

      Less More
    • December 3, 2014

      Roozbeh Mottaghi

      Scene understanding is one of the holy grails of computer vision, and despite decades of research, it is still considered an unsolved problem. In this talk, I will present a number of methods, which help us take a step further towards the ultimate goal of holistic scene understanding. In particular, I will talk about our work on object detection, 3D pose estimation, and contextual reasoning, and show that modeling these tasks jointly enables better understanding of scenes. At the end of the talk, I will describe our recent work on providing richer descriptions for objects in terms of their viewpoint and sub-category information.

      Less More
    • November 10, 2014

      Alan Akbik

      The use of deep syntactic information such as typed dependencies has been shown to be very effective in Information Extraction (IE). Despite this potential, the process of manually creating rule-based information extractors that operate on dependency trees is not intuitive for persons without an extensive NLP background. In this talk, I present an approach and a graphical tool that allows even novice users to quickly and easily define extraction patterns over dependency trees and directly execute them on a very large text corpus. This enables users to explore a corpus for structured information of interest in a highly interactive and data-guided fashion, and allows them to create extractors for those semantic relations they find interesting. I then present a project in which we use Information Extraction to automatically construct a very large common sense knowledge base. This knowledge base - dubbed "The Weltmodell" - contains common sense facts that pertain to proper noun concepts; an example of this is the concept "coffee", for which we know that it is typically drunk by a person or brought by a waiter. I show how we mine such information from very large amounts of text, how we quantify notions such as typicality and similarity, and discuss some ideas how such world knowledge can be used to address reasoning tasks.

      Less More
    • November 4, 2014

      Raymond Mooney

      Traditional logical approaches to semantics and newer distributional or vector space approaches have complementary strengths and weaknesses.We have developed methods that integrate logical and distributional models by using a CCG-based parser to produce a detailed logical form for each sentence, and combining the result with soft inference rules derived from distributional semantics that connect the meanings of their component words and phrases. For recognizing textual entailment (RTE) we use Markov Logic Networks (MLNs) to combine these representations, and for Semantic Textual Similarity (STS) we use Probabilistic Soft Logic (PSL). We present experimental results on standard benchmark datasets for these problems and emphasize the advantages of combining logical structure of sentences with statistical knowledge mined from large corpora.

      Less More
    • October 1, 2014

      Chris Callison-Burch

      I will present my method for learning paraphrases - pairs of English expressions with equivalent meaning - from bilingual parallel corpora, which are more commonly used to train statistical machine translation systems. My method equates pairs of English phrases like --thrown into jail, imprisoned-- when they share an aligned foreign phrase like festgenommen. Because bitexts are large and because a phrase can be aligned many different foreign phrases including phrases in multiple foreign languages, the method extracts a diverse set of paraphrases. For thrown into jail, we not only learn imprisoned, but also arrested, detained, incarcerated, jailed, locked up, taken into custody, and thrown into prison, along with a set of incorrect/noisy paraphrases. I'll show a number of methods for filtering out the poor paraphrases, by defining a paraphrase probability calculated from translation model probabilities, and by re-ranking the candidate paraphrases using monolingual distributional similarity measures.

      Less More
    • August 5, 2014

      Jonathan Berant

      Machine reading calls for programs that read and understand text, but most current work only attempts to extract facts from redundant web-scale corpora. In this talk, I will focus on a new reading comprehension task that requires complex reasoning over a single document. The input is a paragraph describing a biological process, and the goal is to answer questions that require an understanding of the relations between entities and events in the process. To answer the questions, we first predict a rich structure representing the process in the paragraph. Then, we map the question to a formal query, which is executed against the predicted structure. We demonstrate that answering questions via predicted structures substantially improves accuracy over baselines that use shallower representations.

      Less More
    • July 25, 2014

      Pedro Domingos

      Building very large commonsense knowledge bases and reasoning with them is a long-standing dream of AI. Today that knowledge is available in text; all we have to do is extract it. Text, however, is extremely messy, noisy, ambiguous, incomplete, and variable. A formal representation of it needs to be both probabilistic and relational, either of which leads to intractable inference and therefore poor scalability. In the first part of this talk I will describe tractable Markov logic, a language that is restricted enough to be tractable yet expressive enough to represent much of the commonsense knowledge contained in text. Even then, transforming text into a formal representation of its meaning remains a difficult problem. There is no agreement on what the representation primitives should be, and labeled data in the form of sentence-meaning pairs for training a semantic parser is very hard to come by. In the second part of the talk I will propose a solution to both these problems, based on concepts from symmetry group theory. A symmetry of a sentence is a syntactic transformation that does not change its meaning. Learning a semantic parser for a language is discovering its symmetry group, and the meaning of a sentence is its orbit under the group (i.e., the set of all sentences it can be mapped to by composing symmetries). Preliminary experiments indicate that tractable Markov logic and symmetry-based semantic parsing can be powerful tools for scalably extracting knowledge from text.

      Less More
    • June 4, 2014

      Paul Allen

      Paul Allen discusses his vision for the future of AI and AI2 in this fireside chat moderated by Gary Marcus of New York University at the 10th Anniversary Symposium - Allen Institute for Brain Science. AI2-related discussion begins at 17:30.

      Less More
    • May 13, 2014

      Bart Selman

      In recent years, there has been tremendous progress in solving large-scale reasoning and optimization problems. Central to this progress has been the ability to automatically uncover hidden problem structure. Nevertheless, for the very hardest computational tasks, human ingenuity still appears indispensable. We show that automated reasoning strategies and human insights can effectively complement each other, leading to hybrid human-computer solution strategies that outperform other methods by orders of magnitude. We illustrate our approach with challenges in scientific discovery in the areas of finite mathematics and materials science.

      Less More