• May 18, 2018

    Hany Hassan

    Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this talk, we first describe our recent advances in Nerul Machine translation that led to SOTA results on news translation. We then address the problem of how to define and accurately measure human parity in translation. We will discuss our system achieving human performance and discuss limitations as well as future directions of current NMT systems. Less

  • May 8, 2018

    Saining Xie

    With the support of big-data and big-compute, deep learning has reshaped the landscape of research and applications in artificial intelligence. Whilst traditional hand-guided feature engineering in many cases is simplified, the deep network architectures become increasingly more complex. A central question is if we can distill the minimal set of structural priors that can provide us the maximal flexibility and lead us to richer sets of structural primitives that potentially lay the foundations towards the ultimate goal of building general intelligent systems. In this talk I will introduce my Ph.D. work along the aforementioned direction. I will show how we can tackle different real world problems, with carefully designed architectures, guided by simple yet effective structural priors. In particular, I will focus on two structural priors that have proven to be useful in many different scenarios: the multi-scale prior and the sparse-connectivity prior. will also show examples of learning structural priors from data, instead of hard-wiring them. Less

  • April 20, 2018

    Kyle Richardson

    In this talk, I will give an overview of research being done at the University of Stuttgart on semantic parser induction and natural language understanding. The main topic, semantic parser induction, relates to the problem of learning to map input text to full meaning representations from parallel datasets. Such resulting “semantic parsers” are often a core component in various downstream natural language understanding applications, including automated question-answering and generation systems. We look at learning within several novel domains and datasets being developed in Stuttgart (e.g., software documentation for text-to-code translation) and under various types of data supervision (e.g., learning from entailment, "polyglot" modeling, or learning from multiple datasets). Less

  • April 10, 2018

    Jesse Dodge

    Driven by the need for parallelizable hyperparameter optimization methods, we study open loop search methods: sequences that are predetermined and can be generated before a single configuration is evaluated. Examples include grid search, uniform random search, low discrepancy sequences, and other sampling distributions. In particular, we propose the use of k-determinantal point processes in hyperparameter optimization via random search. Compared to conventional uniform random search where hyperparameter settings are sampled independently, a k-DPP promotes diversity. We describe an approach that transforms hyperparameter search spaces for efficient use with a k-DPP. In addition, we introduce a novel Metropolis-Hastings algorithm which can sample from k-DPPs defined over any space from which uniform samples can be drawn, including spaces with a mixture of discrete and continuous dimensions or tree structure. Our experiments show significant benefits when tuning hyperparameters to neural models for text classification, with a limited budget for training supervised learners, whether in serial or parallel. Less

  • April 2, 2018

    Rama Vedantam

    Understanding how to model vision and language jointly is a long-standing challenge in artificial intelligence. Vision is one of the primary sensors we use to perceive the world, while language is our data structure to represent and communicate knowledge. In this talk, we will take up three lines of attack to this problem: interpretation, grounding, and imagination. In interpretation, the goal will be to get machine learning models to understand an image and describe its contents using natural language in a contextually relevant manner. In grounding, we will connect natural language to referents in the physical world, and show how this can help learn common sense. Finally, we will study how to ‘imagine’ visual concepts completely and accurately across the full range and (potentially unseen) compositions of their visual attributes. We will study these problems from computational as well as algorithmic perspectives and suggest exciting directions for future work. Less

  • March 30, 2018

    Keisuke Sakaguchi

    Robustness has always been a desirable property for natural language processing. In many cases, NLP models (e.g., parsing) and downstream applications (e.g., MT) perform poorly when the input contains noise such as spelling errors, grammatical errors, and disfluency. In this talk, I will present three recent results on error correction models: character, word, and sentence level respectively. For character level, I propose semi-character recurrent neural network, which is motivated by a finding in Psycholinguistics, called Cmabrigde Uinervtisy (Cambridge University) effect. For word-level robustness, I propose an error-repair dependency parsing algorithm for ungrammatical texts. The algorithm can parse sentences and correct grammatical errors simultaneously. Finally, I propose a neural encoder-decoder model with reinforcement learning for sentence-level error correction. To avoid exposure bias in standard encoder-decoders, the model directly optimizes towards a metric for grammatical error correction performance. Less

  • March 28, 2018

    Arun Chaganty

    A significant challenge in developing systems for tasks such as knowledge base population, text summarization or question answering is simply evaluating their performance: existing fully-automatic evaluation techniques rely on an incomplete set of “gold” annotations that can not adequately cover the range of possible outputs of such systems and lead to systematic biases against many genuinely useful system improvements. In this talk, I’ll present our work on how we can eliminate this bias by incorporating on-demand human feedback without incurring the full cost of human evaluation. Our key technical innovation is the design of good statistical estimators that are able to tradeoff cost for variance reduction. We hope that our work will enable the development of better NLP systems by making unbiased natural language evaluation practical and easy to use. Less

  • March 26, 2018

    Chenyan Xiong

    Search engines and other information systems have started to evolve from retrieving documents to providing more intelligent information access. However, the evolution is still in its infancy due to computers’ limited ability in representing and understanding human language. This talk will present my work addressing these challenges with knowledge graphs. The first part is about utilizing entities from knowledge graphs to improve search. I will discuss how we build better text representations with entities and how the entity-based text representations improve text retrieval. The second part is about better text understanding through modeling entity salience (importance), as well as how the improved text understanding helps search under both feature-based and neural ranking settings. This talk concludes with future directions towards the next generation of intelligent information systems. Less

  • March 7, 2018

    Yonatan Belinkov

    Language technology has become pervasive in everyday life, powering applications like Apple’s Siri or Google’s Assistant. Neural networks are a key component in these systems thanks to their ability to model large amounts of data. Contrary to traditional systems, models based on deep neural networks (a.k.a. deep learning) can be trained in an end-to-end fashion on input-output pairs, such as a sentence in one language and its translation in another language, or a speech utterance and its transcription. The end-to-end training paradigm simplifies the engineering process while giving the model flexibility to optimize for the desired task. This, however, often comes at the expense of model interpretability: understanding the role of different parts of the deep neural network is difficult, and such models are often perceived as “black-box”. In this work, I study deep learning models for two core language technology tasks: machine translation and speech recognition. I advocate an approach that attempts to decode the information encoded in such models while they are being trained. I perform a range of experiments comparing different modules, layers, and representations in the end-to-end models. The analyses illuminate the inner workings of end-to-end machine translation and speech recognition systems, explain how they capture different language properties, and suggest potential directions for improving them. The methodology is also applicable to other tasks in the language domain and beyond. Less

  • March 2, 2018

    Peter Jansen

    Modern question answering systems are able to provide answers to a set of common natural language questions, but their ability to answer complex questions, or provide compelling explanations or justifications for why their answers are correct is still quite limited. These limitations are major barriers in high-impact domains like science and medicine, where the cost of making errors is high, and user trust is paramount. In this talk I'll discuss our recent work in developing systems that can build explanations to answer questions by aggregating information from multiple sources (sometimes called multi-hop inference). Aggregating information is challenging, particularly as the amount of information becomes large due to "semantic drift", or the tendency for inference algorithms to quickly move off-topic when assembling long chains of knowledge. Motivated by our earlier efforts in attempting to latently learn information aggregation for explanation generation (which is currently limited to short inference chains), I will discuss our current efforts to build a large corpus of detailed explanations expressed as lexically-connected explanation graphs to serve as training data for the multi-hop inference task. We will discuss characterizing what's in a science exam explanation, difficulties and methods for large-scale construction of detailed explanation graphs, and the possibility of automatically extracting common explanatory patterns from corpora such as this to support building large explanations (i.e. six or more aggregated facts) for unseen questions through merging, adapting, and adding to known explanatory patterns. Less

  • February 27, 2018

    Rob Speer and Catherine Havasi

    We are the developers of ConceptNet, a long-running knowledge representation project that originated from crowdsourcing. We demonstrate systems that we’ve made by adding the common knowledge in ConceptNet to current techniques in distributional semantics. This produces word embeddings that are state-of-the-art at semantic similarity in multiple languages, analogies that perform like a moderately-educated human on the SATs, the ability to find relevant distinctions between similar words, and the ability to propose new knowledge-graph edges and “sanity check” them against existing knowledge. Less

  • February 26, 2018

    Luheng He

    Semantic role labeling (SRL) systems aim to recover the predicate-argument structure of a sentence, to determine “who did what to whom”, “when”, and “where”. In this talk, I will describe my recent SRL work showing that relatively simple and general purpose neural architectures can lead to significant performance gains, including a over 40% error reduction over long-standing pre-neural performance levels. These approaches are relatively simple because they process the text in an end-to-end manner, without relying on the typical NLP pipeline (e.g. POS-tagging or syntactic parsing). They are general purpose because, with only slight modifications, they can be used to learn state-of-the-art models for related semantics problems. The final architecture I will present, which we call Labeled Span Graph Networks (LSGNs), opens up exciting opportunities to build a single, unified model for end-to-end, document-level semantic analysis. Less

  • February 13, 2018

    Oren Etzioni

    Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, gave the keynote address at the winter meeting of the Government-University-Industry Research Roundtable (GUIRR) on "Artificial Intelligence and Machine Learning to Accelerate Translational Research".

  • February 12, 2018

    Richard Zhang

    We explore the use of deep networks for image synthesis, both as a graphics goal and as an effective method for representation learning. We propose BicycleGAN, a general system for image-to-image translation problems, with the specific aim of capturing the multimodal nature of the output space. We study image colorization in greater detail and develop automatic and user-guided approaches. Moreover, colorization, as well as cross-channel prediction in general, is a simple but powerful pretext task for self-supervised feature learning. Not only does the network solve the direct graphics task, it also learns to capture patterns in the visual world, even without the benefit of human-curated labels. We demonstrate strong transfer to high-level semantic tasks, such as image classification, and to low-level human perceptual judgments. For the latter, we collect a large-scale dataset of human similarity judgments and find that our method outperforms traditional metrics such as PSNR and SSIM. We also discover that many unsupervised and self-supervised methods transfer strongly, even comparable to fully-supervised methods. Less

  • January 17, 2018

    Alexander Rush

    Early successes in deep generative models of images have demonstrated the potential of using latent representations to disentangle structural elements. These techniques have, so far, been less useful for learning representations of discrete objects such as sentences. In this talk I will discuss two works on learning different types of latent structure: Structured Attention Networks, a model for learning a soft-latent approximation of the discrete structures such as segmentations, parse trees, and chained decisions; and Adversarially Regularized Autoencoders, a new GAN-based autoencoder for learning continuous representations of sentences with applications to textual style transfer. I will end by discussing an empirical analysis of some issues that make latent structure discovery of text difficult. Less

  • November 21, 2017

    Danqi Chen

    Enabling a computer to understand a document so that it can answer comprehension questions is a central, yet unsolved, goal of NLP. This task of reading comprehension (i.e., question answering over a passage of text) has received a resurgence of interest, due to the creation of large-scale datasets and well-designed neural network models. Less

  • November 20, 2017

    Jacob Walker

    Understanding the temporal dimension of images is a fundamental part of computer vision. Humans are able to interpret how the entities in an image will change over time. However, it has only been relatively recently that researchers have focused on visual forecasting—getting machines to anticipate events in the visual world before they actually happen. This aspect of vision has many practical implications in tasks ranging from human-computer interaction to anomaly detection. In addition, temporal prediction can serve as a task for representation learning, useful for various other recognition problems. Less

  • November 17, 2017

    Sun Kim

    PubMed is a biomedical literature search engine, hosting more than 27 million bibliographic records. With the abundance and diversity of information in PubMed, many queries retrieve thousands of documents, making it difficult for users to identify the information relevant to their topic of interest. Unlike more general domains, the language of biomedicine uses abundant technical jargon to describe scientific discoveries and applications. To understand the semantics of biomedical text, it is important to identify not only the meanings of individual words, but also of multi-word phrases appearing in text. Controlled vocabularies may help, but the rapid growth of PubMed makes it hard to keep up with the new information. Less

  • November 7, 2017

    Mohammad Rasooli

    Transfer methods have been shown to be effective alternatives for developing accurate natural language processing systems in the absence of annotated data in the target language of interest. They are divided into two approaches: 1) annotation projection from translation data using supervised models in resource-rich languages; and 2) direct transfer from resource-rich annotated datasets. In this talk, we review our past work on improving over both of the approaches by applying scalable machine learning methods. We empirically show how our approach is practical on different natural language processing tasks including dependency parsing, semantic role labeling and sentiment analysis of the Twitter text. For our ongoing and future work, we propose to use a holistic approach to model cross-lingual recurrent representations for many languages and tasks. Less

  • November 6, 2017

    Gary Marcus

    All purpose, all-powerful AI systems, capable of catering to our every intellectual need, have been promised for six decades, but thus far still not arrived. What will it take to bring AI to something like human-level intelligence? And why haven't we gotten there already? Scientist, author, and entrepreneur Gary Marcus (Founder and CEO of Geometric Intelligence, recently acquired by Uber) explains why deep learning is overrated, and what we need to do next to achieve genuine artificial intelligence. Less

  • October 30, 2017

    Arman Cohan

    The rapid growth of scientific literature has created a challenge for researchers to remain current with new developments. Existence of surveys summarizing the latest state of the field shows that such information is desirable, yet obtaining such summaries requires painstaking manual efforts. Scientific document summarization aims at addressing this problem by providing a compact representation of new findings and contributions of the published literature. First, I will present methods for improving text summarization of scientific literature by utilizing citations as an alternative to abstracts. In particular, I will talk about how we can address the problem of potential citation inaccuracy by providing context from the reference to the citations. Utilizing these contexts along with the scientific discourse structure, I will present an effective extractive summarization method for capturing various contributions of the target paper. In addition to the rapid growth of biomedical scientific literature, there is an increasing demand for using health-related text, including clinical notes, patient reports, and social media. I will discuss current challenges in health-care which include medical errors and mental-health. As an attempt to address some of these challenges, I will show how we can make qualitative comparison of errors in clinical care through medical narratives. Further, I will focus on mental-health and discuss our proposed approaches to perform depression and self-harm risk assessment utilizing social media data. Less

  • October 16, 2017

    Chuang Gan

    The increasing ubiquity of devices capable of capturing videos has led to an explosion in the amount of recorded video content. Instead of “eyeballing” the videos for potentially useful information, it has therefore been a pressing need to develop automatic video analysis and understanding algorithms for various applications. However, understanding videos on a large scale remains challenging: large variations and complexities, time-consuming annotations, and a wide range of involved video concepts. In light of these challenges, my research towards video understanding focuses on designing effective network architectures to learn robust video representations, learning video concepts from weak supervision and building a stronger connection between language and vision. In this talk, I will first introduce a Deep Event Network (DevNet) that can simultaneously detect pre-defined events and localize spatial-temporal key evidence. Then I will show how web crawled videos and images could be utilized for learning video concepts. Finally, I will present our recent efforts to connect visual understanding to language through attractive visual captioning and visual question segmentation. Less

  • October 4, 2017

    Oren Etzioni

    Does Artificial Intelligence (AI) research result in threats to society, or will it yield beneficial technology? The talk will address these issues by describing the projects and perspective at the Allen Institute for AI (AI2) in Seattle. AI2's mission is "AI for the Common Good," as exemplified by Semantic Scholar, a search engine that utilizes AI to overcome information overload in scientific search. Less

  • September 15, 2017

    Horacio Saggion

    In the current online Open Science context, scientific data-sets and tools for deep text analysis, visualization and exploitation play a major role. I will present a system developed over the past three years for “deep” analysis and annotation of scientific text collections. After a brief overview of the system and its main components, I will present our current work on the development of a bi-lingual (Spanish and English) fully annotated text resource in the field of natural language processing that we have created with our system. Moreover, a faceted-search and visualization system to explore the created resource will be also discussed. Less

  • August 16, 2017

    Leo Boytsov

    We explore alternatives to classic term-based retrieval. The ultimate objective is to develop a smarter candidate generation component for question answering (QA) and information retrieval (IR), which can employ similarities that are more expressive than the commonly used TF-IDF ranking function. Achieving this objective requires solving two subproblems: designing simple yet effective similarity functions and developing efficient solutions for k-NN search. Less

  • August 9, 2017

    Gabi Stanovsky

    Propositions are statements for which a truth value can be assigned (e.g., “Bob loves Mary”). Since they constitute the primary unit of information conveyed in texts, proposition extraction is often used in NLP algorithms such as question answering, summarization, or recognizing textual entailment. I will begin the talk with an overview of my research, which revolves around the different aspects of proposition extraction: from formalizing requirements and evaluation metrics, through annotation and crowdsourcing techniques, to modeling and automatic prediction. I will then describe two concrete research efforts which exemplify these aspects, while making use of the recent QA-SRL paradigm. Less

  • Moving Beyond the Turing Test with the Allen AI Science Challenge
    July 25, 2017

    Oren Etzioni

    This video discusses the paper: Moving Beyond the Turing Test with the Allen AI Science Challenge. The field of Artificial Intelligence has made great strides forward recently, for example AlphaGo's recent victory against the world champion Lee Sedol in the game of Go, leading to great optimism about the field. But are we really moving towards smarter machines, or are these successes restricted to certain classes of problems, leaving other challenges untouched? In 2016, the Allen Institute for Artificial Intelligence (AI2) ran the Allen AI Science Challenge, a competition to test machines on an ostensibly difficult task, namely answering 8th Grade science questions. Our motivations were to encourage the field to set its sights broader and higher by exploring a problem that appears to require modeling, reasoning, language understanding, and commonsense knowledge, to probe the state of the art on this task, and sow the seeds for possible future breakthroughs. The challenge received a strong response, with 780 teams from all over the world participating. What were the results? This article describes the competition and the interesting outcomes of the challenge. Less

  • June 22, 2017

    Arvind Neelakantan

    Knowledge representation and reasoning is one of the central challenges of artificial intelligence, and has important implications in many fields including natural language understanding and robotics. Representing knowledge with symbols, and reasoning via search and logic has been the dominant paradigm for many decades. In this work, we use deep neural networks to learn to both represent symbols and perform reasoning end-to-end from data. By learning powerful non-linear models, our approach generalizes to massive amounts of knowledge and works well with messy real-world data using minimal human effort. First, we show that recurrent neural networks with an attention mechanism achieve state-of-the-art reasoning on a large structured knowledge graph. Next, we develop Neural Programmer, a neural network augmented with discrete operations that can be learned to induce latent programs with backpropagation. We apply Neural Programmer to induce short programs on a natural language question answering dataset that requires reasoning on semi-structured Wikipedia tables. We present what is to our awareness the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. Unlike previous learning approaches to program induction, the model does not require domain-specific grammars, rules, or annotations. Less

  • June 13, 2017

    Oren Etzioni

    As computer automations is upon us and many jobs will change or be replaced by AIs, AI optimist Oren Etzioni, CEO, Allen Institute for AI, describes the social impacts we must consider as he paints a possible euphonic future state in which jobs will be more creative and fulfilling. About XPRIZE: XPRIZE is an educational (501c3) nonprofit organization whose mission is to bring about radical breakthroughs for the benefit of humanity, thereby inspiring the formation of new industries and the revitalization of markets that are currently stuck due to existing failures or a commonly held belief that a solution is not possible. XPRIZE addresses the world's Grand Challenges by creating and managing large-scale, high-profile, incentivized prize competitions that stimulate investment in research and development worth far more than the prize itself. It motivates and inspires brilliant innovators from all disciplines to leverage their intellectual and financial capital. Less

  • May 22, 2017

    Abhinav Gupta

    In 2013, we proposed NEIL (Never Ending Image Learner), a computer program to learn visual models and commonsense knowledge from the web. In its first version, NEIL ran for 2.5 years learning 8K concepts, labeling 4.5M images and learning 20K common-sense facts. But it also helped us discover the shortcomings of the current paradigm of learning and reasoning with knowledge. In this talk, I am going to describe our subsequent efforts to overcome these drawbacks. On the learning side, I will talk about how we scale up learning visual models to rare and compositional categories (“wet possum”). Note the web-search data for compositional categories are noisy and cannot be used “as is” for learning. The core problem in compositional categories is respecting contextuality. The meaning of primitive categories change based on concepts being composed with (red in red wine is different from red in red car). I will talk about how we can respect contextuality while composing categories. On the reasoning side, I will talk about how we can incorporate the learned knowledge graphs in end-to-end learning. Specifically, we will show how these “noisy” knowledge graphs can not only improve classification performance but also provide “explainability” which is crucial for AI systems. I will also show some of our recent work on using knowledge graphs for zero-shot learning (again in an end-to-end manner). Less

  • May 19, 2017

    Scott Yih

    Building a question answering system to automatically answer natural-language questions is a long-standing research problem. While traditionally unstructured text collections are the main information source for answering questions, the development of large-scale knowledge bases provides new opportunities for open-domain factoid question answering. In this talk, I will present our recent work on semantic parsing, which maps natural language questions to structured queries that can be executed on a graph knowledge base to answer the questions. Our approach defines a query graph that resembles subgraphs of the knowledge base and can be directly mapped to a logical form. With this design, semantic parsing is reduced to query graph generation, formulated as a staged search problem. Compared to existing methods, our solution is conceptually simple and yet outperforms previous state-of-the-art results substantially. Less

  • May 9, 2017

    Luheng He

    Semantic role labeling (SRL) systems aim to recover the predicate-argument structure of a sentence, to determine essentially “who did what to whom”, “when”, and “where”. We introduce a new deep learning model for SRL that significantly improves the state of the art, along with detailed analyses to reveal its strengths and limitations. We use a deep highway BiLSTM architecture with constrained decoding, while observing a number of recent best practices for initialization and regularization. Our 8-layer ensemble model achieves 83.2 F1 on the CoNLL 2005 test set and 83.4 F1 on CoNLL 2012, roughly a 10% relative error reduction over the previous state of the art. Extensive empirical analysis of these gains show that (1) deep models excel at recovering long-distance dependencies but can still make surprisingly obvious errors, and (2) that there is still room for syntactic parsers to improve these results. These findings suggest directions for future improvements on SRL performance. Less

  • May 8, 2017

    Derry Wijaya

    One of the ways we can formulate natural language understanding is by treating it as a task of mapping natural language text to its meaning representation: entities and relations anchored to the world. Since verbs express relations over their arguments and adjuncts, a lexical resource about verbs can facilitate natural language understanding by mapping verbs to relations over entities expressed by their arguments and adjuncts in the world. In my thesis work, I semi-automatically construct a large scale verb resource called VerbKB that contains some of these mappings for natural language understanding. A verb lexical unit in VerbKB consists of a verb lexeme or a verb lexeme and a preposition e.g., “live”, “live in”, which is typed with a pair of NELL knowledge base semantic categories that indicates its subject type and its object type e.g., “live in”(person, location). In this talk, I will present the algorithms behind VerbKB that will complement existing resources of verbs such as WordNet and VerbNet and existing knowledge bases about entities such as NELL. VerbKB contains two types of mappings: (1) the mappings from verb lexical units to binary relations in knowledge bases (e.g., the mapping from the verb lexical unit “die at”(person, nonNegInteger) to the binary relation personDiedAtAge) and (2) the mappings from verb lexical units to changes in binary relations in knowledge bases (e.g., the mapping from the verb lexical unit “divorce”(person, person) to the termination of the relation hasSpouse). I will present algorithms for these two mappings and how we extend VerbKB to cover relations beyond existing relations in NELL knowledge base. In the spirit of building multilingual lexical resources for NLP, I will also briefly discuss my recent work in building lexical translations for high-resource and low-resource languages from monolingual or comparable corpora. Less

  • May 2, 2017

    Mark Yatskar

    In this talk, we examine the role of language in enabling grounded intelligence. We consider two applications where language can be used as a scaffold for (a) allowing for the quick acquisition of large scale common sense knowledge, and (b) enabling broad coverage recognition of events in images. We present some of the technical challenges with using language based representations for grounding, such as sparsity, and finally present some social challenges, such as amplified gender bias in models trained on language grounding datasets. Less

  • April 19, 2017

    Mohit Iyyer

    Creative language—the sort found in novels, film, and comics—contains a wide range of linguistic phenomena, from phrasal and sentential syntactic complexity to high-level discourse structures such as narrative and character arcs. In this talk, I explore how we can use deep learning to understand, generate, and answer questions about creative language. I begin by presenting deep neural network models for two tasks involving creative language understanding: 1) modeling dynamic relationships between fictional characters in novels, for which our models achieve higher interpretability and accuracy than existing work; and 2) predicting dialogue and artwork from comic book panels, in which we demonstrate that even state-of-the-art deep models struggle on problems that require commonsense reasoning. Next, I introduce deep models that outperform all but the best human players on quiz bowl, a trivia game that contains many questions about creative language. Shifting to ongoing work, I describe a neural language generation method that disentangles the content of a novel (i.e., the information or story it conveys) from the style in which it is written. Finally, I conclude by integrating my work on deep learning, creative language, and question answering into a future research plan to build conversational agents that are both engaging and useful. Less

  • April 18, 2017

    Marti Hearst

    AI2 researchers are making groundbreaking advances in machine interpretation of scientific and educational text and images. In our current research, we are interested in improving educational technology, especially automated and semi-automated guidance systems. In past work, we have been successful in leveraging existing metadata and ontologies to produce highly usable search interfaces, and so in one very new line of work, we are investigating if we can automatically create good practice questions from a preexisting biology ontology. In the first half of this talk, I will describe this very new work, as well as some as yet unexplored goals for future work in this space. AI2 researchers are also producing the world’s best citation search system. In the second half of this talk I will describe some prior NLP and HCI work on analyzing bioscience citation text which might be of interest to the Semantic Scholar team as well as the NLP teams. Less

  • February 20, 2017

    He He Xiy

    The future of virtual assistants, self-driving cars, and smart homes require intelligent agents that work intimately with users. Instead of passively following orders given by users, an interactive agent must actively collaborate with people through communication, coordination, and user-adaptation. In this talk, I will present our recent work towards building agents that interact with humans. First, we propose a symmetric collaborative dialogue setting in which two agents, each with some private knowledge, must communicate in natural language to achieve a common goal. We present a human-human dialogue dataset that poses new challenges to existing models, and propose a neural model with dynamic knowledge graph embedding. Second, we study the user-adaptation problem in quizbowl - a competitive, incremental question-answering game. We show that explicitly modeling of different human behavior leads to more effective policies that exploits sub-optimal players. I will conclude by discussing opportunities and open questions in learning interactive agents. Less

  • February 16, 2017

    Christopher Lin

    Research in artificial intelligence and machine learning (ML) has exploded in the last decade, bringing humanity to the cusp of self-driving cars, digital personal assistants, and unbeatable game-playing robots. My research, which spans the areas of AI, ML, Crowdsourcing, and Natural Language Processing (NLP), focuses on an area where machines are still significantly inferior to humans, despite their super-human intelligence in so many other facets of life: the intelligent management of machine learning (iML), or the ability to reason about what they don’t know so that they may independently and efficiently close gaps in knowledge. iML encompasses many important questions surrounding the ML pipeline, including, but not limited to: 1) How can an agent optimally obtain high-quality labels? 2) How can an agent that is trying to learn a new concept sift through all the unlabeled examples that exist in the world to identify exemplary subsets that would make good training and test sets? An agent must be able to identify examples that are positive for that concept. Learning is extremely expensive, if not impossible, if one cannot find representative examples. 3) Given a fixed budget, should an agent try to obtain a large but noisy training set, or a small but clean one? How can an agent achieve more cost-effective learning by carefully considering this tradeoff? In this talk, I will go into depth on the third question. I will first discuss properties of learning problems that affect this tradeoff. Then I will introduce re-active learning, a generalization of active learning that allows for the relabeling of existing examples, and show why traditional active learning algorithms don't work well for re-active learning. Finally, I will introduce new algorithms for re-active learning and show that they perform well on several domains. Less

  • February 13, 2017

    Wenpeng Yin

    Wenpeng's talk mainly covers his work developing state-of-the-art deep neural networks to learn representations for different granularity of language units including single words, phrases, sentences, documents and knowledge graphs (KG). Specifically, he tries to deal with these questions: (a) So many pre-trained word embeddings, is there an upper bound? What is the cheapest way to get higher-quality word embeddings? -- More training data? More advanced algorithm/objective function? (b) How to learn representations for phrases which appear continuous as well as discontinuous? How to derive representations for phrases of arbitrary lengths? (c) How to learn sentence representations in supervised, in unsupervised or in context constraints? (d) Given a question, how to distill the document so that its representation is specific to the question? (e) In knowledge graphs such as Freebase, how to model the paths of arbitrary lengths to solve some knowledge graph reasoning problems. These research problems are evaluated on word/phrase similarity, paraphrase identification, question answering, KG reasoning tasks etc. Less

  • January 25, 2017

    Hal Daume

    Machine learning-based natural language processing systems are amazingly effective, when plentiful labeled training data exists for the task/domain of interest. Unfortunately, for broad coverage (both in task and domain) language understanding, we're unlikely to ever have sufficient labeled data, and systems must find some other way to learn. I'll describe a novel algorithm for learning from interactions, and several problems of interest, most notably machine simultaneous interpretation (translation while someone is still speaking). This is all joint work with some amazing (former) students He He, Alvin Grissom II, John Morgan, Mohit Iyyer, Sudha Rao and Leonardo Claudino, as well as colleagues Jordan Boyd-Graber, Kai-Wei Chang, John Langford, Akshay Krishnamurthy, Alekh Agarwal, Stéphane Ross, Alina Beygelzimer and Paul Mineiro. Less

  • January 18, 2017

    Zhou Yu

    Communication is an intricate dance, an ensemble of coordinated individual actions. Imagine a future where machines interact with us like humans, waking us up in the morning, navigating us to work, or discussing our daily schedules in a coordinated and natural manner. Current interactive systems being developed by Apple, Google, Microsoft, and Amazon attempt to reach this goal by combining a large set of single-task systems. But products like Siri, Google Now, Cortana and Echo still follow pre-specified agendas that cannot transition between tasks smoothly and track and adapt to different users naturally. My research draws on recent developments in speech and natural language processing, human-computer interaction, and machine learning to work towards the goal of developing situated intelligent interactive systems. These systems can coordinate with users to achieve effective and natural interactions. I have successfully applied the proposed concepts to various tasks, such as social conversation, job interview training and movie promotion. My team's proposal on engaging social conversation systems was selected to receive $100,000 from Amazon Inc. to compete in the Amazon Alexa Prize Challenge. Less

  • November 19, 2016

    Oren Etzioni

    Artificial Intelligence advocate Oren Etzioni makes a case for the life-saving benefits of AI used wisely to improve our way of life. Acknowledging growing fears about AI’s potential for abuse of power, he asks us to consider how to responsibly balance our desire for greater intelligence and autonomy with the risks inherent in this new and growing technology. Less

  • November 8, 2016

    Manohar Pulari

    Over the past 5 years the community has made significant strides in the field of Computer Vision. Thanks to large scale datasets, specialized computing in form of GPUs and many breakthroughs in modeling better convnet architectures Computer Vision systems in the wild at scale are becoming a reality. At Facebook AI Research we want to embark on the journey of making breakthroughs in the field of AI and using them for the benefit of connecting people and helping remove barriers for communication. In that regard Computer Vision plays a significant role as the media content coming to Facebook is ever increasing and building models that understand this content is crucial in achieving our mission of connecting everyone. In this talk I will gloss over how we think about problems related to Computer Vision at Facebook and touch various aspects related to supervised, semi-supervised, unsupervised learning. I will jump between various research efforts involving representation learning. I will also highlight some large scale applications and talk about limitations of current systems and how we are planning to tackle them. Less

  • October 18, 2016

    Kun Xu

    As very large structured knowledge bases have become available, answering natural language questions over structured knowledge facts has attracted increasing research efforts. We tackle this task in a pipeline paradigm, that is, recognizing users’ query intention and mapping the involved semantic items against a given knowledge base (KB). we propose an efficient pipeline framework to model a user’s query intention as a phrase level dependency DAG which is then instantiated regarding a specific KB to construct the final structured query. Our model benefits from the efficiency of structured prediction models and the separation of KB-independent and KB-related modelings. The most challenging problem in the structure instantiation is to ground the relational phrases to KB predicates which essentially can be treated as a relation classification (RE) task. To learn a robust and generalized representation of the relation, we propose a multi-channel convolutional neural network which works on the shortest dependency path. Furthermore, we introduce a negative sampling strategy to learn the assignment of subjects and objects of a relation. Less

  • October 18, 2016

    Jacob Andreas

    Language understanding depends on two abilities: an ability to translate between natural language utterances and abstract representations of meaning, and an ability to relate these meaning representations to the world. In the natural language processing literature, these tasks are respectively known as "semantic parsing" and "grounding", and have been treated as essentially independent problems. In this talk, I will present two modular neural architectures for jointly learning to ground language in the world and reason about it compositionally. I will first describe a technique that uses syntactic information to dynamically construct neural networks from composable primitives. The resulting structures, called "neural module networks", can be used to achieve state-of-the-art results on a variety of grounded question answering tasks. Next, I will present a model for contextual referring expression generation, in which contrastive behavior results from a combination of learned semantics and inference-driven pragmatics. This model is again backed by modular neural components---in this case elementary listener and speaker representations. It is able to successfully complete a challenging referring expression generation task, exhibiting pragmatic behavior without ever observing such behavior at training time. Less

  • September 29, 2016

    Karthik Narasimhan

    In this talk, I will describe two approaches to learning natural language semantics using reward-based feedback. This is in contrast to many NLP approaches that rely on large amounts of supervision, which is often expensive and difficult to obtain. First, I will describe a framework utilizing reinforcement learning to improve information extraction (IE). Our approach identifies alternative sources of information by querying the web, extracting from new sources, and reconciling the extracted values until sufficient evidence is collected. Our experiments on two datasets -- shooting incidents and food adulteration cases -- demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline. Second, I will talk about learning control policies for text-based games where an agent needs to understand natural language to operate effectively in a virtual environment. We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback, capturing semantics of the game states in the process. Less

  • September 26, 2016

    Shobeir Fakhraei

    Our world is becoming increasingly connected, and so is the data collected from it. To represent, reason about, and model the real-world data, it is essential to develop computational models capable of representing the underlying network structures and their characteristics. Domains such as scholarly networks, biology, online social networks, the World Wide Web and information networks, and recommender systems are just a few examples that include explicit or implicit network structures. I have studied and developed computational models for representing and reasoning about rich, heterogeneous, and interlinked data that span over feature-based and embedding-based approaches to statistical relational methods that more explicitly model dependencies between interconnected entities. In this talk, I will discuss different methods of modeling node classification and link inference on networks in several domains, and highlight two important aspects: (1) Heterogeneous entities and multi-relational structures, (2) joint inference and collective classification of the unlabeled data. I will also introduce our model for link inference that serves as a template to encode a variety of information such as structural, biological, social, contextual interactions in different domains. Less

  • September 19, 2016

    Anna Rohrbach

    In recent years many challenging problems have emerged in the field of language and vision. Frequently the only form of available annotation is the natural language sentence associated with an image or video. How can we address complex tasks like automatic video description or visual grounding of textual phrases with these weak and noisy annotations? In my talk I will first present our pioneering work on automatic movie description. We collected a large scale dataset and proposed an approach to learn visual semantic concepts from weak sentence annotations. I will then talk about our approach to grounding arbitrary language phrases in images. It is able to operate in un- and semi-supervised settings (with respect to the localization annotations) by learning to reconstruct the input phrase. Less

  • September 13, 2016

    Ajay Nagesh

    Information Extraction has become an indispensable tool in our quest to handle the data deluge of the information age. In this talk, we discuss the categorization of complex relational features and outline methods to learn feature combinations through induction. We demonstrate the efficacy of induction techniques in learning rules for the identification of named entities in text – the novelty being the application of induction techniques to learn in a very expressive declarative rule language. Next, we discuss our investigations in the paradigm of distant supervision, which facilitates the creation of large albeit noisy training data. We devise an inference framework in which constraints can be easily specified in learning relation extractors. We reformulate the learning objective in a max-margin framework. To the best of our knowledge, our formulation is the first to optimize multi-variate non-linear performance measures such as F1 for a latent variable structure prediction task. Towards the end, we will briefly touch upon some recent exploratory work to leverage matrix completion methods and novel embedding techniques for predicting a richer fine-grained set of entity types to help in downstream applications such as Relation Extraction and Question Answering. Less

  • September 7, 2016

    Siva Reddy

    I will present three semantic parsing approaches for querying Freebase in natural language 1) training only on raw web corpus, 2) training on question-answer (QA) pairs, and 3) training on both QA pairs and web corpus. For 1 and 2, we conceptualise semantic parsing as a graph matching problem, where natural language graphs built using CCG/dependency logical forms are transduced to Freebase graphs. For 3, I will present a natural-logic approach for Semantic Parsing. Our methods achieve state-of-the-art on WebQuestions and Free917 QA datasets. Less

  • August 23, 2016

    Matthew Peters

    Distributed representations of words, phrases and sentences are central to recent advances in machine translation, language modeling, semantic similarity, and other tasks. In this talk, I'll explore ways to learn similar representations of search queries, web pages and web sites. The first portion of the talk describes a method to learn a keyword-web page similarity function applicable to web search. It represents the web page as a set of attributes (URL, title, meta description tag, etc) and uses a separate LSTM encoder for each attribute. The network is trained end-to-end from clickthrough logs. The second half of the talk introduces a measure of authority for each web page and jointly learns keyword-keyword, keyword-site and keyword-site-authority relationships. The multitask network leverages a shared representation for keywords and sites and learns a fine grained topic authority (for example is an authority on the topic "Bernie Sanders" but not on "Seattle Mariners"). Less

  • August 22, 2016

    Jay Pujara

    Automated question answering, knowledgeable digital assistants, and grappling with the massive data flooding the Web all depend on structured knowledge. Precise knowledge graphs capturing the many, complex relationships between entities are the missing piece for many problems, but knowledge graph construction is notoriously difficult. In this talk, I will chronicle common failures from the first generation of information extraction systems and show how combining statistical NLP signals and semantic constraints addresses these problems. My method, Knowledge Graph Identification (KGI), exploits the key lessons of the statistical relational learning community and uses them for better knowledge graph construction. Probabilistic models are often discounted due to scalability concerns, but KGI translates the problem into a tractable convex objective that is amenable to parallelization. Furthermore, the inferences from KGI have provable optimality and can be updated efficiently using approximate techniques that have bounded regret. I demonstrate state-of-the-art performance of my approach on knowledge graph construction and entity resolution tasks on NELL and Freebase, and discuss exciting new directions for KG construction. Less

  • August 1, 2016

    Dan Garrette

    Learning NLP models from weak forms of supervision has become increasingly important as the field moves toward applications in new languages and domains. Much of the existing work in this area has focused on designing learning approaches that are able to make use of small amounts of human-generated data. In this talk, I will present work on a complementary form of inductive bias: universal, cross-lingual principles of how grammars function. I will develop these intuitions with a series of increasingly complex models based in the Combinatory Categorial Grammar (CCG) formalism: first, a supertagging model that biases towards associative adjacent-category relationships; second, a parsing model that biases toward simpler grammatical analyses; and finally, a novel parsing model, with accompanying learning procedure, that is able to exploit both of these biases by parameterizing the relationships between each constituent label and its supertag context to find trees with a better global coherence. We model grammar with CCG because the structured, logic-backed nature of CCG categories and the use of a small universal set of constituent combination rules are ideally suited to encoding as priors, and we train our models within a Bayesian setting that combines these prior beliefs about how natural languages function with the empirical statistics gleaned from large amounts of raw text. Experiments with each of these models show that when training from only partial type-level supervision and a corpus of unannotated text, employing these universal properties as soft constraints yields empirically better models. Additional gains are obtained by further shaping the priors with corpus-specific information that is estimated automatically from the tag dictionary and raw text. Less

  • July 18, 2016

    Claudio Delli Bovi

    The Open Information Extraction (OIE) paradigm has received much attention in the NLP community over the last decade. Since the earliest days, most OIE approaches have been focusing on Web-scale corpora, which raises issues such as massive amounts of noise. Also, OIE systems can be very different in nature and develop their own type inventories, with no portable ontological structure. This talk steps back and explores both issues by presenting two substantially different approaches to the task: in the first we shift the target of a full-fledged OIE pipeline to a relatively small, dense corpus of definitional knowledge; in the second we try to make sense of different OIE outputs by merging them into a single, unified and fully disambiguated knowledge repository. Less

  • July 15, 2016

    Yuxin Chen

    Sequential information gathering, i.e., selectively acquiring the most useful data, plays a key role in interactive machine learning systems. Such problem has been studied in the context of Bayesian active learning and experimental design, decision making, optimal control and numerous other domains. In this talk, we focus on a class of information gathering tasks, where the goal is to learn the value of some unknown target variable through a sequence of informative, possibly noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly. We propose a class of novel, computationally efficient active learning algorithms, and prove strong theoretical guarantees that hold with correlated, possibly noisy tests. Rather than myopically optimize the value of a test (which, in our case, is the expected reduction in prediction error), at each step, our algorithms pick the test that maximizes the gain in a surrogate objective, which is adaptive submodular. This property enables us to utilize an efficient greedy optimization while providing strong approximation guarantees. We demonstrate our algorithms in several real-world problem instances, including a touch-based location task on an actual robotic platform, and an active preference learning task via pairwise comparisons. Less

  • June 21, 2016

    Katrin Erk

    As the field of Natural Language Processing develops, more ambitious semantic tasks are being addressed, such as Question Answering (QA) and Recognizing Textual Entailment (RTE). Solving these tasks requires (ideally) an in-depth representation of sentence structure as well as expressive and flexible representations at the word level. We have been exploring a combination of logical form with distributional as well as resource-based information at the word level, using Markov Logic Networks (MLNs) to perform probabilistic inference over the resulting representations. In this talk, I will focus on the three main components of a system we have developed for the task of Textual Entailment: (1) Logical representation for processing in MLNs, (2) lexical entailment rule construction by integrating distributional information with existing resources, and (3) probabilistic inference, the problem of solving the resulting MLN inference problems efficiently. I will also comment on how I think the ideas from this system can be adapted to Question Answering and the more general task of in-depth single-document understanding. Less

  • June 20, 2016

    Marcus Rohrbach

    Language is the most important channel for humans to communicate about what they see. To allow an intelligent system to effectively communicate with humans it is thus important to enable it to relate information in words and sentences with the visual world. For this a system should be compositional, so it is e.g. not surprised when it encounters a novel object and can still talk about it. It should be able to explain in natural language, why it recognized a given object in an image as certain class, to allow a human to trust and understand it. However, it should not only be able to generate natural language, but also understand it, and locate sentences and linguistic references in the visual world. In my talk, I will discuss how we approach these different fronts by looking at the tasks of language generation about images, visual grounding, and visual question answering. I will conclude with a discussion of the challenges ahead. Less

  • June 13, 2016

    Megasthenis Asteris

    Principal component analysis (PCA) is one of the most popular tools for identifying structure and extracting interpretable information from datasets. In this talk, I will discuss constrained variants of PCA such as Sparse or Nonnegative PCA that are computationally harder, but offer higher data interpretability. I will describe a framework for solving quadratic optimization problems --such as PCA-- under sparsity or other combinatorial constraints. Our method can surprisingly solve such problems exactly when the involved quadratic form matrix is positive semidefinite and low rank. Of course, real datasets are not low-rank, but they can frequently be well approximated by low-rank matrices. For several datasets, we obtain excellent empirical performance and provable upper bounds that guarantee that our objective is close to the unknown optimum. Less

  • June 13, 2016

    Niket Tandon

    There is a growing conviction that the future of computing will crucially depend on our ability to exploit Big Data on the Web to produce significantly more intelligent and knowledgeable systems. This includes encyclopedic knowledge (for factual knowledge) and commonsense knowledge (for more advanced human-like reasoning). The focus of this talk is automatic acquisition of commonsense knowledge using the Web. We require the computers to understand the environment (e.g. the properties of the objects in the environment), the relations between these objects (e.g. handle is part of a bike or that bike is slower than a car), and, the semantics of their interaction (e.g. a man and a woman meet for a dinner in a restaurant in the evening). This talk presents techniques for gathering such commonsense from textual and visual data from the Web. Less

  • May 23, 2016

    Oren Etzioni

    Oren Etzioni, CEO of the Allen Institute for AI, shares his vision for deploying AI technologies for the common good.

  • May 17, 2016

    Yi Yang

    With the resurgence of neural networks, low-dimensional dense features have been used in a wide range of natural language processing problems. Specifically, tasks like part-of-speech tagging, dependency parsing and entity linking have been shown to benefit from dense feature representations from both efficiency and effectiveness aspects. In this talk, I will present algorithms for unsupervised domain adaptation, where we train low-dimensional feature embeddings with instances from both source and target domains. I will also talk about how to extend the approach to unsupervised multi-domain adaptation by leveraging metadata domain attributes. I will then introduce a tree-based structured learning model for entity linking, where the model employs a few statistical dense features to jointly detect mentions and disambiguate entities. Finally, I will discuss some promising directions for future research. Less

  • May 9, 2016

    Aditya Khosla

    When glancing at a magazine or browsing the Internet, we are continuously exposed to photographs and images. While some images stick in our minds, others are ignored or quickly forgotten. Artists, advertisers and educators are routinely challenged by the question "what makes a picture memorable?" and must then create an image that speaks to the observer. In this talk, I will show how deep learning algorithms can predict with near-human consistency which images people will remember or forget - and how we can modify images automatically to make them more or less memorable. Less

  • May 3, 2016

    Saurabh Gupta

    In this talk, I will talk about detailed scene understanding from RGB-D images. We approach this problem by studying central computer vision problems like bottom-up grouping, object detection, instance segmentation, pose estimation in context of RGB-D images, and finally aligning CAD models to objects in the scene. This results in a detailed output which goes beyond what most current computer vision algorithms produce, and is useful for real world applications like perceptual robotics, and augmented reality. A central question in this work is how to learn good features for depth images in view of the fact that labeled RGB-D datasets are much smaller than labeled RGB datasets (such as ImageNet) typically used for feature learning. To this end I will describe our technique called "cross-modal distillation" which allows us to leverage easily available annotations on RGB images to learn representations on depth images. In addition, I will also briefly talk about some work on vision and language that I did on an internship at Microsoft Research. Less

  • April 26, 2016

    Galen Andrews (University of Washington)

    The successes of deep learning in the past decade on difficult tasks ranging from image processing to speech recognition to game playing is strong evidence for the utility of abstract representations of complex natural sensory data. In this talk I will present the deep canonical correlation analysis (DCCA) model to learn deep representation mappings of each of two data views (e.g., from two different sensory modalities) such that the learned representations are maximally predictive of each other in the sense of correlation. Comparisons with linear CCA and kernel CCA demonstrate that DCCA is capable of finding far more highly correlated nonlinear representations than standard methods. Experiments also demonstrate the utility of the representation mappings learned by DCCA in the scenario where one of the data views is unavailable at test time. Less

  • April 12, 2016

    Percy Liang

    Can we learn if we start with zero examples, either labeled or unlabeled? This scenario arises in new user-facing systems (such as virtual assistants for new domains), where inputs should come from users, but no users exist until we have a working system, which depends on having training data. I will discuss recent work that circumvent this circular dependence by interleaving user interaction and learning. Less

  • April 6, 2016

    Ronan Le Bras

    Most problems, from theoretical problems in combinatorics to real-world applications, comprise hidden structural properties not directly captured by the problem definition. A key to the recent progress in automated reasoning and combinatorial optimization has been to automatically uncover and exploit this hidden problem structure, resulting in a dramatic increase in the scale and complexity of the problems within our reach. The most complex tasks, however, still require human abilities and ingenuity. In this talk, I will show how we can leverage human insights to effectively complement and dramatically boost state-of-the-art optimization techniques. I will demonstrate the effectiveness of the approach with a series of scientific discoveries, from experimental designs to materials discovery. Less

  • April 4, 2016

    Jeffrey Heer

    How might we architect interactive systems that have better models of the tasks we're trying to perform, learn over time, help refine ambiguous user intents, and scale to large or repetitive workloads? In this talk I will present Predictive Interaction, a framework for interactive systems that shifts some of the burden of specification from users to algorithms, while preserving human guidance and expressive power. The central idea is to imbue software with domain-specific models of user tasks, which in turn power predictive methods to suggest a variety of possible actions. I will illustrate these concepts with examples drawn from widely-deployed systems for data transformation and visualization (with reported order-of-magnitude productivity gains) and then discuss associated design considerations and future research directions. Less

  • March 25, 2016

    Ashish Vaswani

    Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing. Less

  • March 9, 2016

    Manaal Faruqui

    Unsupervised learning of word representations have proven to provide exceptionally effective features in many NLP tasks. Traditionally, construction of word representations relies on the distributional hypothesis, which posits that the meaning of words is evidenced by the contextual words they occur with (Harris, 1954). Although distributional context is fairly good at capturing word meaning, in this talk I'll show that going beyond the distributional hypothesis---by exploiting additional sources of word meaning information---improves the quality of word representations. First, I'll show how semantic lexicons, like WordNet, can be used to obtain better word vector representations. Second, I'll describe a novel graph-based learning framework that uses morphological information to construct large scale morpho-syntactic lexicons. I'll conclude with additional approaches that can be taken to improve word representations. Less

  • March 3, 2016

    Ali Farhadi

    Ali Farhadi discusses the history of computer vision and AI.

  • March 2, 2016

    Ashish Sabharwal

    Artificial intelligence and machine learning communities have made tremendous strides in the last decade. Yet, the best systems to date still struggle with routine tests of human intelligence, such as standardized science exams posed as-is in natural language, even at the elementary-school level. Can we demonstrate human-like intelligence by building systems that can pass such tests? Unlike typical factoid-style question answering (QA) tasks, these tests challenge a student’s ability to combine multiple facts in various ways, and appeal to broad common-sense and science knowledge. Going beyond arguably shallow information retrieval (IR) and statistical correlation techniques, we view science QA from the lens of combinatorial optimization over a semi-formal knowledge base derived from text. Our structured inference system, formulated as an Integer Linear Program (ILP), turns out to be not only highly complementary to IR methods, but also more robust to question perturbation, as well as substantially more scalable and accurate than prior attempts using probabilistic first-order logic and Markov Logic Networks (MLNs). This talk will discuss fundamental challenges behind the science QA task, the progress we have made, and many challenges that lie ahead. Less

  • Strategies and Principles for Distributed Machine Learning
    February 16, 2016

    Eric Xing

    The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required — and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can indeed benefit greatly from ML-rooted statistical and algorithmic insights — and that ML researchers should therefore not shy away from such systems design — we discuss a series of principles and strategies distilled from our recent effort on industrial-scale ML solutions that involve a continuum from application, to engineering, and to theoretical research and development of Big ML system and architecture, on how to make them efficient, general, and with convergence and scaling guarantees. Less

  • Intelligible Machine Learning Models for HealthCare
    February 9, 2016

    Rich Caruana

    Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing. Less

  • Probabilistic Models for Learning a Semantic Parser Lexicon
    January 27, 2016

    Jayant Krishnamurthy

    Lexicon learning is the first step of training a semantic parser for a new application domain, and the quality of the learned lexicon significantly affects both the accuracy and efficiency of the final semantic parser. Existing work on lexicon learning has focused on heuristic methods that lack convergence guarantees and require significant human input in the form of lexicon templates or annotated logical forms. In contrast, the proposed probabilistic models are trained directly from question/answer pairs using EM and the simplest model has a concave objective function that guarantees that EM converges to a global optimum. An experimental evaluation on a data set of 4th grade science questions demonstrates that these models improve semantic parser accuracy (35-70% error reduction) and efficiency (4-25x more sentences per second) relative to prior work, despite using less human input. The models also obtain competitive results on Geoquery without any dataset-specific engineering. Less

  • Machine Teaching
    January 12, 2016

    Patrice Simard

    For many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcher’s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the “head” of ML.

    We are interested in another large class of ML problems where data is sparse. For contrast, we call it the “tail” of ML. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “patio on”, “enlarge paragraph spacing”, “make appointment with doctor when back from vacation”. Anyone who has attempted building such a system has soon discovered that there are far more ways to issue a command than they originally thought. Domain knowledge, data selection, and custom features are essential to get good generalization performance with small amounts of data. With the right tools, an ML expert can build such a classifier or annotator in a matter of hours. Unfortunately, the current cost of an ML expert (if one is available) is often more than the value produced by a single domain specific model. Getting good results on the tail is not cheap or easy.

    To address this problem, we change our focus from the learner to the teacher. We define Machine Teaching as improving the “teacher” productivity given the “learner”. The teacher is human. The learner is an ML algorithm. Ideally, our approach is “learner agnostic”. Focusing on improving the teacher does not preclude using the best ML algorithm or the best deep representation features and transfer learning. We view Machine Teaching and Machine Learning as orthogonal and complementary approaches. The Machine Teaching metrics are ML metrics divided by human costs, and Machine Teaching focuses on reducing the denominator. This perspective has led to many interesting insights and significant gains in ML productivity. Less

  • Adding Structure to Unstructured and Semi-structured Data
    December 10, 2015

    Chandra Bhagavatula

    In this talk, I will describe two systems designed to extract structured knowledge from unstructured and semi-structured data. First, I'll present an entity linking system for Web tables. Next, I'll talk about a key phrase extraction system that extracts a set of key concepts from a research article. Towards the end of the talk, I will briefly introduce an underlying common problem which connects these two seemingly distinct tasks. I will also present an approach, based on topic modeling, to solve this common underlying problem. Less

  • Provable Guarantees for Non-convex and Convex Optimization in High Dimensions
    November 3, 2015

    Hanie Sedghi

    Learning with big data is akin to finding a needle in a haystack: useful information is hidden in high dimensional data. Optimization methods, both convex and nonconvex, require new thinking when dealing with high dimensional data, and I present two novel solutions.

  • Large Topic Models: Efficient Inference and Applications
    September 14, 2015

    Doug Downey

    In this talk, I will introduce efficient methods for inferring large
topic hierarchies. The approach is built upon the Sparse Backoff Tree
(SBT), a new prior for latent topic distributions that organizes the
latent topics as leaves in a tree. I will show how a document model
based on SBTs can effectively infer accurate topic spaces of over a million topics.
Experiments demonstrate that scaling to large topic spaces results in
much more accurate models, and that SBT document models make use of
large topic spaces more effectively than flat LDA. Lastly, I will
 describe how the models power Atlasify, a prototype exploratory search engine. Less

  • Contextual LSTMs A step towards Hierarchial Language Modeling
    September 10, 2015

    Shalini Ghosh

    Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this talk, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate hierarchical contextual features (e.g., topics) into the model. The CLSTM models were implemented in the Google DistBelief framework. Less

  • Unsupervised Alignment of Natural Language with Video
    August 18, 2015

    Iftekhar Naim

    Today we encounter enormous amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, movies and shooting scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding text sentences. We address the problem of automatically aligning natural language sentences with corresponding video segments without direct human supervision. We first propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. Next, we propose a latent-variable discriminative alignment model, which outperforms the generative models by incorporating rich features. Our alignment algorithms are applied to align biological wetlab videos with text instructions and movie scenes with shooting scripts. Less

  • Feature Generation from Knowledge Graphs
    July 30, 2015

    Matt Gardner

    A lot of attention has recently been given to the creation of large knowledge bases that contain millions of facts about people, things, and places in the world. In this talk I present methods for using these knowledge bases to generate features for machine learning models. These methods view the knowledge base as a graph which can be traversed to find potentially predictive information. I show how these methods can be applied to models of knowledge base completion, relation extraction, and question answering. Less

  • Consciousness in Biological and Artificial Brains
    July 10, 2015

    Christof Koch

    Human and non-human animals not only act in the world but are capable of conscious experience. That is, it feels like something to have a brain and be cold, angry or see red. I will discuss the scientific progress that has been achieved over the past decades in characterizing the behavioral and the neuronal correlates of consciousness, both based on clinical case studies as well as laboratory experiments. I will introduce the Integrated Information Theory (IIT) that explains in a principled manner which physical systems are capable of conscious, subjective experience. The theory explains many biological and medical facts about consciousness and its pathologies in humans, can be extrapolated to more difficult cases, such as fetuses, mice, or non-mammalian brains and has been used to assess the presence of consciousness in individual patients in the clinic. IIT also explains why consciousness evolved by natural selection. The theory predicts that feed-forward networks, such as deep convolutional networks, are not conscious even if they perform tasks that in humans would be associated with conscious experience. Furthermore, and in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if they were to run software faithfully simulating the human brain, would experience next to nothing. That is, while in the biological realm, intelligence and consciousness are intimately related, contemporary developments in AI dissolve that link, giving rise to intelligence without consciousness. Less

  • Machine Learning with Humans In-the-Loop
    April 21, 2015

    Karthik Raman

    In this talk I discuss the challenges of learning from data that results from human behavior. I will present new machine learning models and algorithms that explicitly account for the human decision making process and factors underlying it such as human expertise, skills and needs. The talk will also explore how we can look to optimize human interactions to build robust learning systems with provable performance guarantees. I will also present examples, from the domains of search, recommendation and educational analytics, where we have successfully deployed systems for cost-effectively learning with humans in the loop. Less

  • Going Beyond Fact-Based Question Answering
    April 7, 2015

    Erik T. Mueller

    To solve the AI problem, we need to develop systems that go beyond answering fact-based questions. Watson has been hugely successful at answering fact-based questions, but to solve hard AI tasks like passing science tests and understanding narratives, we need to go beyond simple facts. In this talk, I discuss how the systems I have most recently worked on have approached this problem. Watson for Healthcare answers Doctor's Dilemma medical competition questions, and WatsonPaths answers medical test preparation questions. These systems have achieved some success, but there is still a lot more to be done. Based on my experiences working on these systems, I discuss what I think the priorities should be going forward. Less

  • Bring Your Own Model: Model-Agnostic Improvements in NLP
    April 7, 2015

    Dani Yogatama

    The majority of NLP research focuses on improving NLP systems by designing better model classes (e.g., non-linear models, latent variable models). In this talk, I will describe a complementary approach based on incorporation of linguistic bias and optimization of text representations that is applicable to several model classes. First, I will present a structured regularizer that is suitable for the problem when only some parts of an input are relevant to the prediction task (e.g., sentences in text, entities in scenes of images) and an efficient algorithm based on the alternating direction method of multipliers to solve the resulting optimization problem. I will then show how such regularizer can be used to incorporate linguistic structures into a text classification model. In the second part of the talk, I will present our first step towards building a black box NLP system that automatically chooses the best text representation for a given dataset by treating it as a global optimization problem. I will also briefly describe an improved algorithm that can generalize across multiple datasets for faster optimization. I will conclude by discussing how such a framework can be applied to other NLP problems. Less

  • Learning from Large, Structured Examples
    March 31, 2015
    Ben London

    In many real-world applications of AI and machine learning, such as natural language processing, computer vision and knowledge base construction, data sources possess a natural internal structure, which can be exploited to improve predictive accuracy. Sometimes the structure can be very large, containing many interdependent inputs and outputs. Learning from data with large internal structure poses many compelling challenges, one of which is that fully-labeled examples (required for supervised learning) are difficult to acquire. This is especially true in applications like image segmentation, annotating video data, and knowledge base construction. Less

  • Distantly Supervised Information Extraction Using Bootstrapped Patterns
    March 27, 2015

    Sonal Gupta

    Although most work in information extraction (IE) focuses on tasks that have abundant training data, in practice, many IE problems do not have any supervised training data. State-of-the-art supervised techniques like conditional random fields are impractical for such real world applications because: (1) they require large and expensive labeled corpora; (2) it is difficult to interpret them and analyze errors, an often-ignored but important feature; and (3) they are hard to calibrate, for example, to reliably extract only high-precision extractions. Less

  • Exploiting Parallel News Streams for Relation Extraction
    March 17, 2015

    Congle Zhang

    Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations (e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge. Less

  • Language and Perceptual Categorization in Computer Vision
    March 12, 2015

    Vicente Ordonez

    Recently, there has been great progress in both computer vision and natural language processing in representing and recognizing semantic units like objects, attributes, named entities, or constituents. These advances provide opportunities to create systems able to interpret and describe the visual world using natural language. This is in contrast to traditional computer vision systems, which typically output a set of disconnected labels, object locations, or annotations for every pixel in an image. The rich visually descriptive language produced by people incorporates world knowledge and human intuition that often can not be captured by other types of annotations. In this talk, I will present several approaches that explore the connections between language, perception, and vision at three levels: learning how to name objects, generating referring expressions for objects in natural scenes, and producing general image descriptions. These methods provide a framework to augment computer vision systems with linguistic information and to take advantage of the vast amount of text associated with images on the web. I will also discuss some of the intuitions from linguistics and perception behind these efforts and how they potentially connect to the larger goal of creating visual systems that can better learn from and communicate with people. Less

  • Learning and Sampling Scalable Graph Models
    March 11, 2015

    Joel Pfeiffer

    Networks provide an effective representation to model many real-world domains, with edges (e.g., friendships, citations, hyperlinks) representing relationships between items (e.g., individuals, papers, webpages). By understanding common network features, we can develop models of the distribution from which the network was likely sampled. These models can be incorporated into real world tasks, such as modeling partially observed networks for improving relational machine learning, performing hypothesis tests for anomaly detection, or simulating algorithms on large scale (or future) datasets. However, naively sampling networks does not scale to real-world domains; for example, drawing a single random network sample consisting of a billion users would take approximately a decade with modern hardware. Less

  • Spectral Probabilistic Modeling and Applications to Natural Language Processing
    March 3, 2015

    Ankur Parikh

    Being able to effectively model latent structure in data is a key challenge in modern AI research, particularly in Natural Language Processing (NLP) where it is crucial to discover and leverage syntactic and semantic relationships that may not be explicitly annotated in the training set. Unfortunately, while incorporating latent variables to represent hidden structure can substantially increase representation power, the key problems of model design and learning become significantly more complicated. For example, unlike fully observed models, latent variable models can suffer from non-identifiability, making it difficult to distinguish the desired latent structure from the others. Moreover, learning is usually formulated as a non-convex optimization problem, leading to the use of local search heuristics that may become trapped in local optima. Less

  • Multimodal Science Learning
    February 26, 2015

    Ken Forbus

    Creating systems that can work with people, using natural modalities, as apprentices is a key step towards human-level AI. This talk will describe how my group is combining research on sketch understanding, natural language understanding, and analogical learning within the Companion cognitive architecture to create systems that can reason and learn about science by working with people. Some promising results will be described (e.g. solving conceptual physics problems involving sketches, modeling conceptual change, learning by reading) as well as work in progress (e.g. interactive knowledge capture via analogy). Less

  • Semi-Supervised Learning In Realistic Settings
    February 5, 2015

    Bhavana Dalvi

    Semi-supervised learning (SSL) has been widely used over a decade for various tasks -- including knowledge acquisition-- that lack large amount of training data. My research proposes a novel learning scenario in which the system knows a few categories in advance, but the rest of the categories are unanticipated and need to be discovered from the unlabeled data. With the availability of enormous unlabeled datasets at low cost, and difficulty of collecting labeled data for all possible categories, it becomes even more important to adapt traditional semi-supervised learning techniques to such realistic settings. Less

  • Bayesian Case Model — Generative Approach for Case-based Reasoning and Prototype
    January 7, 2015

    Been Kim

    I will present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the ``quintessential" observations that best represent clusters in a data set, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art. Less

  • Event Discovery, Content Models, and Relevance
    December 4, 2014

    Aria Haghigi

    I discuss three problems in applied natural language processing and machine learning: event discovery from distributed discourse, document content models for information extraction, and relevance engineering for a large-scale personalization engine. The first two are information extraction problems over social media which attempt to utilize richer structure and context for decision making; these sections reflect work from the tail end of my purely academic work. The relevance section will discuss work done while at my former startup Prismatic and will focus on issues arising from productionizing real-time machine learning. Along the way, I'll share my thoughts and experience around productizing research and interesting future directions. Less

  • Toward Scene Understanding
    December 3, 2014

    Roozbeh Mottaghi

    Scene understanding is one of the holy grails of computer vision, and despite decades of research, it is still considered an unsolved problem. In this talk, I will present a number of methods, which help us take a step further towards the ultimate goal of holistic scene understanding. In particular, I will talk about our work on object detection, 3D pose estimation, and contextual reasoning, and show that modeling these tasks jointly enables better understanding of scenes. At the end of the talk, I will describe our recent work on providing richer descriptions for objects in terms of their viewpoint and sub-category information. Less

  • Open and Exploratory Extraction of Relations (and Common Sense) from Large Text Corpora
    November 10, 2014

    Alan Akbik

    The use of deep syntactic information such as typed dependencies has been shown to be very effective in Information Extraction (IE). Despite this potential, the process of manually creating rule-based information extractors that operate on dependency trees is not intuitive for persons without an extensive NLP background. In this talk, I present an approach and a graphical tool that allows even novice users to quickly and easily define extraction patterns over dependency trees and directly execute them on a very large text corpus. This enables users to explore a corpus for structured information of interest in a highly interactive and data-guided fashion, and allows them to create extractors for those semantic relations they find interesting. I then present a project in which we use Information Extraction to automatically construct a very large common sense knowledge base. This knowledge base - dubbed "The Weltmodell" - contains common sense facts that pertain to proper noun concepts; an example of this is the concept "coffee", for which we know that it is typically drunk by a person or brought by a waiter. I show how we mine such information from very large amounts of text, how we quantify notions such as typicality and similarity, and discuss some ideas how such world knowledge can be used to address reasoning tasks. Less

  • Deep Natural Language Semantics by Combining Logical and Distributional Methods using Probabilistic Logic
    November 4, 2014

    Raymond Mooney

    Traditional logical approaches to semantics and newer distributional or vector space approaches have complementary strengths and weaknesses.We have developed methods that integrate logical and distributional models by using a CCG-based parser to produce a detailed logical form for each sentence, and combining the result with soft inference rules derived from distributional semantics that connect the meanings of their component words and phrases. For recognizing textual entailment (RTE) we use Markov Logic Networks (MLNs) to combine these representations, and for Semantic Textual Similarity (STS) we use Probabilistic Soft Logic (PSL). We present experimental results on standard benchmark datasets for these problems and emphasize the advantages of combining logical structure of sentences with statistical knowledge mined from large corpora. Less

  • The Battle for the Future of Data Mining
    October 7, 2014

    Oren Etzioni

    Deep learning has catapulted to the front page of the New York Times, formed the core of the so-called 'Google brain', and achieved impressive results in vision, speech recognition, and elsewhere. Yet researchers have offered simple conundrums that deep learning doesn't address. For example, consider the sentence: 'The large ball crashed right through the table because it was made of Styrofoam.' What was made of Styrofoam? The large ball? Or the table? The answer is obviously 'the table', but if we change the word 'Styrofoam' to 'steel', the answer is clearly 'the large ball'. To automatically answer this type of question, our computers require an extensive body of knowledge. We believe that text mining can provide the requisite body of knowledge. My talk will describe work at the new Allen Institute for AI towards building the next-generation of text-mining systems. Less

  • Large-Scale Paraphrasing for Natural Language Generation
    October 1, 2014

    Chris Callison-Burch

    I will present my method for learning paraphrases - pairs of English expressions with equivalent meaning - from bilingual parallel corpora, which are more commonly used to train statistical machine translation systems. My method equates pairs of English phrases like --thrown into jail, imprisoned-- when they share an aligned foreign phrase like festgenommen. Because bitexts are large and because a phrase can be aligned many different foreign phrases including phrases in multiple foreign languages, the method extracts a diverse set of paraphrases. For thrown into jail, we not only learn imprisoned, but also arrested, detained, incarcerated, jailed, locked up, taken into custody, and thrown into prison, along with a set of incorrect/noisy paraphrases. I'll show a number of methods for filtering out the poor paraphrases, by defining a paraphrase probability calculated from translation model probabilities, and by re-ranking the candidate paraphrases using monolingual distributional similarity measures. Less

  • Modeling Biological Processes for Reading Comprehension
    August 5, 2014

    Jonathan Berant

    Machine reading calls for programs that read and understand text, but most current work only attempts to extract facts from redundant web-scale corpora. In this talk, I will focus on a new reading comprehension task that requires complex reasoning over a single document. The input is a paragraph describing a biological process, and the goal is to answer questions that require an understanding of the relations between entities and events in the process. To answer the questions, we first predict a rich structure representing the process in the paragraph. Then, we map the question to a formal query, which is executed against the predicted structure. We demonstrate that answering questions via predicted structures substantially improves accuracy over baselines that use shallower representations. Less

  • Extracting Knowledge from Text with Tractable Markov Logic and Symmetry-Based Semantic Parsing
    July 25, 2014

    Pedro Domingos

    Building very large commonsense knowledge bases and reasoning with them is a long-standing dream of AI. Today that knowledge is available in text; all we have to do is extract it. Text, however, is extremely messy, noisy, ambiguous, incomplete, and variable. A formal representation of it needs to be both probabilistic and relational, either of which leads to intractable inference and therefore poor scalability. In the first part of this talk I will describe tractable Markov logic, a language that is restricted enough to be tractable yet expressive enough to represent much of the commonsense knowledge contained in text. Even then, transforming text into a formal representation of its meaning remains a difficult problem. There is no agreement on what the representation primitives should be, and labeled data in the form of sentence-meaning pairs for training a semantic parser is very hard to come by. In the second part of the talk I will propose a solution to both these problems, based on concepts from symmetry group theory. A symmetry of a sentence is a syntactic transformation that does not change its meaning. Learning a semantic parser for a language is discovering its symmetry group, and the meaning of a sentence is its orbit under the group (i.e., the set of all sentences it can be mapped to by composing symmetries). Preliminary experiments indicate that tractable Markov logic and symmetry-based semantic parsing can be powerful tools for scalably extracting knowledge from text. Less

  • Paul Allen Discusses AI2 and the Future of AI (Discussion of AI2 begins at 17:30)
    June 4, 2014

    Paul Allen

    Paul Allen discusses his vision for the future of AI and AI2 in this fireside chat moderated by Gary Marcus of New York University at the 10th Anniversary Symposium - Allen Institute for Brain Science. AI2-related discussion begins at 17:30.

  • Crowdsourcing Insights into Problem Structure for Scientific Discovery
    May 13, 2014

    Bart Selman

    In recent years, there has been tremendous progress in solving large-scale reasoning and optimization problems. Central to this progress has been the ability to automatically uncover hidden problem structure. Nevertheless, for the very hardest computational tasks, human ingenuity still appears indispensable. We show that automated reasoning strategies and human insights can effectively complement each other, leading to hybrid human-computer solution strategies that outperform other methods by orders of magnitude. We illustrate our approach with challenges in scientific discovery in the areas of finite mathematics and materials science. Less

  • Learning and Inference for Natural Language Understanding
    March 31, 2014

    Dan Roth

    Machine Learning and Inference methods have become ubiquitous and have had a broad impact on a range of scientific advances and technologies and on our ability to make sense of large amounts of data. Research in Natural Language Processing has both benefited from and contributed to advancements in these methods and provides an excellent example for some of the challenges we face moving forward. I will describe some of our research in developing learning and inference methods in pursue of natural language understanding. In particular, I will address what I view as some of the key challenges, including (i) learning models from natural interactions, without direct supervision, (ii) knowledge acquisition and the development of inference models capable of incorporating knowledge and reason, and (iii) scalability and adaptation—learning to accelerate inference during the life time of a learning system. Less

  • The Aha! Moment: From Data to Insight
    February 26, 2014

    Dafna Shahaf

    The amount of data in the world is increasing at incredible rates. Large-scale data has potential to transform almost every aspect of our world, from science to business; for this potential to be realized, we must turn data into insight. In this talk, I will describe two of my efforts to address this problem computationally: The first project, Metro Maps of Information, aims to help people understand the underlying structure of complex topics, such as news stories or research areas. Metro Maps are structured summaries that can help us understand the information landscape, connect the dots between pieces of information, and uncover the big picture. The second project proposes a framework for automatic discovery of insightful connections in data. In particular, we focus on identifying gaps in medical knowledge: our system recommends directions of research that are both novel and promising. Less

  • Statistical Text Analysis for Social Science: Learning to Extract International Relations from the News
    February 26, 2014

    Brendan O'Connor

    What can text analysis tell us about society? Corpora of news, books, and social media encode human beliefs and culture. But it is impossible for a researcher to read all of today's rapidly growing text archives. My research develops statistical text analysis methods that measure social phenomena from textual content, especially in news and social media data. For example: How do changes to public opinion appear in microblogs? What topics get censored in the Chinese Internet? What character archetypes recur in movie plots? How do geography and ethnicity affect the diffusion of new language? Less

  • Smart Machines, and What They Can Still Learn From People
    January 23, 2014

    Gary Marcus

    For nearly half a century, artificial intelligence always seemed as if it just beyond reach, rarely more, and rarely less, than two decades away. Between Watson, Deep Blue, and Siri, there can be little doubt that progress in AI has been immense, yet "strong AI" in some ways still seems elusive. In this talk, I will give a cognitive scientist's perspective on AI. What have we learned, and what are we still struggling with? Is there anything that programmers of AI can still learn from studying the science of human cognition? Less

  • AI: A Return to Meaning
    November 5, 2013

    David Ferrucci

    Artificial Intelligence started with small data and rich semantic theories. The goal was to build systems that could reason over logical models of how the world worked; systems that could answer questions and provide intuitive, cognitively accessible explanations for their results. There was a tremendous focus on domain theory construction, formal deductive logics and efficient theorem proving. We had expert systems, rule-bases, forward chaining, backward chaining, modal logics, naïve physics, lisp, prolog, macro theories, micro theories, etc. The problem, of course, was the knowledge acquisition bottleneck; it was too difficult, slow and costly to render all common sense knowledge into an integrated, formal representation that automated reasoning engines could digest. In the meantime, huge volumes of unstructured data became available, compute power became ever cheaper and statistical methods flourished. AI evolved from being predominantly theory-driven to predominantly data-driven. Automated systems generated output using inductive techniques. Training over massive data produced flexible and capable control systems, powerful predictive engines in domains ranging from language translation to pattern recognition, from medicine to economics. Coming from a background in formal knowledge representation and automated reasoning, the writing was on the wall -- big data and statistical machine learning was changing the face of AI and quickly. Form the very inception of Watson, I put a stake in the ground; we will not even attempt to build rich semantic models of the domain. I imagined it would take 3 years just to come to consensus on the common ontology to cover such a broad domain. Rather, we will use a diversity of shallow text analytics, leverage loose and fuzzy interpretations of unstructured information. We would allow many researchers to build largely independent NLP components and rely on machine learning techniques to balance and combine these loosely federated algorithms to evaluate answers in the context of passages. The approach, with a heck of a lot of good engineering, worked. Watson was arguably the best factoid question-answering system in the world, and Watson Paths, could connect questions to answers over multiple steps, offering passage-based "inference chains" from question to answer without a single "if-then rule". But could it explain why an answer is right or wrong? Could it reason over a logical understanding of the domain? Could it automatically learn from language and build the logical or cognitive structures that enable and precede language itself? Could it understand and learn the way we do? No. No. No. No. This talk draws an arc from Theory-Driven AI to Data-Driven AI and positions Watson along that trajectory. It proposes that to advance AI to where we all know it must go, we need to discover how to efficiently combine human cognition, massive data and logical theory formation. We need to boot strap a fluent collaboration between human and machine that engages logic, language and learning to enable machines to learn how to learn and ultimately deliver on the promise of AI. Less