Menu
Viewing 34 videos from 2016 See AI2’s full collection of videos on our YouTube channel.
    • November 19, 2016

      Oren Etzioni

      Artificial Intelligence advocate Oren Etzioni makes a case for the life-saving benefits of AI used wisely to improve our way of life. Acknowledging growing fears about AI’s potential for abuse of power, he asks us to consider how to responsibly balance our desire for greater intelligence and autonomy with the risks inherent in this new and growing technology. Less

      Less More
    • November 8, 2016

      Manohar Pulari

      Over the past 5 years the community has made significant strides in the field of Computer Vision. Thanks to large scale datasets, specialized computing in form of GPUs and many breakthroughs in modeling better convnet architectures Computer Vision systems in the wild at scale are becoming a reality. At Facebook AI Research we want to embark on the journey of making breakthroughs in the field of AI and using them for the benefit of connecting people and helping remove barriers for communication. In that regard Computer Vision plays a significant role as the media content coming to Facebook is ever increasing and building models that understand this content is crucial in achieving our mission of connecting everyone. In this talk I will gloss over how we think about problems related to Computer Vision at Facebook and touch various aspects related to supervised, semi-supervised, unsupervised learning. I will jump between various research efforts involving representation learning. I will also highlight some large scale applications and talk about limitations of current systems and how we are planning to tackle them. Less

      Less More
    • October 18, 2016

      Kun Xu

      As very large structured knowledge bases have become available, answering natural language questions over structured knowledge facts has attracted increasing research efforts. We tackle this task in a pipeline paradigm, that is, recognizing users’ query intention and mapping the involved semantic items against a given knowledge base (KB). we propose an efficient pipeline framework to model a user’s query intention as a phrase level dependency DAG which is then instantiated regarding a specific KB to construct the final structured query. Our model benefits from the efficiency of structured prediction models and the separation of KB-independent and KB-related modelings. The most challenging problem in the structure instantiation is to ground the relational phrases to KB predicates which essentially can be treated as a relation classification (RE) task. To learn a robust and generalized representation of the relation, we propose a multi-channel convolutional neural network which works on the shortest dependency path. Furthermore, we introduce a negative sampling strategy to learn the assignment of subjects and objects of a relation. Less

      Less More
    • October 18, 2016

      Jacob Andreas

      Language understanding depends on two abilities: an ability to translate between natural language utterances and abstract representations of meaning, and an ability to relate these meaning representations to the world. In the natural language processing literature, these tasks are respectively known as "semantic parsing" and "grounding", and have been treated as essentially independent problems. In this talk, I will present two modular neural architectures for jointly learning to ground language in the world and reason about it compositionally. I will first describe a technique that uses syntactic information to dynamically construct neural networks from composable primitives. The resulting structures, called "neural module networks", can be used to achieve state-of-the-art results on a variety of grounded question answering tasks. Next, I will present a model for contextual referring expression generation, in which contrastive behavior results from a combination of learned semantics and inference-driven pragmatics. This model is again backed by modular neural components---in this case elementary listener and speaker representations. It is able to successfully complete a challenging referring expression generation task, exhibiting pragmatic behavior without ever observing such behavior at training time.

      Less More
    • September 29, 2016

      Karthik Narasimhan

      In this talk, I will describe two approaches to learning natural language semantics using reward-based feedback. This is in contrast to many NLP approaches that rely on large amounts of supervision, which is often expensive and difficult to obtain. First, I will describe a framework utilizing reinforcement learning to improve information extraction (IE). Our approach identifies alternative sources of information by querying the web, extracting from new sources, and reconciling the extracted values until sufficient evidence is collected. Our experiments on two datasets -- shooting incidents and food adulteration cases -- demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline. Second, I will talk about learning control policies for text-based games where an agent needs to understand natural language to operate effectively in a virtual environment. We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback, capturing semantics of the game states in the process.

      Less More
    • September 26, 2016

      Shobeir Fakhraei

      Our world is becoming increasingly connected, and so is the data collected from it. To represent, reason about, and model the real-world data, it is essential to develop computational models capable of representing the underlying network structures and their characteristics. Domains such as scholarly networks, biology, online social networks, the World Wide Web and information networks, and recommender systems are just a few examples that include explicit or implicit network structures. I have studied and developed computational models for representing and reasoning about rich, heterogeneous, and interlinked data that span over feature-based and embedding-based approaches to statistical relational methods that more explicitly model dependencies between interconnected entities. In this talk, I will discuss different methods of modeling node classification and link inference on networks in several domains, and highlight two important aspects: (1) Heterogeneous entities and multi-relational structures, (2) joint inference and collective classification of the unlabeled data. I will also introduce our model for link inference that serves as a template to encode a variety of information such as structural, biological, social, contextual interactions in different domains.

      Less More
    • September 19, 2016

      Anna Rohrbach

      In recent years many challenging problems have emerged in the field of language and vision. Frequently the only form of available annotation is the natural language sentence associated with an image or video. How can we address complex tasks like automatic video description or visual grounding of textual phrases with these weak and noisy annotations? In my talk I will first present our pioneering work on automatic movie description. We collected a large scale dataset and proposed an approach to learn visual semantic concepts from weak sentence annotations. I will then talk about our approach to grounding arbitrary language phrases in images. It is able to operate in un- and semi-supervised settings (with respect to the localization annotations) by learning to reconstruct the input phrase.

      Less More
    • September 13, 2016

      Ajay Nagesh

      Information Extraction has become an indispensable tool in our quest to handle the data deluge of the information age. In this talk, we discuss the categorization of complex relational features and outline methods to learn feature combinations through induction. We demonstrate the efficacy of induction techniques in learning rules for the identification of named entities in text – the novelty being the application of induction techniques to learn in a very expressive declarative rule language. Next, we discuss our investigations in the paradigm of distant supervision, which facilitates the creation of large albeit noisy training data. We devise an inference framework in which constraints can be easily specified in learning relation extractors. We reformulate the learning objective in a max-margin framework. To the best of our knowledge, our formulation is the first to optimize multi-variate non-linear performance measures such as F1 for a latent variable structure prediction task. Towards the end, we will briefly touch upon some recent exploratory work to leverage matrix completion methods and novel embedding techniques for predicting a richer fine-grained set of entity types to help in downstream applications such as Relation Extraction and Question Answering.

      Less More
    • September 7, 2016

      Siva Reddy

      I will present three semantic parsing approaches for querying Freebase in natural language 1) training only on raw web corpus, 2) training on question-answer (QA) pairs, and 3) training on both QA pairs and web corpus. For 1 and 2, we conceptualise semantic parsing as a graph matching problem, where natural language graphs built using CCG/dependency logical forms are transduced to Freebase graphs. For 3, I will present a natural-logic approach for Semantic Parsing. Our methods achieve state-of-the-art on WebQuestions and Free917 QA datasets.

      Less More
    • August 23, 2016

      Matthew Peters

      Distributed representations of words, phrases and sentences are central to recent advances in machine translation, language modeling, semantic similarity, and other tasks. In this talk, I'll explore ways to learn similar representations of search queries, web pages and web sites. The first portion of the talk describes a method to learn a keyword-web page similarity function applicable to web search. It represents the web page as a set of attributes (URL, title, meta description tag, etc) and uses a separate LSTM encoder for each attribute. The network is trained end-to-end from clickthrough logs. The second half of the talk introduces a measure of authority for each web page and jointly learns keyword-keyword, keyword-site and keyword-site-authority relationships. The multitask network leverages a shared representation for keywords and sites and learns a fine grained topic authority (for example politico.com is an authority on the topic "Bernie Sanders" but not on "Seattle Mariners").

      Less More
    • August 22, 2016

      Jay Pujara

      Automated question answering, knowledgeable digital assistants, and grappling with the massive data flooding the Web all depend on structured knowledge. Precise knowledge graphs capturing the many, complex relationships between entities are the missing piece for many problems, but knowledge graph construction is notoriously difficult. In this talk, I will chronicle common failures from the first generation of information extraction systems and show how combining statistical NLP signals and semantic constraints addresses these problems. My method, Knowledge Graph Identification (KGI), exploits the key lessons of the statistical relational learning community and uses them for better knowledge graph construction. Probabilistic models are often discounted due to scalability concerns, but KGI translates the problem into a tractable convex objective that is amenable to parallelization. Furthermore, the inferences from KGI have provable optimality and can be updated efficiently using approximate techniques that have bounded regret. I demonstrate state-of-the-art performance of my approach on knowledge graph construction and entity resolution tasks on NELL and Freebase, and discuss exciting new directions for KG construction.

      Less More
    • August 1, 2016

      Dan Garrette

      Learning NLP models from weak forms of supervision has become increasingly important as the field moves toward applications in new languages and domains. Much of the existing work in this area has focused on designing learning approaches that are able to make use of small amounts of human-generated data. In this talk, I will present work on a complementary form of inductive bias: universal, cross-lingual principles of how grammars function. I will develop these intuitions with a series of increasingly complex models based in the Combinatory Categorial Grammar (CCG) formalism: first, a supertagging model that biases towards associative adjacent-category relationships; second, a parsing model that biases toward simpler grammatical analyses; and finally, a novel parsing model, with accompanying learning procedure, that is able to exploit both of these biases by parameterizing the relationships between each constituent label and its supertag context to find trees with a better global coherence. We model grammar with CCG because the structured, logic-backed nature of CCG categories and the use of a small universal set of constituent combination rules are ideally suited to encoding as priors, and we train our models within a Bayesian setting that combines these prior beliefs about how natural languages function with the empirical statistics gleaned from large amounts of raw text. Experiments with each of these models show that when training from only partial type-level supervision and a corpus of unannotated text, employing these universal properties as soft constraints yields empirically better models. Additional gains are obtained by further shaping the priors with corpus-specific information that is estimated automatically from the tag dictionary and raw text.

      Less More
    • July 18, 2016

      Claudio Delli Bovi

      The Open Information Extraction (OIE) paradigm has received much attention in the NLP community over the last decade. Since the earliest days, most OIE approaches have been focusing on Web-scale corpora, which raises issues such as massive amounts of noise. Also, OIE systems can be very different in nature and develop their own type inventories, with no portable ontological structure. This talk steps back and explores both issues by presenting two substantially different approaches to the task: in the first we shift the target of a full-fledged OIE pipeline to a relatively small, dense corpus of definitional knowledge; in the second we try to make sense of different OIE outputs by merging them into a single, unified and fully disambiguated knowledge repository.

      Less More
    • July 15, 2016

      Yuxin Chen

      Sequential information gathering, i.e., selectively acquiring the most useful data, plays a key role in interactive machine learning systems. Such problem has been studied in the context of Bayesian active learning and experimental design, decision making, optimal control and numerous other domains. In this talk, we focus on a class of information gathering tasks, where the goal is to learn the value of some unknown target variable through a sequence of informative, possibly noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly. We propose a class of novel, computationally efficient active learning algorithms, and prove strong theoretical guarantees that hold with correlated, possibly noisy tests. Rather than myopically optimize the value of a test (which, in our case, is the expected reduction in prediction error), at each step, our algorithms pick the test that maximizes the gain in a surrogate objective, which is adaptive submodular. This property enables us to utilize an efficient greedy optimization while providing strong approximation guarantees. We demonstrate our algorithms in several real-world problem instances, including a touch-based location task on an actual robotic platform, and an active preference learning task via pairwise comparisons.

      Less More
    • June 21, 2016

      Katrin Erk

      As the field of Natural Language Processing develops, more ambitious semantic tasks are being addressed, such as Question Answering (QA) and Recognizing Textual Entailment (RTE). Solving these tasks requires (ideally) an in-depth representation of sentence structure as well as expressive and flexible representations at the word level. We have been exploring a combination of logical form with distributional as well as resource-based information at the word level, using Markov Logic Networks (MLNs) to perform probabilistic inference over the resulting representations. In this talk, I will focus on the three main components of a system we have developed for the task of Textual Entailment: (1) Logical representation for processing in MLNs, (2) lexical entailment rule construction by integrating distributional information with existing resources, and (3) probabilistic inference, the problem of solving the resulting MLN inference problems efficiently. I will also comment on how I think the ideas from this system can be adapted to Question Answering and the more general task of in-depth single-document understanding.

      Less More
    • June 20, 2016

      Marcus Rohrbach

      Language is the most important channel for humans to communicate about what they see. To allow an intelligent system to effectively communicate with humans it is thus important to enable it to relate information in words and sentences with the visual world. For this a system should be compositional, so it is e.g. not surprised when it encounters a novel object and can still talk about it. It should be able to explain in natural language, why it recognized a given object in an image as certain class, to allow a human to trust and understand it. However, it should not only be able to generate natural language, but also understand it, and locate sentences and linguistic references in the visual world. In my talk, I will discuss how we approach these different fronts by looking at the tasks of language generation about images, visual grounding, and visual question answering. I will conclude with a discussion of the challenges ahead.

      Less More
    • June 13, 2016

      Megasthenis Asteris

      Principal component analysis (PCA) is one of the most popular tools for identifying structure and extracting interpretable information from datasets. In this talk, I will discuss constrained variants of PCA such as Sparse or Nonnegative PCA that are computationally harder, but offer higher data interpretability. I will describe a framework for solving quadratic optimization problems --such as PCA-- under sparsity or other combinatorial constraints. Our method can surprisingly solve such problems exactly when the involved quadratic form matrix is positive semidefinite and low rank. Of course, real datasets are not low-rank, but they can frequently be well approximated by low-rank matrices. For several datasets, we obtain excellent empirical performance and provable upper bounds that guarantee that our objective is close to the unknown optimum.

      Less More
    • June 13, 2016

      Niket Tandon

      There is a growing conviction that the future of computing will crucially depend on our ability to exploit Big Data on the Web to produce significantly more intelligent and knowledgeable systems. This includes encyclopedic knowledge (for factual knowledge) and commonsense knowledge (for more advanced human-like reasoning). The focus of this talk is automatic acquisition of commonsense knowledge using the Web. We require the computers to understand the environment (e.g. the properties of the objects in the environment), the relations between these objects (e.g. handle is part of a bike or that bike is slower than a car), and, the semantics of their interaction (e.g. a man and a woman meet for a dinner in a restaurant in the evening). This talk presents techniques for gathering such commonsense from textual and visual data from the Web.

      Less More
    • May 23, 2016

      Oren Etzioni

      Oren Etzioni, CEO of the Allen Institute for AI, shares his vision for deploying AI technologies for the common good.

      Less More
    • May 17, 2016

      Yi Yang

      With the resurgence of neural networks, low-dimensional dense features have been used in a wide range of natural language processing problems. Specifically, tasks like part-of-speech tagging, dependency parsing and entity linking have been shown to benefit from dense feature representations from both efficiency and effectiveness aspects. In this talk, I will present algorithms for unsupervised domain adaptation, where we train low-dimensional feature embeddings with instances from both source and target domains. I will also talk about how to extend the approach to unsupervised multi-domain adaptation by leveraging metadata domain attributes. I will then introduce a tree-based structured learning model for entity linking, where the model employs a few statistical dense features to jointly detect mentions and disambiguate entities. Finally, I will discuss some promising directions for future research.

      Less More
    • May 9, 2016

      Aditya Khosla

      When glancing at a magazine or browsing the Internet, we are continuously exposed to photographs and images. While some images stick in our minds, others are ignored or quickly forgotten. Artists, advertisers and educators are routinely challenged by the question "what makes a picture memorable?" and must then create an image that speaks to the observer. In this talk, I will show how deep learning algorithms can predict with near-human consistency which images people will remember or forget - and how we can modify images automatically to make them more or less memorable.

      Less More
    • May 3, 2016

      Saurabh Gupta

      In this talk, I will talk about detailed scene understanding from RGB-D images. We approach this problem by studying central computer vision problems like bottom-up grouping, object detection, instance segmentation, pose estimation in context of RGB-D images, and finally aligning CAD models to objects in the scene. This results in a detailed output which goes beyond what most current computer vision algorithms produce, and is useful for real world applications like perceptual robotics, and augmented reality. A central question in this work is how to learn good features for depth images in view of the fact that labeled RGB-D datasets are much smaller than labeled RGB datasets (such as ImageNet) typically used for feature learning. To this end I will describe our technique called "cross-modal distillation" which allows us to leverage easily available annotations on RGB images to learn representations on depth images. In addition, I will also briefly talk about some work on vision and language that I did on an internship at Microsoft Research.

      Less More
    • April 26, 2016

      The successes of deep learning in the past decade on difficult tasks ranging from image processing to speech recognition to game playing is strong evidence for the utility of abstract representations of complex natural sensory data. In this talk I will present the deep canonical correlation analysis (DCCA) model to learn deep representation mappings of each of two data views (e.g., from two different sensory modalities) such that the learned representations are maximally predictive of each other in the sense of correlation. Comparisons with linear CCA and kernel CCA demonstrate that DCCA is capable of finding far more highly correlated nonlinear representations than standard methods. Experiments also demonstrate the utility of the representation mappings learned by DCCA in the scenario where one of the data views is unavailable at test time.

      Less More
    • April 12, 2016

      Percy Liang

      Can we learn if we start with zero examples, either labeled or unlabeled? This scenario arises in new user-facing systems (such as virtual assistants for new domains), where inputs should come from users, but no users exist until we have a working system, which depends on having training data. I will discuss recent work that circumvent this circular dependence by interleaving user interaction and learning.

      Less More
    • April 6, 2016

      Ronan Le Bras

      Most problems, from theoretical problems in combinatorics to real-world applications, comprise hidden structural properties not directly captured by the problem definition. A key to the recent progress in automated reasoning and combinatorial optimization has been to automatically uncover and exploit this hidden problem structure, resulting in a dramatic increase in the scale and complexity of the problems within our reach. The most complex tasks, however, still require human abilities and ingenuity. In this talk, I will show how we can leverage human insights to effectively complement and dramatically boost state-of-the-art optimization techniques. I will demonstrate the effectiveness of the approach with a series of scientific discoveries, from experimental designs to materials discovery.

      Less More
    • April 4, 2016

      Jeffrey Heer

      How might we architect interactive systems that have better models of the tasks we're trying to perform, learn over time, help refine ambiguous user intents, and scale to large or repetitive workloads? In this talk I will present Predictive Interaction, a framework for interactive systems that shifts some of the burden of specification from users to algorithms, while preserving human guidance and expressive power. The central idea is to imbue software with domain-specific models of user tasks, which in turn power predictive methods to suggest a variety of possible actions. I will illustrate these concepts with examples drawn from widely-deployed systems for data transformation and visualization (with reported order-of-magnitude productivity gains) and then discuss associated design considerations and future research directions.

      Less More
    • March 25, 2016

      Ashish Vaswani

      Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.

      Less More
    • March 9, 2016

      Manaal Faruqui

      Unsupervised learning of word representations have proven to provide exceptionally effective features in many NLP tasks. Traditionally, construction of word representations relies on the distributional hypothesis, which posits that the meaning of words is evidenced by the contextual words they occur with (Harris, 1954). Although distributional context is fairly good at capturing word meaning, in this talk I'll show that going beyond the distributional hypothesis---by exploiting additional sources of word meaning information---improves the quality of word representations. First, I'll show how semantic lexicons, like WordNet, can be used to obtain better word vector representations. Second, I'll describe a novel graph-based learning framework that uses morphological information to construct large scale morpho-syntactic lexicons. I'll conclude with additional approaches that can be taken to improve word representations.

      Less More
    • March 3, 2016

      Ali Farhadi

      Ali Farhadi discusses the history of computer vision and AI.

      Less More
    • March 2, 2016

      Ashish Sabharwal

      Artificial intelligence and machine learning communities have made tremendous strides in the last decade. Yet, the best systems to date still struggle with routine tests of human intelligence, such as standardized science exams posed as-is in natural language, even at the elementary-school level. Can we demonstrate human-like intelligence by building systems that can pass such tests? Unlike typical factoid-style question answering (QA) tasks, these tests challenge a student’s ability to combine multiple facts in various ways, and appeal to broad common-sense and science knowledge. Going beyond arguably shallow information retrieval (IR) and statistical correlation techniques, we view science QA from the lens of combinatorial optimization over a semi-formal knowledge base derived from text. Our structured inference system, formulated as an Integer Linear Program (ILP), turns out to be not only highly complementary to IR methods, but also more robust to question perturbation, as well as substantially more scalable and accurate than prior attempts using probabilistic first-order logic and Markov Logic Networks (MLNs). This talk will discuss fundamental challenges behind the science QA task, the progress we have made, and many challenges that lie ahead.

      Less More
    • February 16, 2016

      Eric Xing

      The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required — and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can indeed benefit greatly from ML-rooted statistical and algorithmic insights — and that ML researchers should therefore not shy away from such systems design — we discuss a series of principles and strategies distilled from our recent efforton industrial-scale ML solutions that involve a continuum from application, to engineering, and to theoretical research and development of Big ML system and architecture, on how to make them efficient, general, and with convergence and scaling guarantees.

      Less More
    • February 9, 2016

      Rich Caruana

      Locally normalized approaches for structured prediction, such as left-to-right parsing and sequence labeling, are attractive because of their simplicity, ease of training, and the flexibility in choosing features from observations. Combined with the power of neural networks, they have been widely adopted for NLP tasks. However, locally normalized models suffer from label bias, where search errors arise during prediction because scores of hypotheses are computed from local decisions. While conditional random fields avoid label bias by scoring hypothesis globally, it is at the cost of training time and limited freedom for specifying features. In this talk, I will present two approaches for overcoming label bias in structured prediction with locally normalized models. In the first approach, I will introduce a framework for learning to identify erroneous hypotheses and discard them at prediction time. Applying this framework to transition-based dependency parsing improves parsing accuracy significantly. In the second approach, I will show that scheduled sampling (Bengio et al.) and a variant can be robust to prediction errors, leading to state-of-the-art accuracies on CCG supertagging with LSTMs and in-domain CCG parsing.

      Less More
    • January 27, 2016

      Jayant Krishnamurthy

      Lexicon learning is the first step of training a semantic parser for a new application domain, and the quality of the learned lexicon significantly affects both the accuracy and efficiency of the final semantic parser. Existing work on lexicon learning has focused on heuristic methods that lack convergence guarantees and require significant human input in the form of lexicon templates or annotated logical forms. In contrast, the proposed probabilistic models are trained directly from question/answer pairs using EM and the simplest model has a concave objective function that guarantees that EM converges to a global optimum. An experimental evaluation on a data set of 4th grade science questions demonstrates that these models improve semantic parser accuracy (35-70% error reduction) and efficiency (4-25x more sentences per second) relative to prior work, despite using less human input. The models also obtain competitive results on Geoquery without any dataset-specific engineering.

      Less More
    • January 12, 2016

      Patrice Simard

      For many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcher’s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the “head” of ML.

      We are interested in another large class of ML problems where data is sparse. For contrast, we call it the “tail” of ML. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “patio on”, “enlarge paragraph spacing”, “make appointment with doctor when back from vacation”. Anyone who has attempted building such a system has soon discovered that there are far more ways to issue a command than they originally thought. Domain knowledge, data selection, and custom features are essential to get good generalization performance with small amounts of data. With the right tools, an ML expert can build such a classifier or annotator in a matter of hours. Unfortunately, the current cost of an ML expert (if one is available) is often more than the value produced by a single domain specific model. Getting good results on the tail is not cheap or easy.

      To address this problem, we change our focus from the learner to the teacher. We define Machine Teaching as improving the “teacher” productivity given the “learner”. The teacher is human. The learner is an ML algorithm. Ideally, our approach is “learner agnostic”. Focusing on improving the teacher does not preclude using the best ML algorithm or the best deep representation features and transfer learning. We view Machine Teaching and Machine Learning as orthogonal and complementary approaches. The Machine Teaching metrics are ML metrics divided by human costs, and Machine Teaching focuses on reducing the denominator. This perspective has led to many interesting insights and significant gains in ML productivity.

      Less More