Matt Gardner received his Ph.D. from the Language Technologies Institute at Carnegie Mellon University in November 2015. His research interests include machine reading and reasoning, extracting information from text and using it to answer questions. At home, he likes to play harp and flute duets with his wife, read and play with his kids, and play soccer with friends.
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits)). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.
We explore some of the practicalities of using random walk inference methods, such as the Path Ranking Algorithm (PRA), for the task of knowledge base completion. We show that the random walk probabilities computed (at great expense) by PRA provide no discernible benefit to performance on this task, and so they can safely be dropped. This result allows us to define a simpler algorithm for generating feature matrices from graphs, which we call subgraph feature extraction (SFE). In addition to being conceptually simpler than PRA, SFE is much more efficient, reducing computation by an order of magnitude, and more expressive, allowing for much richer features than just paths between two nodes in a graph. We show experimentally that this technique gives substantially better performance than PRA and its variants, improving mean average precision from .432 to .528 on a knowledge base completion task using the NELL knowledge base.
This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).
Much work in recent years has gone into the construction of large knowledge bases (KBs), such as Freebase, DBPedia, NELL, and YAGO. While these KBs are very large, they are still very incomplete, necessitating the use of inference to fill in gaps. Prior work has shown how to make use of a large text corpus to augment random walk inference over KBs. We present two improvements to the use of such large corpora to augment KB inference. First, we present a new technique for combining KB relations and surface text into a single graph representation that is much more compact than graphs used in prior work. Second, we describe how to incorporate vector space similarity into random walk inference over KBs, reducing the feature sparsity inherent in using surface text. This allows us to combine distributional similarity with symbolic logical inference in novel and effective ways. With experiments on many relations from two separate KBs, we show that our methods significantly outperform prior work on KB inference, both in the size of problem our methods can handle and in the quality of predictions made.
We have recently shown how to combine random walk inference over knowledge bases with vector space representations of surface forms, improving performance on knowledge base inference. In this paper, we formalize the connection of our prior work to logical inference rules, giving some general observations about methods for incorporating vector space representations into symbolic logic systems. Additionally, we present some promising preliminary work that extends these techniques to learning open-domain relations for the purpose of answering multiple choice questions, achieving 67% accuracy on a small test set.
Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRA-based approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.
Particle swarm optimization (PSO) has previously been parallelized primarily by distributing the computation corresponding to particles across multiple processors. In these approaches, the only benefit of additional processors is an increased swarm size. However, in many cases this is not efficient when scaled to very large swarm sizes (on very large clusters). Current methods cannot answer well the question: “How can 1000 processors be fully utilized when 50 or 100 particles is the most efficient swarm size?” In this paper we attempt to answer that question with a speculative approach to the parallelization of PSO that we refer to as SEPSO.
In our approach, we refactor PSO such that the computation needed for iteration t+1 can be done concurrently with the computation needed for iteration t. Thus we can perform two iterations of PSO at once. Even with some amount of wasted computation, we show that this approach to parallelization in PSO often outperforms the standard parallelization of simply adding particles to the swarm. SEPSO produces results that are exactly equivalent to PSO; that is, SEPSO is a new method of parallelization and not a new PSO algorithm or variant.
However, given this new parallelization model, we can relax the requirement of exactly reproducing PSO in an attempt to produce better results. We present several such relaxations, including keeping the best speculative position evaluated instead of the one corresponding to the standard behavior of PSO, and speculating several iterations ahead instead of just one. We show that these methods dramatically improve the performance of parallel PSO in many cases, giving speed ups of up to six compared to previous parallelization techniques.
Particle swarm optimization (PSO) has previously been parallelized only by adding more particles to the swarm or by parallelizing the evaluation of the objective function. However, some functions are more efficiently optimized with more iterations and fewer particles. Accordingly, we take inspiration from speculative execution performed in modern processors and propose speculative evaluation in PSO (SEPSO). Future positions of the particles are speculated and evaluated in parallel with current positions, performing two iterations of PSO at once.
We also propose another way of making use of these speculative particles, keeping the best position found instead of the position that PSO actually would have taken. We show that for a number of functions, speculative evaluation gives dramatic improvements over adding additional particles to the swarm.
Particle Swarm Optimization (PSO) has typically been used with small swarms of about 50 particles. However, PSO is more efficiently parallelized with large swarms. We formally describe existing topologies and identify variations which are better suited to large swarms in both sequential and parallel computing environments. We examine the performance of PSO for benchmark functions with respect to swarm size and topology. We develop and demonstrate a new PSO variant which leverages the unique strengths of large swarms. "Hearsay PSO" allows for information to flow quickly through the swarm, even with very loosely connected topologies. These loosely connected topologies are well suited to large scale parallel computing environments because they require very little communication between particles. We consider the case where function evaluations are expensive with respect to communication as well as the case where function evaluations are relatively inexpensive. We also consider a situation where local communication is inexpensive compared to external communication, such as multicore systems in a cluster.