# Tushar Khot

## Research

Tushar Khot is a Research Scientist at AI2, specializing in probabilistic graphical models for structured data. His research focuses on modeling structured data using first-order logic and probabilistic models. He received his PhD from the University of Wisconsin-Madison in 2014 and his B.Tech from NIT Trichy in 2006. Prior to his PhD, he worked at Google Bangalore for two years.

## Papers

## Question Answering via Integer Programming over Semi-Structured Knowledge

**Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni and Dan Roth**IJCAI*2016*Answering science questions posed in natural language is an important AI challenge. Answering such questions often requires non-trivial inference and knowledge that goes beyond factoid retrieval. Yet, most systems for this task are based on relatively shallow Information Retrieval (IR) and statistical correlation techniques operating on large unstructured corpora. We propose a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts. On a dataset of real, unseen science questions, our system significantly outperforms (+14%) the best previous attempt at structured reasoning for this task, which used Markov Logic Networks (MLNs). It also improves upon a previous ILP formulation by 17.7%. When combined with unstructured inference methods, the ILP system significantly boosts overall performance (+10%). Finally, we show our approach is substantially more robust to a simple answer perturbation compared to statistical correlation methods. Less

## Learning Continuous-Time Bayesian Networks in Relational Domains: A Non-Parametric Approach

**Shuo Yang, Tushar Khot, Kristian Kersting and Sriraam Natarajan**AAAI | 2016Many real world applications in medicine, biology, communication networks, web mining, and economics, among others, involve modeling and learning structured stochastic processes that evolve over continuous time. Existing approaches, however, have focused on propositional domains only. Without extensive feature engineering, it is difficult---if not impossible---to apply them within relational domains where we may have varying number of objects and relations among them. We therefore develop the first relational representation called Relational Continuous-Time Bayesian Networks (RCTBNs) that can address this challenge. It features a nonparametric learning method that allows for efficiently learning the complex dependencies and their strengths simultaneously from sequence data. Our experimental results demonstrate that RCTBNs can learn as effectively as state-of-the-art approaches for propositional tasks while modeling relational tasks faithfully. Less

## Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

**Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Turney**AAAI | 2016What capabilities are required for an AI system to pass standard 4th Grade Science Tests? Previous work has examined the use of Markov Logic Networks (MLNs) to represent the requisite background knowledge and interpret test questions, but did not improve upon an information retrieval (IR) baseline. In this paper, we describe an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results. We evaluate the methods on six years of unseen, unedited exam questions from the NY Regents Science Exam (using only non-diagram, multiple choice questions), and show that our overall system’s score is 71.3%, an improvement of 23.8% (absolute) over the MLN-based method described in previous work. We conclude with a detailed analysis, illustrating the complementary strengths of each method in the ensemble. Our datasets are being released to enable further research. Less

## Exploring Markov Logic Networks for Question Answering

**Tushar Khot, Niranjan Balasubramanian, Eric Gribkoff, Ashish Sabharwal, Peter Clark, Oren Etzioni**EMNLP | 2015Elementary-level science exams pose significant knowledge acquisition and reasoning challenges for automatic question answering. We develop a system that reasons with knowledge derived from textbooks, represented in a subset of first-order logic. Automatic extraction, while scalable, often results in knowledge that is incomplete and noisy, motivating use of reasoning mechanisms that handle uncertainty. Markov Logic Networks (MLNs) seem a natural model for expressing such knowledge, but the exact way of leveraging MLNs is by no means obvious. We investigate three ways of applying MLNs to our task. First, we simply use the extracted science rules directly as MLN clauses and exploit the structure present in hard constraints to improve tractability. Second, we interpret science rules as describing prototypical entities, resulting in a drastically simplified but brittle network. Our third approach, called Praline, uses MLNs to align lexical elements as well as define and control how inference should be performed in this task. Praline demonstrates a 15% accuracy boost and a 10x reduction in runtime as compared to other MLN-based methods, and comparable accuracy to word-based baseline approaches. Less

## Knowledge-Based Probabilistic Logic Learning

**Phillip Odom, Tushar Khot, Reid Porter, and Sriraam Natarajan**AAAI | 2015Advice giving has been long explored in artificial intelligence to build robust learning algorithms. We consider advice giving in relational domains where the noise is systematic. The advice is provided as logical statements that are then explicitly considered by the learning algorithm at every update. Our empirical evidence proves that human advice can effectively accelerate learning in noisy structured domains where so far humans have been merely used as labelers or as designers of initial structure of the model. Less

## Gradient-based Boosting for Statistical Relational Learning: The Markov Logic Network and Missing Data Cases

**Tushar Khot, Sriraam Natarajan, Kristian Kersting and Jude Shavlik**Machine Learning | 2015Recent years have seen a surge of interest in Statistical Relational Learning (SRL) models that combine logic with probabilities. One prominent and highly expressive SRL model is Markov Logic Networks (MLNs), but the expressivity comes at the cost of learning complexity. Most of the current methods for learning MLN structure follow a two-step approach where first they search through the space of possible clauses (i.e. structures) and then learn weights via gradient descent for these clauses. We present a functional-gradient boosting algorithm to learn both the weights (in closed form) and the structure of the MLN simultaneously. Moreover most of the learning approaches for SRL apply the closed-world assumption, i.e., whatever is not observed is assumed to be false in the world. We attempt to open this assumption.We extend our algorithm for MLN structure learning to handle missing data by using an EM-based approach and show this algorithm can also be used to learn Relational Dependency Networks and relational policies. Our results in many domains demonstrate that our approach can effectively learn MLNs even in the presence of missing data.Less

## Relational One-Class Classification: A Non-Parametric Approach

**Tushar Khot, Sriraam Natarajan, Jude Shavlik**AAAI | 2014One-class classification approaches have been proposed in the literature to learn classifiers from examples of only one class. But these approaches are not directly applicable to relational domains due to their reliance on a feature vector or a distance measure. We propose a non-parametric relational one-class classification approach based on first-order trees. We learn a tree-based distance measure that iteratively introduces new relational features to differentiate relational examples. We update the distance measure so as to maximize the one-class classification performance of our model. We also relate our model definition to existing work on probabilistic combination functions and density estimation. We experimentally show that our approach can discover relevant features and outperform three baseline approaches.

## Gradient-based Boosting for Statistical Relational Learning: The Relational Dependency Network Case.

**Sriraam Natarajan, Tushar Khot, Kristian Kersting, Bernd Gutmann and Jude Shavlik.**Machine Learning | 2012Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditions distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex model-selection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational function-approximation problems using gradient-based boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to state-of-the-art statistical relational learning approaches.

## Learning Markov Logic Networks via Functional Gradient Boosting

**Tushar Khot, Sriraam Natarajan, Kristian Kersting, Jude Shavlik**ICDM | 2011Recent years have seen a surge of interest in Statistical Relational Learning (SRL) models that combine logic with probabilities. One prominent example is Markov Logic Networks (MLNs). While MLNs are indeed highly expressive, this expressiveness comes at a cost. Learning MLNs is a hard problem and therefore has attracted much interest in the SRL community. Current methods for learning MLNs follow a twostep approach: first, perform a search through the space of possible clauses and then learn appropriate weights for these clauses. We propose to take a different approach, namely to learn both the weights and the structure of the MLN simultaneously. Our approach is based on functional gradient boosting where the problem of learning MLNs is turned into a series of relational functional approximation problems. We use two kinds of representations for the gradients: clausebased and tree-based. Our experimental evaluation on several benchmark data sets demonstrates that our new approach can learn MLNs as good or better than those found with state-ofthe- art methods, but often in a fraction of the time.