Menu
Viewing 141-160 of 192 papers
Clear all
    • EMNLP 2016
      Rebecca Sharp, Mihai Surdeanu, Peter Jansen, and Peter Clark

      A common model for question answering (QA) is that a good answer is one that is closely related to the question, where relatedness is often determined using generalpurpose lexical models such as word embeddings. We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings. With causality as a use case, we implement this insight in three steps. First, we generate causal embeddings cost-effectively by bootstrapping cause-effect pairs extracted from free text using a small set of seed patterns. Second, we train dedicated embeddings over this data, by using task-specific contexts, i.e., the context of a cause is its effect. Finally, we extend a state-of-the-art reranking approach for QA to incorporate these causal embeddings. We evaluate the causal embedding models both directly with a casual implication task, and indirectly, in a downstream causal QA task using data from Yahoo! Answers. We show that explicitly modeling causality improves performance in both tasks. In the QA task our best model achieves 37.3% P@1, significantly outperforming a strong baseline by 7.7% (relative).

      Less More
    • EMNLP 2016
      Jayant Krishnamurthy, Oyvind Tafjord, and Aniruddha Kembhavi

      Situated question answering is the problem of answering questions about an environment such as an image or diagram. This problem requires jointly interpreting a question and an environment using background knowledge to select the correct answer. We present Parsing to Probabilistic Programs (P3), a novel situated question answering model that can use background knowledge and global features of the question/environment interpretation while retaining efficient approximate inference. Our key insight is to treat semantic parses as probabilistic programs that execute nondeterministically and whose possible executions represent environmental uncertainty. We evaluate our approach on a new, publicly-released data set of 5000 science diagram questions, outperforming several competitive classical and neural baselines.

      Less More
    • COLING 2016
      Peter Jansen, Niranjan Balasubramanian, Mihai Surdeanu, and Peter Clark

      QA systems have been making steady advances in the challenging elementary science exam domain. In this work, we develop an explanation-based analysis of knowledge and inference requirements, which supports a fine-grained characterization of the challenges. In particular, we model the requirements based on appropriate sources of evidence to be used for the QA task. We create requirements by first identifying suitable sentences in a knowledge base that support the correct answer, then use these to build explanations, filling in any necessary missing information. These explanations are used to create a fine-grained categorization of the requirements. Using these requirements, we compare a retrieval and an inference solver on 212 questions. The analysis validates the gains of the inference solver, demonstrating that it answers more questions requiring complex inference, while also providing insights into the relative strengths of the solvers and knowledge sources. We release the annotated questions and explanations as a resource with broad utility for science exam QA, including determining knowledge base construction targets, as well as supporting information aggregation in automated inference.

      Less More
    • NIPS 2016
      Been Kim, Sanmi Koyejo and Rajiv Khanna

      Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely sufficient to represent the gist of the complexity. In order for users to construct better mental models and understand complex data distributions, we also need criticism to explain what are not captured by prototypes. Motivated by the Bayesian model criticism framework, we develop MMD-critic which efficiently learns prototypes and criticism, designed to aid human interpretability. A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing competitive performance compared to baselines.

      Less More
    • NIPS 2016
      Shengjia Zhao, Enze Zhou, Ashish Sabharwal, and Stefano Ermon

      A key challenge in sequential decision problems is to determine how many samples are needed for an agent to make reliable decisions with good probabilistic guarantees. We introduce Hoeffding-like concentration inequalities that hold for a random, adaptively chosen number of samples. Our inequalities are tight under natural assumptions and can greatly simplify the analysis of common sequential decision problems. In particular, we apply them to sequential hypothesis testing, best arm identification, and sorting. The resulting algorithms rival or exceed the state of the art both theoretically and empirically.

      Less More
    • Ethics 2016
      Amitai Etzioni and Oren Etzioni

      The growing number of 'smart' instruments, those equipped with AI, has raised concerns because these instruments make autonomous decisions; that is, they act beyond the guidelines provided them by programmers. Hence, the question the makers and users of smart instrument (e.g., driver-less cars) face is how to ensure that these instruments will not engage in unethical conduct (not to be conflated with illegal conduct). The article suggests that to proceed we need a new kind of AI program—oversight programs—that will monitor, audit, and hold operational AI programs accountable.

      Less More
    • AI Magazine 2016
      Peter Clark and Oren Etzioni

      Given the well-known limitations of the Turing Test, there is a need for objective tests to both focus attention on, and measure progress towards, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling — critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world. Here we propose this task as a challenge problem for the community, summarize our state-of-the-art results on math and science tests, and provide supporting datasets (see www.allenai.org/data.html).

      Less More
    • AAAI 2016
      Ashish Sabharwal, Horst Samulowitz, and Gerald Tesauro

      We study a novel machine learning (ML) problem setting of sequentially allocating small subsets of training data amongst a large set of classifiers. The goal is to select a classifier that will give near-optimal accuracy when trained on all data, while also minimizing the cost of misallocated samples. This is motivated by large modern datasets and ML toolkits with many combinations of learning algorithms and hyper- parameters. Inspired by the principle of “optimism under un- certainty,” we propose an innovative strategy, Data Allocation using Upper Bounds (DAUB), which robustly achieves these objectives across a variety of real-world datasets. We further develop substantial theoretical support for DAUB in an idealized setting where the expected accuracy of a classifier trained on n samples can be known exactly. Under these conditions we establish a rigorous sub-linear bound on the regret of the approach (in terms of misallocated data), as well as a rigorous bound on suboptimality of the selected classifier. Our accuracy estimates using real-world datasets only entail mild violations of the theoretical scenario, suggesting that the practical behavior of DAUB is likely to approach the idealized behavior.

      Less More
    • AAAI 2016
      Carolyn Kim, Ashish Sabharwal, and Stefano Ermon

      We consider the problem of sampling from a discrete probability distribution specified by a graphical model. Exact samples can, in principle, be obtained by computing the mode of the original model perturbed with an exponentially many i.i.d. random variables. We propose a novel algorithm that views this as a combinatorial optimization problem and searches for the extreme state using a standard integer linear programming (ILP) solver, appropriately extended to account for the random perturbation. Our technique, GumbelMIP, leverages linear programming (LP) relaxations to evaluate the quality of samples and prune large portions of the search space, and can thus scale to large tree-width models beyond the reach of current exact inference methods. Further, when the optimization problem is not solved to optimality, our method yields a novel approximate sampling technique. We empirically demonstrate that our approach parallelizes well, our exact sampler scales better than alternative approaches, and our approximate sampler yields better quality samples than a Gibbs sampler and a low-dimensional perturbation method.

      Less More
    • AAAI 2016
      Shengjia Zhao, Sorathan Chaturapruek, Ashish Sabharwal, and Stefano Ermon

      Many recent algorithms for approximate model counting are based on a reduction to combinatorial searches over random subsets of the space defined by parity or XOR constraints. Long parity constraints (involving many variables) provide strong theoretical guarantees but are computationally difficult. Short parity constraints are easier to solve but have weaker statistical properties. It is currently not known how long these parity constraints need to be. We close the gap by providing matching necessary and sufficient conditions on the required asymptotic length of the parity constraints. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs. We empirically demonstrate the effectiveness of these bounds on model counting benchmarks and in a Satisfiability Modulo Theory (SMT) application motivated by the analysis of contingency tables in statistics.

      Less More
    • AAAI 2016
      Shuo Yang, Tushar Khot, Kristian Kersting, and Sriraam Natarajan

      Many real world applications in medicine, biology, communication networks, web mining, and economics, among others, involve modeling and learning structured stochastic processes that evolve over continuous time. Existing approaches, however, have focused on propositional domains only. Without extensive feature engineering, it is difficult---if not impossible---to apply them within relational domains where we may have varying number of objects and relations among them. We therefore develop the first relational representation called Relational Continuous-Time Bayesian Networks (RCTBNs) that can address this challenge. It features a nonparametric learning method that allows for efficiently learning the complex dependencies and their strengths simultaneously from sequence data. Our experimental results demonstrate that RCTBNs can learn as effectively as state-of-the-art approaches for propositional tasks while modeling relational tasks faithfully.

      Less More
    • AAAI 2016
      Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, and Peter Turney

      What capabilities are required for an AI system to pass standard 4th Grade Science Tests? Previous work has examined the use of Markov Logic Networks (MLNs) to represent the requisite background knowledge and interpret test questions, but did not improve upon an information retrieval (IR) baseline. In this paper, we describe an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results. We evaluate the methods on six years of unseen, unedited exam questions from the NY Regents Science Exam (using only non-diagram, multiple choice questions), and show that our overall system’s score is 71.3%, an improvement of 23.8% (absolute) over the MLN-based method described in previous work. We conclude with a detailed analysis, illustrating the complementary strengths of each method in the ensemble. Our datasets are being released to enable further research.

      Less More
    • WSDM 2016
      Bhavana Dalvi, Aditya Mishra, and William W. Cohen

      In an entity classification task, topic or concept hierarchies are often incomplete. Previous work by Dalvi et al. has shown that in non-hierarchical semi-supervised classification tasks, the presence of such unanticipated classes can cause semantic drift for seeded classes. The Exploratory learning method was proposed to solve this problem; however it is limited to the flat classification task. This paper builds such exploratory learning methods for hierarchical classification tasks. We experimented with subsets of the NELL ontology and text, and HTML table datasets derived from the ClueWeb09 corpus. Our method (OptDAC-ExploreEM) outperforms the existing Exploratory EM method, and its naive extension (DAC-ExploreEM), in terms of seed class F1 on average by 10% and 7% respectively.

      Less More
    • Vanderbilt 2016
      Amitai Etzioni and Oren Etzioni

      AI programs make numerous decisions on their own, lack transparency, and may change frequently. Hence, the article shows, unassisted human agents — such as auditors, accountants, inspectors, and police — cannot ensure that AI guided instruments will abide by the law. Human agents need assistance of AI oversight programs that analyze and oversee the operational AI programs. The article then asks whether operational AI programs should be programmed to enable human users to override them — without that such a move would undermine the legal order. The article next points out that AI operational programs provide very high surveillance capacities, and that hence AI oversight programs are essential for protecting individual rights in the cyber age. The article closes by discussing the argument that AI guided instruments, e.g. robots, lead to endangering much more than the legal order — that they may turn on their makers, or even destroy humanity.

      Less More
    • ICCV 2015
      Hamid Izadinia, Fereshteh Sadeghi, Santosh Divvala, Hanna Hajishirzi, Yejin Choi, and Ali Farhadi

      We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a highquality segment-phrase table using minimal human supervision. More importantly, we demonstrate the unique value unleashed by this rich bimodal resource, for both vision as well as natural language understanding. First, we show that fine-grained textual labels facilitate contextual reasoning that helps in satisfying semantic constraints across image segments. This feature enables us to achieve state-of-the-art segmentation results on benchmark datasets. Next, we show that the association of high-quality segmentations to textual phrases aids in richer semantic understanding and reasoning of these textual phrases. Leveraging this feature, we motivate the problem of visual entailment and visual paraphrasing, and demonstrate its utility on a large dataset.

      Less More
    • EMNLP 2015
      Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi, Oren Etzioni, and Clint Malcolm

      This paper introduces GeoS, the first automated system to solve unaltered SAT geometry questions by combining text understanding and diagram interpretation. We model the problem of understanding geometry questions as submodular optimization, and identify a formal problem description likely to be compatible with both the question text and diagram. GeoS then feeds the description to a geometric solver that attempts to determine the correct answer. In our experiments, GeoS achieves a 49% score on official SAT questions, and a score of 61% on practice questions.1 Finally, we show that by integrating textual and visual information, GeoS boosts the accuracy of dependency and semantic parsing of the question text.

      Less More
    • AAAI • Workshop on Scholarly Big Data 2015
      Christopher Clark and Santosh Divvala

      Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. While many "off-the-shelf" tools exist that can extract embedded images from these documents, e.g. PDFBox, Poppler, etc., these tools are unable to extract tables, captions, and figures composed of vector graphics. Our proposed approach analyzes the structure of individual pages of a document by detecting chunks of body text, and locates the areas wherein figures or tables could reside by reasoning about the empty regions within that text. This method can extract a wide variety of figures because it does not make strong assumptions about the format of the figures embedded in the document, as long as they can be differentiated from the main article's text. Our algorithm also demonstrates a caption-to-figure matching component that is effective even in cases where individual captions are adjacent to multiple figures. Our contribution also includes methods for leveraging particular consistency and formatting assumptions to identify titles, body text and captions within each article. We introduce a new dataset of 150 computer science papers along with ground truth labels for the locations of the figures, tables and captions within them. Our algorithm achieves 96% precision at 92% recall when tested against this dataset, surpassing previous state of the art. We release our dataset, code, and evaluation scripts on our project website for enabling future research.

      Less More
    • Proceedings of IAAI 2015
      Peter Clark

      While there has been an explosion of impressive, datadriven AI applications in recent years, machines still largely lack a deeper understanding of the world to answer questions that go beyond information explicitly stated in text, and to explain and discuss those answers. To reach this next generation of AI applications, it is imperative to make faster progress in areas of knowledge, modeling, reasoning, and language. Standardized tests have often been proposed as a driver for such progress, with good reason: Many of the questions require sophisticated understanding of both language and the world, pushing the boundaries of AI, while other questions are easier, supporting incremental progress. In Project Aristo at the Allen Institute for AI, we are working on a specific version of this challenge, namely having the computer pass Elementary School Science and Math exams. Even at this level there is a rich variety of problems and question types, the most difficult requiring significant progress in AI. Here we propose this task as a challenge problem for the community, and are providing supporting datasets. Solutions to many of these problems would have a major impact on the field so we encourage you: Take the Aristo Challenge!

      Less More
    • AAAI • Workshop on Scholarly Big Data 2015
      Marco Valenzuela, Vu Ha, and Oren Etzioni

      We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that measure the quality of publications. We model this task as a supervised classification problem at two levels of detail: a coarse one with classes (important vs. non-important), and a more detailed one with four importance classes. We annotate a dataset of approximately 450 citations with this information, and release it publicly. We propose a supervised classification approach that addresses this task with a battery of features that range from citation counts to where the citation appears in the body of the paper, and show that, our approach achieves a precision of 65% for a recall of 90%.

      Less More
    • NAACL 2015
      Rebecca Sharp, Peter Jansen, Mihai Surdeanu, and Peter Clark

      Monolingual alignment models have been shown to boost the performance of question answering systems by "bridging the lexical chasm" between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.

      Less More