A photo of Peter Clark

Peter Clark


Dr. Peter Clark is a Senior Research Director and founding member of the Allen Institute for AI (AI2), and also served as Interim CEO from 2022-2023. He leads AI2’s Aristo Project, a team of 15 people developing AI agents that can systematically reason, explain, and continually improve over time, in particular in the context of scientific discovery. He received his Ph.D. in 1991 and has worked in AI for over 30 years. He has published over 250 papers, and has received several awards, including four Best Paper awards (AAAI, EMNLP x 2, AKBC), a Boeing Associate Technical Fellowship (2004), and Senior Membership of AAAI.

Semantic ScholarGoogle ScholarContact


  • Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

    Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter TurneyAAAI 2016

    What capabilities are required for an AI system to pass standard 4th Grade Science Tests? Previous work has examined the use of Markov Logic Networks (MLNs) to represent the requisite background knowledge and interpret test questions, but did not improve upon an information retrieval (IR) baseline. In this paper, we describe an alternative app.

  • Semantic Role Labeling for Process Recognition Questions

    S. Louvan, C. Naik, V. Lynn, A. Arun, N. Balasubramanian, P. ClarkProc. 1st Int Workshop on Capturing Scientific Knowledge (SciKnow'15) | 2015

    We consider a 4th grade level question answering task. We focus on a subset involving recognizing instances of physical, biological, and other natural processes. Many processes involve similar entities and are hard to distinguish using simple bag-of-words representations alone. Simple semantic roles such as Input, Result, and Enabler can often capture the most critical bits of information about processes. Our QA system scores answers by aligning semantic roles in the question against the roles in the knowledge. Empirical evaluation shows that manually generated roles provide a 12% relative improvement in accuracy over a simpler bag-of-words representation. However, automatic role identification is noisy and doesn’t provide gains even with distant supervision and domain adaptation modifications to account for the limited training data. In addition, we conducted an error analysis of the QA system when using the manual roles. We find representational gaps i.e., cases where critical information doesn’t fit into any of the current roles, as well as entailment issues that motivate deeper reasoning beyond simple role based alignment for future work.

  • Answering Elementary Science Questions by Constructing Coherent Scenes using Background Knowledge

    Yang Li, Peter ClarkEMNLP | 2015

    Much of what we understand from text is not explicitly stated. Rather, the reader uses his/her knowledge to fill in gaps and create a coherent, mental picture or “scene” depicting what text appears to convey. The scene constitutes an understanding of the text, and can be used to answer questions that go beyond the text. Our goal is to answer elementary science questions, where this requirement is pervasive; A question will often give a partial description of a scene and ask the student about implicit information. We show that by using a simple “knowledge graph” representation of the question, we can leverage several large-scale linguistic resources to provide missing background knowledge, somewhat alleviating the knowledge bottleneck in previous approaches. The coherence of the best resulting scene, built from a question/answer-candidate pair, reflects the confidence that the answer candidate is correct, and thus can be used to answer multiple choice questions. Our experiments show that this approach outperforms competitive algorithms on several datasets tested. The significance of this work is thus to show that a simple “knowledge graph” representation allows a version of “interpretation as scene construction” to be made viable.

  • Exploring Markov Logic Networks for Question Answering

    Tushar Khot, Niranjan Balasubramanian, Eric Gribkoff, Ashish Sabharwal, Peter Clark, Oren EtzioniEMNLP | 2015

    Elementary-level science exams pose significant knowledge acquisition and reasoning challenges for automatic question answering. We develop a system that reasons with knowledge derived from textbooks, represented in a subset of first-order logic. Automatic extraction, while scalable, often results in knowledge that is incomplete and noisy, motivating use of reasoning mechanisms that handle uncertainty. Markov Logic Networks (MLNs) seem a natural model for expressing such knowledge, but the exact way of leveraging MLNs is by no means obvious. We investigate three ways of applying MLNs to our task. First, we simply use the extracted science rules directly as MLN clauses and exploit the structure present in hard constraints to improve tractability. Second, we interpret science rules as describing prototypical entities, resulting in a drastically simplified but brittle network. Our third approach, called Praline, uses MLNs to align lexical elements as well as define and control how inference should be performed in this task. Praline demonstrates a 15% accuracy boost and a 10x reduction in runtime as compared to other MLN-based methods, and comparable accuracy to word-based baseline approaches.

  • Higher-order Lexical Semantic Models for Non-factoid Answer Reranking

    Daniel Fried, Peter Jansen, Gustave Hahn-Powell, Mihai Surdeanu, Peter ClarkTransactions of the Association for Computational Linguistics | 2015

    Lexical semantic models provide robust performance for question answering, but, in general, can only capitalize on direct evidence seen during training. For example, monolingual alignment models acquire term alignment probabilities from semistructured data such as question-answer pairs; neural network language models learn term embeddings from unstructured text. All this knowledge is then used to estimate the semantic similarity between question and answer candidates. We introduce a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations. Using a corpus of 10,000 questions from Yahoo! Answers, we experimentally demonstrate that higher-order methods are broadly applicable to alignment and language models, across both word and syntactic representations. We show that an important criterion for success is controlling for the semantic drift that accumulates during graph traversal. All in all, the proposed higher-order approach improves five out of the six lexical semantic models investigated, with relative gains of up to +13% over their first-order variants.

  • Learning Knowledge Graphs for Question Answering through Conversational Dialog

    Ben Hixon, Peter Clark, Hannaneh HajishirziNAACL | 2015

    We describe how a question-answering system can learn about its domain from conversational dialogs. Our system learns to relate concepts in science questions to propositions in a fact corpus, stores new concepts and relations in a knowledge graph (KG), and uses the graph to solve questions. We are the first to acquire knowledge for question-answering from open, natural language dialogs without a fixed ontology or domain model that predetermines what users can say. Our relation-based strategies complete more successful dialogs than a query expansion baseline, our taskdriven relations are more effective for solving science questions than relations from general knowledge sources, and our method is practical enough to generalize to other domains.

  • Spinning Straw into Gold: Using Free Text to Train Monolingual Alignment Models for Non-factoid Question Answering

    Rebecca Sharp, Peter Jansen, Mihai Surdeanu, Peter ClarkNAACL | 2015

    Monolingual alignment models have been shown to boost the performance of question answering systems by "bridging the lexical chasm" between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.

  • Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!

    Peter ClarkProceedings of IAAI | 2015

    While there has been an explosion of impressive, datadriven AI applications in recent years, machines still largely lack a deeper understanding of the world to answer questions that go beyond information explicitly stated in text, and to explain and discuss those answers. To reach this next generation of AI applications, it is imperative to make faster progress in areas of knowledge, modeling, reasoning, and language. Standardized tests have often been proposed as a driver for such progress, with good reason: Many of the questions require sophisticated understanding of both language and the world, pushing the boundaries of AI, while other questions are easier, supporting incremental progress. In Project Aristo at the Allen Institute for AI, we are working on a specific version of this challenge, namely having the computer pass Elementary School Science and Math exams. Even at this level there is a rich variety of problems and question types, the most difficult requiring significant progress in AI. Here we propose this task as a challenge problem for the community, and are providing supporting datasets. Solutions to many of these problems would have a major impact on the field so we encourage you: Take the Aristo Challenge!

  • Automatic Construction of Inference-Supporting Knowledge Bases

    Peter Clark, Niranjan Balasubramanian, Sumithra Bhakthavatsalam, Kevin Humphreys, Jesse Kinkead, Ashish Sabharwal, Oyvind TafjordAKBC | 2014

    While there has been tremendous progress in automatic database population in recent years, most of human knowledge does not naturally fit into a database form. For example, knowledge that "metal objects can conduct electricity" or "animals grow fur to help them stay warm" requires a substantially different approach to both acquisition and representation. This kind of knowledge is important because it can support inference e.g., (with some associated confidence) if an object is made of metal then it can conduct electricity; if an animal grows fur then it will stay warm. If we want our AI systems to understand and reason about the world, then acquisition of this kind of inferential knowledge is essential. In this paper, we describe our work on automatically constructing an inferential knowledge base, and applying it to a question-answering task. Rather than trying to induce rules from examples, or enter them by hand, our goal is to acquire much of this knowledge directly from text. Our premise is that much inferential knowledge is written down explicitly, in particular in textbooks, and can be extracted with reasonable reliability. We describe several challenges that this approach poses, and innovative, partial solutions that we have developed. Finally we speculate on the longer-term evolution of this work.

  • Modeling Biological Processes for Reading Comprehension

    Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Brad Huang, Christopher D. Manning, Abby Vander Linden, Brittany Harding, Peter ClarkEMNLP | 2014

    Machine reading calls for programs that read and understand text, but most current work only attempts to extract facts from redundant web-scale corpora. In this paper, we focus on a new reading comprehension task that requires complex reasoning over a single document. The input is a paragraph describing a biological process, and the goal is to answer questions that require an understanding of the relations between entities and events in the process. To answer the questions, we first predict a rich structure representing the process in the paragraph. Then, we map the question to a formal query, which is executed against the predicted structure. We demonstrate that answering questions via predicted structures substantially improves accuracy over baselines that use shallower representations.

  • Discourse Complements Lexical Semantics for Non-factoid Answer Reranking

    Jansen, P., Surdeanu, M., Clark, P.ACL | 2014

    We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We experimentally demonstrate that the discourse structure of nonfactoid answers provides information that is complementary to lexical semantic similarity between question and answer, improving performance up to 24% (relative) over a state-of-the-art model that exploits lexical semantic similarity alone. We further demonstrate excellent domain transfer of discourse information, suggesting these discourse features have general utility to non-factoid question answering.

  • A Study of the Knowledge Base Requirements for Passing an Elementary Science Test

    Clark, P., Harrison, P., Balasubramanian, N.AKBC, Workshop on Automatic KB Construction | 2013

    Our long-term interest is in machines that contain large amounts of general and scientific knowledge, stored in a "computable" form that supports reasoning and explanation. As a medium-term focus for this, our goal is to have the computer pass a fourth-grade science test, anticipating that much of the required knowledge will need to be acquired semi-automatically. This paper presents the first step towards this goal, namely a blueprint of the knowledge requirements for an early science exam, and a brief description of the resources, methods, and challenges involved in the semiautomatic acquisition of that knowledge. The result of our analysis suggests that as well as fact extraction from text and statistically driven rule extraction, three other styles of automatic knowledge-base construction (AKBC) would be useful: acquiring definitional knowledge, direct "reading" of rules from texts that state them, and, given a particular representational framework (e.g., qualitative reasoning), acquisition of specific instances of those models from text (e..g, specific qualitative models).

  • Learning Biological Processes with Global Constraints

    Berant, J., Manning, C., Clark, P., Harding, B., Lewis, J.EMNLP | 2013

    Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) — specifically "How?" and "Why?" questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set of temporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint inference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.

  • Extracting Meronyms for a Biology Knowledge Base Using Distant Supervision

    Ling, X., Weld, D., Clark, P.AKBC | 2013

    Knowledge of objects and their parts, meronym relations, are at the heart of many question-answering systems, but manually encoding these facts is impractical. Past researchers have tried hand-written patterns, supervised learning, and bootstrapped methods, but achieving both high precision and recall has proven elusive. This paper reports on a thorough exploration of distant supervision to learn a meronym extractor for the domain of college biology. We introduce a novel algorithm, generalizing the "at least one" assumption of multi-instance learning to handle the case where a fixed (but unknown) percentage of bag members are positive examples. Detailed experiments compare strategies for mention detection, negative example generation, leveraging out-of-domain meronyms, and evaluate the benefit of our multi-instance percentage model.

  • Semi-Markov Phrase-based Monolingual Alignment

    Yao, X., Van Durme, B., Callision-Burch, C., Clark, P.EMNLP | 2013

    We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model's alignment score approaches the state of the art.

  • A Lightweight and High Performance Monolingual Word Aligner

    Yao, X., Van Durme, B., Callision-Burch, C., Clark, P.ACL | 2013

    Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.

  • Automatic Coupling of Answer Extraction and Information Retrieval

    Yao, X., Van Durme, B., Clark, P.ACL | 2013

    Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.

  • Answer Extraction as Sequence Tagging with Tree Edit Distance

    Yao, X., Van Durme, B., Callision-Burch, C., Clark, P.NAACL | 2013

    Our goal is to extract answers from preretrieved sentences for Question Answering (QA). We construct a linear-chain Conditional Random Field based on pairs of questions and their possible answer sentences, learning the association between questions and answer types. This casts answer extraction as an answer sequence tagging problem for the first time, where knowledge of shared structure between question and source sentence is incorporated through features based on Tree Edit Distance (TED). Our model is free of manually created question and answer templates, fast to run (processing 200 QA pairs per second excluding parsing time), and yields an F1 of 63.3% on a new public dataset based on prior TREC QA evaluations. The developed system is open-source, and includes an implementation of the TED model that is state of the art in the task of ranking QA pairs.

  • Inquire Biology: A Textbook that Answers Questions

    Chaudhri, V., Cheng, B., Overholtzer, A., Roschelle, J., Spaulding, A., Clark, P., Greaves, M., Gunning, D.AI Magazine | 2013

    Inquire Biology is a prototype of a new kind of intelligent textbook—one that answers students’ questions, engages their interest, and improves their understanding. Inquire provides unique capabilities via a knowledge representation that captures conceptual knowledge from the textbook and uses inference procedures to answer students’ questions. Students ask questions by typing free‐form natural language queries, by entering concepts of interest, or by selecting passages of text. The system then attempts to answer the question and also generates suggested questions related to the query or selection. The questions supported by the system were chosen to be educationally useful, for example: what is the structure of X?; compare X and Y?; how does X relate to Y? In user studies, students found this question‐answering capability to be extremely useful while reading and while doing problem solving. In an initial controlled experiment, community college students using the Inquire Biology prototype outperformed students using either a hardcopy or conventional E‐book version of the same biology textbook. Additional research is needed to fully develop Inquire, but the initial prototype clearly demonstrates the promise of applying knowledge representation technology to electronic textbooks.

  • SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge

    Dzikovska, M., Nielsen, R., Brew, C., Leacock, C., Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I., Dang, H.Proc. 7th International Workshop on Semantic Evaluation (SemEval-2013) | 2013

    We present the results of the Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge, aiming to bring together researchers in educational NLP technology and textual entailment. The task of giving feedback on student answers requires semantic inference and therefore is related to recognizing textual entailment. Thus, we offered to the community a 5-way student response labeling task, as well as 3-way and 2-way RTE-style tasks on educational data. In addition, a partial entailment task was piloted. We present and compare results from 9 participating teams, and discuss future directions.

  • Constructing a Textual KB from a Biology TextBook

    Clark, P., Harrison, P., Balasubramanian, N., Etzioni, O.AKBC, Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction | 2012

    As part of our work on building a "knowledgeable textbook" about biology, we are developing a textual question-answering (QA) system that can answer certain classes of biology questions posed by users. In support of that, we are building a "textual KB" - an assembled set of semi-structured assertions based on the book - that can be used to answer users' queries, can be improved using global consistency constraints, and can be potentially validated and corrected by domain experts. Our approach is to view the KB as systematically caching answers from a QA system, and the QA system as assembling answers from the KB, the whole process kickstarted with an initial set of textual extractions from the book text itself. Although this research is only in a preliminary stage, we summarize our progress and lessons learned to date.

  • An Entailment-Based Approach to the QA4MRE Challenge

    Clark, P., Harrison, P., Yao, X.Proc. CLEF 2012 (Conference and Labs of the Evaluation Forum) - QA4MRE Lab | 2012

    This paper describes our entry to the 2012 QA4MRE Main Task (English dataset). The QA4MRE task poses a significant challenge as the expression of knowledge in the question and answer (in the document) typically substantially differs. Ultimately, one would need a system that can perform full machine reading – creating an internal model of the document’s meaning – to achieve high performance.

  • Answering Biology Questions using Textual Reasoning

    Clark, P., Harrison, P., Balasubramanian, N.Proc. of the Pacific Northwest Regional NLP Workshop (NW-NLP 2012) | 2012
  • The Seventh PASCAL Recognizing Textual Entailment Challenge

    Bentivogli, L., Clark, P., Dagan, I., Dang, H., Giampiccolo, D.Proc. 2011 Text Analysis Conference (TAC) | 2011

    This paper presents the Seventh Recognizing Textual Entailment (RTE-7) challenge. This year’s challenge replicated the exercise proposed in RTE-6, consisting of a Main Task, in which Textual Entailment is performed on a real corpus in the Update Summarization scenario; a Main subtask aimed at detecting novel information; and a KBP Validation Task, in which RTE systems had to validate the output of systems participating in the KBP Slot Filling Task. Thirteen teams participated in the Main Task (submitting 33 runs) and 5 in the Novelty Detection Subtask (submitting 13 runs). The KBP Validation Task was undertaken by 2 participants which submitted 5 runs. The ablation test experiment, introduced in RTE-5 to evaluate the impact of knowledge resources used by the systems participating in the Main Task and extended also to tools in RTE-6, was also repeated in RTE-7.

  • The Sixth PASCAL Recognizing Textual Entailment Challenge

    Bentivogli, L., Clark, P., Dagan, I., Dang, H., Giampiccolo, D.Proc. 2010 Text Analysis Conference (TAC) | 2010

    This paper presents the Sixth Recognizing Textual Entailment (RTE-6) challenge. This year a major innovation was introduced, as the traditional Main Task was replaced by a new task, similar to the RTE-5 Search Pilot, in which Textual Entailment is performed on a real corpus in the Update Summarization scenario. A subtask was also proposed, aimed at detecting novel information. To continue the effort of testing RTE in NLP applications, a KBP Validation Pilot Task was set up, in which RTE systems had to validate the output of systems participating in the KBP Slot Filling Task. Eighteen teams participated in the Main Task (48 submitted runs) and 9 in the Novelty Detection Subtask (22 submitted runs). As for the Pilot, 10 runs were submitted by 3 participants. Finally, the exploratory effort started in RTE-5 to perform resource evaluation through ablation tests was not only reiterated in RTE-6, but also extended to tools

  • Project Halo Update - Progress Toward Digital Aristotle

    D. Gunning, V. Chaudhri, P. Clark, K. Barker, et al.AI Magazine | 2010

    In the winter, 2004 issue of AI magazine, we reported Vulcan Inc.’s first step toward creating a question-answering system called “Digital Aristotle.” The goal of that first step was to assess the state of the art in applied knowledge representation and reasoning (KR&R) by asking AI experts to represent 70 pages from the advanced placement (AP) chemistry syllabus and to deliver knowledge-based systems capable of answering questions from that syllabus. This paper reports the next step toward realizing a Digital Aristotle: we present the design and evaluation results for a system called AURA, which enables domain experts in physics, chemistry, and biology to author a knowledgebase and that then allows a different set of users to ask novel questions against that knowledge base. These results represent a substantial advance over what we reported in 2004, both in the breadth of covered subjects and in the provision of sophisticated technologies in knowledge representation and reasoning, natural language processing, and question answering to domain experts and novice users.

  • BLUE-Lite: A Knowledge-Based Lexical Entailment System for RTE6

    P. Clark, P. HarrisonProceedings of 2010 Text Analysis Conference (TAC) | 2010 | Presentation Slides

    In this paper we present our RTE6 system, BLUE-Lite, and the results of experiments with it. Unlike our earlier RTE5 system, called BLUE, BLUE-Lite uses only a lexical ("bag of words") representation of the sentences. To compare lexical items, BLUE-Lite exploits linguistic and world knowledge drawn from WordNet and the DIRT paraphrase database. To take context into account, BLUE-Lite also looks in the preceding sentence (with reduced confidence) if an H word does not match T. In addition, the entailment threshold is varied between topics to account for the fact that some topics are harder to find entailments in than others. Our results show that WordNet, DIRT, and these two techniques all improved performance (producing an overall F=0.44), and also that a relatively simple baseline ("match all but one") without any of these techniques achieved a surprisingly high score (F=0.40). Finally, we discuss the role of structural information, why it is challenging to yield advantage from it (in particular in this year's challenge), but why ultimately it must be taken into account for further improvements in performance.

  • Exploiting Paraphrases and Deferred Sense Commitment to Interpret Questions More Reliably

    P. Clark, P. HarrisonProc. COLING'10 | 2010

    Creating correct, semantic representations of questions is essential for applications that can use formal reasoning to answer them. However, even within a restricted domain, it is hard to anticipate all the possible ways that a question might be phrased, and engineer reliable processing modules to produce a correct semantic interpretation for the reasoner. In our work on posing questions to a biology knowledge base, we address this brittleness in two ways: First, we exploit the DIRT paraphrase database to introduce alternative phrasings of a question; Second, we defer word sense and semantic role commitment until question answering. Resulting ambiguities are then resolved by interleaving additional interpretation with question-answering, allowing the combinatorics of alternatives to be controlled and domain knowledge to guide paraphrase and sense selection. Our evaluation suggests that the resulting system is able to understand exam-style questions more reliably.

  • Machine Reading as a Process of Partial Question-Answering

    P. Clark, P. HarrisonProc. NAACL Workshop on Formalisms and Methodology for Learning by Reading (FAM-LbR) | 2010 | Presentation Slides

    This paper explores the close relationship between question answering and machine reading, and how the active use of reasoning to answer (and in the process, disambiguate) questions can also be applied to reading declarative texts, where a substantial proportion of the text’s contents is already known to (represented in) the system. In question answering, a question may be ambiguous, and it may only be in the process of trying to answer it that the "right" way to disambiguate it becomes apparent. Similarly in machine reading, a text may be ambiguous, and may require some process to relate it to what is already known. Our conjecture in this paper is that these two processes are similar, and that we can modify a question answering tool to help "read" new text that augments existing system knowledge. Specifically, interpreting a new text T can be recast as trying to answer, or partially answer, the question "Is it true that T?", resulting in both appropriate disambiguation and connection of T to existing knowledge. Some preliminary investigation suggests this might be useful for proposing knowledge base extensions, extracted from text, to a knowledge engineer.

  • An Inference-Based Approach to Recognizing Entailment

    P. Clark, P. HarrisonProceedings of 2009 Text Analysis Conference (TAC) | 2009 | Presentation Slides

    For this year's RTE challenge we have continued to pursue a (somewhat) "logical" approach to recognizing entailment, in which our system, called BLUE (Boeing Language Understanding Engine) first creates a logic-based representation of a text T and then performs simple inference (using WordNet and the DIRT inference rule database) to try and infer a hypothesis H.

  • Large-Scale Extraction and Use of Knowledge From Text

    P. Clark, P. HarrisonProc. Fifth Int Conf on Knowledge Capture (KCap) | 2009 | Downloadable DART Database | Presentation Slides

    Many AI tasks, in particular natural language processing, require a large amount of world knowledge to create expectations, assess plausibility, and guide disambiguation. However, acquiring this world knowledge remains a formidable challenge. Building on ideas by Schubert, we have developed a system called DART (Discovery and Aggregation of Relations in Text) that extracts simple, semi-formal statements of world knowledge (e.g., "airplanes can fly", "people can drive cars") from text by abstracting from a parser's output, and we have used it to create a database of 23 million propositions of this kind. An evaluation of the DART database on two language processing tasks (parsing and textual entailment) shows that it improves performance, and a human evaluation shows that over half the facts in it are considered true or partially true, rising to 70% for facts seen with high frequency. The significance of this work is two-fold: First it has created a new, publically available knowledge resource for language processing and other data interpretation tasks, and second it provides empirical evidence of the utility of this type of knowledge, going beyond Schubert et al's earlier evaluations which were based solely on human inspection of its contents.

  • Naturalness vs. Predictability: A Key Debate in Controlled Languages

    P. Clark, P. Harrison, W. Murray, J. ThompsonProc. 2009 Workshop on Controlled Natural Languages (CNL) | 2009 | Presentation Slides

    In this paper we describe two quite different philosophies used in developing controlled languages (CLs): A "naturalist" approach, in which CL interpretation is treated as a simpler form of full natural language processing; and a "formalist" approach, in which the CL interpretation is “deterministic” (context insensitive) and the CL is viewed more as an English-like formal specification language. Despite the philosophical and practical differences, we suggest that a synthesis can be made in which a deterministic core is embedded in a naturalist CL, and illustrate this with our own controlled language CPL.

  • A Study of Machine Reading from Multiple Texts

    P. Clark, J. ThompsonAAAI Spring Symposium on Learning by Reading and Learning to Read | 2009 | Presentation Slides

    A system that seeks to build a semantically coherent representation from multiple texts requires (at least) three things: a representation language that is sufficiently expressive to capture the information conveyed by the text; a natural language engine that can interpret text and generate semantic representations in that language with reasonable reliability; and a knowledge integration capability that can integrate information from different texts and from background knowledge into a coherent whole. In this paper we present a case study of these requirements for interpreting four different paragraphs of text (from different sources), each describing how a two-stroke combustion engine behaves. We identify the challenges involved in meeting these requirements and how they might be addressed. One key feature that emerges is the need for extensive background knowledge to guide the interpretation, disambiguate, and fill in gaps. The resulting contribution of this paper is a deeper understanding of the overall machine reading task.

  • Recognizing Textual Entailment with Logical Inference

    P. Clark, P. HarrisonProc. of 2008 Text Analysis Conference (TAC) | 2008 | Presentation Slides

    With the goal of producing explainable entailmentdecisions, and ultimately having the computer "understand" the sentences it is processing, we have been pursuing a (somewhat) "logical" approach to recognizing entailment. First our system performs semantic interpretation of the sentence pairs. Then, it tries to determine if the (logic for) the H sentence subsumes (i.e., is implied by) some inference-elaborated version of the T sentence, using WordNet (including logical representations of its sense definitions) and the DIRT paraphrase database as its sources of knowledge. For pairs where it can conclude or refute entailment, the system often produces explanations which appear insightful, but also sometimes produces explanations which are clearly erroneous. In this paper we present our system and illustrate its good and bad behaviors. While the good behaviors are encouraging, the primary challenges continue to be: lack of lexical and world knowledge; poor quality of existing knowledge; and limitations of using a deductive style of reasoning with imprecise knowledge. Our best scores were: 56.5% (2-way task) and 48.1% (3-way task)

  • Augmenting WordNet for Deep Understanding of Text

    P. Clark, C. Fellbaum, J. Hobbs, P. Harrison, J. Thompson, W. MurrayProc SIGSEM Symposium on Text Processing (STEP) | 2009 | Presentation Slides

    One of the big challenges in understanding text, i.e., constructing an overall coherent representation of the text, is that much information needed in that representation is unstated (implicit). Thus, in order to "fill in the gaps" and create an overall representation, language processing systems need a large amount of world knowledge, and creating those knowledge resources remains a fundamental challenge. In our current work, we are seeking to augment WordNet as a knowledge resource for language understanding in several ways: adding in formal versions of its word sense definitions (glosses); classifying the morphosemantic links between nouns and verbs; encoding a small number of "core theories" about WordNet's most commonly used terms; and adding in simple representations of scripts. Although this is still work in progress, we describe our experiences so far with what we hope will be a significantly improved resource for the deep understanding of language.

  • Boeing's NLP System and the Challenges of Semantic Representation

    P. Clark, P. HarrisonProc. SIGSEM Symposium on Text Processing (STEP) | 2008 | Presentation Slides

    We describe Boeing's NLP system, BLUE, comprising a pipeline of a parser, a logical form (LF) generator, an initial logic generator, and further processing modules. The initial logic generator produces logic whose structure closely mirrors the structure of the original text. The subsequent processing modules then perform, with somewhat limited scope, additional transformations to convert this into a more usable representation with respect to a specific target ontology, better able to support inference.

  • Using and Extending WordNet to Support Question-Answering

    P. Clark, C. Fellbaum, J. HobbsProc. Fourth Global WordNet Conference (GWC) | 2008

    Over the last few years there has been increased research in automated question-answering from text, including questions whose answer is implied, rather than explicitly stated, in the text. WordNet has played a central role in many such systems (e.g., 21 of the 26 teams in the recent PASCAL RTE3 challenge used WordNet), and thus WordNet is being increasingly stretched to play more semantic tasks in applications. As part of our current research, we are exploring some of the new demands which question-answering places on WordNet, and how it might be further extended to meet them. In this paper, we present some of these new requirements, and some of the extensions that we are currently making to WordNet in response.

  • Capturing and Answering Questions Posed to a Knowledge-Based System

    P. Clark, J. Chaw, K. Barker, V. Chaudhri, P. Harrison, J. Fan, B. John, B. Porter, A. Spaulding, J. Thompson, P. YehProc. 4th Int Conf on Knowledge Capture (KCap) | 2007 | Presentation Slides

    As part of an ongoing project, Project Halo, our goal is to build a system capable of answering questions posed by novice users to a formal knowledge base. In our current context, the knowledge base covers selected topics in physics, chemistry, and biology, with AP (advanced high school) level examination questions. The task is challenging because the original questions are linguistically complex and are often incomplete (assume unstated knowledge), and because the users do not have prior knowledge of the system's ontology. Our solution involves two parts: a controlled language interface, in which users reformulate the original natural language questions in a simplified version of English, and a novel problem solver that can elaborate initially inadequate logical interpretations of a question by using relevant pieces of knowledge in the knowledge base. An extensive evaluation of the work in 2006 showed that this approach is feasible and that complex, multisentence questions can be posed and answered successfully, thus illustrating novel ways of dealing with the knowledge capture impedance between users and a formal knowledge base, while also revealing challenges that still remain.

  • Putting Semantics into WordNet's "Morphosemantic" Links

    C. Fellbaum, A. Osherson, P. ClarkProc 3rd Language & Technology Conference (LTC) | 2007

    To add to WordNet's contents, and specifically to aid automatic reasoning with WordNet, we classify and label the current relations among derivationally and semantically related noun-verb pairs. Manual inspection of thousands of pairs shows that the form-meaning mappings of affixes are not in a one-to-one relation and far less regularity than expected. We determine a set of semantic relations found across a number of morphologically defined noun-verb pair classes.

  • On the Role of Lexical and World Knowledge in RTE3

    P. Clark, W. R. Murray, J. Thompson, P. Harrison, J. Hobbs, C. FellbaumProc. 2007 ACL-PASCAL Workshop of Textual Entailment and Paraphrasing | 2007

    To score well in RTE3, and even more so to create good justifications for entailments, substantial lexical and world knowledge is needed. With this in mind, we present an analysis of a sample of the RTE3 positive entailment pairs, to identify where and what kinds of world knowledge are needed to fully identify and justify the entailment, and discuss several existing resources and their capacity for supplying that knowledge. We also briefly sketch the path we are following to build an RTE system (Our implementation is very preliminary, scoring 50.9% at the time of RTE). The contribution of this paper is thus a framework for discussing the knowledge requirements posed by RTE and some exploration of how these requirements can be met.

  • Reading to Learn: An Investigation into Language Understanding

    P. Clark, P. Harrison, J. Thompson, R. Wojcik, T. Jenkins, D. IsraelProc. AAAI 2007 Spring Symposium on Machine Reading | 2007 | Presentation Slides

    One of the most important methods by which human beings learn is by reading. While in its full generality, the reading task is still too difficult a capability to be implemented in a computer, significant (if partial) approaches to the task are now feasible. Our goal in this project was to study issues and develop solutions for this task by working with a reduced version of the problem, namely working with text written in a simplified version of English (a Controlled Language) rather than full natural language. Our experience and results reveal that even this reduced version of the task is still challenging, and we have uncovered several major insights into this challenge. We describe our work and analysis, present a synthesis and evaluation of our work, and make several recommendations for future work in this area. Our conclusion is that ultimately, to bridge the “knowledge gap”, a pipelined approach is inappropriate, and that to address the knowledge requirements for good language understanding an iterative (bootstrapped) approach is the most promising way forward.

  • From WordNet to a Knowledge Base

    P. Clark, P. Harrison, T. Jenkins, J. Thompson, R. WojcikProc. AAAI 2006 Spring Symposium on Formalizing and Compiling Background Knowledge | 2006 | Presentation Slides

    At Boeing we have been attempting to use WordNet – an online lexical resource – for machine reasoning, using both its taxonomic information and other information (e.g., parts relations). While we get some leverage, it is clear that WordNet is drastically limited in the types of knowledge it contains. In this paper, we will describe our work in this area, some of our attempts to expand on WordNet, and present a vision for what we believe a future WordNet-like resource should look like – a large knowledge base with vastly richer representational structures – and the potential such a resource would offer for machine reasoning.

  • Acquiring and Using World Knowledge using a Restricted Subset of English

    P. Clark, P. Harrison, T. Jenkins, J. Thompson, R. WojcikThe 18th International FLAIRS Conference (FLAIRS) | 2005 | Presentation Slides

    Many AI applications require a base of world knowledge to support reasoning. However, construction of such inferencecapable knowledge bases, even if constrained in coverage, remains one of the major challenges of AI. Authoring knowledge in formal logic is too complex a task for many users, while knowledge authored in unconstrained natural language is generally too difficult for computers to understand. However, there is an intermediate position, which we are pursuing, namely authoring knowledge in a restricted subset of natural language. Our claim is that this approach hits a “sweet spot” between the former two extremes, being both usable by humans and understandable by machines. We have developed such a language (called CPL, Computer-Processable Language), an interpreter, and a reasoner, and have used them to encode approximately 1000 "commonsense" rules (a mixture of general and domain-specific). The knowledge base is being used experimentally for semantic retrieval of video clips based on their captions, also expressed in CPL. In this paper, we describe CPL, its interpretation, and its use for reasoning, and discuss the strengths and weaknesses of restricted natural language as a the basis for knowledge representation.

  • A Portable Process Language

    P. Clark, D. Morley, V. Chaudhri, K. MyersProc. ICAPS Workshop on the Role of Ontologies in AI Planning and Scheduling | 2005

    Process representation languages designed to support execution have evolved to support specialized reasoning capabilities like action selection and task decomposition, but do not readily support inferences that one might need for explanation or question answering. In this paper, we report on a process language, PPL, that we have designed to serve as a bridge between a representation designed for execution and a representation designed for applications such as question answering and explanation generation. Through its use of a propositional-style representation of process structure, PPL can enable the use of generalized reasoning methods for those purposes. PPL is novel in that it directly encodes the process "flow chart" in a neutral, KIF-like syntax, allowing other modules to introspect on the process structure.

  • Project Halo: Towards a Digital Aristotle

    N. Friedland, G. Matthews, M. Witbrock, D. Baxter, J. Curtis, B. Shepard, P. Miraglia, J. Angele, S. Staab, E. Moench, H. Oppermann, D. Wenke, D. Israel, V. Chaudhri, B. Porter, K. Barker, J. Fan, J. Chaw, P. Yeh, D. Tecuci, P. ClarkAI Magazine | 2004

    Vulcan Inc.’s Project Halo is a multi-staged effort to create a Digital Aristotle, an application that will encompass much of the world's scientific knowledge and be capable of applying sophisticated problem-solving to answer novel questions. Vulcan envisions two primary roles for the Digital Aristotle: as a tutor to instruct students in the sciences, and as an interdisciplinary research assistant to help scientists in their work.

  • A Question-Answering System for AP Chemistry: Assessing KR&R Technologies

    K. Barker, V. Chaudhri, S. Chaw, P. Clark, J. Fan, D. Israel, S. Mishra, B. Porter, P. Romero, D. Tecuci, P. Yeh.Proc 9th International Conf on Knowledge Representation and Reasoning (KR) | 2004

    Basic research in knowledge representation and reasoning (KR&R) has steadily advanced over the years, but it has been difficult to assess the capability of fielded systems derived from this research. In this paper, we present a knowledge-based question-answering system that we developed as part of a broader effort by Vulcan Inc. to assess KR&R technologies, and the result of its assessment. The challenge problem presented significant new challenges for knowledge representation, compared with earlier such assessments, due to the wide variability of question types that the system was expected to answer. Our solution integrated several modern KR&R technologies, in particular semantically well-defined frame systems, automatic classification methods, reusable ontologies, a methodology for knowledge base construction, and a novel extension of methods for explanation generation. The resulting system exhibited high performance, achieving scores for both accuracy and explanation which were comparable to human performance on similar tests. While there are qualifications to this result, it is a significant achievement and an informative data point about the state of the art in KR&R, and reflects significant progress by the field.

  • Graph-Based Acquisition of Expressive Knowledge

    V. Chaudhri, K. Murray, J. Pacheco,, P. Clark, B. Porter, P. HayesProc. European Knowledge Acquisition Workshop (EKAW'04) | 2004

    Capturing and exploiting knowledge is at the heart of several important problems such as decision making, the semantic web, and intelligent agents. The captured knowledge must be accessible to subject matter experts so that the knowledge can be easily extended, queried, and debugged. In our previous work to meet this objective, we created a knowledge authoring system based on graphical assembly from components that allowed acquisition of an interestingly broad class of axioms. In this paper, we explore the question: can we expand the axiom classes acquired by building on our existing graphical methods and still retain simplicity so that people with minimal training in knowledge representation can use it? Specifically, we present techniques used to capture ternary relations, classification rules, constraints, and if-then rules.

  • Towards a Quantitative, Platform-Independent Analysis of Knowledge Systems

    N. Friedland, P. Allen, M. Witbrock, G. Matthews, N. Salay, P. Miraglia, J. Angele, S. Staab, D. Israel, V. Chaudhri, B. Porter, K. Barker, P. ClarkProc. 9th International Conf on Knowledge Representation and Reasoning (KR'04) | 2004

    The Halo Pilot, a six-month effort to evaluate the state-ofthe-art in applied Knowledge Representation and Reasoning (KRR) systems, collaboratively developed a taxonomy of failures with the goal of creating a common framework of metrics against which we could measure inter- and intra- system failure characteristics of each of the three Halo knowledge applications. This platform independent taxonomy was designed with the intent of maximizing its coverage of potential failure types; providing the necessary granularity and precision to enable clear categorization of failure types; and providing a productive framework for short and longer term corrective action.

  • A Knowledge-Driven Approach to Text Meaning Processing

    P. Clark, P. Harrison, J. ThompsonProc. of the HLT Workshop on Text Meaning Processing | 2003 | Presentation Slides

    Our goal is to be able to answer questions about text that go beyond facts explicitly stated in the text, a task which inherently requires extracting a “deep” level of meaning from that text. Our approach treats meaning processing fundamentally as a modeling activity, in which a knowledge base of common-sense expectations guides interpretation of text, and text suggests which parts of the knowledge base might be relevant. In this paper, we describe our ongoing investigations to develop this approach into a usable method for meaning processing.

  • Knowledge-Driven Text Interpretation and Question-Answering: Some Current Activities at Boeing Mathematics and Computing Technology

    P. Clark, P. Harrison, T. Jenkins, J. Thompson, R. WojcikProc. AAAI Spring Symposium on New Directions in Question Answering | 2003

    At Boeing, we are currently developing methods for knowledge-driven text interpretation and question answering, based on matching analyzed input text against strong, background expectations from a knowledge-base (Clark et al. 2002). Our goal is to be able to answer questions about text which goes beyond facts explicitly stated in the text. For example, the statement (1) “China launched a meterological satellite into orbit Wednesday” suggests that (among other things) there was a rocket launch; China probably owns the satellite; the satellite is for monitoring weather; the orbit is around Earth; etc., although none of these facts are explicit in the text.

  • Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

    P. Clark, V. Chaudhri, S. Mishra, J. Thomere, K. Barker, B. Porter2nd International Conference on Knowledge Capture (KCap'03) | 2003 | Presentation Slides

    In order for a knowledge capture system to be effective, it needs to not only acquire general domain knowledge from experts, but also capture the specific problem-solving scenarios and questions which those experts are interested in solving using that knowledge. For some tasks, this latter aspect of knowledge capture is straightforward. In other cases, in particular for systems aimed at a wide variety of tasks, the question-posing aspect of knowledge capture can be a challenge in its own right. In this paper, we present the approach we have developed to address this challenge, based on the creation of a catalog of domain-independent question types and the extension of question template methods with graphical tools. Our goal was that domain experts could directly convey complex questions to a machine, in a form which it could then reason with. We evaluated the resulting system over several weeks, and in this paper we report some important lessons learned from this evaluation, revealing several interesting strengths and weaknesses of the approach.

  • A Semantic Infosphere

    M. Uschold, P. Clark, F. Dickey, C. Fung, S. Smith, S. Uczekaj, M. Wilke, S. Bechhofer, I. Horrocks2nd International Semantic Web Conference (ISWC'03) | 2003 | Presentation Slides

    We describe a prototype implementation of a semantic filtering capability added to an existing XML-based publish and subscribe infrastructure. An ontology is used to provide vocabulary for expressing both 1) the semantic annotations that characterize the published documents and 2) the subscriptions specifying the class of documents to be routed to a given client. A description logic (DL) classifier is used to determine which subscribers an incoming document is routed to. We outline the key elements of the ontology for the battlefield domain and give some sample annotations and subscriptions. This is the basis for describing a number of scenarios showing how this filtering capability could be used practice. We critically analyze the suitability of a DL language and reasoner in general, and the particular implementation choices (DAML+OIL, FaCT and OilEd) for performing this task. A key result of the work is to demonstrate the importance of testing semantics-based technologies on practical problems. We discovered a number of new and interesting areas for future work, which in turn can direct the focus of the research community.

  • A Knowledge Acquisition Tool for Course of Action Analysis

    K. Barker, J. Blythe, G. Borchardt, V. Chaudhri, P. Clark, P. Cohen, J. Fitzgerald, K. Forbus, Y. Gil, B. Katz, J. Kim, G. King, S. Mishra, C. Morrison, K. Murray, C. Otstott, B. Porter, R. Schrag, Tomas Uribe, J. Usher, P. YehProc. 5th Innovative Applications of Artificial Intelligence (IAAI'03) | 2003

    We present the novel application of a general-purpose knowledge-based system, SHAKEN, to the specific task of acquiring knowledge for military Course of Action (COA) analysis. We show how SHAKEN can capture and reuse expert knowledge for COA critiquing, which can then be used to produce high-level COA assessments through declarative inference and simulation. The system has been tested and evaluated by domain experts, and we report on the results. The generality of the approach makes it applicable to task analysis and knowledge capture in other domains. The primary objective of this work is to demonstrate the application of the knowledge acquisition technology to the task of COA analysis. Developing a system deployable in an operational environment is the subject of future work.

  • Knowledge Patterns

    P. Clark, J. Thompson, and B. PorterIn Handbook of Ontologies, eds: S. Staab, R. Studer, Berlin:Springer | 2003

    When building a knowledge base, one frequently repeats similar versions of general theories in multiple, more specific theories. For example, when building the Botany Knowledge Base [Porter et al., 1988], we embedded a theory of production in representations of photosynthesis, mitosis, growth, and many other botanical processes. Typically, a general theory is incorporated into more specific ones by an inheritance mechanism. However, this works poorly in two situations: when the general theory applies to a specific theory in more than one way, and when only a selected portion of the general theory is applicable.

  • A Knowledge-Rich Approach to Understanding Text about Aircraft Systems

    P. Clark, L. Duncan, H. Holmback, T. Jenkins, J. ThompsonProceedings of Human Language Technologies Conference (HLT'02) | 2002 | Presentation Slides

    As part of a longer-term goal to construct an aerospace knowledgebase (KB), we are developing techniques for interpreting text about aircraft systems and then adding those interpretations to the KB. A major challenge in this task is that much of what is written builds on unstated, shared, general knowledge about aircraft, and such prior knowledge is needed to fully understand the text. To address this challenge, we are using a more general KB about aircraft to create strong, prior expectations about what might be stated in that text, then treating the language understanding task as one of incrementally extending and refining that prior knowledge. The KB constrains the possible interpretations of the text, allowing it to be placed in the appropriate context and helping identify when statements can be taken literally or need to be coerced or modified to be understood correctly. In this paper we present this approach and discuss its underlying assumptions and range of applicability. The significance of this work is twofold: It illustrates the critical role background knowledge plays in fully understanding language, and it provides a simple model for how that understanding can take place, based on the iterative refinement of a representation using information extracted from text.

  • A Web-based Ontology Browsing and Editing System

    J. Thomere, K. Barker, V. Chaudhri, P. Clark, M. Eriksen, S. Mishra, B. Porter, A. RodriguezProceedings of IAAI'02 | 2002

    Making logic-based AI representations accessible to ordinary users has been an ongoing challenge for the successful deployment of knowledge bases. Past work to meet this objective has resulted in a variety of ontology editing tools and task-specific knowledge-acquisition methods. In this paper, we describe a Web-based ontology browsing and editing system with the following features: (a) well-organized English-like presentation of concept descriptions and (b) use of graphs to enter concept relationships, add/delete lists, and analogical correspondences. No existing tool supports these features. The system is Web-based and its user interface uses a mixture of HTML and Java. It has undergone significant testing and evaluation in the context of a real application.

  • The Many Faces of the Semantic Web

    P. Clark, M. UscholdIEEE Intelligent Systems | 2002
  • Knowledge Entry as the Graphical Assembly of Components

    P. Clark, J. Thompson, K. Barker, B. Porter, V. Chaudhri, A. Rodriguez, J. Thomere, S. Mishra, Y. Gil, P. Hayes, T. ReichherzerProc. 1st Int Conf on Knowledge Capture (K-Cap'01) | 2001 | Presentation Slides

    Despite some successes, the lack of tools to allow subject matter experts to directly enter, query, and debug formal domain knowledge in a knowledge-base still remains a major obstacle to their deployment. Our goal is to create such tools, so that a trained knowledge engineer is no longer required to mediate the interaction. This paper presents our work on the knowledge entry part of this overall knowledge capture task, which is based on several claims: that users can construct representations by connecting pre-fabricated, representational components, rather than writing low-level axioms; that these components can be presented to users as graphs; and the user can then perform composition through graph manipulation operations. To operationalize this, we have developed a novel technique of graphical dialog using examples of the component concepts, followed by an automated process for generalizing the user’s graphically-entered assertions into axioms. We present these claims, our approach, the system (called SHAKEN) that we are developing, and an evaluation of our progress based on having users encode knowledge using the system.

  • A Library of Generic Concepts for Composing Knowledge Bases

    K. Barker, B. Porter, P. ClarkProc. 1st Int Conf on Knowledge Capture (K-Cap'01) | 2001

    Building a knowledge base for a given domain traditionally involves a subject matter expert and a knowledge engineer. One of the goals of our research is to eliminate the knowledge engineer. There are at least two ways to achieve this goal: train domain experts to write axioms (i.e., turn them into knowledge engineers) or create tools that allow users to build knowledge bases without having to write axioms. Our strategy is to create tools that allow users to build knowledge bases through instantiation and assembly of generic knowledge components from a small library.

  • Representing Roles and Purpose

    J. Fan, K. Barker, B. Porter, P. ClarkProc. 1st Int Conf on Knowledge Capture (K-Cap'01) | 2001

    Ontology designers often distinguish Entities (things that are) from Events (things that happen). It is not obvious how this division admits Roles (things that are, but only in the context of things that happen). For example, Person might be considered an Entity, while Employee is a Role. A Person remains a Person independent of the Events in which he participates. Someone is an Employee only by virtue of participating in an Employment Event. The problem of how to represent Roles is not new, but there is little consensus on a solution. In this paper, we present an ontology that finds a place for Roles as well as a representation that allows Roles to be related to Entities and Events to express the teleological notion of purpose.

  • Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search

    P. Clark, J. Thompson, H. Holmback, L. DuncanProc. 12th Conf on Innovative Applications of AI (AAAI/IAAI'2000) | 2000 | Presentation Slides

    With the growth of on-line information, the need for better resource location services is growing rapidly. A popular goal is to conduct search in terms of concepts, rather than words; however, this approach is frequently thwarted by the high up-front cost of building an adequate ontology (conceptual vocabulary) in the first place. In this paper we describe a knowledge-based Expert Locator application (for identifying human experts relevant to a particular problem or interest), which addresses this issue by using a large, pre-built, technical thesaurus as an initial ontology, combined with simple AI techniques of search, subsumption computation, and language processing. The application has been deployed and in use in our local organization since June, 1999, and a second, larger application was deployed in March 2000. We present the Expert Locator and the AI techniques it uses, and then we evaluate and discuss the application. The significance of this work is that it demonstrates how years of work by library science in thesaurus-building can be leveraged using AI methods, to construct a practical resource location service in a short period of time.

  • Semantic Integration of Heterogeneous Information Sources Using a Knowledge-Based System

    T. Adams, J. Dullea, P. Clark, S. Sripada, and T. BarrettProc. 5th Int Conf on CS and Informatics (CS&I'2000) | 2000

    A growing number of decision support applications require the ability to formulate ad hoc queries that access information from heterogeneous information sources. The Boeing Company recently conducted a study to investigate the feasibility of applying a highperformance knowledge base to interface the information sources to the decision support system. A commercially available knowledge-based development environment provided a number of useful features for this problem including an expressive knowledge representation language, an efficient inference engine, and a large repository of common-sense knowledge. Our objectives were to determine if the pre-stored knowledge would significantly reduce the time required for knowledge engineering in large scale applications, to determine if the domain model was expressive enough to support the wide range of ad-hoc queries typical of our applications, and to determine if the common-sense knowledge would extend the inference power beyond simply combining information from the available data sources.

  • Three Approaches for Knowledge Sharing: A Comparative Analysis

    M. Uschold, R. Jasper, and P. ClarkProc. 12th Workshop on Knowledge Acquisition, Modeling, and Management (KAW'99) | 1999

    Our broad, overall goal is to enable cost-effective sharing of design knowledge between knowledge-based engineering software systems. To achieve this, we have identified and explored three different approaches for knowledge sharing, which we present in this paper: (i) Sharing services via point-to-point translation (ii) Neutral interchange formats (iii) Neutral authoring

  • A Knowledge-Based Approach to Question-Answering

    P. Clark, J. Thompson, and B. PorterAAAI'99 Fall Symposium on Question-Answering Systems | 1999 | Presentation Slides

    Our long-term research goal is to create systems capable of answering a wide variety of questions, including questions which were unanticipated at the time the system was constructed, and questions tailored to novel scenarios which the user is interested in. Our approach is to augment on-line text with a knowledge-based question-answering component, capable of reasoning about a scenario which the user provides, and synthesizing customized answers at run-time to his/her questions.

  • KB-PHaSE: A Knowledge-Based Training Tool for a Space Experiment

    P. Clark, J. Thompson, and M. L. DittmarTechnical Report SSGTECH-98-035, Maths and Computing Technology, The Boeing Company, Seattle, WA | 1998

    This document summarizes a short (3 month) project exploring the use of knowledge-based training tools for Space Station payload experiments, and develop a demonstration prototype (called KB-PHaSE). The purpose of the knowledge-base is to augment existing training material with a 'dynamic component', whereby the user can ask questions during a training session and receive customized, situation-specific answers, generated automatically from the knowledge base. We describe the application domain, the system from the user's point of view, and the structure of the knowledge-base of the underlying application. Finally, we reflect on the benefits and challenges of this approach.

  • Mediated Information Access: Components, Technologies and Architectures

    P. Clark, F. Lochovsky, R. Speigle, A. SunTechnical Report SSGTECH-98-020, Maths and Computing Technology, The Boeing Company, Seattle, WA | 1998
  • Ontology Reuse and Application

    Mike Uschold, Mike Healy, Keith Williamson, Peter Clark, Steven WoodsProceedings of the International Conference on Formal Ontology and Information Systems - FOIS'98 (Frontiers in AI and Applications) | 1998

    In this paper, we describe an investigation into the reuse and application of an existing ontology for the purpose of specifying and formally developing software for aircraft design. Our goals were to clearly identify the processes involved in the task, and assess the cost-effectiveness of reuse. Our conclusions are that (re)using an ontology is far from an automated process, and instead requires significant effort from the knowledge engineer.

  • Building Concept Representations from Reusable Components

    P. Clark and B. PorterAAAI'97 (Best Paper Award) | 1997

    Our goal is to build knowledge-based systems capable of answering a wide variety of questions, including questions that are unanticipated when the knowledge base is built. For systems to achieve this level of competence and generality, they require the ability to dynamically construct new concept representations, and to do so in response to the questions and tasks posed to them.

  • Using Access Paths to Guide Inference with Conceptual Graphs

    P. Clark and B. PorterProc. Int Conf on Conceptual Structures - ICCS'97 (Lecture Notes in AI) | 1997 | Presentation Slides

    Conceptual Graphs (CGs) are a natural and intuitive notation for expressing first-order logic statements. However, the task of performing inference with a large-scale CG knowledge base remains largely unexplored. Although basic inference operators are defined for CGs, few methods are available for guiding their applications during automated reasoning. Given the expressive power of CGs, this can result in inference being intractable.

  • The Neutral Representation Project

    Mike Barley, Peter Clark, Keith Williamson, Steve WoodsProc. AAAI Spring Symposium on Ontological Engineering | 1997

    The evolving complexity of many modern artifacts, such as aircraft, has led to a serious fragmentation of knowledge among software systems required for their design and manufacture. In the case of aircraft design, views of the same generic design knowledge are redundantly encoded in multiple software systems, each system using its own idiosyncratic ontology, and each system containing that knowledge in an implicit, task- and vendor-specific form. This situation is expensive, due to the high cost of developing from scratch, maintaining and keeping synchronized the many systems used in design.

  • Improving Image Classification by Combining Statistical, Case-Based and Model-Based Prediction Methods

    P. Clark, C. Feng, S. Matwin, K. FungFundamenta Informaticae | 1997

    Evidence for image classification can be considered to come from two sources: traditional statistical information derived algorithmically from image data, and model-based evidence arising from previous expertise and experience in a given application domain. This paper presents a study of classification techniques based on both these sources (traditional algorithmic and model-based), and illustrates how they can be combined.

  • Building Domain Representations from Components

    P. Clark and B. PorterAI Technical Report 06-332, Dept CS, Univ. Texas at Austin | 1996

    A major cause of the knowledge-engineering bottleneck is the inability to transfer representational fragments from one knowledge base to another due to the idiosyncratic nature of the domain-specific representations. In this paper, we show that representations can be built automatically by composing abstract, reusable components. Moreover, we describe how representations of specific situations, that arise during problem solving, can be assembled 'on demand', guided by a query for a particular piece of information.

  • The KM Knowledge Representation Language

    P. Clark and B. PorterDept. CS, Univ. Texas at Austin | 1996 | Reference Manual
  • A Compositional Approach to Representing Planning Operators

    P. Clark, B. Porter, and D. BatoryAI Technical Report 06-331, Dept CS, Univ. Texas at Austin | 1996

    AI has frequently been criticized for being "stuck in the microworld" because of the common inability of AI systems to cope with the complexity of real domains. Often, adding details removes regularity, transforming a representation from a few simple structures to a large, unwieldly collection of specialized ones. This paper addresses this problem in the context of representing planning operators (domain-specific knowledge about the effects of actions in a domain) for use by AI planning systems.

  • Using qualitative models to guide inductive learning

    P. Clark and S. MatwinIn P. Utgoff, ed., Proc. Tenth Int. Machine Learning Conference (ML-93) | 1993

    This paper presents a method for using qualitative models to guide inductive learning. Our objectives are to induce rules which are not only accurate but also explainable with respect to the qualitative model and to reduce learning time by exploiting domain knowledge in the learning process. Such explainability is essential both for practical application of inductive technology and for integrating the results of learning back into an existing knowledgebase We apply this method to two process control problems a water tank network and an ore grinding process used in the mining industry. Surprisingly in addition to achieving explainability the classicational accuracy of the induced rules is also increased. We show how the value of the qualitative models can be quantied in terms of their equivalence to additional training examples and finally discuss possible extensions.

  • Learning domain theories using abstract background knowledge

    P. Clark and S. MatwinIn P. Brazdil, ed., Proc. Sixth European Conference on Machine Learning (ECML-93) | 1993

    Substantial machine learning research has addressed the task of learning new knowledge given a (possibly incomplete or incorrect) domain theory, but leaves open the question of where such domain theories originate. In this paper we address the problem of constructing a domain theory from more general, abstract knowledge which may be available.

  • The Syntax vs. Semantics Debate Revisited?

    P. ClarkIn N. Cercone and G. McCalla, eds., Computational Intelligence | 1993
  • Learning domain theories using abstract background knowledge

    P. Clark and S. MatwinTech Report TR-92-35, Dept CS, Ottawa Univ., Canada | 1992
  • Lazy partial evaluation: An integration of explanation-based generalisation and partial evaluation

    P. Clark and R. HolteIn D. Sleeman and P. Edwards, eds., Proc. Ninth Int. Machine Learning Conference (ML-92) | 1992
  • Generalised Backjumping

    P. Clark and R. HolteTech. Report TR-92-20, Dept. CS, Univ. Ottawa | 1992
  • Applications of Machine Learning

    P. Clark, B. Cestnik, C. Sammut, and J. StenderIn Y. Kodratoff, ed., Machine Learning - EWSL-91 | 1991
  • A Model of Argumentation and its Application in a Cooperative Expert System

    P. ClarkPhD thesis, Strathclyde University, Glasgow, UK | 1991
  • Rule induction with CN2: Some recent improvements

    P. Clark and R. BoswellIn Y. Kodratoff, ed., Machine Learning - EWSL-91 | 1991
  • Machine learning: Techniques and recent developments

    P. ClarkIn A. R. Mirzai, ed., Artificial Intelligence: Concepts and Applications in Engineering | 1990
  • Learning from Imperfect Data

    P. Brazdil and P. ClarkIn P. B. Brazdil and K. Konolige, eds., Machine Learning, Meta-reasoning and Logics | 1990
  • Representing knowledge as arguments: Applying expert systems technology to judgemental problem-solving

    P. ClarkIn T. R. Addis and R. M. Muir, eds., Research and Development in Expert Systems VII | 1990
  • Nonmonotonic reasoning, argumentation and machine learning

    P. ClarkTechnical Report TIMLG-38, Turing Institute, Glasgow, UK | 1990
  • A Comparison of Rule and Exemplar-based Learning Systems

    P. ClarkIn P. B. Brazdil and K. Konolige, eds., Machine Learning, Meta-reasoning and Logics | 1990
  • Exemplar-based reasoning in geological prospect appraisal

    P. ClarkTIRM-89-034, Turing Institute, Glasgow, UK | 1989
  • Knowledge representation in machine learning

    P. ClarkIn Y. Kodratoff and A. Hutchinson, eds., Machine and Human Learning | 1989
  • The CN2 Induction Algorithm

    P. Clark and T. NiblettMachine Learning | 1989
  • Representing arguments as background knowledge for constraining generalisation

    P. ClarkIn D. Sleeman, ed., Proc. Third European Working Session on Learning (EWSL-88) | 1988
  • Representing arguments as background knowledge for the justification of case-based inferences

    P. ClarkIn E. L. Rissland and J. A. King, eds., Proc. AAAI-88 Workshop on Case-Based Reasoning | 1988
  • PROTOS: A Rational Reconstruction

    P. ClarkTuring Institute Tech Report | 1987
  • Induction in Noisy Domains

    P. Clark and T. NiblettIn I. Bratko and N. Lavrac, eds., Progress in Machine Learning: Proc. 2nd European ML Conference (EWSL-87) |
  • Learning if-then rules in noisy domains

    P. Clark and T. NiblettIn B. Phelps, ed., Interactions in Artificial Intelligence and Statistical Methods | 1987
  • FADES: An Expert System for Fault Analysis and Diagnosis

    C. Wood and P. ClarkTIRM 87-024, Turing Institute | 1987
  • Towards an improved domain representation for planning

    P. ClarkMaster's thesis, Edinburgh Univ., Edinburgh, UK | 1985