Viewing 511-519 of 519 papers
- Peter Jansen, Mihai Surdeanu, and Peter ClarkACL • 2014 We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We experimentally demonstrate that the discourse structure of nonfactoid answers provides information that is complementary to lexical semantic similarity between question and answer, improving performance up to 24% (relative) over a state-of-the-art model that exploits lexical semantic similarity alone. We further demonstrate excellent domain transfer of discourse information, suggesting these discourse features have general utility to non-factoid question answering.
- Xuchen Yao, Benjamin Van Durme, Chris Callision-Burch, and Peter ClarkACL • 2013 Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.
- Xuchen Yao, Benjamin Van Durme, and Peter ClarkACL • 2013 Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.
- Xuchen Yao, Benjamin Van Durme, Chris Callision-Burch, and Peter ClarkNAACL • 2013 Our goal is to extract answers from preretrieved sentences for Question Answering (QA). We construct a linear-chain Conditional Random Field based on pairs of questions and their possible answer sentences, learning the association between questions and answer types. This casts answer extraction as an answer sequence tagging problem for the first time, where knowledge of shared structure between question and source sentence is incorporated through features based on Tree Edit Distance (TED). Our model is free of manually created question and answer templates, fast to run (processing 200 QA pairs per second excluding parsing time), and yields an F1 of 63.3% on a new public dataset based on prior TREC QA evaluations. The developed system is open-source, and includes an implementation of the TED model that is state of the art in the task of ranking QA pairs.
- Xuchen Yao, Benjamin Van Durme, Chris Callision-Burch, and Peter ClarkEMNLP • 2013 We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model's alignment score approaches the state of the art.
- Xiao Ling, Dan Weld, and Peter ClarkAKBC • 2013 Knowledge of objects and their parts, meronym relations, are at the heart of many question-answering systems, but manually encoding these facts is impractical. Past researchers have tried hand-written patterns, supervised learning, and bootstrapped methods, but achieving both high precision and recall has proven elusive. This paper reports on a thorough exploration of distant supervision to learn a meronym extractor for the domain of college biology. We introduce a novel algorithm, generalizing the "at least one" assumption of multi-instance learning to handle the case where a fixed (but unknown) percentage of bag members are positive examples. Detailed experiments compare strategies for mention detection, negative example generation, leveraging out-of-domain meronyms, and evaluate the benefit of our multi-instance percentage model.
- Aju Thalappillil Scaria, Jonathan Berant, Mengqiu Wang, Christopher D. Manning, Justin Lewis, Brittany Harding, and Peter ClarkEMNLP • 2013 Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) — specifically "How?" and "Why?" questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set of temporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint inference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.
- Peter Clark, Phil Harrison, and Niranjan BalasubramanianCIKM • AKBC • 2013 Our long-term interest is in machines that contain large amounts of general and scientific knowledge, stored in a "computable" form that supports reasoning and explanation. As a medium-term focus for this, our goal is to have the computer pass a fourth-grade science test, anticipating that much of the required knowledge will need to be acquired semi-automatically. This paper presents the first step towards this goal, namely a blueprint of the knowledge requirements for an early science exam, and a brief description of the resources, methods, and challenges involved in the semiautomatic acquisition of that knowledge. The result of our analysis suggests that as well as fact extraction from text and statistically driven rule extraction, three other styles of automatic knowledge-base construction (AKBC) would be useful: acquiring definitional knowledge, direct "reading" of rules from texts that state them, and, given a particular representational framework (e.g., qualitative reasoning), acquisition of specific instances of those models from text (e..g, specific qualitative models).
- Peter Clark, Phil Harrison, Niranjan Balasubramanian, and Oren EtzioniNAACL-HLT • AKBC Workshop • 2012 As part of our work on building a "knowledgeable textbook" about biology, we are developing a textual question-answering (QA) system that can answer certain classes of biology questions posed by users. In support of that, we are building a "textual KB" - an assembled set of semi-structured assertions based on the book - that can be used to answer users' queries, can be improved using global consistency constraints, and can be potentially validated and corrected by domain experts. Our approach is to view the KB as systematically caching answers from a QA system, and the QA system as assembling answers from the KB, the whole process kickstarted with an initial set of textual extractions from the book text itself. Although this research is only in a preliminary stage, we summarize our progress and lessons learned to date.