February 5, 2019
Abstract: The social sciences are at a crossroads The great challenges of our time are human in nature - terrorism, climate change, the use of natural resources, and the nature of work - and require robust social science to understand the sources and consequences. Yet the lack of reproducibility and replicability evident in many fields is even more acute in the study of human behavior both because of the difficulty of sharing confidential data and because of the lack of scientific infrastructure. Much of the core infrastructure is manual and ad-hoc in nature, threatening the legitimacy and utility of social science research.
A major challenge is search and discovery. The vast majority of social science data and outputs cannot be easily discovered by other researchers even when nominally deposited in the public domain. A new generation of automated search tools could help researchers discover how data are being used, in what research fields, with what methods, with what code and with what findings. And automation can be used to reward researchers who validate the results and contribute additional information about use, fields, methods, code, and findings. In sum, the use of data depends critically on knowing how it has been produced and used before: the required elements what do the data measure, what research has been done by what researchers, with what code, and with what results.
In this presentation I describe the work that we are doing to build and develop automated tools to create the equivalent of an Amazon.com or TripAdvisor for the access and use of confidential microdata.Less More
January 25, 2019
Time is an important dimension when we describe the world because the world is evolving over time and many facts are time-sensitive. Understanding time is thus an important aspect of natural language understanding and many applications may rely on it, e.g., information retrieval, summarization, causality, and question answering.
In this talk, I will mainly focus on a key component of it, temporal relation extraction. The task has long been challenging because the actual timestamps of those events are rarely expressed explicitly and their temporal order has to be inferred, from lexical cues, between the lines, and often based on strong background knowledge. Additionally, collecting enough and high-quality annotations to facilitate machine learning algorithms for this task is also difficult, which makes it even more challenging to investigate the task. I tackled this task in three perspectives, structured learning, common sense, and data collection, and have improved the state-of-the-art by approximately 20% in absolute F1. My current system, CogCompTime, is available at this online demo: http://groupspaceuiuc.com/temporal/. In the future, I expect to expand my research in these directions to other core problems in AI such as incidental supervision, semantic parsing, and knowledge representation.Less More
January 11, 2019
In this talk I will introduce a new model for encoding knowledge graphs and generating texts from them. Graphical knowledge representations are ubiquitous in computing, but pose a challenge for text generation techniques due to their non-hierarchical structure and collapsing of long-distance dependencies. Moreover, automatically extracted knowledge is noisy, and so requires a text generation model be robust. To address these issues, I introduce a novel attention-based encoder-decoder model for knowledge-graph-to-text generation. This model extends the popular Transformer for text encoding to function over graph-structured inputs. The result is a powerful, general model for graph encoding which can incorporate global structural information when contextualizing vertices in their local neighborhoods. Through detailed automatic and human evaluations I demonstrate the value of conditioning text generation on graph-structured knowledge, as well as the superior performance of the proposed model compared to recent work.Less More