Videos

See AI2's full collection of videos on our YouTube channel.
All Years
All Videos
Viewing 11-20 of 163 videos
  • Syntactic Search by Example – ACL 2020 Thumbnail

    Syntactic Search by Example – ACL 2020

    July 6, 2020  |  Micah Schlain
    Micah Sclain discusses the work on syntactic search happening at AI2 Israel. Check out our system: https://allenai.github.io/spike/
  • Learning and Applications of Paraphrastic Representations for Natural Language Thumbnail

    Learning and Applications of Paraphrastic Representations for Natural Language

    June 18, 2020  |  John Wieting
    Representation learning has had a tremendous impact in machine learning and natural language processing (NLP), especially in recent years. Learned representations provide useful features needed for downstream tasks, allowing models to incorporate knowledge from billions of tokens of text. The result is better performance and generalization on many important problems of interest. This talk focuses on the problem of learning paraphrastic representations for units of language spanning from sub-words to full sentences – the latter being a focal point. Our primary goal is to learn models that can encode arbitrary word sequences into a vector with the property that sequences with similar semantics are near each other in the learned vector space, and that this property transfers across domains. We first show several simple, but effective, models to learn word and sentence representations on noisy paraphrases automatically extracted from bilingual corpora. These models outperform contemporary models on a variety of semantic evaluations. We then propose techniques to enable deep networks to learn effective semantic representations, addressing a limitation of our prior work. We also automatically construct a large paraphrase corpus that improves the performance of all our studied models, especially those using deep architectures, and has found uses for a variety of generation tasks such as paraphrase generation and style-transfer. We next propose models for multilingual paraphrastic sentence representations. Again, we first propose a simple and effective approach that outperforms more complicated methods on cross-lingual sentence similarity and mining bitext. We then propose a generative model that concentrates semantic information into a single interlingua representations and pushes information responsible for linguistic variation to separate language-specific representations. We show that this model has improved performance on both monolingual and cross-lingual tasks over prior work and successfully disentangles these two sources of information. Finally, we apply our representations to the task of fine-tuning neural machine translation systems using minimum risk training. The conventional approach is to use BLEU (Papineni et al., 2002), since that is commonly used for evaluation. However, we found that using an embedding model to evaluate similarity allows the range of possible scores to be continuous and, as a result, introduces fine-grained distinctions between similar translations. The result is better performance on both human evaluations and BLEU score, along with faster convergence during training.
  • Neuro-symbolic Learning Algorithms for Automated Reasoning Thumbnail

    Neuro-symbolic Learning Algorithms for Automated Reasoning

    April 30, 2020  |  Forough Arabshahi
    Humans possess impressive problem solving and reasoning capabilities, be it mathematical, logical or commonsense reasoning. Computer scientists have long had the dream of building machines with similar reasoning and problem solving abilities as humans. Currently, there are three main challenges in realizing this dream. First, the designed system should be able to extrapolate and reason in scenarios that are much harder than what it has seen before. Second, the system’s decisions/actions should be interpretable, so that humans can easily verify if the decisions are due to reasoning skills or artifacts/sparsity in data. Finally, even if the decisions are easily interpretable, the system should include some way for the user to efficiently teach the correct reasoning when it makes an incorrect decision. In this talk, I will discuss how we can address these challenges using instructable neuro-symbolic reasoning systems. Neuro-symbolic systems bridge the gap between two major directions in artificial intelligence research: symbolic systems and neural networks. We will see how these hybrid models exploit the interpretability of symbolic systems to obtain explainability. Moreover, combined with our developed neural networks, they extrapolate to harder reasoning problems. Finally, these systems can be directly instructed by humans in natural language, resulting in sample-efficient learning in data-sparse scenarios.
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Thumbnail

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    January 6, 2020  |  Colin Raffel
    Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this talk, I will discuss our recent paper where we explored the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compared pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieved state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. I will wrap up by discussing some of our ongoing and future work on transfer learning for NLP.
  • Towards AI Complete Question Answering: Combining Text-based, Unanswerable and World Knowledge Questions Thumbnail

    Towards AI Complete Question Answering: Combining Text-based, Unanswerable and World Knowledge Questions

    December 11, 2019  |  Anna Rogers
    The recent explosion in question answering research produced a wealth of both reading comprehension and commonsense reasoning datasets. Combining them presents a different kind of challenge: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. We present QuAIL, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains. Crucially, it offers both general and context-specific questions, the answers for which are unlikely to be found in pretraining data of large models like BERT. We show that QuAIL poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset, and we discuss methodological issues in creating such datasets.
  • Learning Dynamics of LSTM Language Models Thumbnail

    Learning Dynamics of LSTM Language Models

    November 20, 2019  |  Naomi Saphra
    Research has shown that neural models implicitly encode linguistic features, but there has been little work exploring how these encodings arise as the models are trained. I will be presenting work on the learning dynamics of neural language models from a variety of angles. Using Singular Vector Canonical Correlation Analysis to probe the evolution of syntactic, semantic, and topic representations, we find that part-of-speech is learned earlier than topic; that recurrent layers become more similar to those of a tagger during training; and embedding layers less similar. I will also discuss how these results connect to the compositionality of LSTM learning dynamics, with synthetic experimental evidence that language models rely on short memorized segments in learning general long-range relations between words. I will explore links between syntactic relations and mathematical properties of the interactions between words.
  • Extremely Large Neural Memory of Unstructured Knowledge Thumbnail

    Extremely Large Neural Memory of Unstructured Knowledge

    November 20, 2019  |  Minjoon Seo
    The web is a collection of massive and mostly unstructured knowledge data. My recent research has focused on organizing every piece of information on a web-scale corpus and making it readily accessible so that it can be efficiently utilized for creating a language understanding system that requires world knowledge (e.g. question answering). I approach the problem by creating an extremely large neural memory where the entire text corpus is discretized into billions of atomic information and hashed with key vectors. This allows us to random-access specific word-level information in the corpus very fast and accurately. In the first part of my talk, I will highlight the direct usage of the neural memory in factual question answering (e.g. SQuAD, Natural Questions), where I show that the “memorification” of the corpus leads to at least 100x faster inference with better accuracy. In the second part, I will discuss a more advanced usage as a future research direction that utilizes an interactive memory controller, which could hint how we can approach language understanding tasks that need to jointly consider several different pieces of information spread over the web.
  • Medical Question Answering: Dealing with the complexity and specificity of consumer health questions and visual questions Thumbnail

    Medical Question Answering: Dealing with the complexity and specificity of consumer health questions and visual questions

    November 12, 2019  |  Dr. Asma Ben Abacha
    Consumer health questions pose specific challenges to automated answering. Two of the salient aspects are the higher linguistic and semantic complexity when compared to open domain questions, and the more pronounced need for reliable information. In this talk I will present two main approaches to deal with the increased complexity by recognizing question entailment and by question summarization, recently published respectively in BMC Bioinformatics and ACL 2019. In particular, our question entailment approach to question answering (QA) showed that restricting the answer sources to only reliable resources led to an improvement of the QA performance and our summarization experiments showed the relevance of data augmentation methods for abstractive question summarization. I’ll also talk about the MEDIQA shared task on question entailment, textual inference and medical question answering that we recently organized at ACL-BioNLP. In the second part of the talk, I will address more specifically questions about medications and present our last study and dataset on medication QA. Finally, I’ll describe our recent endeavors in visual question answering (VQA) from radiology images and the medical VQA challenge (VQA-Med) editions for 2019 and 2020 that we organize in the scope of ImageCLEF.
  • Boosting innovation and discovery of ideas Thumbnail

    Boosting innovation and discovery of ideas

    August 7, 2019  |  Tom Hope
    The explosion of available idea repositories -- scientific papers, patents, product descriptions -- represents an unprecedented opportunity to accelerate innovation and lead to a wealth of discoveries. Given the scale of the problem and its ever-expanding nature, there is a need for intelligent automation to assist in the process of discovery. In this talk, I will present our work toward addressing this challenging problem. We developed an approach for boosting people’s creativity by helping them discover analogies -- abstract structural connections between ideas. We learn to decompose innovation texts into functional models that describe the components and goals of inventions, and use them to build a search engine supporting expressive inspiration queries. In ideation studies, our inspirations helped people generate better ideas with significant improvement over standard search. We also construct a commonsense ontology of purposes and mechanisms of products, mapping the landscape of ideas. I will also describe a novel machine learning framework we developed in order to identify innovation in patents, where labels are extremely hard to obtain. In our setting, called Ballpark Learning, we are only given groups of instances with coarse constraints over label averages. We demonstrate encouraging results in classification and regression tasks across several domains.
  • Extracting T cell function and differentiation characteristics Thumbnail

    Extracting T cell function and differentiation characteristics

    July 23, 2019  |  Jeff Hammerbacher
    Many promising cancer immunotherapy treatment protocols rely on efficient and increasingly extensive methods for manipulating human immune cells. T cells are a frequent target of the laboratory and clinical research driving the development of such protocols as they are most often the effector of the cytotoxic activity that makes these treatments so potent. However, the cytokine signaling network that drives the differentiation and function of such cells is complex and difficult to replicate on a large scale in model biological systems. Abridged versions of these networks have been established over decades of research but it remains challenging to define their global structure as the classification of T cell subtypes operating in these networks, the mechanics of their formation, and the purpose of the signaling molecules they excrete are all controversial, with a slowly expanding understanding emerging in literature over time. To aid in the quantification of this understanding, we are developing a methodology for identifying references to well known cytokines, transcription factors, and T cell types in literature as well as classifying the relationships between the three in an attempt to determine what cytokines initiate the transcription programs that lead to various cell states in addition to the secretion profiles associated with those states. Entity recognition for this task is performed using SciSpacy and classification of the relations between these entities is based on an LSTM trained using Snorkel, where weak supervision is established through a variety of classification heuristics and distant supervision is provided via previously published immunology databases.