VideosSee AI2's full collection of videos on our YouTube channel.
Viewing 1-10 of 157 videos
- October 28, 2020 | Daniel KhashabiA study cost-efficiency of local perturbations for model training.
- October 13, 2020 | DeepLearning.AIHeroes of NLP is a video interview series featuring Andrew Ng, the founder of DeepLearning.AI, in conversation with thought leaders in NLP. Watch Andrew lead an enlightening discourse around how these industry and academic experts started in AI, their previous and current research projects, how their understanding of AI has changed through the decades, and what advice they can provide for learners of NLP. This is an interview featuring Andrew Ng and Oren Etioni, CEO of the Allen Institute for AI.
- October 1, 2020 | Stanford HAIIn this latest Directors’ Conversation, HAI Denning Family Co-director John Etchemendy’s guest is Oren Etzioni, Allen Institute for Artificial Intelligence CEO, company founder, and professor of computer science. Here the two discuss language prediction model GPT-3, a better approach to an AI Turing test, and the real signs that we’re approaching AGI.
- July 13, 2020 | Daniel KhashabiA survey of hypotheses assessing tools in NLP and their comparison. Further details can be found in the paper 'Not All Claims are Created Equal: Choosing the Right Approach to Assess Your Hypotheses'. https://www.semanticscholar.org/paper/Not-All-Claims-are-Created-Equal%3A-Choosing-the-to-Azer-Khashabi/ac6ce24b5d0bde9ba220120dac40d2ddc69458b7
- July 6, 2020 | Micah SchlainMicah Sclain discusses the work on syntactic search happening at AI2 Israel. Check out our system: https://allenai.github.io/spike/
- June 18, 2020 | John WietingRepresentation learning has had a tremendous impact in machine learning and natural language processing (NLP), especially in recent years. Learned representations provide useful features needed for downstream tasks, allowing models to incorporate knowledge from billions of tokens of text. The result is better performance and generalization on many important problems of interest. This talk focuses on the problem of learning paraphrastic representations for units of language spanning from sub-words to full sentences – the latter being a focal point. Our primary goal is to learn models that can encode arbitrary word sequences into a vector with the property that sequences with similar semantics are near each other in the learned vector space, and that this property transfers across domains. We first show several simple, but effective, models to learn word and sentence representations on noisy paraphrases automatically extracted from bilingual corpora. These models outperform contemporary models on a variety of semantic evaluations. We then propose techniques to enable deep networks to learn effective semantic representations, addressing a limitation of our prior work. We also automatically construct a large paraphrase corpus that improves the performance of all our studied models, especially those using deep architectures, and has found uses for a variety of generation tasks such as paraphrase generation and style-transfer. We next propose models for multilingual paraphrastic sentence representations. Again, we first propose a simple and effective approach that outperforms more complicated methods on cross-lingual sentence similarity and mining bitext. We then propose a generative model that concentrates semantic information into a single interlingua representations and pushes information responsible for linguistic variation to separate language-specific representations. We show that this model has improved performance on both monolingual and cross-lingual tasks over prior work and successfully disentangles these two sources of information. Finally, we apply our representations to the task of fine-tuning neural machine translation systems using minimum risk training. The conventional approach is to use BLEU (Papineni et al., 2002), since that is commonly used for evaluation. However, we found that using an embedding model to evaluate similarity allows the range of possible scores to be continuous and, as a result, introduces fine-grained distinctions between similar translations. The result is better performance on both human evaluations and BLEU score, along with faster convergence during training.
- April 30, 2020 | Forough ArabshahiHumans possess impressive problem solving and reasoning capabilities, be it mathematical, logical or commonsense reasoning. Computer scientists have long had the dream of building machines with similar reasoning and problem solving abilities as humans. Currently, there are three main challenges in realizing this dream. First, the designed system should be able to extrapolate and reason in scenarios that are much harder than what it has seen before. Second, the system’s decisions/actions should be interpretable, so that humans can easily verify if the decisions are due to reasoning skills or artifacts/sparsity in data. Finally, even if the decisions are easily interpretable, the system should include some way for the user to efficiently teach the correct reasoning when it makes an incorrect decision. In this talk, I will discuss how we can address these challenges using instructable neuro-symbolic reasoning systems. Neuro-symbolic systems bridge the gap between two major directions in artificial intelligence research: symbolic systems and neural networks. We will see how these hybrid models exploit the interpretability of symbolic systems to obtain explainability. Moreover, combined with our developed neural networks, they extrapolate to harder reasoning problems. Finally, these systems can be directly instructed by humans in natural language, resulting in sample-efficient learning in data-sparse scenarios.
- January 6, 2020 | Colin RaffelTransfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this talk, I will discuss our recent paper where we explored the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compared pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieved state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. I will wrap up by discussing some of our ongoing and future work on transfer learning for NLP.
Towards AI Complete Question Answering: Combining Text-based, Unanswerable and World Knowledge QuestionsDecember 11, 2019 | Anna RogersThe recent explosion in question answering research produced a wealth of both reading comprehension and commonsense reasoning datasets. Combining them presents a different kind of challenge: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. We present QuAIL, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains. Crucially, it offers both general and context-specific questions, the answers for which are unlikely to be found in pretraining data of large models like BERT. We show that QuAIL poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset, and we discuss methodological issues in creating such datasets.
- November 20, 2019 | Naomi SaphraResearch has shown that neural models implicitly encode linguistic features, but there has been little work exploring how these encodings arise as the models are trained. I will be presenting work on the learning dynamics of neural language models from a variety of angles. Using Singular Vector Canonical Correlation Analysis to probe the evolution of syntactic, semantic, and topic representations, we find that part-of-speech is learned earlier than topic; that recurrent layers become more similar to those of a tagger during training; and embedding layers less similar. I will also discuss how these results connect to the compositionality of LSTM learning dynamics, with synthetic experimental evidence that language models rely on short memorized segments in learning general long-range relations between words. I will explore links between syntactic relations and mathematical properties of the interactions between words.