About AllenNLP
The AllenNLP team envisions language-centered AI that equitably serves humanity. We work to improve NLP systems' performance and accountability, and advance scientific methodologies for evaluating and understanding those systems. We deliver high-impact research of our own and masterfully-engineered open-source tools to accelerate NLP research around the world.
Featured Software
AllenNLP Library
A natural language processing platform for building state-of-the-art models. A complete platform for solving natural language processing tasks in PyTorch.
ViewAI2 Tango
A Python library for choreographing your machine learning research. Construct machine learning experiments out of repeatable, reusable steps.
ViewRecent Papers
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. SmithNAACL • 2022 Warning : this paper discusses and contains content that is offensive or upsetting. The perceived toxicity of language can vary based on someone’s identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in…Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. SmithNAACL • 2022 Natural language processing researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic metrics and of crowdworker judgments. Meanwhile, efforts to improve generation models…DEMix Layers: Disentangling Domains for Modular Language Modeling
Suchin Gururangan, Michael Lewis, Ari Holtzman, Noah A. Smith, Luke ZettlemoyerNAACL • 2022 We introduce a new domain expert mixture (DEMIX) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMIX layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular…Few-Shot Self-Rationalization with Natural Language Prompts
Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. PetersFindings of NAACL • 2022 Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free…NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
Ximing Lu, S. Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin ChoiNAACL • 2022 The dominant paradigm for neural text generation is left-to-right decoding from autoregressive language models. Constrained or controllable generation under complex lexical constraints, however, requires foresight to plan ahead feasible future paths. Drawing…
Recent Datasets
View All AllenNLP DatasetsQasper
Question Answering on Research Papers
A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
A Dataset of Incomplete Information Reading Comprehension Questions
13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pages
IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes with a set of links to other Wikipedia pages, and answering the questions requires finding the appropriate links to follow and retrieving relevant information from those linked pages that is missing from the original context.
ZEST: ZEroShot learning from Task descriptions
ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.
ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.
MOCHA
A benchmark for training and evaluating generative reading comprehension metrics.
Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations. MOCHA contains 40K human judgement scores on model outputs from 6 diverse question answering datasets and an additional set of minimal pairs for evaluation. Using MOCHA, we train an evaluation metric: LERC, a Learned Evaluation metric for Reading Comprehension, to mimic human judgement scores.
Recent Press
View All AllenNLP PressWhy Historical Language Is a Challenge for Artificial Intelligence
November 16, 2021
The curse of neural toxicity: AI2 and UW researchers help computers watch their language
March 6, 2021
Green AI
November 18, 2020
Your favorite A.I. language tool is toxic
September 29, 2020
Deep Learning’s Climate Change Problem
June 17, 2020
Why are so many AI systems named after Muppets?
December 11, 2019
ICS Partnership with AI2 Leads to a New Toolkit and Best Demo Paper Award
November 19, 2019
AI/NLP Research Partnership with Allen Institute for AI
September 30, 2019
Podcasts
NLP Highlights
NLP Highlights is AllenNLP’s podcast for discussing recent and interesting work related to natural language processing. Hosts from the AllenNLP team at AI2 offer short discussions of papers and occasionally interview authors about their work.
You can also find NLP Highlights on Apple Podcasts, Spotify, PlayerFM, or Stitcher.