About AllenNLP

The AllenNLP team envisions language-centered AI that equitably serves humanity. We work to improve NLP systems' performance and accountability, and advance scientific methodologies for evaluating and understanding those systems. We deliver high-impact research of our own and masterfully-engineered open-source tools to accelerate NLP research around the world.

Featured Software

AI2 Tango

A Python library for choreographing your machine learning research. Construct machine learning experiments out of repeatable, reusable steps.

View

AllenNLP Library

A natural language processing platform for building state-of-the-art models. A complete platform for solving natural language processing tasks in PyTorch.

View

Recent Papers

View All AllenNLP Papers

Detection and Measurement of Syntactic Templates in Generated Text
Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. WallacearXiv • 2024 Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic…
Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG
William Merrill, Noah A. Smith, Yanai ElazararXiv • 2024 How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate /n/-grams from their training data, evaluating both (i) the probability LMs assign to complete…
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep DasigiNAACL • 2024 Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work…
The Bias Amplification Paradox in Text-to-Image Generation
P. Seshadri, Sameer Singh, Yanai ElazarNAACL • 2024 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel StanovskyarXiv • 2024 Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications…

View All AllenNLP Papers

Recent Datasets

View All AllenNLP Datasets

Qasper

Question Answering on Research Papers

A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.

A Dataset of Incomplete Information Reading Comprehension Questions

13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pages

IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes with a set of links to other Wikipedia pages, and answering the questions requires finding the appropriate links to follow and retrieving relevant information from those linked pages that is missing from the original context.

ZEST: ZEroShot learning from Task descriptions

ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.

ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.

MOCHA

A benchmark for training and evaluating generative reading comprehension metrics.

Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations. MOCHA contains 40K human judgement scores on model outputs from 6 diverse question answering datasets and an additional set of minimal pairs for evaluation. Using MOCHA, we train an evaluation metric: LERC, a Learned Evaluation metric for Reading Comprehension, to mimic human judgement scores.

View All AllenNLP Datasets

Recent Press

View All AllenNLP Press

As AI tools get smarter, they’re growing more covertly racist, experts find

The Guardian
March 16, 2024

Read the Article

Chatbot AI makes racist judgements on the basis of dialect

Nature
March 13, 2024

Read the Article

AI chatbots use racist stereotypes even after anti-racism training

New Scientist
March 7, 2024

Read the Article

AI’s Climate Impact Goes beyond Its Emissions

Scientific American
December 7, 2023

Read the Article

Peeking Inside Pandora’s Box: Unveiling the Hidden Complexities of Language Model Datasets with ‘What’s in My Big Data’? (WIMBD)

Marktechpost
November 5, 2023

Read the Article

Your Personal Information Is Probably Being Used to Train Generative AI Models

Scientific American
October 19, 2023

Read the Article

AI Is Becoming More Powerful—but Also More Secretive

Wired
October 19, 2023

Read the Article

Inside the secret list of websites that make AI like ChatGPT sound smart

The Washington Post
April 19, 2023

Read the Article

View All AllenNLP Press

Podcasts

NLP Highlights
NLP Highlights is AllenNLP’s podcast for discussing recent and interesting work related to natural language processing. Hosts from the AllenNLP team at AI2 offer short discussions of papers and occasionally interview authors about their work.

You can also find NLP Highlights on Apple Podcasts, Spotify, PlayerFM, or Stitcher.

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

About AllenNLP

Featured Software

AI2 Tango

AllenNLP Library

Recent Papers

Detection and Measurement of Syntactic Templates in Generated Text

Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG

Evaluating In-Context Learning of Libraries for Code Generation

The Bias Amplification Paradox in Text-to-Image Generation

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Recent Datasets

Qasper

A Dataset of Incomplete Information Reading Comprehension Questions

ZEST: ZEroShot learning from Task descriptions

MOCHA

Recent Press

Podcasts

NLP Highlights