About AllenNLP

The AllenNLP team envisions language-centered AI that equitably serves humanity. We work to improve NLP systems' performance and accountability, and advance scientific methodologies for evaluating and understanding those systems. We deliver high-impact research of our own and masterfully-engineered open-source tools to accelerate NLP research around the world.

Featured Software

AI2 Tango

A Python library for choreographing your machine learning research. Construct machine learning experiments out of repeatable, reusable steps.


AllenNLP Library

A natural language processing platform for building state-of-the-art models. A complete platform for solving natural language processing tasks in PyTorch.

  • Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

    Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep DasigiACL Findings2023 Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021a) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on…
  • HINT: Hypernetwork Instruction Tuning for Efficient Few- and Zero-Shot Generalisation

    Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew E. PetersACL2023 Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on…
  • Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation

    Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, D. Klakow, Yanai ElazarFindings of ACL 20232023 Few-shot fine-tuning and in-context learning are two alternative strategies for task adaptation of pre-trained language models. Recently, in-context learning has gained popularity over fine-tuning due to its simplicity and improved out-of-domain…
  • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

    Yao Fu, Hao-Chun Peng, Tushar Khot, Mirella LapataarXiv.org2023 We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the…
  • Complexity-Based Prompting for Multi-Step Reasoning

    Yao Fu, Hao-Chun Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…


Question Answering on Research Papers

A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.

A Dataset of Incomplete Information Reading Comprehension Questions

13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pages

IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes with a set of links to other Wikipedia pages, and answering the questions requires finding the appropriate links to follow and retrieving relevant information from those linked pages that is missing from the original context.

ZEST: ZEroShot learning from Task descriptions

ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.

ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.


A benchmark for training and evaluating generative reading comprehension metrics.

Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations. MOCHA contains 40K human judgement scores on model outputs from 6 diverse question answering datasets and an additional set of minimal pairs for evaluation. Using MOCHA, we train an evaluation metric: LERC, a Learned Evaluation metric for Reading Comprehension, to mimic human judgement scores.

Inside the secret list of websites that make AI like ChatGPT sound smart

The Washington Post
April 19, 2023
Read the Article

AI can help address climate change—as long as it doesn’t exacerbate it

Fast Company
February 15, 2023
Read the Article

How to Detect AI-Generated Text, According to Researchers

February 8, 2023
Read the Article

Could AI help you to write your next paper?

October 31, 2022
Read the Article

How to shrink AI’s ballooning carbon footprint

July 19, 2022
Read the Article

These simple changes can make AI research much more energy efficient

MIT Tech Review
July 6, 2022
Read the Article

Measuring AI’s Carbon Footprint

IEEE Spectrum
June 26, 2022
Read the Article

Why Historical Language Is a Challenge for Artificial Intelligence

November 16, 2021
Read the Article


  • NLP Highlights

    NLP Highlights is AllenNLP’s podcast for discussing recent and interesting work related to natural language processing. Hosts from the AllenNLP team at AI2 offer short discussions of papers and occasionally interview authors about their work.

    You can also find NLP Highlights on Apple Podcasts, Spotify, PlayerFM, or Stitcher.