About AllenNLP
The AllenNLP team envisions language-centered AI that equitably serves humanity. We work to improve NLP systems' performance and accountability, and advance scientific methodologies for evaluating and understanding those systems. We deliver high-impact research of our own and masterfully-engineered open-source tools to accelerate NLP research around the world.
Featured Software
AI2 Tango
A Python library for choreographing your machine learning research. Construct machine learning experiments out of repeatable, reusable steps.
ViewAllenNLP Library
A natural language processing platform for building state-of-the-art models. A complete platform for solving natural language processing tasks in PyTorch.
ViewRecent Papers
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo LuoICCV • Proceedings • 2023 Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their strong knowledge…TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A. SmithICCV • Proceedings • 2023 Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation…The Bias Amplification Paradox in Text-to-Image Generation
P. Seshadri, Sameer Singh, Yanai ElazararXiv • 2023 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…LEXPLAIN: Improving Model Explanations via Lexicon Supervision
Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, Noah A. Smith*SEM • Proceedings • 2023 Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s…When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Alex Mallen, Akari Asai, Victor Zhong, R. Das, Daniel Khashabi, Hannaneh Hajishirzi, Annual Meeting of the Association for Computational Linguistics • 2023 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMs…
Recent Datasets
Qasper
Question Answering on Research Papers
A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
A Dataset of Incomplete Information Reading Comprehension Questions
13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pages
IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes with a set of links to other Wikipedia pages, and answering the questions requires finding the appropriate links to follow and retrieving relevant information from those linked pages that is missing from the original context.
ZEST: ZEroShot learning from Task descriptions
ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.
ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.
MOCHA
A benchmark for training and evaluating generative reading comprehension metrics.
Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations. MOCHA contains 40K human judgement scores on model outputs from 6 diverse question answering datasets and an additional set of minimal pairs for evaluation. Using MOCHA, we train an evaluation metric: LERC, a Learned Evaluation metric for Reading Comprehension, to mimic human judgement scores.
Recent Press
Inside the secret list of websites that make AI like ChatGPT sound smart
April 19, 2023
AI can help address climate change—as long as it doesn’t exacerbate it
February 15, 2023
How to Detect AI-Generated Text, According to Researchers
February 8, 2023
Could AI help you to write your next paper?
October 31, 2022
How to shrink AI’s ballooning carbon footprint
July 19, 2022
These simple changes can make AI research much more energy efficient
July 6, 2022
Measuring AI’s Carbon Footprint
June 26, 2022
Why Historical Language Is a Challenge for Artificial Intelligence
November 16, 2021
Podcasts
NLP Highlights
NLP Highlights is AllenNLP’s podcast for discussing recent and interesting work related to natural language processing. Hosts from the AllenNLP team at AI2 offer short discussions of papers and occasionally interview authors about their work.
You can also find NLP Highlights on Apple Podcasts, Spotify, PlayerFM, or Stitcher.