Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 1-10 of 991 papers

ADaPT: As-Needed Decomposition and Planning with Language Models
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar KhotNAACL Findings • 2024 Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative…
Related:
Demo Code
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep DasigiNAACL • 2024 Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work…
Leveraging Code to Improve In-context Learning for Semantic Parsing
Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish SabharwalNAACL • 2024 In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the…
Related:
Code
QualEval: Qualitative Evaluation for Model Improvement
Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin KalyanNAACL • 2024 Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world…
Related:
Code
Improving Language Models with Advantage-based Offline Policy Gradients
Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark O. RiedlICLR • 2024 Language Models (LMs) achieve substantial language capabilities when finetuned using Reinforcement Learning with Human Feedback (RLHF). However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for…
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta, Vaishnavi Shrivastava, A. Deshpande, A. Kalyan, Peter Clark, Ashish Sabharwal, Tushar KhotICLR • 2024 Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior…
Related:
Dataset Code
BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh HajishirziICLR • 2024 Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of…
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chun-yue Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng GaoICLR • 2024 Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we…
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh HajishirziICLR • 2024 Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that…
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke ZettlemoyerICLR • 2024 The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government…

1
2
3
•••
100

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

ADaPT: As-Needed Decomposition and Planning with Language Models

Evaluating In-Context Learning of Libraries for Code Generation

Leveraging Code to Improve In-context Learning for Semantic Parsing

QualEval: Qualitative Evaluation for Model Improvement

Improving Language Models with Advantage-based Offline Policy Gradients

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore