Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

Jonathan BraggMike D'ArcyNishant BalepurDaniel S. Weld
2025
arXiv

AI agents hold great real-world promise, with the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new… 

Data Contamination Report from the 2024 CONDA Shared Task

Oscar SainzIker Garc'ia-FerreroAlon JacoviJinglin Yang
2024
arXiv

The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where… 

OLMES: A Standard for Language Model Evaluations

Yuling GuOyvind TafjordBailey KuehlHanna Hajishirzi
2024
arXiv.org

Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to… 

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Ori YoranTomer WolfsonOri RamJonathan Berant
2023
ICLR

Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that… 

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Catherine ChenZejiang ShenDan KleinKyle Lo
2023
Findings of ACL

Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on… 

From Centralized to Ad-Hoc Knowledge Base Construction for Hypotheses Generation.

Shaked Launer-WachsHillel Taub-TabibJennie Tokarev MademY. Shamay
2023
Journal of Biomedical Informatics

Objective To demonstrate and develop an approach enabling individual researchers or small teams to create their own ad-hoc, lightweight knowledge bases tailored for specialized scientific interests,… 

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Ori YoranTomer WolfsonBen BoginJonathan Berant
2023
EMNLP

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple… 

Lexical Generalization Improves with Larger Models and Longer Training

Elron BandelYoav GoldbergYanai Elazar
2022
Finding of EMNLP

While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to… 

Linear Adversarial Concept Erasure

Shauli RavfogelMichael TwitonYoav GoldbergRyan Cotterell
2022
ICML

We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as… 

A Dataset for N-ary Relation Extraction of Drug Combinations

Aryeh TiktinskyVijay ViswanathanDanna NiezniYoav Goldberg
2022
NAACL

Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a…