An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

Jonathan BraggMike D'ArcyNishant BalepurDaniel S. Weld

2025

arXiv

AI agents hold great real-world promise, with the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new…

Data Contamination Report from the 2024 CONDA Shared Task

Oscar SainzIker Garc'ia-FerreroAlon JacoviJinglin Yang

2024

arXiv

The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where…

OLMES: A Standard for Language Model Evaluations

Yuling GuOyvind TafjordBailey KuehlHanna Hajishirzi

2024

arXiv.org

Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to…

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Ori YoranTomer WolfsonOri RamJonathan Berant

2023

ICLR

Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that…

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Catherine ChenZejiang ShenDan KleinKyle Lo

2023

Findings of ACL

Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on…

From Centralized to Ad-Hoc Knowledge Base Construction for Hypotheses Generation.

Shaked Launer-WachsHillel Taub-TabibJennie Tokarev MademY. Shamay

2023

Journal of Biomedical Informatics

Objective To demonstrate and develop an approach enabling individual researchers or small teams to create their own ad-hoc, lightweight knowledge bases tailored for specialized scientific interests,…

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Ori YoranTomer WolfsonBen BoginJonathan Berant

2023

EMNLP

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple…

Lexical Generalization Improves with Larger Models and Longer Training

Elron BandelYoav GoldbergYanai Elazar

2022

Finding of EMNLP

While ﬁne-tuned language models perform well on many tasks, they were also shown to rely on superﬁcial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to…

Linear Adversarial Concept Erasure

Shauli RavfogelMichael TwitonYoav GoldbergRyan Cotterell

2022

ICML

We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as…

A Dataset for N-ary Relation Extraction of Drug Combinations

Aryeh TiktinskyVijay ViswanathanDanna NiezniYoav Goldberg

2022

NAACL

Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a…

Previous2-11Next