An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Back to Square One: Bias Detection, Training and Commonsense Disentanglement in the Winograd Schema

Yanai ElazarHongming ZhangYoav GoldbergDan Roth

2021

EMNLP

The Winograd Schema (WS) has been proposed as a test for measuring commonsense capabilities of models. Recently, pre-trained language model-based approaches have boosted performance on some WS…

Contrastive Explanations for Model Interpretability

Alon JacoviSwabha SwayamdiptaShauli RavfogelYoav Goldberg

2021

EMNLP

Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce…

Parameter Norm Growth During Training of Transformers

William MerrillVivek RamanujanYoav GoldbergNoah A. Smith

2021

EMNLP

The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine,…

Transformer Feed-Forward Layers Are Key-Value Memories

Mor GevaR. SchusterJonathan BerantOmer Levy

2021

EMNLP

Feed-forward layers constitute two-thirds of a transformer model’s parameters, yet their role in the network remains underexplored. We show that feed-forward layers in transformer-based language…

Value-aware Approximate Attention

Ankit GuptaJonathan Berant

2021

EMNLP

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all…

What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

Mor GevaUri KatzAviv Ben-ArieJonathan Berant

2021

EMNLP

The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given…

Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization

Inbar OrenJonathan HerzigJonathan Berant

2021

EMNLP

Modern semantic parsers suffer from two principal limitations. First, training requires expensive collection of utterance-program pairs. Second, semantic parsers fail to generalize at test time to…

COVR: A test-bed for Visually Grounded Compositional Generalization with real images

Ben BoginShivanshu GuptaMatt GardnerJonathan Berant

2021

EMNLP

While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In…

Question Decomposition with Dependency Graphs

Matan HassonJonathan Berant

2021

AKBC

QDMR is a meaning representation for complex questions, which decomposes questions into a sequence of atomic steps. While stateof-the-art QDMR parsers use the common sequence-to-sequence (seq2seq)…

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

Mor GevaDaniel KhashabiElad SegalJonathan Berant

2021

TACL

A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce STRATEGYQA, a question…

Previous22-31Next