# AI2 ISRAEL

The Allen Institute for AI Israel office was founded in 2019 in Sarona, Tel Aviv. AI2's mission is to contribute to humanity through high-impact AI research and engineering.

AI2 Israel continues our mission of AI for the Common Good through groundbreaking research in natural language processing and machine learning, all in close association with the AI2 home office in Seattle, Washington.

## Our Focus

The focus of AI2 Israel is bringing people closer to information, by creating and using advanced language-centered AI. As a scientific approach, we believe in combining strong linguistics-oriented foundations, state-of-the-art machine learning, and top-notch engineering, with a user oriented design.

For application domains, we focus on understanding and answering complex questions, filling in commonsense gaps in text, and enabling robust extraction of structured information from text. This is an integral part of AI2’s vision of pushing the boundaries of the algorithmic understanding of human language and advancing the common good through AI.

## Team

• Yoav GoldbergResearch Director, AI2 Israel
• Ron YachiniChief Operating Officer, AI2 Israel
• Jonathan BerantResearch
• Yaara CohenEngineering
• Matan EyalResearch & Engineering
• Tom HopeYoung Investigator
• Yael RachmutOperations
• Micah ShlainResearch & Engineering
• Alon TalmorResearch
• Hillel Taub-TabibResearch & Engineering
• Reut TsarfatyResearch

## Current Openings

AI2 Israel is a non-profit offering exceptional opportunities for researchers and engineers to develop AI for the common good. We are currently looking for outstanding software engineers and research engineers. Candidates should send their CV to: ai2israel-cv@allenai.org

## Research Areas

#### DIY Information Extraction

Data scientists have a set of tools to work with structured data in tables. But how does one extract meaning from textual data? While NLP provides some solutions, they all require expertise in either machine learning, linguistics, or both. How do we expose advanced AI and text mining capabilities to domain experts who do not know ML or CS?

#### Question Understanding

The goal of this project is to develop models that understand complex questions in broad domains, and answer them from multiple information sources. Our research revolves around investigating symbolic and distributed representations that facilitate reasoning over multiple facts and offer explanations for model decisions.

#### Missing Elements

Current natural language processing technology aims to process what is explicitly mentioned in text. But what about the elements that are being left out of the text, yet are easily and naturally inferred by the human hearer? Can our computer programs identify and infer such elements too? In this project, we develop benchmarks and models to endow NLP applications with this capacity.

#### AI Gamification

The goal of this project is to involve the public in the development of better AI models. We use stimulating games alongside state-of-the-art AI models to create an appealing experience for non-scientific users. We aim to improve the ways data is collected for AI training as well as surface strengths and weaknesses of current models.

## Demos

View All AI2 Israel Demos
• ### SPIKE-CORD

Extractive search over CORD-19 with 3 powerful query modes | AI2 Israel, DIY Information Extraction

SPIKE-CORD is powerful sentence-level, context-aware, and linguistically informed extractive search system for exploring the CORD-19 corpus.

Try the demo
• ### SPIKE-CORD

Extractive search over CORD-19 with 3 powerful query modes | AI2 Israel, DIY Information Extraction

SPIKE-CORD is powerful sentence-level, context-aware, and linguistically informed extractive search system for exploring the CORD-19 corpus.

Try the demo
• ### Break

Try the QDMR CopyNet parser | AI2 Israel, Question Understanding

Live demo of the QDMR CopyNet parser from the paper Break It Down: A Question Understanding Benchmark (TACL 2020). The parser receives a natural language question as input and returns its Question Decomposition Meaning Representation (QDMR). Each step in the decomposition constitutes a subquestion necessary to answer the original question. More info: https://allenai.github.io/Break/

Try the demo
• ### Break

Try the QDMR CopyNet parser | AI2 Israel, Question Understanding

Live demo of the QDMR CopyNet parser from the paper Break It Down: A Question Understanding Benchmark (TACL 2020). The parser receives a natural language question as input and returns its Question Decomposition Meaning Representation (QDMR). Each step in the decomposition constitutes a subquestion necessary to answer the original question. More info: https://allenai.github.io/Break/

Try the demo
• ## Recent Papers

• ### Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan BerantTACL2021
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce STRATEGYQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative questions from crowdsourcing workers, while covering a broad range of potential strategies. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Moreover, we annotate each question with (1) a decomposition into reasoning steps for answering it, and (2) Wikipedia paragraphs that contain the answers to each step. Overall, STRATEGYQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs. Analysis shows that questions in STRATEGYQA are short, topicdiverse, and cover a wide range of strategies. Empirically, we show that humans perform well (87%) on this task, while our best baseline reaches an accuracy of ∼ 66%
• ### SmBoP: Semi-autoregressive Bottom-up Semantic Parsing

The de-facto standard decoding method for semantic parsing in recent years has been to autoregressively decode the abstract syntax tree of the target program using a top-down depth-first traversal. In this work, we propose an alternative approach: a Semi-autoregressive Bottom-up Parser (SmBoP) that constructs at decoding step $t$ the top-$K$ sub-trees of height $\leq t$. Our parser enjoys several benefits compared to top-down autoregressive parsing. First, since sub-trees in each decoding step are generated in parallel, the theoretical runtime is logarithmic rather than linear. Second, our bottom-up approach learns representations with meaningful semantic sub-programs at each step, rather than semantically vague partial trees. Last, SmBoP includes Transformer-based layers that contextualize sub-trees with one another, allowing us, unlike traditional beam-search, to score trees conditioned on other trees that have been previously explored. We apply SmBoP on Spider, a challenging zero-shot semantic parsing benchmark, and show that SmBoP is competitive with top-down autoregressive parsing. On the test set, SmBoP obtains an EM score of $60.5\%$, similar to the best published score for a model that does not use database content, which is at $60.6\%$.
• ### MULTIMODALQA: COMPLEX QUESTION ANSWERING OVER TEXT, TABLES AND IMAGES

Ankit Gupta, Jonathan BerantICLR2021
When answering complex questions, people can seamlessly combine information from visual, textual and tabular sources. While interest in models that reason over multiple pieces of evidence has surged in recent years, there has been relatively little work on question answering models that reason across multiple modalities. In this paper, we present MULTIMODALQA (MMQA): a challenging question answering dataset that requires joint reasoning over text, tables and images. We create MMQA using a new framework for generating complex multi-modal questions at scale, harvesting tables from Wikipedia, and attaching images and text paragraphs using entities that appear in each table. We then define a formal language that allows us to take questions that can be answered from a single modality, and combine them to generate cross-modal questions. Last, crowdsourcing workers take these automatically generated questions and rephrase them into more fluent language. We create 29,918 questions through this procedure, and empirically demonstrate the necessity of a multi-modal multi-hop approach to solve our task: our multi hop model, ImplicitDecomp, achieves an average F1 of 51.7 over cross-modal questions, substantially outperforming a strong baseline that achieves 38.2 F1, but still lags significantly behind human performance, which is at 90.1 F1.
• ### Bootstrapping Relation Extractors using Syntactic Search by Examples

Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav GoldbergEACL2021
The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.
• ### First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé SeddahEACL2021
Multilingual pretrained language models have demonstrated remarkable zero-shot crosslingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model’s internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a taskspecific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during finetuning, the task predictor has little importance on the transfer and can be reinitialized during fine-tuning. We present extensive experiments with three distinct tasks, seventeen typologically diverse languages and multiple domains to support our hypothesis.

## Recent Press

View All AI2 Israel Press

מערכת בינה מלאכותית עברה בהצטיינות יתרה מבחן במדעים של כיתה ח' (Artificial Intelligence System Cum Laude Passed 8th Grade Science Test)

Haaretz
September 6, 2019

המחיר המושתק של בינה מלאכותית (The secret price of artificial intelligence)

ynet
August 12, 2019