Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 231-240 of 298 papers
Multi-View Learning for Vision-and-Language Navigation
Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Yejin Choi, Noah A. SmitharXiv • 2020 Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified. In this paper, we present a novel training paradigm, Learn…Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah A. Smith arXiv • 2020 Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to…Analyzing Compositionality in Visual Question Answering
Sanjay Subramanian, Sameer Singh, Matt GardnerNeurIPS • ViGIL Workshop • 2019 Since the release of the original Visual Question Answering (VQA) dataset, several newer datasets for visual reasoning have been introduced, often with the express intent of requiring systems to perform compositional reasoning. Recently, transformer models…Evaluating Question Answering Evaluation
Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt GardnerEMNLP • MRQA Workshop • 2019 As the complexity of question answering (QA) datasets evolve, moving away from restricted formats like span extraction and multiple-choice (MC) to free-form answer generation, it is imperative to understand how well current metrics perform in evaluating QA…On Making Reading Comprehension More Comprehensive
Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon MinEMNLP • MRQA Workshop • 2019 Machine reading comprehension, the task of evaluating a machine’s ability to comprehend a passage of text, has seen a surge in popularity in recent years. There are many datasets that are targeted at reading comprehension, and many systems that perform as…ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension
Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt GardnerEMNLP • MRQA Workshop • 2019 Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple…AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matthew Gardner, Sameer SinghEMNLP • 2019 Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model…Do NLP Models Know Numbers? Probing Numeracy in Embeddings
Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt GardnerEMNLP • 2019 The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks. Currently, most NLP models treat numbers in text in the same way as other tokens---they embed them as distributed vectors. Is this enough to capture…Efficient Navigation with Language Pre-training and Stochastic Sampling
Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin ChoiEMNLP • 2019 Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments. In this paper, we report two simple but highly…Global Reasoning over Database Structures for Text-to-SQL Parsing
Ben Bogin, Matt Gardner, Jonathan BerantEMNLP • 2019 State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time (zero-shot), the parser often struggles to select the correct set of database…