Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
How Good Are My Predictions? Efficiently Approximating Precision-Recall Curves for Massive Datasets
Large scale machine learning produces massive datasets whose items are often associated with a confidence level and can thus be ranked. However, computing the precision of these resources requires…
Learning to Predict Citation-Based Impact Measures
Citations implicitly encode a community's judgment of a paper's importance and thus provide a unique signal by which to study scientific impact. Efforts in understanding and refining this signal are…
Should Artificial Intelligence Be Regulated?
New technologies often spur public anxiety, but the intensity of concern about the implications of advances in artificial intelligence (AI) is particularly noteworthy. Several respected scholars and…
The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction
This paper describes our submission for the ScienceIE shared task (SemEval-2017 Task 10) on entity and relation extraction from scientific papers. Our model is based on the end-to-end relation…
YOLO9000: Better, Faster, Stronger
We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both…
LCNN: Lookup-based Convolutional Neural Network
Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for…
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the…
Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question…
Asynchronous Temporal Fields for Action Recognition
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and…
Target-driven visual navigation in indoor scenes using deep reinforcement learning
Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly)…