Award Winning Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 40 papers
NLPositionality: Characterizing Design Biases of Datasets and Models
Sebastin Santy, Jenny T. Liang, Ronan Le Bras, Katharina Reinecke, Maarten SapACL • 2023 Design biases in NLP systems, such as performance differences for different populations, often stem from their creator's positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases…Do Androids Laugh at Electric Sheep? Humor"Understanding"Benchmarks from The New Yorker Caption Contest
Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin ChoiACL • 2023 We challenge AI models to “demonstrate un-derstanding” of the sophisticated multimodal humor of The New Yorker Caption Contest. Concretely, we develop three carefully cir-cumscribed tasks for which it suffices (but is not necessary) to grasp potentially…Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha KembhaviCVPR • 2023 We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. VISPROG avoids the need for any task-specific training. Instead, it uses the in-context learning ability of large language…The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks
Nikil Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, Kai-Wei ChangACL • 2023 How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from…Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker
Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia TsvetkovACL • 2023 Theory of Mind (ToM)$\unicode{x2014}$the ability to reason about the mental states of other people$\unicode{x2014}$is a key element of our social intelligence. Yet, despite their ever more impressive performance, large-scale neural language models still lack…LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle LoEACL • 2023 While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers…CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
Joseph Chee Chang, Amy X. Zhang, Jonathan Bragg, Andrew Head, Kyle Lo, Doug Downey, Daniel S. WeldCHI • 2023 When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature…Queer In AI: A Case Study in Community-Led Participatory AI
Organizers Of Queer in AI, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi J, Huan Zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Melvin Selim Atay, Milind Agarwal, Nyx McLean, Pan Xu, A Pranav, Raj Korpan, Ruchira Ray, Sarah Mathew, Sarthak Arora, St John, Tanvi Anand, Vishakha Agrawal, William Agnew, Yanan Long, Zijie J. Wang, Zeerak Talat, Avijit Ghosh, Nathaniel Dennler, Michael Noseworthy, Sharvani Jha, Emi Baylor, Aditya Joshi, Natalia Y. Bilenko, Andrew McNamara, Raphael Gontijo-Lopes, Alex Markham, Evyn Dǒng, Jackie Kay, Manu Saraswat, Nikhil Vytla, Luke StarkFAccT • 2023 We present Queer in AI as a case study for community-led participatory design in AI. We examine how participatory design and intersectional tenets started and shaped this community's programs over the years. We discuss different challenges that emerged in the…Abstract Visual Reasoning with Tangram Shapes
Anya Ji, Noriyuki Kojima, N. Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav ArtziEMNLP • 2022We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with > 1k distinct stimuli, is orders of…Best Long Paper AwardCONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Abhilasha Ravichander, Matt Gardner, Ana MarasovićEMNLP • 2022 The full power of human language-based communication cannot be realized without negation. All human languages have some form of negation. Despite this, negation remains a challenging phenomenon for current natural language understanding systems. To facilitate…