Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 155 papers
  • LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks

    Mihir Parmar, Aakanksha Naik, Himanshu Gupta, Disha Agrawal, Chitta BaralarXiv.org2023 Many large language models (LLMs) for medicine have largely been evaluated on short texts, and their ability to handle longer sequences such as a complete electronic health record (EHR) has not been systematically explored. Assessing these models on long…
  • The Surveillance AI Pipeline

    Pratyusha Ria Kalluri, William Agnew, M. Cheng, Kentrell Owens, Luca Soldaini, A. BirhanearXiv2023 A rapidly growing number of voices have argued that AI research, and computer vision in particular, is closely tied to mass surveillance. Yet the direct path from computer vision research to surveillance has remained obscured and difficult to assess. This…
  • When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

    Orion Weller, Kyle Lo, David Wadden, Dawn J Lawrie, Benjamin Van Durme, Arman Cohan, Luca SoldainiarXiv2023 Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular…
  • Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms

    Organizer of Queer In AI, Nathaniel Dennler, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu, William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, Jessica de Jesus de Pinho PinhalAIES2023 Bias evaluation benchmarks and dataset and model documentation have emerged as central processes for assessing the biases and harms of artificial intelligence (AI) systems. However, these auditing processes have been criticized for their failure to integrate…
  • Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

    Catherine Chen, Zejiang Shen, Dan Klein, Gabi Stanovsky, Doug Downey, Kyle LoFindings of ACL2023 Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the…
  • Riveter: Measuring Power and Social Dynamics Between Entities

    Maria Antoniak, Anjalie Field, Jimin Mun, Melanie Walsh, Lauren F. Klein, Maarten SapACL2023 Riveter provides a complete easy-to-use pipeline for analyzing verb connotations associated with entities in text corpora. We prepopulate the package with connotation frames of sentiment, power, and agency, which have demonstrated usefulness for capturing…
  • Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

    Li Lucy, Jesse Dodge, David Bamman, Katherine A. KeithFindings of ACL2023 Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring…
  • ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

    Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug DowneyarXiv.org2023 Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways…
  • Perspective: Large Language Models in Applied Mechanics

    Neal R. Brodnik, Samuel Carton, Caelin Muir, Satanu Ghosh, Doug Downey, M. Echlin, T. Pollock, S. DalyJournal of applied mechanics2023 Large language models (LLMs), such as ChatGPT and PaLM, are able to perform sophisticated text comprehension and generation tasks with little or no training. Alongside their broader societal impacts, these capabilities carry great promise for the physical…
  • A Controllable QA-based Framework for Decontextualization

    Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle LoarXiv2023 Many real-world applications require surfacing extracted snippets to users, whether motivated by assistive tools for literature surveys or document cross-referencing, or needs to mitigate and recover from model generated inaccuracies., Yet, these passages can…