Papers

Learn more about AI2's Lasting Impact Award
Viewing 91-100 of 292 papers
  • WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

    Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiFindings of EMNLP2022 A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We introduce a novel approach for dataset creation based on worker and AI…
  • Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

    Suchin Gururangan, Dallas Card, Sarah K. Drier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. SmithEMNLP2022 Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and news often serve as anchors for automatically selecting web text most…
  • Modeling the Machine Learning Multiverse

    Samuel J Bell, Onno P. Kampman, Jesse Dodge, Neil D. LawrenceNeurIPS2022 Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the multiverse analysis . Our framework builds upon the multiverse analysis [1] introduced…
  • BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Teven Le Scao, Angela Fan, Christopher Akiki, Elizabeth-Jane Pavlick, Suzana Ili'c, Daniel Hesslow, Roman Castagn'e, A. Luccioni, Franccois Yvon, Matthias Gallé, J. Tow, Alexander M. Rush, Stella Rose Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurenccon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa Etxabe, A. F. Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris C. Emezue, Christopher Klamm, Colin Leong, Daniel Alexander van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo G. Ponferrada, Efrat Levkovizh, Ethan Kim, E. Natan, F. Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady ElSahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jorg Frohberg, Josephine L. Tobing, J. Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben Allal, L. Tanguy, Manan Dey, M. Muñoz, Maraim Masoud, Mar'ia Grandury, Mario vSavsko, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, M. A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, R. L'opez, R. Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, S. Longpre, Somaieh Nikpoor, Stanislav Silberberg, S. Pai, S. Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, V. Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, V. Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal V. Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, T. Bers, Thibault Févry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiang Tang, Zheng Xin Yong, Zhiqing Sun, Shaked Brody, Y. Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, D. Narayanan, Hatim Bourfoune, J. Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, M. Shoeybi, Myriam Peyrounette, N. Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre Franccois Lavall'ee, R. Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, S. Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aur'elie N'ev'eol, Charles Lovering, Daniel H Garrette, D. Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, E. Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, J. Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, S. Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, V. Protasov, V. Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdenvek Kasner, Alice Rueda, Amanda Pestana, A. Feizpour, Ammar Khan, Amy Faranak, A. Santos, A. Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, A. Tammour, Azadeh HajiHosseini, Bahareh Behroozi, B. Ajibade, B. Saxena, Carlos Muñoz Ferrandis, Danish Contractor, D. Lansky, Davis David, Douwe Kiela, D. A. Nguyen, Edward Tan, Emily Baylor, Ezinwanne Ozoani, Fatim T Mirza, Frankline Ononiwu, Habib Rezanejad, H.A. Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, I. Nejadgholi, J. Passmore, Joshua Seltzer, Julio Bonis Sanz, Karën Fort, L. Dutra, Mairon Samagaio, Maraim Elbadri, M. Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, M. Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, N. Fahmy, O. Samuel, Ran An, R. Kromann, Ryan Hao, S. Alizadeh, Sarmad Shubber, Silas L. Wang, Sourav Roy, S. Viguier, Thanh-Cong Le, Tobi Oyebade, T. Le, Yoyo Yang, Z. Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, A. Callahan, Anima Shukla, Antonio Miranda-Escalada, A. Singh, Benjamin Beilharz, Bo Wang, C. Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel Le'on Perin'an, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully A. Burns, Helena U. Vrabec, I. Bello, Isha Dash, J. Kang, John Giorgi, J. Golde, J. Posada, Karthi Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, M. A. Castillo, Marianna Nezhurina, Mario Sanger, M. Samwald, Michael Cullan, Michael Weinberg, M. Wolf, Mina Mihaljcic, Minna Liu, M. Freidank, Myungsun Kang, Natasha Seelam, N. Dahlberg, N. Broad, N. Muellner, Pascale Fung, Patricia Haller, R. Chandrasekhar, R. Eisenberg, Robert Martin, Rodrigo L. Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, S. Bharati, T. A. Laud, Th'eo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yashasvi Bajaj, Y. Venkatraman, Yifan Xu, Ying Xu, Yun-chao Xu, Z. Tan, Zhongli Xie, Zifan Ye, M. Bras, Younes Belkada, Thomas WolfarXiv2022 Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and…
  • Quantifying the narrative flow of imagined versus autobiographical stories.

    Maarten Sap, A. Jafarpour, Yejin Choi, Noah A. Smith, J. Pennebaker, E. HorvitzProceedings of the National Academy of Sciences of the United States of America2022 Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge of narrative event flow enables people to weave together a story. However, comparable computational tools to evaluate the flow of…
  • Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

    Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter ClarkEMNLP • The Third Workshop on Figurative Language Processing 2022 Figurative language (e.g., “he flew like the wind”) is challenging to understand, as it is hard to tell what implicit information is being conveyed from the surface form alone. We hypothesize that to perform this task well, the reader needs to mentally…
  • SciFact-Open: Towards open-domain scientific claim verification

    David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, Hannaneh HajishirziEMNLP 20222022 While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature. Moving to…
  • Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

    Jiacheng Liu, Skyler Hallinan, Ximing Lu, Pengfei He, S. Welleck, Hannaneh Hajishirzi, Yejin ChoiConference on Empirical Methods in Natural Language Processing2022 Knowledge underpins reasoning. Recent research demonstrates that when relevant knowledge is provided as additional context to commonsense question answering (QA), it can substantially enhance the performance even on top of state-of-the-art. The fundamental…
  • Transparency Helps Reveal When Language Models Learn Meaning

    Zhaofeng Wu, Will Merrill, Hao Peng, Iz Beltagy, Noah A. SmitharXiv2022 Many current NLP systems are built from language models trained to optimize unsupervised objectives on large amounts of raw text. Under what conditions might such a procedure acquire meaning? Our system-atic experiments with synthetic data reveal that, with…
  • Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

    R. Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, R. Sifa, C. Bauckhage, Hannaneh Hajishirzi, Yejin ChoiArXiv2022 We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL…