Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

MultiVerS: Improving scientific claim verification with weak supervision and full-document context

David WaddenKyle LoLucy Lu WangHannaneh Hajishirzi
2022
Findings of NAACL

The scientific claim verification task requires an NLP system to label scientific documents which Support or Refute an input claim, and to select evidentiary sentences (or rationales) justifying… 

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Luca Di LielloSiddhant GargLuca SoldainiAlessandro Moschitti
2022
NAACL

Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show… 

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

Zejiang ShenKyle LoLauren YuDoug Downey
2022
arXiv

With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy… 

Data Governance in the Age of Large-Scale Data-Driven Language Technology

Yacine JerniteHuu NguyenStella Rose BidermanMargaret Mitchell
2022
FAccT

The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language… 

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

Sheshera MysoreArman CohanTom Hope
2022
NAACL

We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the… 

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

Zejiang ShenKyle LoLucy Lu WangDoug Downey
2022
TACL

Accurately extracting structured content from PDFs is a critical first step for NLP over scientific papers. Recent work has improved extraction accuracy by incorporating elementary layout… 

Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires

Thong NguyenAndrew YatesAyah ZiriklyArman Cohan
2022
ACL

Automated methods have been widely used to identify and analyze mental health conditions (e.g., depression) from various sources of information, including social media. Yet, deployment of such… 

Zero- and Few-Shot NLP with Pretrained Language Models

Iz BeltagyArman CohanRobert Logan IVSameer Singh
2022
ACL, tutorial

The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult. This is a challenging setting both academically… 

Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions

Emily AllawayJena D. HwangChandra BhagavatulaYejin Choi
2022
arXiv

Generics express generalizations about the world (e.g., “birds can fly"). However, they are not universally true – while sparrows and penguins are both birds, only sparrows can fly and penguins… 

Generating Scientific Claims for Zero-Shot Scientific Fact Checking

Dustin WrightDavid WaddenKyle LoLucy Lu Wang
2022
ACL

Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of training data, as annotation requires domain expertise. To address…