Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
ABC: Attention with Bounded-memory Control
Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths,…
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Can we enable NLP models to appropriately respond to instructional prompts and consequently generalize to new tasks? To study this question, we leverage the existing NLP datasets and the…
Extracting Latent Steering Vectors from Pretrained Language Models
Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We…
Generated Knowledge Prompting for Commonsense Reasoning
Despite their ability to capture large amount of knowledge during pretraining, large-scale language models often benefit from incorporating external knowledge bases, especially on commonsense…
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on,…
Generating Scientific Definitions with Controllable Complexity
Unfamiliar terminology and complex language can present barriers to understanding science. Natural language processing stands to help address these issues by automatically defining unfamiliar terms.…
Is GPT-3 Text Indistinguishable from Human Text? SCARECROW: A Framework for Scrutinizing Machine Text
Modern neural text generation systems can produce remarkably fluent and grammatical texts. While earlier language models suffered from repetition and syntactic errors, the errors made by contemporary…
Reframing Instructional Prompts to GPTk's Language
How can model designers turn task instructions into effective prompts for language models? Backed by extensive empirical analysis on GPT3, we observe important features for successful instructional…
Saturated Transformers are Constant-Depth Threshold Circuits
Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that…
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that…