Research - Foundations of AI

Some of our most influential work has been in the realm of foundational AI research. From fundamental improvements to language model behavior to understanding webscale data sets, our researchers ask hard questions to uncover critical findings about the state of artificial intelligence.

Understanding LLM training data

Large text corpora are the backbone of today’s language models, but we have a limited understanding of the content of these datasets. Our research in this area seeks to uncover important facts about these popular corpora including general statistics, quality, social factors, and potential contamination by evaluation data.

What's in my big data?

Retrieval-augmented generation

Retrieval-augmented generation (RAG) improves the responses of large language models by allowing them to access an additional authoritative knowledge source outside of their training data. Our work in this field offers ways to make RAG more scalable, performant, and more respectful of data concerns like copyright.

Human-AI interaction

To realize the full promise of AI, it is critical to design user interfaces that support effective collaboration with human users. This research area explores a variety of novel interfaces for humans that maximize the helpfulness of AI when engaging with scientific literature, supporting better access, rapid and deep interactive information gathering, and more.

Theoretical insights about LLMs

While language models are somewhat opaque, it is possible to apply theoretical techniques to analyze their intrinsic capabilities and limitations, yielding important fundamental insights about such systems.

Intelligent language agents

As well as answering questions, language models can also act as intelligent agents, interacting autonomously with an external environment to perform complex tasks. Our research focuses on having such agents plan and learn in these environments in order to rapidly improve their performance.

Systematic reasoning with language

While language models are innately good at question-answering, our research has developed new methods for enabling them to reason systematically and to arrive at conclusions in a sound and explainable way.