Making AI citations count with Asta

October 8, 2025

Shriya Atmakuri, Amanpreet Singh, and Doug Downey - Ai2

When you ask an AI system a question and get back a detailed answer, you probably already know there are real people behind that knowledge. Underpinning every AI response are hours of work by scientists who conducted experiments, authors who wrote papers, researchers who shared their discoveries with the world, and countless others.

But here's the problem: these people aren’t always properly or precisely credited when AI systems use and synthesize their work to respond to queries. In academic research, we track how often papers get cited—it's how we measure scientific impact. But even when models cite papers in their responses, those citations aren't counted anywhere.

We're taking a first step to change that. Today, we're releasing data that shows which scientific papers our agentic platform for research and discovery, Asta, relies on most when answering questions. Think of it as creating a citation count for the AI era—a way to measure which research is actually powering AI answers across thousands of queries.

Why this matters

Consider a researcher who spends years on a breakthrough, publishes their findings, and then sees AI use that work to answer questions with no record that it was their research enabling those answers.

This isn't just about hurt feelings. In the scientific world, getting cited – referenced by other researchers – can make or break a career. Citations influence who gets hired, promoted, or funded. They're the currency of scientific reputation.

Right now, when AI systems like ChatGPT or Asta cite a paper in their answers, those citations vanish into the void. Unlike traditional academic citations that get tracked in databases like Google Scholar and our Semantic Scholar, nobody's keeping score of which papers are driving AI responses. We think that should change.

What we're doing about it

Starting today, we're releasing statistics showing which papers Asta cites most often when responding to questions. We'll update this data every week, and anyone can access it for free.

Asta helps scientists by reading through mountains of research papers and writing detailed reports that answer their questions, complete with citations. This approach, known as Retrieval-Augmented Generation (RAG), works in two steps: first, the system retrieves relevant source articles from our large database of scientific papers; second, it uses a model to compose a report that synthesizes information from those retrieved sources and cites them.

This has become the standard approach for AI systems that answer questions about scientific literature. Similar "deep research" functionality can be found in OpenAI's Deep Research, Elicit, SciSpace, and FutureHouse's tools.

The RAG approach is necessary because scientific papers relevant to users' questions often come out after a model has been trained. Even for older papers that were in the training data, crucial details typically aren't captured by LLMs, which tend to encode more commonly-expressed general knowledge rather than specific experimental findings.

While RAG doesn't solve the harder problem of tracing how training data shaped a model's knowledge, it does make it plain which sources are being explicitly retrieved and cited in the outputs. And that means we can track and publicly credit those papers.

Every time someone uses Asta’s Summarize literature tool*, we now track which papers get cited and make that information public. (Note that we can only track papers in our Semantic Scholar database.) So far, we've logged data from 113,292 queries over more than 7 months, recording 4,951,364 citations across 2,072,623 different papers.

It's worth noting that the authors in our data are just textual names—we don’t currently attempt to determine which names refer to the same individual in the case of individuals who share a name or go by multiple names. We hope to address this in the coming months by integrating an improved version of the author disambiguation algorithm used in Semantic Scholar.

_{*We only track data from users who’ve opted into detailed metrics sharing.}

What we're learning

The most popular papers reflect who's using the tool. Right now, most Asta users are AI researchers (with biomedical scientists as the next biggest group). Asta’s top five most cited papers are seminal natural language processing studies from the past 10 years. Further, looking at Asta’s top-cited papers in Medicine and Materials Science, we see that several of the top-ranked papers in those fields are also AI-related.

Overall

Title	Venue	Year	Asta Citations
Language Models are Few-Shot Learners	NeurIPS	2020	919
Attention Is All You Need	NeurIPS	2017	620
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	NeurIPS	2022	580
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	NAACL	2019	491
Training language models to follow instructions with human feedback	NeurIPS	2022	384

Medicine

Title	Venue	Year	Asta Citations
Large language models encode clinical knowledge	Nature	2022	87
U-Net: Convolutional Networks for Biomedical Image Segmentation	MICCAI	2015	86
Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality...	CA	2021	85

Materials Science

Title	Venue	Year	Asta Citations
Highly accurate protein structure prediction with AlphaFold	Nature	2021	82
Enhanced Growth of Bacterial Cells in a Smart 3D Printed Bioreactor	Micromachines	2023	67
Exosomes/tricalcium phosphate combination scaffolds can enhance bone...	Stem cell research & therapeutics	2016	43

Naturally, as with any metric, if these rankings became a target for researchers, Goodhart’s law would apply. That is to say, if researchers started optimizing their work to get cited by AI systems such as Asta rather than focusing on genuine scientific contribution, the citation counts would lose their meaning as indicators of impact, and we would have to consider ways to prevent such gaming.

Are we making the "rich get richer"? One worry about AI literature tools is that they might keep recommending the same famous papers over and over, while overlooking equally good work from lesser-known researchers. This concern echoes broader research on preferential attachment and cumulative advantage in citation networks, where highly-cited papers tend to accumulate even more citations over time.

The data suggests the opposite is happening. The table below shows that Asta's citations are actually less skewed toward the most popular papers than citations from human authors are. For papers cited by both Asta and at least one paper in 2025, we compared their citation counts in Asta with their citation counts from papers published in 2025 to date in Semantic Scholar. At higher percentiles—representing the most highly-cited work—the gap widens dramatically: human authors are far more likely than Asta to cite the most prominent papers.

Percentile Rank	Asta Citations %	Papers Citations %
90th	34.0	63.0
95th	23.4	51.4
99th	9.2	30.6
99.5th	6.2	24.1

For example, the top 10% most-cited papers account for 34.0% of all Asta citations, compared to 63.0% of citations from papers published in 2025. The disparity is even more pronounced at the top 1%: these highly-cited papers receive 30.6% of citations from papers but only 9.2% from Asta—more than three times the difference.

This suggests that Asta may actually be helping surface relevant but less famous work that human researchers typically overlook. Rather than amplifying existing inequalities in scientific recognition, Asta appears to be distributing its citations more evenly based on relevance to users' questions, not on a paper's existing popularity.

Of course, this doesn't mean citation inequality isn't a concern—it just means that, at least in this dataset, we're not seeing evidence that AI tools are making it worse. One caveat is that Asta’s citations reflect not only how the system works, but also the queries that users pose. Under a different query workload, we may see different patterns.

The retraction problem. Asta occasionally cites papers that have been retracted (pulled from the scientific record because of errors or misconduct). Important recent reporting has drawn attention to this fact—a number of AI systems built to aid in scientific research inadvertently cite retracted papers.

We investigated this phenomenon in Asta by checking our citations log against the RetractionWatch database, which tracks problematic papers**. Out of roughly five million citations, about 5,448 went to retracted papers. That's only 0.11%, meaning it's rare—but not zero. Even occasionally surfacing retracted content could lead to mistakes, so we're working hard on fixing this by flagging or removing retracted papers from our results.

_{**We used the}_{2025/09/25 snapshot}_{of the database.}

What this doesn’t solve—yet

Being transparent about citations is a good start, but it doesn’t solve the whole problem. The remaining challenge is trickier: AI models are trained on trillions of words from millions of sources, and every single piece of text leaves a tiny fingerprint on the AI model's parameters—the billions of numerical values that determine how it thinks and responds. Right now, we don't have a foolproof way to trace which training materials influenced a particular AI’s answers.

Our citation data only captures papers that Asta explicitly references in its reports—it doesn't account for all the knowledge baked into the AI model's parameters during training. This is a hard, unsolved technical problem.

The research community is pursuing several promising approaches to tackle this. These include mathematical techniques based on influence functions, new architectures like our FlexOlmo that use mixture-of-experts designs to explicitly separate contributions from different data sources, and tools like our OLMoTrace that let users quickly find the training text most similar to an AI’s output.

Together with the wider AI community, we’ll continue to refine and develop new approaches to properly credit the data that shapes AI behavior.

Looking ahead

The future of AI should include credit where credit is due, as others have long argued. Ours is just a first step toward a bigger vision: a world where people who create content can share their work freely, confident that they'll get credit when AI systems use it and that this credit will be publicly tracked across those systems’ answers.

We hope other AI companies will follow suit and release similar data—imagine if we could see which scientific papers are most influential across all the major AI systems. It would give us an unprecedented view of what's shaping scientific understanding in real time.

Want to explore the data yourself? We'll be updating it regularly here, and we'd love to hear your thoughts on what other features would be useful.