How do researchers actually use AI-powered science tools? Lessons from 250,000+ queries

February 27, 2026

Ai2

What do researchers actually do with AI-powered science tools? Turns out, their habits aren’t always in line with what agentic tool developers – including our own Asta team – expect. Our new open dataset of over 258,000 real researcher queries reveals that scientists aren't just using AI to search or synthesize; they're rewriting the rules of what search even means, submitting queries seven times longer than traditional searches, navigating results out of order, and importing tricks they've learned from general-purpose chatbots into tools that were never designed for it. The gap between how these tools were built and how researchers actually use them is wide—and, for the builders of AI tools, instructive.

Today we’re releasing the Asta Interaction Dataset (AID)—258,935 queries and 432,059 clickstream interactions from researchers using Asta, our AI-powered research assistant integrated with Semantic Scholar (S2). Collected over six months (February–August 2025) from users across dozens of disciplines, AID captures not just what researchers ask, but how they engage with the results: which sections they expand, which citations they click, which reports they revisit days later, and so on.

To our knowledge, this is the largest open dataset of how researchers interact with AI-powered scientific tools. Prior reports on AI tool usage – from Anthropic, OpenAI, Perplexity, and others – share only aggregate analyses without the underlying data. Existing public conversation datasets like LMSYS-Chat-1M, WildChat, and OpenAssistant contain general-purpose LLM conversations, but none are specific to scientific research tools or include rich clickstream signals. We’re releasing the full query text, interaction logs, and a reusable query taxonomy because we believe the community needs shared, open data to make progress on understanding how researchers actually use these tools.

In this post, we walk through a few of the things we found.

Asta: Two AI-powered research interfaces

Asta is an open research assistant platform integrated with S2, a major academic search engine. It exposes two AI-powered interfaces:

PaperFinder (PF): An AI-enhanced literature search tool that returns a ranked list of papers with lightweight LLM-generated synthesis. (In Asta, this powers the Find papers feature.)
ScholarQA (SQA): A scientific question-answering tool that produces structured, multi-section reports with inline citations, essentially an automated literature summary tool that produces structured reports on demand. (In Asta, this powers the Generate a report feature.)

Both tools use retrieval-augmented generation (RAG) over a scholarly corpus, grounding all claims in retrieved papers via inline citations. As a baseline, we also compare against traditional S2 keyword search.

_{A note on privacy: We take protecting user data very seriously. In Asta, users can choose to share their de-identified interactions for inclusion in public research datasets—our study draws exclusively from users who opted in. For these opted-in interactions, we use hashed report identifiers with no user IDs, and remove queries flagged by an LLM as containing PII (less than 1%).}

Queries are longer, more complex, and more demanding

Users of AI-powered tools submit dramatically longer and more complex queries compared to those submitted to traditional academic search engines:

Metric	PaperFinder	ScholarQA	Semantic Scholar (traditional)
Avg. constraints per query	0.60	0.82	0.15
Avg. entities per query	4.00	5.14	2.25
Avg. relations per query	2.17	2.68	1.20
Avg. query length (words)	17.04	36.96	5.35

SQA queries are seven times longer than traditional S2 searches. And this isn’t just verbosity: queries contain more entities, more relationships, and more explicit constraints.

Interestingly, even traditional S2 queries have gotten more complex between 2022 and 2025: average query length grew from 4.8 to over 6 words, and the fraction of queries with at least one constraint rose from 7% to 10%. This suggests users increasingly expect search systems to handle more complex queries, likely shaped by their exposure to AI-powered tools.

What are researchers actually asking for?

To make sense of this query diversity, we developed a new taxonomy covering query intents, phrasing styles, and search criteria types. This taxonomy was constructed via an iterative human-and-LLM process that involved multiple passes of human reviewer inspection, labeling, and LLM-based inspection and labeling. The distribution of these labels reveals that users treat AI research tools as collaborative partners in the research process rather than just as systems that help them explore the literature.

Beyond keywords: what researchers actually type into AI tools

Some of the most revealing findings came from simply reading what users type into the search box. Beyond the standard taxonomy, we found query patterns that show users probing the boundaries of what AI research tools can do. These behaviors reflect phrasing strategies shaped by general-purpose LLMs:

Pattern	Tool	Example Query	Why It’s Interesting
Template Filling	PF	“fill this tabel with 10 jurnal bellow:…” [table template with citations]	Users paste structured templates (tables, forms) and expect the AI to populate them with literature data—treating the research tool as a data entry assistant.
Template Filling	SQA	“for sacubitril find all: ‘IUPAC Name: CAS Number: Molecular Formula:…’” [15+ fields]	Users submit structured extraction tasks with 15+ fields, expecting the tool to act as a fact-extraction pipeline over the literature.
Explicit Prompting	SQA	“You are an expert research assistant specializing in computational geosciences and machine learning.”	Users apply prompt engineering techniques (system prompts, persona assignment) learned from general-purpose LLMs, even though our tool doesn’t support custom system prompts.
Explicit Prompting	PF	“Find papers…The model must be capable of…”	Users use markdown-style emphasis (bold, caps) to stress constraints, revealing expectations shaped by conversational AI.
Persona Adoption	SQA	“Think of yourself as experienced professor…Please write me a phd proposal…devour Turnitin detection bots”	Some users ask the tool to adopt an expert persona and even attempt to circumvent plagiarism detection—a behavior shaped by general-purpose LLM interactions.
Collaborative Writing	SQA	“I’m working on my paper…” [LaTeX section] “add papers from TSE, TOSEM, ICSE”	Users paste their in-progress LaTeX manuscripts and ask the tool to find and insert citations from specific venues—using it as a collaborative writing partner.
Research Lineage	PF	“What are latest advances in research fields of these three papers?” [3 DOIs]	Users paste DOIs and ask the tool to trace the research lineage forward, treating it as a citation graph explorer.
Refinding	PF	“hey whats the name of the paper that did a study on how people use llms by allowing the public to use their tokens on paid llms…”	Users describe half-remembered papers in conversational language, using the tool as a “tip-of-my-tongue” paper finder—a task traditional search handles poorly.
Refinding	PF	“…paper using BERT that says we cant just look at top-k…which paper says this”	Users recall a specific claim from a paper they read before and ask the tool to identify the source—a sophisticated citation recovery task.

These patterns reveal a key insight: users expect AI research tools to function as collaborative research partners with capabilities similar to general-purpose chatbots. They bring habits from general-purpose LLMs – such as prompt engineering, persona assignment, template filling, and collaborative writing – into a domain-specific platform. Some of these imported behaviors raise obvious concerns—the dataset includes queries that attempt to circumvent plagiarism detection. We include them because understanding how users actually behave, not just how we hope they behave, is the point.

How users engage with results

We also analyzed what users do after submitting a query. The engagement patterns differ sharply from traditional search.

Results as persistent artifacts

One of our most striking findings is that users treat AI-generated outputs as persistent artifacts rather than ephemeral search results. Over 50% of SQA users and 42% of PF users revisit previous reports—substantially more than the rate of near-duplicate query submission (~19% and ~15%, respectively). Users come back to their results hours or days later, suggesting they bookmark and reference these outputs as part of their ongoing research workflow. This has direct implications for how we think about generated content: if users are returning to these outputs, we need better ways to help them manage and build on past reports and, more critically, mechanisms for keeping them current as new literature appears.

Non-linear reading in SQA

SQA’s structured report format enables rich, non-linear reading behaviors. We found that:

Users skip the introduction 43% of the time, jumping directly to sections of interest
Over 52% of reports involve non-consecutive section expansions
Users frequently navigate backwards and return to earlier sections from later ones

This supports our design choice of collapsible sections with TL;DR summaries, which lets users efficiently triage which sections deserve deeper reading. It also suggests that future designs would benefit from supporting non-linear navigation, rather than assuming sequential consumption.

The dataset: A resource for the community

We’re publicly releasing AID because we believe the community needs shared, open data to make progress on understanding how researchers use AI tools. Here’s what makes it distinctive:

Scale: 258,935 queries and 432,059 clickstream interactions from a 6-month period (Feb–Aug 2025)
Rich interaction signals: Beyond query text, the dataset includes section expansions, link clicks, evidence clicks, report section titles, cited paper IDs, and shown search result positions—enabling analysis of the full user interaction journey, not just queries
Domain focus: Unlike broad-domain datasets (LMSYS-Chat, WildChat, etc.), AID is specifically from researchers using scientific tools, making it directly relevant for studying AI-assisted research workflows
Open taxonomy: Alongside the dataset, we release our full query intent taxonomy with definitions and examples, providing a reusable framework for classifying queries to AI research assistants

AID is released as six Parquet files (queries, section expansions, S2 link clicks, report section titles, report corpus IDs, and PF shown results). If you're building AI tools for researchers, this data will probably surprise you. It surprised us.

Read our technical report for additional details.