Researchers are familiar with the challenge of keeping up-to-date with the latest publications. This can be daunting due to the density and quantity of the text in scientific literature. To address this challenge, many methods have been developed to quickly assess the relevance of literature. One common method is skimming, where researchers glance across the pages to look for key information from figures, headings, and paragraphs.

Our latest project began by asking ourselves, "How might we help researchers to skim more effectively?"

Semantic Scholar addresses this challenge through a multi-institutional collaboration effort we call the Semantic Reader Project. During this project, human-computer interaction (HCI) researchers, natural language processing (NLP) experts, and user experience (UX) practitioners built both research prototypes and a rapidly maturing free scientific PDF reader, called Semantic Reader. This reader shows scientific papers augmented by artificial intelligence (AI) features. The collaboration on Semantic Reader's new skimming feature makes for a note-worthy case study in iterative design.

From Research to Product

The skimming feature started as an internship project by Raymond Fok, a PhD student at the University of Washington, who interned at Semantic Scholar. He researched and developed an early PDF reader with automatically-generated highlights called Scim. This prototype was tested with a total of 458 papers available and 31 participants across two studies. The study results suggest that the Scim system helps study participants gain a high-level understanding of papers and draws their attention to details in the paper that might otherwise be skipped. Participants noted this is particularly helpful in dense passages and with papers in unfamiliar domains. In the diary study, 70.4% of participants responded that the Scim system helped them skim the paper. These results led our team to discuss ways we might release the feature to a broader audience. For more details about the Scim system, please read Scim: Intelligent Skimming Support for Scientific Papers, published at IUI 2023.

Because of the success of this early research and design, this project was passed to our product team at Semantic Scholar. The product team is responsible for building the public-facing Semantic Reader, which is often informed by the early-stage research conducted by our interns and research colleagues. By meeting with Raymond and the other researchers who created the Scim system, we gained valuable learnings and gathered feedback on subsequent iterations.

Key learnings from the early research prototypes:

The practice of skimming during reading and common habits.
Highlight affordances: highlights with a colored background are preferred over low lighting less important text or underlining designs.
Distribution of highlights: users preferred highlights to be spread throughout the text, rather than condensed into one area. Particular attention should be paid to the middle of the paper.
Content diversity: users have a variety of goals when reading, and therefore highlights should contain various content types such as objectives, methods, results, and novelty statements.
Customization: users expect to be able to customize the content types, frequency, and display of highlights.

With these insights, we decided to launch an "earliest testable" version of the skimming feature, a bare-bones version designed to be quickly developed and released so that we can validate and learn from user behavior.

Earliest Testable

We released our first iteration in the fall of 2022, which included 9,259 of the top viewed Computer Science ArXiv papers in Semantic Reader. This first iteration had a total of 92,500 unique user impressions from December 5, 2022 through September 7, 2023 on Semantic Reader. This increase in users and papers helped us to more accurately assess the strengths and weaknesses of this feature. In order to launch quickly, we removed support for faceted highlights, the sidebar, scrollbar visualization, and all controls besides turning on or off the feature. We followed the distribution and accuracy recommendations found in the prior work. This gave us a chance to refine the model and identify potential errors.

We also launched a survey on Semantic Reader to collect feedback about the prototype. Out of the 45 responses, 82.23% said they were Satisfied or Very Satisfied with the feature. We also saw that most users returned to use the feature again, despite it being disabled by default. Comments we received included:

Add the feature on more articles
More flexibility controlling the color and contrast of highlights
Errors such as highlights that stretch across columns of the paper

With this feedback, we decided to proceed with the feature and address the bugs and suggestions we received from the survey.

Multi-Faceted Highlights and Snippets

While this earliest release was exciting, it was only the beginning. We still had many features from the early Scim research we wanted to test. The next two features we chose to address were the multi-faceted display of highlights and the sidebar snippets.

The early research showed that common types of information readers seek include:

Objectives/Goals
Methods
Results
Novel Statements

We wanted to support readers in finding this information with automatically-generated skimming highlights. After testing the new models, we found that Novel Statements often overlapped with the Objectives and Methods highlighting. Therefore, we chose to remove the Novel Statement facets, despite the fact that they had success in the early research.

We also chose to add in this iteration a sidebar, which contained the text of each highlight. The Scim system's user research showed that several participants liked to navigate the highlights using the extracted text in the sidebar.

Designing a Flexible and Accessible Display

Although colored highlights seemed like a natural choice for communicating different types of information in the text, it could be inaccessible to individuals who use assistive technology such as screen readers, those with difficulties seeing low-contrast visuals, or users with certain types of color blindness. Seen through the lens of inclusive design, we decided that a one-size-fits-all approach would not work for this feature.

We created a customizable interface that would allow users to adjust the display of Semantic Reader's skimming highlights based on their needs. Display adjustments we've built in include:

Adjust highlight contrast
Show margin labels that indicate type of highlight without relying on color
Control the number of highlights
Turn on/off types of highlighting

By focusing on accessibility, we created a set of customizable options aiming to allow all users to adapt the interface to meet their needs. These changes are reflected in the latest release of the Semantic Reader skimming system, which was released on October 6, 2023. This feature is available on 480,000 computer science papers from ArXiv sources, which are written in English. Semantic Reader is only available today on desktop devices.

Limitations & Future Work

Using principles of accessible design and development along with usability research, we aim to make skimming as accessible to users with assistive technology as we can. Due to ongoing challenges with parsing PDFs into HTML for screen readers, Semantic Reader is not fully accessible yet. However, in its current state, screen reader users can skim the paper by reading the extracted highlighted text in the side panel.

Today, skimming is available on a subset of Semantic Reader papers (480,000 of the ~6M papers available via Semantic Reader). We'd initially planned to release it more widely but discovered errors in our scientific PDF parsing system were far more prevalent in non-computer science papers and PDFs generated from non-LaTex-based sources. We are working on improving our scientific PDF parsing systems for a wider variety of papers and hope to make Skimming more broadly available in the future.

We also have a full-scale usability study of the Semantic Reader underway, during which we aim to learn about how its features interact with one another, uncover usability issues, and collect additional user feedback. These results will help us assess the design and modeling changes that have been made since the early HCI studies and plan for future iterations of the feature.

We plan to continue the cross-discipline partnership in the Semantic Reader project as we continue to test new and experimental ways to help researchers read and understand scientific papers.

You can try Semantic Reader's skimming feature today by visiting our example paper, or another one of 480,000 computer science papers from ArXiv sources, which are written in English on Semantic Reader. Please note, you must be on a desktop device to use Semantic Reader.

Special thanks to the collaborators on this project: Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, Daniel S. Weld, Kelsey MacMillan, YenSung Chen, Cecile Nguyen, Eric Marsh, Huy Tran, Tyler Murray, Chloe Anastasiades, Smita Rao, Evie Cheng, Jenna Sparks, Bailey Kuehl, Erin Bransom, and Jordan Buckley.

Cassidy Trier is a Senior Product Designer at Ai2, where she designs human and AI interactions.

Case study: Iterative design for skimming support

From Research to Product

Earliest Testable

Multi-Faceted Highlights and Snippets

Designing a Flexible and Accessible Display

Limitations & Future Work

Subscribe to receive monthly updates about the latest Ai2 news.