Introducing Atlantes: the first AI-powered GPS model for real-time global scale maritime intelligence

April 28, 2025

Henry Herzog, Yawen Zhang, Patrick Beukema, Josh Hansen - Ai2

Introducing Atlantes

Atlantes is a powerful new suite of AI models that use GPS data to analyze vessel behavior in real time. It's been the backbone of Skylight, our platform for maritime intelligence and marine conservation, since its release in Fall of 2024, and today we're open-sourcing it so others can learn from and build on it. We have released the model weights, alongside both the training and inference code. We have released the exact same image/container that is currently used in production. Atlantes is highly efficient and cost-effective to host, processing over 5B GPS messages/day on just 5 T4 GPUs.

Motivation

It is easy to forget that most of the planet is made up of water. The vast majority of goods are transported by sea, and nearly half of the planet depends on the oceans for food. A reliable, intricate, and massive web of shipping activity keeps the planet well fed, well stocked, and moving. But our current use of the ocean is not sustainable and we are putting the health of our oceans at risk. To effectively conserve and sustainably manage this critical resource, we must maintain a watchful eye on maritime activity.

Communication on the high seas

Ships are in constant communication with one another, using an onboard broadcasting system called the Automatic Identification System (AIS). Invented in the 1970s, AIS enables ships to safely navigate, even in bad visibility. Because it is autonomous and continuous, there is a steady stream of GPS coordinates available from nearly every vessel in the world. If you want to understand what is happening on the high seas, AIS provides a rich and indispensable source of information. But interpreting these data is not an easy task. The sheer volume of data, more than 120M unique messages/day, is beyond human analysis. On top of the volume, the sequences are irregular and noisy. These GPS messages are recorded by satellites, transmitted back to Earth, then copied to our servers for inference, all in the span of minutes from every location on the planet.

To convert all this data into actionable information, expert analysts today have to carefully examine weeks or even months of that vessel's history across many thousands of GPS messages. This process often involves extensive manual inspection of the GPS features (speed, proximity to shore, even pictures of the vessel if they are available). Atlantes is a system of GPS transformers that attempts to mimic this expert analysis. These models were trained via supervision against dozens of expert analysts' classifications across many millions of individually labeled GPS messages.

Modeling strategy

For many applications, people want to understand vessel behavior as it happens, they don’t want to know what happened hours, days or weeks ago. From a modeling perspective, the challenge is to reliably interpret a vessel’s behavior with minimal information–ideally just the last GPS message and a small amount of the historical trajectory–so it can all be processed in near real-time.

GPS sequences are not like natural language. On the one hand, the vocabulary is much simpler. A message is composed of a latitude (float), a longitude (float), and a timestamp (float/int). These data are implicitly vectorized and therefore can be modeled without complex embedding strategies. However, GPS sequences are also very noisy and often filled with irregular gaps. Imagine reading a book with words, sentences, or even entire paragraphs redacted.

AIS was specifically designed for collision avoidance (“Hey I am right here”), it was not designed to precisely communicate a specific complex behavior (“Hey, I just ran out of fuel, and I am drifting without control”). Also, unlike natural language, the white space between words varies dramatically even within a single sentence. At the same time, variability between successive tokens is minimal. Vessels at time t are typically doing the same thing they were doing at time t-1. Consider a vessel that is transiting the Atlantic. For many days it will be traveling in what is essentially a straight line at a constant speed and heading. Consider the parallel to language, where words vary significantly across the sentences, and there is hopefully (little) repetition. “The The The The quick quick quick quick brown brown brown brown fox fox fox fox fox”.

Transiting sequences, like the above cargo ship crossing the Atlantic, is straightforward to interpret without sophisticated modeling, a simple speed filter is sufficient. However, consider the complexity of the entire population of GPS sequences across hundreds of vessel categories (e.g. fishing, rescue, ferry, sailboats). If your goal is to distinguish a complex behavior like fishing from sailing or surveying with minimal latency, machine learning and AI become indispensable.

We found modern transformers are well suited to this task, with some modifications to handle the noise and irregularity described above. The ATLAS architecture that makes up our Atlantes suite of models, consists of three key components: continuous point embedding (CPE) layers, 1D CNN layers, and a transformer encoder. Unlike previous GPS sequence modeling architectures, there is no need for any interpolation or feature engineering. The model can learn the desired patterns from the raw sequences of AIS data. This makes the model relatively simple to maintain from an engineering perspective, and more easily adaptable to GPS problems in other domains, from a research perspective.

Scaling the training data

Given the lack of large-scale GPS annotation libraries, we built our own for labeling vessel behaviors. We developed an annotation software and hired two dozen expert maritime analysts capable of classifying GPS sequences. In total, their work resulted in over 15 million labeled GPS messages with extremely high precision and attention to detail.

Example of labeling GPS messages. Atlantes can process 28 activity classifications per second.

Running efficiently at global-scale

It is computationally inefficient and wasteful to run inference on every single GPS message from every single vessel. To reduce computational load, we only run the full model inference when we detect a potential change in behavior. We detect these changes via a CPU friendly, extremely lightweight, simple statistical model based on change point detection of the vessel’s speed and course over ground. If we detect a change, or if the number of messages has surpassed a maximum threshold (n=50 messages), then we run inference on the most recent message received. That way, we can limit the number of inferences to only those that matter, while still preserving the low latency classifications of the most critical information (changepoints).

The lightweight design of the model, just 4.7M parameters, makes it computationally efficient and suitable for deployment on modest hardware. We run the entire system on just 5 T4s and 20 CPUs. This size also enabled extensive unit and integration testing built into continuous integration and continuous deployment, which is typically impractical for larger deep learning-based models. ATLAS has already been deployed and processing the full volume of AIS data (more than 110 million messages per day) for the last several months.

From research to impact

To achieve high performance, it was critical to design the model in close cooperation with a diverse network of expert analysts who depend on this information for decision-making and action. We iterated and refined this model in a staging environment in close collaboration with those experts before we shipped it to production. This has already proven invaluable. In one case, it helped Argentinian authorities intercept a foreign vessel engaged in suspected illegal fishing within minutes of detection. The vessel was fined, marking the first time Argentina has issued an international penalty against a foreign vessel using AI-powered evidence.

This model is one component in a complex and global effort to monitor the planet's oceans for sustainability and conservation. Real-time insights require classifying behavior with very limited information, and care must be taken to ensure that the intelligence we provide is both accurate and worth taking action on. We hope that by open-sourcing the models, we can enable other researchers and environmentalists to better understand the strengths and limitations of this approach and also foster more widespread adoption of maritime transparency.

For more technical details about the model and machine learning strategy, you can check out the links below:

Follow us or share this post:

Ai2 on X Ai2 on LinkedIn Share on X Share on LinkedIn