SatlasPretrain Models: foundation models for satellite and aerial imagery
Piper Wolters / April 23, 2024
The remote sensing field contains a wide variety of applications, from monitoring deforestation of different drivers, to assessing bicycle and pedestrian infrastructure in urban areas, to detecting vessels in marine protected areas. Each use case requires specific imagery types and labels, and often training a new model from scratch. For remote sensing tasks to be solved efficiently and accurately, it is critical that large-scale pre-trained models are open and available for others to use as a starting point for their research and applications.
Open Code, Models, and Demos
We release large-scale pre-trained geospatial models that can be fine-tuned on downstream tasks, leading to faster training and improved performance compared to training from other initializations. The model weights and source code for SatlasPretrain models are released on github, with comprehensive documentation and examples. These models are released under ODC-BY.
The satlaspretrain-models pip package can be used to easily import SatlasPretrain models into your codebase. Additionally, we have integrated some of the SatlasPretrain backbone models into the popular geospatial package, torchgeo, for seamless integration into many geospatial workflows.
We provide 1) a demo showing how to fine-tune a SatlasPretrain model on the EuroSAT classification task using the satlaspretrain-models pip package, and 2) a torchgeo demo that walks through how to load SatlasPretrain weights into a model, download a dataset, initialize a trainer, and fine-tune a model on the UCMerced classification task.
Large-Scale Pre-training
SatlasPretrain models were trained on SatlasPretrain, a large-scale dataset specifically tailored for the pre-training of remote sensing imagery tasks. This dataset covers 50x more of the Earth than the previous largest dataset (FMOW), and spans different seasonal conditions. It consists of a comprehensive collection of Sentinel-2, Sentinel-1, LandSat-8/9, and aerial imagery, along with 302 million labels. The labels are spread across seven task modalities and over a hundred unique tasks, including land cover segmentation, crop type classification, and building detection.
This work was published at the International Conference on Computer Vision (ICCV) 2023.
SatlasPretrain contains diverse images and labels.
SatlasPretrain Model Architecture
The SatlasPretrain model architecture consists of three main components: backbone, feature pyramid network, and prediction head.
For models trained on multi-image input, the backbone is applied on each individual image, and then max pooling is applied in the temporal dimension, i.e., across the multiple aligned images. Single-image models input an individual image.
The satlaspretrain-models package allows you to load a pretrained backbone or pretrained backbone + feature pyramid network, along with randomly initialized prediction heads for various task types. The available backbones include Swin-v2-Base, Swin-v2-Tiny, ResNet50, and ResNet152.
SatlasPretrain Model Architecture.
Foundation Models for Multiple Sensors
Imagery produced by different sensors have varying resolutions, styles, and noise, so it is important to match the modality of a downstream task to a foundation model trained on the same modality. We train and release pre-trained models with SatlasPretrain labels on each of the following image modalities:
The four image modalities supported by SatlasPretrain as well as SatlasPretrain Foundation models.
SatlasPretrain foundation models have already proven useful in producing high-quality, global geospatial outputs for the Satlas website, for things like treecover density and solar farm polygons.
We hope that these accessible models will lead to faster training and improved performance for a variety of downstream remote sensing applications. In the near future, we will be building more high-quality models for problems in the earth monitoring and climate change space, and researching ways to improve the efficiency and performance of remote sensing models in general.