Skip to main content
To KTH's start page

Multi-Sensor Remote Sensing for Urban Mapping and Change Detection Using Deep Learning

Time: Fri 2024-12-13 09.00

Location: D37, Lindstedtsvägen 5, Stockholm

Video link: https://kth-se.zoom.us/j/65114181594

Language: English

Subject area: Geodesy and Geoinformatics, Geoinformatics

Doctoral student: Sebastian Hafner , Geoinformatik

Opponent: Professor Paolo Gamba, University of Pavia, Italy

Supervisor: Professor Yifang Ban, Geoinformatik

Export to calendar

QC241126

Abstract

Driven by the rapid growth in population, urbanization is progressing at an unprecedented rate in many places around the world. Earth observation (EO) has become a vital tool for monitoring urbanization on a global scale. Modern satellite missions, in particular, provide new opportunities for urban mapping and change detection (CD) through high-resolution imagery and frequent revisits. These missions have enabled multi-modal approaches by integrating data from different satellites, such as Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI). Concurrently, EO data analysis has evolved from traditional machine learning methods to deep learning (DL) models, particularly Convolutional Neural Networks (ConvNets). However, current DL methods for urban mapping and CD face several challenges, such as reliance on large labeled datasets for supervised training, the limited transferability of DL models across geographic regions, the effective integration of multi-modal EO data, and using satellite image time series (SITS) for CD. To address these challenges, this thesis aims to develop novel DL methods for robust urban mapping and CD using multi-source EO data.

First, a semi-supervised learning (SSL) method is introduced, leveraging multi-modal Sentinel-1 SAR and Sentinel-2 MSI data to improve the geographic transferability of urban mapping models. This method employs a dual stream ConvNet architecture to map built-up areas separately from SAR and optical images. By assuming consistent maps should be produced for both modalities, an unsupervised loss for unlabeled data is introduced to penalize discrepancies between them. Extensive evaluation using annotations from the SpaceNet 7 multi-temporal building monitoring dataset demonstrated that this SSL approach (F1 score 0.694) outperforms several supervised approaches (F1 scores ranging from 0.574 to 0.651). Furthermore, it produces built-up area maps that rival or surpass global human settlement maps like GHS-BUILT-S2 and WSF 2019.

For urban CD, a new network architecture is proposed for fusing bi-temporal Sentinel-1 SAR and Sentinel-2 MSI image pairs. This architecture uses a dual stream design to process each modality through separate ConvNets before combining the extracted features at a later stage. The proposed strategy outperforms other ConvNet-based approaches, both with uni-modal and multi-modal data. Additionally, it achieves state-of-the-art (SOTA) performance on the Onera Satellite CD dataset (F1 score 0.600).

Building on this, a second network architecture was developed to adapt the transferability improvement approach for urban CD. This approach uses bi-temporal Sentinel-1 SAR and Sentinel-2 MSI image pairs and outputs urban changes using a difference decoder while mapping built-up areas with a semantic decoder. Similar to the urban mapping method, inconsistencies in built-up area maps across modalities are penalized on unlabeled data. Evaluation on the SpaceNet 7 dataset, enhanced with Sentinel-1 SAR and Sentinel-2 MSI data, shows that the method performs well under limited label conditions, achieving an F1 score of 0.555 with all available labels, and delivering reasonable CD results (F1 score of 0.491) even with only 10 \% of the labeled data. In contrast, supervised multi-modal methods and SSL methods using optical data failed to exceed an F1 score of 0.402 under this condition.

A third urban CD method focuses on detecting changes in consecutive images of SITS (i.e., continuous urban CD). This method introduces a temporal feature refinement module that uses self-attention to enhance ConvNet-based multi-temporal representations of buildings. Additionally, a multi-task integration module employing Markov networks is proposed to generate optimal building map time series based on segmentation and dense change outputs. The proposed method effectively identifies urban changes in high-resolution SITS from PlanetScope (F1 score 0.551) and Gaofen-2 (F1 score 0.440), demonstrating superior performance compared to bi-temporal and multi-temporal urban CD and segmentation methods on two challenging datasets.

Finally, the thesis develops a baseline network for multi-hazard building damage detection using the xBD dataset, which contains bi-temporal images captured before and after natural disasters. The study examines model transferability across disaster types by employing a comprehensive dataset split and proposes incorporating disaster-specific information into the baseline model to account for disaster-specific damage characteristics. The disaster-adaptive model demonstrates improved generalization to unseen events compared to several competing methods.

This thesis addresses key challenges in urban mapping and urban CD, including multi-hazard building damage detection. By advancing methods that leverage multi-sensor EO data and DL techniques, this thesis makes major contributions to timely and reliable urban data production, thereby supporting sustainable urban planning and urban Sustainable Development Goal (SDG) indicators monitoring.

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-356875