Satellite and UAV Imagery for Flood Mapping and Damage Assessment in Mozambique using Machine Learning
Time: Thu 2025-06-12 09.30
Location: D3, Lindstedtsvägen 5, Stockholm
Video link: https://kth-se.zoom.us/j/67206163625
Language: English
Subject area: Geodesy and Geoinformatics, Geoinformatics
Doctoral student: Manuel Nhangumbe , Geoinformatik
Opponent: Professor Alfonso Vitti, University ofTrento, Trento, Italy
Supervisor: Professor Yifang Ban, Geoinformatik
QC 20250523
Abstract
Floods are becoming increasingly frequent and impactful worldwide, withtheir severity intensifying due to climate change. This growing threat hasmade all countries more vulnerable to natural disasters. Over the past fewdecades, Mozambique has been particularly affected by several tropical cyclones(TCs). In 2019, following the devastation caused by TCs Idai andKenneth, Mozambique became the first country in southern Africa to bestruck by two cyclones in the same rainy season. In 2023, it was hit twice bythe same cyclone, TC Freddy, which was also recorded as the longest-lastingcyclone on record.
Given the extent of the damage caused by such events, there is an urgentneed for efficient and cost-effective methods to map both flooded andflood-prone areas. These methods are essential for aiding local authorities indisaster preparedness, planning, and impact mitigation. Moreover, they playa vital role in providing information that supports evidence-based decisionmakingfor sustainable development. Several remote sensing (RS) approacheshave been proposed for post-flood assessment, including those based on machinelearning (ML) and deep learning (DL). While effective, these approachesoften require large amounts of annotated data and are typically task-specific,limiting their scalability and adaptability especially in data-scarce regions.
In this study, we investigate the use of multi-temporal Sentinel-1 (S1) SyntheticAperture Radar (SAR) and Sentinel-2 (S2) Multi-Spectral Instrument(MSI) data, along with other data sources, to develop scalable, cost-effective,and computationally efficient methods for near real-time flood mapping andflood damage assessment (DA) in Mozambique. Additionally, we explore theuse of Geo-Foundation Models (GFMs) on small datasets for flood mappingand DA, including ML-based alternatives to DL approaches.
As such, three approaches for flood mapping are proposed. The first isa fully automated method for near real-time flood mapping, utilizing multitemporalS1 data acquired over Beira municipality and the Macomia district.It identifies flooded areas by computing the difference between imagesacquired before and after the flooding event, followed by Otsu’s thresholdingmethod for automatic flood area extraction. The second approach employsboth supervised and unsupervised ML methods, such as Support VectorMachines (SVM) and K-Means clustering, leveraging a dataset provided byDrivenData, which was launched as part of a competition for flood mappingusing SAR data. This dataset, based on S1, includes VH and VV imageryand labeled data from 13 countries worldwide. By harnessing the processingcapability of the Google Earth Engine (GEE) platform, both approaches arepresented as an alternative to traditional DL methods due to cost-effectivenessand low computational power requirements. The third approach involves finetuninga GFM, named Clay, on the DrivenData dataset for the task of floodmapping. Foundation Models (FMs) refer to models that are pre-trained onbroad datasets typically using large-scale self-supervision and can be adapted(e.g., fine-tuned) for a wide range of downstream tasks. Clay was initiallyiipre-trained for segmentation, classification, and biomass information extractionusing a variety of sensors such as S1, S2, and Landsat. These models arereshaping how traditional ML and DL approaches are trained, significantlyreducing the amount of time and data required for training while maintaininghigh standards of result quality.
Furthermore, we explored the use of S2 MSI data to generate a land cover(LC) map of the study area and estimate the percentage of flooded areaswithin each LC class. The results demonstrate that the combination of S1and S2 data is a reliable approach for near real-time flood mapping and damageassessment. Using the first approach, we automatically mapped floodedareas with an overall accuracy of about 87–88% and kappa of 0.73–0.75. Thesecond approach also produced satisfactory results, revealing that VH polarizationand the combination of VV+VH performed better than using VVpolarization alone. Specifically, in Cambodia and Bolivia, VH polarizationyielded Intersection over Union (IoU) values ranging from 0.819 to 0.856.Predictions for Beira using VH imagery resulted in an IoU of 0.568, whichrepresents a reasonable outcome. The third approach achieved an IoU exceeding0.92 and an F1-score above 0.96, outperforming the winning DL solutionfrom the DrivenData competition, which attained an IoU of 0.8072 when thedataset was initially released.
The LC classification was validated by randomly collecting over 600 pointsfor each LC class, achieving an overall accuracy of 90–95% with a kappa valueof 0.80–0.94. These results enabled us to identify areas prone to flooding andregions where floodwaters recede more quickly, providing valuable insights forimproved planning. Additionally, we determined the percentage of floodedLC categories such as Agriculture, Mangrove, and Built-up areas, as theirdestruction has significant implications for food security and socio-economicdevelopment.
Furthermore, to obtain more detailed insights into the damage in Beira,we deployed Clay for the task of Building Damage Classification (BDC), finetuningit on the EDDA dataset. The EDDA dataset, released in 2023, consistsof geo-referenced drone imagery captured in Beira after TC Idai. The finetunedmodel achieved a validation IoU of 0.829, which was then comparedto the results from a U-Net implementation that yielded a validation IoU of0.567.
Therefore, the contribution of this thesis lies in providing practical, dataefficientsolutions that enhance local disaster management capabilities andcommunity resilience. We have demonstrated that while ML methods areefficient and cost-effective for near real-time flood mapping, particularly whencombined with Sentinel data, GFMs offer improved accuracy (even with asmall dataset), albeit with slightly higher computational requirements.