Generative Deep Learning in Remote Sensing for Wildfire Monitoring
Time: Thu 2025-06-12 14.30
Location: D31, Lindstedtsvägen 5, Stockholm
Video link: https://kth-se.zoom.us/j/64930192359
Language: English
Subject area: Geodesy and Geoinformatics, Geoinformatics
Doctoral student: Eric Brune , Geoinformatik
Opponent: Professor (visiting) Pedram Ghamisi, Lancaster University, England. Group leader Machine Learning at Helmholtz-Zentruni Dresden-Rossendorf, Freiberg, Germany
Supervisor: Professor Yifang Ban, Geoinformatik
QC 20250522
Abstract
Wildfires present escalating global risks, intensified by climate change, demanding effective monitoring strategies. While satellite remote sensing is highly relevant for this task, it faces limitations related to sensor capabilities. High-resolution optical sensors like Sentinel-2 MultiSpectral Instrument (MSI) (10-20 m) provide detailed spatial information but have infrequent revisit times of around 5 days and cannot see through clouds or smoke. Conversely, moderate-resolution sensors like Terra/Aqua MODIS offer daily coverage but lack the necessary spatial detail (250-500 m) for detailed burned area mapping. Synthetic Aperture Radar (SAR) from sensors like Sentinel-1 SAR provides all-weather imaging but is affected by speckle noise and complex signal interactions, making interpretation difficult. This thesis explores how generative deep learning, specifically conditional Diffusion Models (DM), can help overcome these fundamental challenges in satellite-based wildfire monitoring by synthesizing analysis-ready, high-resolution information. These models have shown proficiency in learning complex data distributions and generating high-fidelity samples, making them suitable for data synthesis and translation tasks.
The goal of this thesis is to generate high-resolution (≤30 m) optical representations of wildfires and to map burned areas, overcoming two different sensor limitations. This goal is pursued through two specific objectives. The first objective is to evaluate and develop a multi-task DM capable of fusing moderate-resolution, high-frequency optical data with high-resolution, lower-frequency optical data to generate daily, high-resolution representations of post-fire conditions, including both super-resolved imagery and burned area segmentation maps. The second objective is to design and assess a conditional DM for translating all-weather SAR data into optical-like imagery for post-fire scenes, with the goal of enabling accurate downstream burned area segmentation even when actual optical data is unavailable.
To meet the first objective, a novel multi-task conditional diffusion architecture, FireSR-DDPM, was developed. It uses a U-Net structure within the Denoising Diffusion Probabilistic Model (DDPM) framework and is conditioned on post-fire MODIS imagery (Red, NIR, SWIR bands) and pre-fire Sentinel-2 MSI data. FireSR-DDPM generates both an eight-fold super-resolved post-fire image to near Sentinel-2 MSI-native resolution and a simultaneous burned area segmentation mask via parallel decoder paths from a shared encoder. The multi-task design allows synergistic learning, where spatial detail from super-resolution aids segmentation and semantic context from segmentation guides image generation. An additional feature of the architecture is a feature affinity loss term that explicitly promotes consistency between the internal representations learned by the two decoder branches, improving the effectiveness of the joint optimization for super-resolution and segmentation. Trained and validated using 1,079 Canadian wildfire events (≥ 2,000 ha, 2017-2022) with National Burned Area Composite (NBAC) polygons as reference, FireSR-DDPM showed substantial performance improvements on a 2023 hold-out test set. It achieved high segmentation accuracy (F1=0.8983, IoU=0.8153) and improved perceptual quality in super-resolution (LPIPS=0.1134), clearly surpassing baseline single-task and sequential methods. The model's ability to generate multiple outputs from the same input was also used to derive empirical confidence maps for the segmentation results without needing separate calibration.
For the second objective, a computationally efficient transformer-based DM, Swin-U-DiT, was proposed for SAR-to-Optical translation. This architecture combines a hierarchical U-Net structure with recent developments in Transformers. These blocks integrate the efficiency of the Swin Transformer's windowed self-attention with the U-DiT concept of applying attention to spatially downsampled internal feature representations. This design considerably lowers the computational requirements of self-attention compared to standard Vision Transformer approaches while retaining strong performance. Conditioned via channel-wise concatenation on pre-fire Sentinel-1 SAR (VV, VH), post-fire Sentinel-1 SAR (VV, VH), and pre-fire Sentinel-2 MSI data, Swin-U-DiT learns to generate the corresponding post-fire Sentinel-2 MSI reflectance image. When evaluated on 335 Canadian fires from 2022, Swin-U-DiT produced images with significantly higher fidelity (Fréchet Inception Distance (FID=44.3, LPIPS=0.304) than a standard Pix2Pix GAN baseline. Importantly, the practical usefulness of the generated imagery was confirmed through downstream evaluation: using the Swin-U-DiT translated images as input to a fixed segmentation U-Net (pre-trained on real Sentinel-2 MSI data) notably improved burned area segmentation from F1=0.697 (using only SAR and pre-fire optical inputs) to 0.804. A key finding was the model's efficiency: this performance gain was achieved with only three DDIM sampling steps. This corresponds to a processing time of less than five minutes for a 250 km x 100 km scene on a single consumer-grade GPU (NVIDIA RTX 3080), confirming its suitability for near-real-time regional monitoring.
In conclusion, both research objectives were successfully addressed. The novel contributions include: (i) the integration of 8x super-resolution and segmentation within a single generative multi-task DM (FireSR-DDPM); (ii) the design of an efficient SAR-to-Optical translation architecture (Swin-U-DiT) combining principles from Swin Transformer and U-DiT within a diffusion framework; and (iii) the demonstration that high downstream task performance can be achieved with very few diffusion sampling steps, improving practical feasibility. These methods represent advancements for operational wildfire monitoring. Future work includes extending model training to diverse global biomes, incorporating sequence modeling for analyzing fire progression dynamics, and exploring model distillation for further inference speed improvements.