Mobile Phone Data Analytics to Support Disaster and Disease Outbreak Response
Time: Wed 2024-11-27 13.00
Location: Kollegiesalen, Brinellvägen 8, Stockholm
Video link: https://kth-se.zoom.us/j/67011152784
Language: English
Subject area: Geodesy and Geoinformatics, Geoinformatics
Doctoral student: Silvino Pedro Cumbane , Geoinformatik
Opponent: Professor John Östh, Oslo Metropolitan University
Supervisor: Professor Yifang Ban, Geoinformatik; Associated Professor Gyözö Gidofalvi, Geoinformatik
Abstract
Natural disasters result in devastating losses in human life, environmental assets, and personal, regional, and national economies. The availability of different big data such as satellite images, Global Positioning System (GPS) traces, mobile Call Detail Records (CDR), social media posts, etc., in conjunction with advances in data analytic techniques (e.g., data mining and big data processing, machine learning and deep learning), can facilitate the extraction of geospatial information that is critical for rapid and effective disaster response. However, disaster response system development usually requires the integration of data from different sources (streaming data sources and data sources at rest) with different characteristics and types, which consequently have different processing needs. Deciding which processing framework to use for a specific big data to perform a given task is usually a challenge for researchers from the disaster management field. While many tasks can be accomplished with population and movement data, for disaster management, a key and arguably most important task is to analyze the displacement of the population during and after a disaster. Therefore, in this thesis, the knowledge and framework resulted from a literature review. These were used to select tools and processing strategies to perform population displacement (the forced movement or relocation of people from their original homes) analysis after a disaster. This is a use case of the framework as well as an illustration of the value and challenges (e.g., gaps in data due to power outages) of using CDR data analysis to support disaster management.
Displaced populations were inferred by analyzing the variation of home cell-tower for each anonymized mobile phone subscriber before and after a disaster using CDR data. The effectiveness of the proposed method is evaluated using remote sensing-based building damage assessment data and Displacement Tracking Matrix (DTM) from individuals’ survey responses at shelters after a severe cyclone in Beira city, central Mozambique, in March 2019. The results show an encouraging correlation coefficient (over 70%) between the number of arrivals in each neighborhood estimated using CDR data and from DTM. In addition to this, CDR-based analysis derives the spatial distribution of displaced populations with high coverage of people, i.e., including not only people in shelters but everyone who used a mobile phone before and after disaster. Moreover, results suggest that if CDR data are available after a disaster, population displacement can be estimated. These details can be used for response activities and for example to contribute to reducing waterborne diseases (e.g., diarrheal disease) and diseases associated with crowding (e.g., acute respiratory infections) in shelters and host communities.
Although COVID-19 is not a post-disaster disease, it is an acute respiratory illness that can be severe. By assuming that its characteristics can be similar to an acute respiratory infection following a disaster, a deep learning approach was tested to predict the spread of COVID-19. The tested deep learning approach consists of multilayer BiLSTM. In order to train the model to predict daily COVID-19 cases in low-income countries, mobility trend data from Google, temperature, and relative humidity were used. The performance of the proposed multilayer BiLSTM is evaluated by comparing its RMSE with the one from multilayer LSTM (with the same settings as BiLSTM) in four developing countries namely Mozambique, Rwanda, Nepal, and Myanmar. The proposed multilayer BiLSTM outperformed the multilayer LSTM in all four countries. The proposed multilayer BiLSTM was also evaluated by comparing its root mean squared error (RMSE) with multilayer LSTM models, ARIMA- and stacked LSTM-based models in 8 countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. Finally, the proposed multilayer BiLSTM model was evaluated at the city level by comparing its average relative error (ARE) with the other four models, namely the LSTM-based model considering multilayer architecture, Google Cloud Forecasting, the LSTM-based model with mobility data only, and the LSTM-based model with mobility, temperature, and relative humidity data for 7 periods (of 28 days each) in six highly populated regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The proposed multilayer BiLSTM model outperformed the multilayer LSTM model and other previous models by up to 1.6 and 0.6 times in terms of RMSE and ARE, respectively. Therefore, the proposed model enables more accurate forecasting of COVID-19 cases. This can support governments and health authorities in their decisions, mainly in developing countries with limited resources.
In addition to understanding the disease spread dynamics, rapid implementation of control measures is critical in the case of a post-disaster outbreak. This is crucial to stopping the spread of the disease. However, its implementation is based on informed decisions. Therefore, in order to support the decision-makers, a data-driven approach for estimating spatio-temporal exposure risk of locations using mobile phone data was tested. The approach used anonymized CDR from one of the biggest mobile network operators in Mozambique to estimate the daily origin-destination (OD) matrices. The daily OD matrices are estimated at province level since the available daily COVID-19 cases (validation data) are at that level. COVID-19 was used as a proxy of a post-disaster disease due to the unavailability of daily real-world data of a disease following a natural disaster in Mozambique. The estimated daily OD matrices are then used to construct the daily directed-weighted networks, in which the nodes represent provinces and the edges, the people flowing between each pair of provinces. Then, three centrality measures, namely weighted in-degree centrality, improved in-degree centrality, and weighted PageRank were used to estimate the daily exposure risk of each province. The results were evaluated by computing the Spearman’s rank correlation between risk score estimated using the daily COVID-19 reported cases and the exposure risk estimated using the three measures. The comparison results revealed that the overall weighted PageRank algorithm is the best measure at estimating exposure risk compared to the other two measures. Accordingly, three Poisson regression models were implemented to model the relationship between the COVID-19 cases in each province and the corresponding exposure risk estimated using the three centrality measures. The results showed that the coefficients of the models estimated using the maximum likelihood method are statistically significant (p-value <0.05). This means that the exposure risk does in fact influence the number of COVID-19 cases. Since the sign of the coefficients of the models is positive, we conclude that the number of COVID-19 cases in each province increases with an increase in the spatial exposure risk. The analysis was also conducted at district level, i.e., in Greater Maputo Area (GMA), which is located in the southern part of Mozambique and consists of all Maputo city districts (except Kanyaka), Matola city, Matola-Rio, Boane, and Marracuene districts. However, due to the unavailability of daily COVID-19 cases at district level, the evaluation was done by comparing the daily exposure risk estimated using the three centrality measures and the distribution of different types of points of interest, namely commercial, education, financial, government, healthcare, public, sport, and transport. The results revealed a good Spearman’s rank correlation between education, financial, and transport related points of interest and the three centrality measures. Government related points of interest presented the lowest correlation results compared to the three centrality measures. The remainder of points of interest showed medium-low to medium-high Spearman’s correlation coefficient compared to the three centrality measures. Therefore, anonymized CDR in conjunction with weighted PageRank algorithm can help decision-makers estimate the exposure risk in case of an outbreak and hence reduce the impact of a disease on human lives by imposing several informed interventions to contain and delay its spread.