Data-Driven Graphical Modelling and Applications in Public Transportation
Time: Fri 2025-01-17 10.00
Location: F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm
Video link: https://kth-se.zoom.us/j/67216916457
Language: English
Subject area: Transport Science, Transport Systems
Doctoral student: Qi Zhang , Transportplanering
Opponent: Associate Professor Carlos M. Lima Azevedo, Department of Technology, Management and Economics, Technical University of Denmark
Supervisor: Docent Zhenliang Ma, Transportplanering; Professor Erik Jenelius, Transportplanering, Centrum för transportstudier, CTS
QC 20241203
Abstract
Efficient public transportation is crucial for reducing traffic congestion, cutting carbon emissions, and ensuring fair access to jobs and services. With modern technology, we now have access to large amounts of public transport data, including passenger movements, vehicle trajectories, and other sensor-generated information. The knowledge hidden behind this data has significant potential to enhance transportation planning, operations, and control. However, effectively representing and organizing, as well as extracting useful information from such data to address public transportation issues remains challenging.
Graphical models have gained significant attention for their strengths in data representation, knowledge interconnection, and complex structure visualization. Notably, knowledge graphs and causal graphs are two distinct types of graphical models and are widely applied in various domains (e.g., social network analysis, drug discovery, and recommendation systems, etc.). Knowledge graphs are good at organizing and connecting massive amounts of data and knowledge, revealing complex relationships, and enabling knowledge mining and inference (answering `what' and `how' questions). Causal graphs are powerful for identifying and analyzing causal relationships, allowing for a deeper understanding of the underlying mechanisms that drive observed data patterns (answering `why' questions).
Specifically, the thesis aims to propose two data-driven graphical models (i.e., the knowledge graph and causal graph) and explore their application scenarios in public transportation. It constructs a mobility knowledge graph to represent and organize mobility data, mine travel patterns between stations, and validate its value in trip destination inference and user-station attention estimation. Then, to gain a deeper understanding of transportation operations, the thesis develops causal discovery models for static data to infer causal relationships and generate causal graphs to analyse the variables causing bus delays. Based on the causal graph, it quantifies the contribution of each variable while considering the causal relationships to support the development of target strategies to mitigate delays. Additionally, the thesis also develops a time series causal discovery model to understand bus delay propagation patterns and effects within the public transportation system from a system perspective.
Papers I and II focus on data organization and knowledge inference, construct a mobility knowledge graph (MKG), and explore its applications in public transportation. Paper I introduces the concept of MKG and proposes a framework for constructing it from smart card data by capturing spatiotemporal travel patterns between stations using both rule-based and neural network-based decomposition methods. It validates the MKG framework and demonstrates its value in inferring trip destinations using only tap-in records. Paper II explores another transportation application, proposing a method to estimate the `real' user-station attention from partially observed station visit counts data. It utilizes the MKG to capture latent spatiotemporal travel dependencies between stations to enhance the estimation process by addressing missing values and cold start problems. The framework is validated with both synthetic and real-world data, demonstrating the value of MKG in user-station attention estimation.
Papers IV-VI focus on the research of causal graphs and their applications in public transportation. Before conducting the causal analysis for bus delay, Paper III conducts an empirical study examining the heterogeneous effects of various factors on bus arrival delays. Paper IV focuses on the operational variables and develops causal discovery methods for static data to analyse the variables causing bus delays and evaluate their performance from statistical data fitting and causality interpretation perspectives. It identifies the optimal causal discovery method for analysing the causes of bus delays. Further, based on the causal graph generated in Paper IV, Paper V develops a causality-based Shapley value approach to quantify the contribution of each variable to bus delays to support efficient transportation decision-making. The results are cross-validated with the conventional model (e.g., regression models) to reveal the difference between correlation-based and causality-based analysis approaches. Moreover, Paper VI develops a time series causal discovery model to infer causal relationships between bus stops and generate the spatiotemporal delay propagation causal graph from time series bus stop delay data. Then, it incorporates complex network theory to analyse the bus delay propagation patterns and effects within the public transportation system.