Towards Unsupervised, Analysable and Scalable Node Embedding Models for Transaction Networks
Time: Wed 2025-12-10 09.00
Location: / F3 (Flodis), Lindstedtsvägen 26
Video link: https://kth-se.zoom.us/j/64433421713
Language: English
Subject area: Computer Science
Doctoral student: Ciwan Ceylan , Robotik, perception och lärande, RPL
Opponent: Associate Professor Davide Mottin, Aarhus University
Supervisor: Professor Danica Kragic Jensfelt, Collaborative Autonomous Systems
QC 20251118
Abstract
The ability to efficiently learn embeddings—low-dimensional vector representations of complex data—has been central to recent advances in machine learning. Network data models, which represent entities (nodes) and their relationships (edges), provide a powerful framework for studying diverse systems, from social interactions and infrastructure to molecular biology. Both research and practical applications have benefited greatly from progress in embedding learning, with node embeddings in particular enabling downstream tasks such as node classification, clustering, anomaly detection, graph alignment, and link prediction.
However, not all network types have seen equal progress. In particular, embedding models for transaction networks—formed by digital payments, transfers, and exchanges—remain underdeveloped, despite their significant potential for applications such as financial crime detection. Several methodological challenges persist in learning node embeddings for transaction networks, as key modalities must be captured while also meeting essential model desiderata. This thesis considers three such desiderata: models should be unsupervised, to address the lack of labelled data; analysable, to ensure interpretability in unsupervised settings; and scalable, to handle the size and complexity of real-world transaction networks.
Guided by these goals, the thesis introduces node embedding models designed to capture three essential transaction network modalities: edge flow, edge directionality, and multi-scale structure. In doing so, it provides both methodological advances and analytical insights. Four key findings are that: (i) it is possible to learn node embeddings that represent transaction flow, something not previously demonstrated; (ii) nodes that only receive transactions (so-called "sinks") degrade embedding quality, but this can be mitigated by combining directed and undirected propagation; (iii) standard message-passing methods can lead to rank deficiency, harming embedding quality, which can be resolved through a new technique called message aggregation; and (iv) embeddings can be made interpretable, with each feature corresponding to a meaningful aspect of the network.
A persistent practical challenge in transaction network research—and a major reason for its limited progress—is the scarcity of accessible datasets, owing to the security and privacy concerns surrounding financial data. This thesis circumvents this issue by focusing on the underlying methodological challenges of node embedding modelling for transaction networks. Extensive empirical evaluations are conducted on both proxy datasets, comprising communication and social networks that share the same key modalities as real-world banking data, and on publicly available cryptocurrency and simulated transaction network datasets, which enable broader validation of the proposed models.