Skip to main content
To KTH's start page

Learning Representations for Tandem Mass Spectra

Self-Supervised Methods and Inductive Biases

Time: Fri 2026-04-17 13.15

Location: Pascal, Gamma-6, Tomtebodavägen 23, Solna

Language: English

Subject area: Biotechnology

Doctoral student: Alfred Nilsson , Genteknologi

Opponent: Professor Björn Wallner, Linköpings universitet

Supervisor: Lukas Käll, Genteknologi, Science for Life Laboratory, SciLifeLab, SeRC - Swedish e-Science Research Centre

Export to calendar

QC 2026-03-27

Abstract

Mass spectrometry (MS) is central to modern proteomics, enabling analysis of proteins and peptides based on their mass-to-charge ratio. Tandem mass spectrometry (MS2) encodes peptide fragmentation patterns and forms the basis for sequence identification. While database search has long dominated this process, deep learning has opened new paths for the direct interpretation of spectra. This thesis investigates how neural networks can learn representations of MS2 spectra. Two complementary research directions are explored.

First, selected self-supervised pretraining strategies are evaluated through controlled downstream experiments using encoders pretrained on unlabeled MS2 corpora. Self-distillation yields global embeddings that implicitly encode aspects of peptide chemical properties, and masked autoencoding provides modest improvements in de novo optimization and accuracy. However, the resulting improvements fall short of state-of-the-art supervised de novo sequencing performance.

Second, we introduce Pairwise Attention, a transformer architecture that incorporates a domain-aligned relational inductive bias by conditioning attention on pairwise mass differences between peaks. This yields consistent performance improvements on standard de novo sequencing benchmarks and strong generalization across datasets.

Overall, the results show that self-supervised learning can recover meaningful structure from raw MS2 data, while architectural inductive biases currently offer the most robust and reliable gains for de novo peptide sequencing.

Link to DiVA