Skip to main content
To KTH's start page To KTH's start page

Towards Scalable Machine Learning with Privacy Protection

Time: Tue 2023-11-21 10.00

Location: D31, Lindstedtsvägen 5, Stockholm

Language: English

Subject area: Computer Science Information and Communication Technology

Doctoral student: Dominik Fay , Reglerteknik

Opponent: Professor Antti Honkela, Department of Computer Science, University of Helsinki, Helsinki, Finland

Supervisor: Professor Mikael Johansson, Reglerteknik; Professor Tobias J. Oechtering, Teknisk informationsvetenskap; Assistant professor Jens Sjölund, Department of Information Technology, Division of Systems and Control, Uppsala University, Uppsala, Sweden

Export to calendar

QC 20231101


The increasing size and complexity of datasets have accelerated the development of machine learning models and exposed the need for more scalable solutions. This thesis explores challenges associated with large-scale machine learning under data privacy constraints. With the growth of machine learning models, traditional privacy methods such as data anonymization are becoming insufficient. Thus, we delve into alternative approaches, such as differential privacy.

Our research addresses the following core areas in the context of scalable privacy-preserving machine learning: First, we examine the implications of data dimensionality on privacy for the application of medical image analysis. We extend the classification algorithm Private Aggregation of Teacher Ensembles (PATE) to deal with high-dimensional labels, and demonstrate that dimensionality reduction can be used to improve privacy. Second, we consider the impact of hyperparameter selection on privacy. Here, we propose a novel adaptive technique for hyperparameter selection in differentially gradient-based optimization. Third, we investigate sampling-based solutions to scale differentially private machine learning to dataset with a large number of records. We study the privacy-enhancing properties of importance sampling, highlighting that it can outperform uniform sub-sampling not only in terms of sample efficiency but also in terms of privacy.

The three techniques developed in this thesis improve the scalability of machine learning while ensuring robust privacy protection, and aim to offer solutions for the effective and safe application of machine learning in large datasets.