Skip to main content
To KTH's start page

On Implicit Smoothness Regularization in Deep Learning

Time: Thu 2024-11-07 15.00

Location: Kollegiesalen, Brinellvägen 8, Stockholm

Video link: https://kth-se.zoom.us/j/62717697317

Language: English

Subject area: Computer Science

Doctoral student: Matteo Gamba , Robotik, perception och lärande, RPL

Opponent: Professor Christopher Zach, Chalmers University of Technology

Supervisor: Mårten Björkman, Robotik, perception och lärande, RPL

Export to calendar

QC 20241017

Abstract

State of the art neural networks provide a rich class of function approximators,fueling the remarkable success of gradient-based deep learning on complex high-dimensional problems, ranging from natural language modeling to imageand video generation and understanding. Modern deep networks enjoy sufficient expressive power to shatter common classification benchmarks, as wellas interpolate noisy regression targets. At the same time, the same models areable to generalize well whilst perfectly fitting noisy training data, even in the absence of external regularization constraining model expressivity. Efforts towards making sense of the observed benign overfitting behaviour uncovered its occurrence in overparameterized linear regression as well as kernel regression,extending classical empirical risk minimization to the study of minimum norm interpolators. Existing theoretical understanding of the phenomenon identi-fies two key factors affecting the generalization ability of interpolating models.First, overparameterization – corresponding to the regime in which a model counts more parameters than the number of constraints imposed by the train-ing sample – effectively reduces model variance in proximity of the training data. Second, the structure of the learner – which determines how patterns in the training data are encoded in the learned representation – controls the ability to separate signal from noise when attaining interpolation. Analyzingthe above factors for deep finite-width networks respectively entails characterizing the mechanisms driving feature learning and norm-based capacity control in practical settings, thus posing a challenging open problem. The present thesis explores the problem of capturing effective complexity of finite-width deep networks trained in practice, through the lens of model function geometry, focusing on factors implicitly restricting model complexity. First,model expressivity is contrasted to effective nonlinearity for models undergoing double descent, highlighting constrained effective complexity afforded by over parameterization. Second, the geometry of interpolation is studied in the presence of noisy targets, observing robust interpolation over volumesof size controlled by model scale. Third, the observed behavior is formally tied to parameter-space curvature, connecting parameter-space geometry tothe input space’s. Finally, the thesis concludes by investigating whether the findings translate to the context of self-supervised learning, relating the geometry of representations to downstream robustness, and highlighting trends in keeping with neural scaling laws. The present work isolates input-space smoothness as a key notion for characterizing effective complexity of model functions expressed by overparameterized deep networks.

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-354917