Skip to main content
To KTH's start page

Manifolds of Learning

Time: Fri 2026-02-06 14.00

Location: F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm

Language: English

Subject area: Applied and Computational Mathematics

Doctoral student: Vahid Shahverdi , Algebra, kombinatorik och topologi

Opponent: Mireille Boutin,

Supervisor: Kathlén Kohn, Algebra, kombinatorik och topologi; Joakim Andén, Matematik (Avd.)

Export to calendar

QC 2025-12-15

Abstract

Neural networks are central to modern machine learning, with applications that range from computer vision to natural language processing. Despite their success, their mathematical foundations are barely understood. At the heart of every such model lies a training procedure that amounts to solving a highly nonconvex optimization problem with many potential solutions, yet optimization algorithms often find parameters that not only fit the data but also generalize well to unseen samples. Why neural networks exhibit this favorable behavior, and how architectural choices influence it, remain fundamental open questions that call for new mathematical tools.

This thesis suggests one promising approach, neuroalgebraic geometry, whose research program is to study neural networks through the lens of algebraic geometry. In this framework, nonlinearities such as activation functions are replaced with algebraic counterparts, for instance polynomials, so that the resulting models become amenable to rigorous algebro-geometric analysis. Since polynomials are universal approximators, by a limit argument, the methods developed in neuroalgebraic geometry can extend beyond the polynomial realm and thus bridge the gap between algebraic models and practical neural networks.

Through neuroalgebraic geometry, we study the function space parameterized by a given neural network architecture, which we refer to as the neuromanifold. Its dimension and volume reflect how rich the model is and how well it can generalize from data. Singular points, places where the neuromanifold is not regular, characterize implicit biases that arise during training. The analysis of the parametrization map relates to the identifiability of neural networks, a property that is essential for the design of equivariant architectures in which symmetries of the data are encoded into the model. Finally, viewing optimization through this geometric lens connects the landscape of the loss function to the underlying structure of the neuromanifold. The algebraic setting makes these analyses more feasible since it makes the ambient space of the neuromanifold finite-dimensional.

The main goal of this thesis is to analyze the functionality of two important architectures, multilayer perceptrons (MLPs) and convolutional neural networks (CNNs), through the lens of algebraic geometry.

In Paper A, we present a position paper that introduces and motivates the emerging research area of neuroalgebraic geometry. We construct a dictionary between algebro-geometric concepts (such as dimension, degree, and singularities) and key machine learning phenomena (including sample complexity, expressivity, and implicit bias). Along the way, the paper provides a concise literature overview and argues for new connections at the intersection of algebraic geometry and machine learning.

In Paper B, we investigate linear convolutional networks with single-channel and one-dimensional filters. By examining their neuromanifold, we provide its dimension and singularities. Furthermore, by considering optimization with squared loss, we show that the critical points of the parameterization corresponding to spurious points are not attracted by gradient-based optimization once all strides are larger than one.

In Paper C, which continues the investigation from Paper B, we introduce a recursive algorithm that generates the polynomial equations defining the Zariski closure of the neuromanifold of linear convolutional networks. We further provide the exact number of (complex) critical points that arise when training these networks with squared loss and generic data.

In Paper D, we examine linear invariant and equivariant networks under permutation groups. We determine the dimension, degree, and singular locus of the neuromanifold for these models. We then analyze the number of (complex) critical points that can arise during training. Furthermore, we show that the neuromanifold of linear equivariant networks comprises many irreducible components that cannot be parameterized by a single fixed architecture, and thus the choice of architecture determines which irreducible component we parameterize.

In Paper E, which is our first exploration of non-linear activation functions, we analyze single-channel, one-dimensional-filter convolutional networks with monomial activation functions. We show that its neuromanifold, once projectified, is birational to a Segre--Veronese variety, a well-known object in classical algebraic geometry. We then describe its algebraic invariants such as dimension and degree and characterize its singular points, including their types. Finally, we provide an exact formula for the number of (complex) critical points arising when training with generic data under squared loss optimization.

In Paper F, we investigate both MLPs and CNNs with generic polynomial activation functions. We prove that no continuous symmetry exists in either model, i.e., the generic fiber of the parameterization is finite. Consequently, the dimension of the neuromanifolds coincides with the number of parameters. Furthermore, we show that in both models, certain subnetworks correspond to singular points of the neuromanifold. Finally, for CNNs the parameters associated with these subnetworks are not critically exposed, whereas for MLPs they are critically exposed, meaning that they appear as critical points of the loss function with nonzero probability over the data distribution.

Although the main thrust of this thesis concerns the geometry and optimization of neural networks, the perspective developed here---framing learning problems as algebraic optimization over structured sets---also motivated a new approach to a classical problem in signal processing. 

In Paper G, we address the problem of multireference alignment (MRA), where multiple noisy observations of a one-dimensional signal are available, each subjected to an unknown circular shift. The objective is to reconstruct the underlying signal up to circular shift. We introduce a novel algorithm that minimizes a distance function defined on the manifold of signals whose second-order moments agree with those estimated from the observations. We analyze the optimization problem both in the finite- and infinite-data regimes. In the latter, we show that the true signal is always a critical point of the loss function, and our empirical results show that it corresponds to a global minimum.

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-374085