Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Sep 29;15:33500. doi: 10.1038/s41598-025-17098-8

Type-II neural symmetry detection with Lie theory

Alex Gabel 1,2,, Rick Quax 2, Efstratios Gavves 1
PMCID: PMC12480656  PMID: 41022871

Abstract

Understanding symmetries within data is crucial for explainability and enhancing model efficiency in artificial intelligence. This work investigates an approach to neural symmetry detection, specifically leveraging the mathematical framework of Lie theory. Our approach projects data into a low-dimensional latent space, where symmetry transformations can be efficiently applied. By leveraging the matrix exponential, we accurately capture both affine and non-affine transformations, allowing for improved data augmentation and model selection as potential applications. Our method also estimates transformation magnitude distributions, providing deeper insights into the geometric structure of data. Experiments conducted on augmented MNIST demonstrate the effectiveness of our approach in detecting complex symmetries with multiple transformations. This work paves the way for more interpretable and parameter efficient AI models by identifying structural priors that align with the inherent symmetries in data.

Keywords: Deep learning, Symmetry detection, Lie theory

Subject terms: Mathematics and computing, Computational science

Introduction

Symmetries play a fundamental role in both physics and mathematics, providing deep insights into the laws governing natural phenomena and the structure of mathematical objects. In physics, for example, symmetries underlie conservation laws1 and are pivotal in formulating theories such as relativity2 and both classical3 and quantum mechanics4,5. Similarly, in mathematics, symmetries help in understanding geometric structures and solving complex equations6. Despite their significance, the integration of symmetry principles into artificial intelligence (AI) remains limited. While certain neural architectures (e.g., convolutional7 or group-equivariant neural networks8,9) inherently exploit a specific symmetry (translations or a specific group transformation, respectively), a universal approach for detecting and leveraging a broader class of symmetry transformations remains elusive. This is a significant shortcoming, because exploiting symmetries can lead to substantial improvements in model efficiency, interpretability, and performance across various AI applications. In particular, we emphasize its potential use for automatic model selection through structural bias learning in the context of geometric deep learning, what we will refer to as Type-II Deep Learning (cf. Type-II Bayes). In other words, given a neural network Inline graphic, it could have been engineered811 or learnt to self-impose12 a specific symmetry as a bias through weight-sharing scheme S, such that, for set of weights determined by the scheme Inline graphic, its associated neural network Inline graphic:

graphic file with name 41598_2025_17098_Article_Equ1.gif 1

with a suitable prior distribution over possible schemes and where theoretical aspects regarding an appropriate measure over weight sharing choices are left open to future scrutiny. Practically, a Type-II Neural Network can be thought of as an architecture that not only adjusts its weights during training, but has the ability to optimize its weight-sharing scheme for the task at hand.

In this paper, we tackle deep learning-aided symmetry detection, or neural symmetry detection. One can be interested in learning the fundamental geometrical properties of the distribution of data for various applications, ranging from data exploration and topological data analysis13,14 to model selection and structural prior learning15,16. Motivations include understanding the nature of the data in a broader explainable AI pipeline17 (in contrary to processing big data using a black-box) and model efficiency by ensuring a suitable inductive bias, leading to constraints on the hypothesis space while retaining or even improving the expressive power of neural architectures18. Traditionally, advancements in deep learning have relied on computational scaling through the accrual of progressively larger data sets19. Since this becomes impractical, recent approaches20 have opted for reducing the cost of training or the overall size of the model while maintaining competitive performance. In the context of the latter, the irrelevancies of a given task, usually formalized mathematically as symmetry groups21, can be exploited by geometric methods that either lead to weight-sharing schemes within the neural network9,22,23 or overcompensate with additional computational operations for when the input undergoes the expected transformation10,24,25. Modern approaches to symmetry detection focus on learning the most likely symmetry group that relates points in a data set. In previous work26,27, the matrix exponential required to quantify the continuous symmetry transformation was approximated in various ways, which sacrifices accuracy for tractability.

We propose instead to evaluate the matrix exponential exactly while avoiding the computationally expensive operation in pixel space by going to a low-dimensional latent space, using a deep autoencoder-like architecture. This offers more control over the behaviour of latent vectors and features by enforcing continuous transformations such as rotations between them, which has been shown to improve the interpretability of features learned by deep models (left column, Fig. 1). Other work that models data geometrically in latent space has interesting connections to neuroscience and information propagation in the brain (top right of Fig. 1, e.g., “topographic” modelling28). In this work, in order to learn and control the amount of symmetry bias in our model, geometric information is completely stored in the transformation T applied in latent space (e.g., for a Lie group, Inline graphic), allowing for the application to arbitrarily structured data. This bottleneck in the symmetry detection algorithm removes the need for the engineer to pick a G-equivariant neural network for symmetry detection. The transformations are parameterized efficiently, and an extension to non-affine transformations is also possible, leading to more interesting settings such as in the broader class of conformal transformations and more general diffeomorphisms. This has the added benefit of learning suitable representations and has a possible extension to defining the connectivity matrix in the context of structural prior learning (See Section 2.1) and the construction of more general diffeomorphism-equivarianct networks. Splitting the learning of the transformation distribution from the generator is another challenge, and this was possible by introducing a separate network to estimate the transformation magnitudes, creating a seamless end-to-end pipeline. Compared to other works in this field, this model is efficient (low-parameter count for symmetry generator and better than brute force) and interpretable (generators are transferable).

Fig. 1.

Fig. 1

Taxonomy of Lie-Autoencoding Models. Usual approaches exploit the structure of data by using an appropriate neural network for dimensionality reduction. Here, we differentiate between “agnostic” (MLPs) and (geometrically-)biased autoencoders (e.g., CNNs). G is used to denote a (geometric) prior, but not it need not necessarily be a G-equivariant model. This work investigates models with MLP AEs (left column) while most approaches explored in the literature can be situated in the abstract-biased class (bottom right). Here, n denotes the characteristic latent vector size.

The proposed neural symmetry detector model enforces consistency losses to ensure the encoder-decoder pair remains non-trivial, even when the transformation magnitude t is set to zero. At the core of the model, transformations are parameterized using a basis (e.g., affine or quadratic, the latter having the ability to model infinitesimal Special Conformal Transformations), and their magnitudes are estimated by a separate neural network that takes data pairs as input (see Fig. 2). To align transformations in latent and pixel space, we introduce a Taylor expansion-based approximation for matching their estimates, bypassing the computational expense of directly calculating matrix exponentials. This model is evaluated on SyMNIST, a task involving augmented MNIST digit pairs designed to detect symmetry transformations applied during augmentation and estimate the underlying magnitude distribution. We also provide results for SuperSyMNIST, in which the digits in pairs have the same label but are different, including multiple one-parameter transformations and larger canvases. We provide baselines and comparisons with other models, introducing an unsupervised technique (Inline graphic-matching) to link latent space transformations to pixel-space transformations. Experimental results highlight its performance and suggest avenues for further work, including applications to downstream tasks such as Type-II Neural Networks. We also provide all the code for the latent model, which is designed in a modular way to allow for out-of-the-box tweaking and experimenting. Finally, we note that some of these ideas are already reflected in related literature, such as Noether’s Razor29, latent space symmetry discovery for non-linear symmetry transformations using GANs30 or otherwise31, attempts at learning weight-sharing schemes using permutation matrices32, and the automatic topology design system AutoML33.

Fig. 2.

Fig. 2

Latent model architecture. The input pair is passed to the t-network Inline graphic that predicts the magnitude of the transformation (top branch). Simultaneously, the pair is encoded (middle branch) and a consistency loss enforces the latents to be connected by a matrix multiplication with the result of the discretized exponential map. The generator is updated through updating the coefficients Inline graphic related to the chosen basis Inline graphic. Multiple generators can be applied in series as well, introducing a label n for each.

Method

Symmetry generators and structural priors

We start with some background knowledge on symmetry detection using the formalism of Lie theory, following an approach by34 and recently applied to the same problem in31. For the interested reader, for a more thorough introduction to the topic we refer to35 and6, with a focus on representation theory and differential equations, respectively. Last, we briefly discuss the application of detecting symmetries to structural prior learning for model selection.

Lie groups and Lie algebras

The transformations that describe the symmetries are assumed to form Lie groups. This means they are sufficiently smooth (k-times differentiable, where k is usually chosen to be infinity), closed under composition, associative, have a neutral element, and have smooth inverse. These transformations can be defined by the way in which they act on objects, namely Inline graphic, with object Inline graphic. In order to apply differential operators to this object, we can treat it as a (canonical) coordinate vector field. Namely, each point is assigned its spatial coordinates as values, and this vector field is clearly Inline graphic. This also introduces a parameter, Inline graphic, which is related to the magnitude of the transformation, forming what is usually called a one-parameter group. For rotations, this parameter will correspond to the angle, for translations, it will be the distance, etc. Because of continuity in the parameter, we can perform a Taylor expansion of the transformation Inline graphic for small values of t:

graphic file with name 41598_2025_17098_Article_Equ2.gif 2

We apply the First Fundamental Theorem of Lie6,34 in order to make the following claim: Inline graphic defines the transformation and is related to what is known as the generator of the transformation. Intuitively, this correspondence between action and generator is due to the constraints imposed on the transformation function being a Lie group. The generator can thus be written as a differential operator as follows:

graphic file with name 41598_2025_17098_Article_Equ3.gif 3

That is, if one solves the differential equations Inline graphic that characterize the generator, the original transformation function is obtained. More specifically, the solution is the family of functions topologically connected to the identity transformation through continuity in t. The generator is an element of the Lie algebra of the transformation group and is related to the original transformation by what is called the exponential map. This nomenclature emphasizes the connection between the differentiation performed in Equation 2 and exponentiation, easily seen when solving the characteristic equation6, i.e.,

graphic file with name 41598_2025_17098_Article_Equ4.gif 4

as the solution is Inline graphic with Inline graphic, where the integer power of G is defined by applying the differential operator iteratively. Since we are applying it to a pixelated image or arbitrary signal, we must evaluate G in some basis by choosing an interpolation scheme. We discuss this in a later section. The inverse procedure, which extracts the generator from the action as shown in Equation 2, is also referred to as the logarithmic map.

We focus on one-parameter groups for two reasons: Ease of implementation and the fact that one such inductive bias is incredibly powerful already. One need not look further than CNNs to conclude that identifying translation as a symmetry of a dataset immediately leads to equivariant models that are superbly successful in practice. Multiple transformations also require additional considerations that relate to the algebra itself, such as closure under commutators36, an extension we explored as well for 3-parameter groups.

Connectivity and equivariance: a Type-II NN roadmap

   Learning connectivity matrices for deep equivariant models, regardless of whether a symmetry group related to solving the task is given a priori, has been of particular interest to the (geometric) deep learning community9,10,15,16,32. It is worth noting that generators can be related to the connectivity matrix, explicitly so for translations, where the shift matrix determines a power series that tiles the weight matrix accordingly. Formally, this involved picking the right representation of the symmetry group of interest, mapping the continuous differential operator formalism described above to linear maps (matrices). We choose to write the partial derivatives in the compact notation, e.g., Inline graphic.

Practically, for the one-pixel shift matrix S, we can write, for Inline graphic:

graphic file with name 41598_2025_17098_Article_Equ5.gif 5

The above equation defines the weight matrix of one such convolutional layer L, with updatable weights Inline graphic. A collection of multiple power series applied in succession and interlaced with non-linear activation functions is the neural network. Schematically, we have

graphic file with name 41598_2025_17098_Equ6_HTML.gif 6

with a total number of layers Inline graphic, biases Inline graphic, and activation functions Inline graphic. Note that this finite power series is not able to capture the parametrization of fully-connected layers, as all the elements in the basis need to be related by matrix powers. However, as with the Fourier Transform of the Dirac delta function, it might be necessary to include infinitely many terms in order to correctly converge to an element of the basis of matrices as a vector space, i.e. the null matrix with a single entry equal to one.

In previous work, a main issue was overcoming the computational complexity associated with high pixel count, especially in learning the exponent of a matrix exponential. A second potential issue is introducing strong spatial correlations as a constraint a priori, defeating the purpose of learning transformations from scratch. If it is already known that the data is spatially structured, one could introduce continuous coordinates12 or use spatial convolutions10 immediately, without conveniently ignoring the possibility of spatially unstructured data. This issue is partially alleviated with the matrix exponential method, as learning a generator that corresponds to the zeros matrix leads to the identity matrix, a trivial operation.

Neural symmetry detection

The biggest technical issue to overcome in the symmetry detection task is learning two separate properties of the data set with a single pipeline: one collective, another pair-dependent. Namely, the generator and the transformation magnitude, respectively. For each dataset, one model is trained and ideally captures the most salient symmetry transformation that relates the data points to each other. The learned symmetry can then be used in downstream tasks. In previous work, especially before the triumph of deep learning techniques, methods such as gradient descent37 and expectation-maximization26 were used. In this work, we are interested in neural symmetry detection, taking inspiration from works leveraging neural networks as function approximators27,31.

Defining the task: SyMNIST and GalaxSym

A symmetry is a transformation that leaves a certain quantity of interest unchanged. In order to define the symmetry under consideration, we must state what is being kept “the same”. For the experiments that follow, we consider the classification task and the symmetry that keeps the underlying label identical. The overarching problem, therefore, is learning transformations that map instances of the same class (i.e., data points with the same underlying label) to each other. For the experiments, we introduce the SyMNIST and GalaxSym tasks. The dataset consists of image data, paired with an augmented version of the original image. The classification labels are not used for prediction. The goal is two-fold: (i) extract the type of transformation G that was applied, and (ii) estimate the distribution of transformation magnitudes t of the seen transformations in the data, which are the “parameters” of the transformation (e.g., rotation angle, scaling factor, etc.). We also have a SuperSyMNIST and SuperGalaxSym extension, in which the second image is still augmented in the usual way but stems from a disparate root image with the same label (See Fig. 3).

Fig. 3.

Fig. 3

Visual comparison of different models and tasks. The first row shows results from a linear model AE (i.e., single layer MLP), the second row shows a deep AE applied to SyMNIST (augmented sample pairs). The third and fourth rows show SuperSyMNIST, where different digits with the same label are given to the model. The final row illustrates a 3-parameter group of transformations with their corresponding targets and model outputs. The model has learned to capture the overall shape and transformations well.

Data availability:    The datasets analyzed were the widely available MNIST, which can be accessed through many libraries such as torchvision.datasets.MNIST (MNIST can also be found at: https://www.kaggle.com/datasets/hojjatk/mnist-dataset.), and the Galaxy-10 DECaLS dataset (Available at: https://zenodo.org/records/10845026/files/Galaxy10_DECals.h5.). The latter’s images originate from the DESI Legacy Imaging Surveys and were labeled by Galaxy Zoo. The original 128-by-128 RGB images were averaged over the color channels to obtain grayscale version. The torchvision38,39 library was used to apply affine transformations to the MNIST images. For non-affine transformations, i.e., the SCT, the torch.nn.functional.grid_sample method was used.

Parametrizing the generator

   We can parametrize the generator with any given basis for the functional form of its components, to allow for modelling a broad range of symmetry transformations40. In other words, regression is performed on the coefficients of a basis, which can be chosen freely. We wish to have the ability to detect the “typical” symmetries considered in the symmetry detection literature, such as rotation, scaling, and translation, which will be referred to as the canonical symmetries, a subset of the affine transformations. This choice, including coefficient sparsity, also places a prior on p(S) (cf. Eqn. 1), the distribution or hypothesis space of weight sharing schemes S. Therefore, we pick a quadratic basis, which includes affine transformations, such that the functions Inline graphic from (3) have the following form:

graphic file with name 41598_2025_17098_Article_Equ7.gif 7
graphic file with name 41598_2025_17098_Article_Equ8.gif 8

with learnable coefficients Inline graphic. For clarity, in the above we replaced the integers in the subscripts by c (for constant terms, related to translations), x, and y accordingly. The above quadratic basis can capture the canonical symmetries but others (such as shears, compositions, special conformal transformations) as well. Note that one can pick an arbitrarily complicated basis for the expressions given above. This is the major appeal of this approach, and we hope to explore different bases in future work.

Example 1: Canonical transformations    In order to encode the three canonical symmetries, one should use coefficients Inline graphic that yield Inline graphic, Inline graphic, and Inline graphic, which are the generators of translation (in the x-direction), counterclockwise rotation about the origin, and isotropic scaling w.r.t. the origin, respectively. Additionally, one can write down generators for shearing, anisotropic scaling, or combinations of translation and any another of the above transformations.

Example 2: Special Conformal transformations    A non-affine transformation that plays an important role in the field of theoretical physics and mathematics is the angle-preserving special conformal transformation (SCT), which has a natural connection to M bius transformations and conformal field theories41. The SCT can be thought of as a combination of an inversion w.r.t. the unit circle, followed by a translation by vector Inline graphic, and finally another inversion (we refer the reader to the supplementary material for an example of its effect on a pixelated smiley face). The vector Inline graphic plays the role of the parameter here, and taking infinitesimally small values leads to the following expression for the generators of the SCT, Inline graphic and Inline graphic, with the vector pointing in the x and y-directions, respectively.

Interpolation scheme

To apply the operators to a grid, one must write the partial derivatives as matrices. We use the Shannon-Whittaker interpolation, as is done in37 and27. This automatically assumes the function to be interpolated is periodic, although other interpolation schemes could have been chosen. We note that this scheme introduces some aliasing for transformations of low-resolution images, and forms one of the notable limitations of the current model. One could investigate other choices, such as bicubic interpolation, although similar results were obtained. Nevertheless, we pick this interpolation scheme for its ability to perform the transformations of interest using matrix-vector multiplication. Let I be some real-valued signal. For a discrete set of n points on the real line and Inline graphic for all samples i from 1 to n, the Shannon-Whittaker interpolation reconstructs the signal for all Inline graphic as

graphic file with name 41598_2025_17098_Article_Equ9.gif 9

To obtain numerical expressions (matrices) for Inline graphic, Q can be differentiated with respect to its input. This then describes continuous changes in the one dimensional spatial coordinate at all n points, i.e., Inline graphic. The above can be extended to two dimensions by performing the Kronecker product of the result obtained for one dimension with the identity matrix, Inline graphic and Inline graphic, mirroring the flattening operation applied to the input images. The parametrized generator for the 2D affine case, for example, looks like:

graphic file with name 41598_2025_17098_Article_Equ10.gif 10

where the Inline graphic are the matrices that represent the operators Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic, respectively. This can easily be extended to arbitrarily dimensional data by adding more factors to the above matrices, as was done above for the quadratic basis. One can see that performing this operation in pixel space scales poorly with signal length (or image width) n.

Latent model

Learning symmetries using a latent space bottleneck has multiple motivations and has been explored in other research recently30. First, learning symmetries from scratch requires learning the relationship between pixels, if one exists, in the data, and this does not scale well with input size as the space of possible connectivities is factorial in nature (cf. the permutation group). Second, and related to the previous point, is the poor scaling of the size of the generator that needs to be matrix exponentiated in order to preserve the continuous nature of the symmetry transformations. Finally, the encoder should ideally learn to remove unimportant information from the data, such as background information in an image classification setting. We opt for an adapted autoencoder architecture, whose latent space we use to learn the transformations.

In our model, the autoencoder design allows for the model to keep relevant information in the latent space and transform this according to a transformation, a result of the exponential map, it shares with all other input pairs (Fig. 2). The model takes an image as input and reconstructs the transformed image as output. In order to get an estimate for the parameter (e.g. rotation angle) a separate network is trained together with the autoencoder. The parameter estimate is then passed to the matrix exponential function that is then used to matrix multiply the latent patch(es). Finally, the latent patch(es) are decoded, and a reconstruction loss is enforced on the output-transformed image pair. The learnable parts of the model can be grouped as follows:

  1. The encoder Inline graphic

  2. The decoder Inline graphic

  3. The t-network Inline graphic

  4. The parametrized generator Inline graphic

The loss function consists of a part that ensures both the n-dimensional pixel-space and d-dimensional latent-space vectorized data pairs transform to each other. For images Inline graphic and Inline graphic, one can write these as:

graphic file with name 41598_2025_17098_Article_Equ11.gif 11
graphic file with name 41598_2025_17098_Article_Equ12.gif 12

and can also be written as Inline graphic and Inline graphic. The model allows for various numbers of latent patches (cf. channels) to be transformed in parallel in the latent space.

Inline graphic-matching:    We also include a loss term that enforces the generator in the latent space to be close to the one in pixel space is introduced. Since calculating the matrix exponential in pixel space does not scale well w.r.t. data size, a different approach needs to be used. Hence, this Inline graphic-matching term uses the learned Inline graphic from the latent space (Eq. 7) and places them in a generator for the pixel-space, using a Taylor expansion, to compare its effect on the vectorized input, formally

graphic file with name 41598_2025_17098_Article_Equ13.gif 13

where the prime denotes the basis Inline graphic evaluated in pixel-space, i.e. Inline graphic. If one wishes to enforce sparsity on the coefficients of the terms in the generator, a sparsity loss (LASSO) can be added in order to enforce correct behavior in the symbolic regression. This assumption helps with understandability and interpretability of coefficients, but is also a prior on S.

Furthermore, we include the standard reconstruction loss term for each of the inputs in the data pair individually, Inline graphic, to train the autoencoder. This loss term simply encourages the encoder to learn to reconstruct individual image inputs well, as usually done with autoencoders. Thus, this term ignores the exponential on the latent space. The total loss, therefore, is:

graphic file with name 41598_2025_17098_Article_Equ14.gif 14

In order to avoid the ambiguity in the exponent of the matrix exponential (namely, Inline graphic), the generator is normalized by enforcing the coefficient vector to have unit norm during training. I.e., Inline graphic, where Inline graphic is the vector made up of the coefficients Inline graphic. A consequence of this choice is that the t-network is expected to produce zero-valued outputs for image pairs that are not path connected.

Multiple transformations:    In order to allow for more complicated transformations that do not form one-parameter groups, a straightforward extension to the proposed model can be accomplished by adding more factors of matrix exponentials. This allows for transformations connected to the identity in a continuous way, charting out the space of possible transformations defined by the parameters associated with the various generators. If one wishes to enforce the Lie algebra structure, one can include, by design, multiplication by a closure factor: Inline graphic. This implementation is present in the codebase. Since no conclusive results were obtained from experiments with this setup, and for the sake of clarity of the presentation of the results, we leave this for future work. We note that reconstructions were still very good, so further research in this direction should prove fruitful. Another option would be to enforce this algebraic constraint in the loss.

Computational complexity

The computational complexity of the latent Lie symmetry detector model depends on the choice of loss formulation. With the first order Taylor term enforcing Inline graphic-matching, we have a matrix exponential of Inline graphic compared to Inline graphic for brute force, where Inline graphic is the size of the latent patch and Inline graphic is the total number of pixels or features in each data point. In the latent model, there are now 3 additional NNs which scale predominantly with input size for this setting. Depending on the tapering of the MLP, the complexity is always better than Inline graphic. Activating the second order Taylor term in the loss introduces an Inline graphic scaling for the matrix-matrix multiplication. Either this term can be omitted or more efficient procedures for calculating this term can be implemented in future work. Nevertheless, the computational burden is still lower than that of calculating a matrix exponential. (We refer the interested reader to the comparisons in supplemental material.)

Results

Implementation and the SyMNIST/GalaxSym tasks

Here, the implementation of the latent model is quickly sketched. For further implementation details, we refer the reader to the Appendix (A). We note that code will be made available with extensive comments and documentation. Transformations are applied using the affine function from the torchvision library38 and torch’s grid_sample for more non-affine transformations. The images can also be placed on a larger canvas, which is particularly useful when translations are part of the applied augmentations. The magnitude of the transformation is the parameter sampled from the chosen distribution. This procedure allows for flexibility in testing a neural symmetry detector, as the distribution can be arbitrary and, in theory, so can the transformations. In these experiments, we will focus on detecting combinations of various affine transformations and a non-affine transformation, the SCT. We focus on applying our model to the SyMNIST task, but we note the latent model works equally well with Galaxy-10 DECaLS data. Samples for both SyMNIST and SuperSyMNIST are shown in Fig. 3.

The proposed model offers several advantages over previous approaches. Unlike earlier works that rely on supervised learning for estimating symmetries27, or are restricted to small image patches26, our method enables unsupervised detection of both affine and non-affine transformations directly from images. Additionally, the proposed framework can handle arbitrary and multi-modal distributions of transformation magnitudes and multiple one-parameter group transformations, which significantly broadens its applicability compared to methods that assume uniform or low-modal (typically 2 or 3) transformation magnitude distributions42,43. Furthermore, the model directly evaluates the learned symmetries rather than relying on downstream tasks for validation15,32. By learning both the underlying generators and the distributions of transformation magnitudes, our approach provides a comprehensive solution for symmetry detection that is robust, transferable across datasets, and capable of handling complex transformations. Finally, we experimented with abstract (2-dimensional) latents and did not observe as high a quality of reconstructions as for the structured (patchified) latents, most likely due to the low dimension of the latent space. This issue could potentially be fixed in future work by separating the shape (2-dimensional) from the residuals (finer details). (A figure showcasing the 2-dimensional latents is provided in the supplemental materials.)

Transformation magnitude distribution

Histograms of learned transformation magnitudes reveal correct modes for same-sample pairs (Fig. 4), but performance degrades for dissimilar pairs, i.e., SuperSyMNIST. This is most likely due to the fact that MNIST digits are all differently aligned, are not perfectly centered, and that the MSE loss does not capture semantic similarity in pixel space very well, especially for small transformation magnitudes. It seems like the distributions capture aliasing artifacts in the small angle and translation setting (Fig. 4b). Note the broadening of the range (in radians) of the learned distribution when sampling angles in the rotation setting, which breaks down for larger values (Fig. 4a). Peaks are also visible at the origin, even when no pair relates to such an identity transformation (as in Fig. 4d and e). We qualitatively see good shape reconstruction for uniform distributions for small angles and scaling transformations (Fig. 4a and c). Clearly, such a qualitative assessment is not rigorous enough for a thorough analysis.

Fig. 4.

Fig. 4

Learnt transformation magnitude histogram for various transformations. Magnitudes for more complicated distributions, compositions, and non-canonical transformations were also learnt. Results for (d) and (e) were produced by categorical, the others, by uniform sampling of t.

The Wasserstein distance for the normalized distributions provides a quantitative metric for learned distributions. By looking at test-time samples, we observe interesting behaviour when traversing in t-space (Fig. 5). Certain transitions between modes are seemingly continuous is pixel-space, but anomalous can be found in certain models. For a 5-modal distribution, the model seems to learn intrinsic symmetries in the digits as a way to rotate them (top). We also show a drawback of the model, namely that it is quite tough to learn exact distributions, in particular for uniform distributions in the SuperSyMNIST case (bottom). In the latter, this is probably due to the MSE loss not being the most suitable objective for this task. On average, however, the model seems to recover the correct transformation and the reconstructions at test time look accurate, especially considering the overall shape.

Fig. 5.

Fig. 5

Top row: 5-modal SyMNIST with rotation. (Left) Image-target pair and output of the model used for the auxiliary task, including the sampling versus the recovered distribution of normalized transformation parameters Inline graphic (above). A traversal through t-space and reconstructions for unseen transformations, corresponding to red dotted lines in the t-distribution plot (below). (Right) Fine-grained variation of t emphasizes topological anomalies, with transitions being less problematic for digits such as “1”, “2”, and “0” but more apparent for symmetric digits like “7”. Plots are for equally spaced Inline graphic. Bottom row: (Left) Visual comparison of SuperSyMNIST for uniform SCT distributions (non-affine model, Inline graphic), highlighting versatility but sensitivity to sharp distribution edges. (Right) Continuous variation of t for Inline graphic.

Symmetry detection performance

Visually, the transformations at test time look good and similar to the ground truth. In most failure cases, the location and overall shape is still correct, but the more detailed elements are either missing, blurry, or in an incorrect location. By plotting the vector fields associated with the learnt generator, we can visualize the flow of the transformation (Fig. 7, top). In order to compare to the ground truth generator, we split the coefficients up in drift, diffusion, and non-affine terms. This makes it easier to identify a correct transformation, such as a rotation, that has drifted due to a spurious non-zero translation coefficient. Additionally, we can look at the latent patches the model learnt and evaluate the transformations on a sample patch (Fig. 7, bottom). In cases where a rotation was not recovered, we do see evidence of different solutions to the symmetry detection task: for certain flow fields, and low transformation magnitudes, one can mimic the correct transformation, e.g., by pushing and pulling the sides of the image using a shear rather than a rotation (Fig. 7, bottom left). From looking at the latents, it is also not obvious whether the model has learnt any geometrically structured signals. Qualitatively, in GalaxSym, we note the pixel level transformations look exceptional both within and just outside the t-distribution (Fig. 6). Generators are the closest to the ground truth for translation and scaling, but rarely correct for rotation (checked by flagging imaginary eigenvalues, signalling a center of rotation, in the diffusion matrix). Perhaps the alpha matching term requires more tuning. The LASSO loss term only seemed to have a marginal effect regarding performance, hampering reconstruction quality for large values of Inline graphic, perhaps a poor combination with the choice of coefficient normalization.

Fig. 6.

Fig. 6

Visualization of learnt transformations on three GalaxSym and one SuperGalaxSym tasks: rotation, scale, shift, and super-rotation (similar to no augmentation) respectively. The model clearly learns to reproduce the transformation well in pixel space. In the SuperGalaxSym case, it seems to learn scaling as well as rotation, reflecting the distribution of sizes of galaxies in a given class.

Alpha-matching

First- and second-order Taylor expansions were compared to assess Inline graphic-matching loss effectiveness. While higher-order terms improve pixel-space approximations, the gains are limited for large parameter values. This connects to Noether Prior and Type-II networks, underscoring the importance of correct inductive biases.

With Inline graphic-matching (cf. Eqn. 13), a first order Taylor expansion was used to estimate the pixel space transformation. This has the obvious drawback that the model is expected to perform better for small values of the sampled transformation parameter. If instead one adds a second-order term, extending the support of the matrix exponential around the identity, slight improvements in the output images are obtained. There was no improvement in the range of parameter values, however. What we do observe, is a shift in the correct generator for higher values of Inline graphic (Fig. 7). Not only does the correct generator appear for optimal Inline graphic, but the orbit of reconstructed digits at test time is more robust.

Fig. 7.

Fig. 7

(Top) Comparison of reconstructed images (first and fourth rows, for t-values regularly spaced between Inline graphic to Inline graphic) for a model trained on rotations with Inline graphic and generator flow fields visualizations (second and third rows) for different values of Inline graphic values using order 1 (first two rows) and order 2 (last two rows) Taylor approximations. (Bottom) When omitted, or when Inline graphic is too small (here: Inline graphic), alpha matching cannot tie the transformation in pixel space to those in latent space, possibly resulting in a mismatch, such as a shift when a rotation was desired. (Actual latents and predicted t-values, including the action of the exponential map on line segments and Gaussian blobs is shown for an order 2 model trained on rotated GalaxSym pairs.)

Complex transformations

Results on non-affine transformations, such as SCT (Fig. 5, bottom), demonstrate the model’s capability to handle complex symmetries. For composed transformations (e.g., rotation and scaling), the model detects multimodal distributions, capturing combined effects accurately: For compositions, we sampled from a categorical distribution for both rotation angle (Inline graphic or 45 degrees) and scaling factor (0.5 or 1.5), resulting in a 4-modal distribution (Fig. 4e). Finally, we mention that the model is also capable of handling multiple-parameter groups in latent space, as illustrated in Fig. 2. It is possible to either include a closure factor by introducing an induced one-parameter group, i.e., Inline graphic, or omit it entirely. For these multi-parameter cases, only the first order Taylor term is implemented. The results for auxiliary reconstruction are very good, especially for SyMNIST (SuperSyMNIST reconstructions can be seen in Figure 3, bottom row), while splitting up the learnt generator into the expected one-parameter groups is not always guaranteed since other superpositions of correct generators are theoretically possible.

Discussion

While correct generators were hard to obtain, the pixel space transformations match the augmentations very well, including the number of modes in the distribution of outputs from the t-network. From qualitative analysis of the results for translated samples, we conclude we recover the symmetry generator the best here, perhaps due to an observed slight bias towards keeping the drift terms non-zero. Rotations, scaling, shear, and SCTs are less conclusive, but recoverable for the right settings of the regularizers, in particular for high enough values and order of the alpha-matching term. In particular, small to medium transformation magnitudes, but not so small as to be below the aliasing threshold, one-parameter group augmentation transformed SyMNIST, with correctly tuned Inline graphic yields the correct symmetry generator in most cases. Reconstructions and transformation magnitude distributions are quantitatively very good, in particular the latter. Despite the MSE reconstruction loss having drawbacks, such as blurriness and misalignment with the theoretical objective, we observe relatively good results even in the SuperSyMNIST setting, suggesting good to excellent performance on the overall shape of an object, a promising result assuming one can disentangle texture and high frequencies from overall pose and, assumedly, low spatial frequencies. This is most likely rooted in the role of the reconstruction as an auxiliary task, the MSE loss having a smoothing effect. Additionally, we note that the model seems to learn “Platonic” digits, namely recognizable digits with different font or style than the expected one, that are transformed appropriately (this was not so clear in the GalaxSym dataset). Despite learning correct transformations in pixel space, the latent transformations are still hard to get exactly right for a single set of hyperparameters for multiple transformations, the latent transformations seem to still be underconstrained. From the Inline graphic-matching experiments, we deduce that this parameter is the most crucial to tune correctly, and we provide some quantitative results showcasing behavior beyond its optimal value. We expect that this term helps the coefficients flow towards a basin where the correct pixel-space transformations are reachable, although placing too much weight on this term might be counter-productive for large values of the parameters.

Finally, we note the choice of interpolation scheme and aliasing issues in latent space. Experimenting with different choices for interpolation, we do not observe major differences in the results shown above. Nevertheless, the codebase has options to change this in order to allow for further experimentation in this direction. We note that the latent patches do not seem to have learnt any geometric structure in latent space detectable by eye, despite the traversals in magnitude space being relatively stable and, in all cases, interpretable when decoded to pixel space.

Supplementary Information

Below is the link to the electronic supplementary material.

Appendix: Experimental details

For SyMNIST, the encoder and decoders were fully-connected MLPs with 512, 256, 128, 64 neurons in each hidden layer with LeakyReLU (0.2) activation functions. The output layer of the encoder and therefore the input layer of the decoder had size 81, for most of the experiments, or 25 for a couple of others (e.g., Fig. 4). The Adam optimizer44 was used with a learning rate of 0.001, batch size of 1024. All the results are shown for 1 channel in latent space, but more channels are possible and can improve the reconstruction quality. Wider and deeper MLPs, and various channels in latent space were all checked (1, 4, 16, and 64). 16 seemed to work best for MSE loss on the reconstructions, but there was no dramatic increase in predictive performance regarding the generators. LayerNorm was also used. All these options, including the ones discussed in the main text, can easily be toggled in the code accompanying this paper (https://github.com/alexgabel/LiePaper). For consistency, the same hyperparameters were used, namely Inline graphic, unless specified otherwise. For the GalaxSym experiments, we opted for identical architecture as well as a single channel, since this produced good reconstructions, and 81 latent pixels. The alpha-matching regularizer was 0.001 for all the results except for the shifted pairs, where a value of 1.0 yielded cleaner samples. The transformation magnitudes were uniformly sampled, with a maximum of 15 pixels, 90 degrees, and 1.5 scaling factor for translation, rotation, and scaling respectively. For the final rows (SuperGalaxSym) the alpha-matching regularizer was set to 1000.0.

Author contributions

A.G. did the main research (coding, experiments, mathematical analysis) and wrote the article. As supervisors, E.G. and R.Q. helped write the main text by providing revisions of certain sections, comments and suggestions. E.G. and R.Q. helped suggest applications and variations of the experimental setup.

Data availability

The dataset analysed is the widely available MNIST, which can be accessed through many libraries such as torchvision.datasets.MNIST. It can also be found at https://www.kaggle.com/datasets/hojjatk/mnist-dataset1. It contains 60,000 training images and 10,000 test images of handwritten digits, each in greyscale with a resolution of 28x28 pixels. The torchvision library was used to apply affine transformations to the MNIST images. For non-affine transformations, such as the SCT, the torch.nn.functional.grid_sample method was used.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-17098-8.

References

  • 1.Noether, E. Invariante Variationsprobleme. Nachrichten von der Gesellschaft der Wissenschaften zu G ttingen, Mathematisch-Physikalische Klasse. (1918);1918:235-57. Available from: http://eudml.org/doc/59024.
  • 2.Misner, C. W., Thorne, K. S., & Wheeler, J. A. Gravitation. Misner, C W , Thorne, K S , & Wheeler, J A , editor; (1973).
  • 3.Arnold, V. I. Mathematical methods of classical mechanics. vol. 60. Springer; (1989).
  • 4.Nakahara, M. Geometry, topology and physics; 2003. Bristol, UK: Hilger (1990) 505 p. (Graduate student series in physics). Available from: http://www.slac.stanford.edu/spires/find/hep/www?key=7208855.
  • 5.Frankel, T. The Geometry of Physics: An Introduction. 3rd ed. Cambridge University Press; (2011).
  • 6.Olver, P. J. Applications of Lie Groups to Differential Equations. Graduate Texts in Mathematics. Springer New York; (1993). Available from: https://books.google.nl/books?id=sI2bAxgLMXYC.
  • 7.Fukushima, K. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position. Biological Cybernetics.36, 193–202 (1980). [DOI] [PubMed] [Google Scholar]
  • 8.Cohen, T., & Welling, M. Group equivariant convolutional networks. In: International conference on machine learning. PMLR; (2016). p. 2990-9.
  • 9.Finzi, M., Welling, M., & Wilson, A. G. A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups; (2021).
  • 10.Kondor, R., & Trivedi, S. On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups; (2018).
  • 11.Bekkers, E. J. B-Spline CNNs on Lie Groups; (2021). Available from: arxiv:org/abs/1909.12057.
  • 12.Moskalev, A., Sepliarskaia, A., Sosnovik, I., & Smeulders, A. LieGG: Studying Learned Lie Group Generators. In: Advances in Neural Information Processing Systems; (2022).
  • 13.Carlsson, G. E. Topology and data. Bulletin of the American Mathematical Society. (2009);46:255-308. Available from: https://api.semanticscholar.org/CorpusID:1472609.
  • 14.Wasserman, L. Topological Data Analysis; (2016).
  • 15.Zhou, A., Knowles, T., & Finn, C. Meta-learning symmetries by reparameterization. arXiv preprint arXiv:2007.02933. (2020).
  • 16.van der Ouderaa, T. F., Immer, A., & van der Wilk, M. Learning Layer-wise Equivariances Automatically using Gradients. In: Thirty-seventh Conference on Neural Information Processing Systems; (2023) .
  • 17.Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., & Yu, B. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences. (2019) Oct;116(44):22071 22080. Available from: 10.1073/pnas.1900654116. [DOI] [PMC free article] [PubMed]
  • 18.Cohen, T., & Welling, M. Group Equivariant Convolutional Networks. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; (2016). p. 2990-9. Available from: https://proceedings.mlr.press/v48/cohenc16.html.
  • 19.L Heureux, A., Grolinger, K., Elyamany, H. F., & Capretz, M. A. M. Machine Learning With Big Data: Challenges and Approaches. IEEE Access. (2017);5:7776-97.
  • 20.Zhou, C. et al. LIMA: Less Is More for Alignment; (2023).
  • 21.Knigge, D. M., Romero, D. W., & Bekkers, E. J. Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups; (2022).
  • 22.van der Ouderaa, T. F. A., & van der Wilk, M. Learning Invariant Weights in Neural Networks; (2022).
  • 23.van der Ouderaa, T. F. A. Immer A (van der Wilk M, Learning Layer-wise Equivariances Automatically using Gradients, 2023). [Google Scholar]
  • 24.Bekkers, E. J. et al. Roto-Translation Covariant Convolutional Networks for Medical Image Analysis. (2018). [DOI] [PubMed]
  • 25.Romero, D. W., & Lohit, S. Learning Partial Equivariances From Data. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, editors. Advances in Neural Information Processing Systems. vol. 35. Curran Associates, Inc.; (2022). p. 36466-78. Available from: https://proceedings.neurips.cc/paper_files/paper/2022/file/ec51d1fe4bbb754577da5e18eb54e6d1-Paper-Conference.pdf.
  • 26.Sohl-Dickstein, J., Wang, C. M., & Olshausen, B. A. An unsupervised algorithm for learning lie group transformations. arXiv preprint arXiv:1001.1027. 2010.
  • 27.Dehmamy, N., Walters, R., Liu, Y., Wang, D. & Yu, R. Automatic symmetry discovery with lie algebra convolutional network. Advances in Neural Information Processing Systems.34, 2503–15 (2021). [Google Scholar]
  • 28.Keller, T.A., & Welling, M. Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, editors. Proceedings of the 40th International Conference on Machine Learning. vol. 202 of Proceedings of Machine Learning Research. PMLR; 2023. p. 16168-89. Available from: https://proceedings.mlr.press/v202/keller23a.html.
  • 29.van der Ouderaa, T. F. A., van der Wilk, M., & de Haan, P. Noether’s razor: Learning Conserved Quantities; 2024. Available from: arxiv:org/abs/2410.08087.
  • 30.Yang, J., Dehmamy, N., Walters, R., & Yu, R. Latent Space Symmetry Discovery; (2024). Available from: arxiv:org/abs/2310.00105.
  • 31.Gabel, A. et al. Learning Lie Group Symmetry Transformations with Neural Networks. (2023).
  • 32.van der Linden, P. A., Garc a-Castellanos, A., Vadgama, S., Kuipers, T. P., & Bekkers, E.J. Learning Symmetries via Weight-Sharing with Doubly Stochastic Tensors; 2024. Available from: arxiv:org/abs/2412.04594.
  • 33.Elsken, T., Metzen, J. H., & Hutter, F. Neural Architecture Search: A Survey. Journal of Machine Learning Research. (2019);20(55):1-21. Available from: http://jmlr.org/papers/v20/18-598.html.
  • 34.Oliveri, F. Lie Symmetries of Differential Equations: Classical Results and Recent Contributions. Symmetry.06, 2 (2010). [Google Scholar]
  • 35.Fulton, W., & Harris, J. Representation Theory: A First Course. Graduate Texts in Mathematics. Springer New York; (1991). Available from: https://books.google.nl/books?id=6GUH8ARxhp8C.
  • 36.Roman, A., Forestano, R. T., Matchev, K. T., Matcheva, K., & Unlu, E. B. Oracle-Preserving Latent Flows; (2023).
  • 37.Rao, R., & Ruderman, D. Learning Lie groups for invariant visual perception. Advances in neural information processing systems. (1998);11.
  • 38.Marcel, S., & Rodriguez, Y. Torchvision the machine-vision package of torch. In: Proceedings of the 18th ACM International Conference on Multimedia. MM ’10. New York, NY, USA: Association for Computing Machinery; (2010). p. 1485 1488. Available from: 10.1145/1873951.1874254.
  • 39.Maintainers, T. contributors. TorchVision: PyTorch’s Computer Vision library. GitHub. https://github.com/pytorch/vision (2016).
  • 40.Gabel, A., Quax, R., Data-driven, Gavves E. & Lie point symmetry detection for continuous dynamical systems. Machine Learning: Science and Technology. 5(1):015037. Available from: 10.1088/2632-2153/ad2629 (2024).
  • 41.Di Francesco, P., Mathieu, P.S.n., & Chal, D. Conformal field theory. Graduate texts in contemporary physics. New York, NY: Springer; (1997). Available from: https://cds.cern.ch/record/639405.
  • 42.Benton, G., Finzi, M., Izmailov, P. & Wilson, A. G. Learning invariances in neural networks from training data. Advances in neural information processing systems.33, 17605–16 (2020). [Google Scholar]
  • 43.Singhal, U., Esteves, C., Makadia, A., & Yu, S. X. Learning to Transform for Generalizable Instance-wise Invariance. (2024).
  • 44.Kingma, D. P., & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. (2014).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The dataset analysed is the widely available MNIST, which can be accessed through many libraries such as torchvision.datasets.MNIST. It can also be found at https://www.kaggle.com/datasets/hojjatk/mnist-dataset1. It contains 60,000 training images and 10,000 test images of handwritten digits, each in greyscale with a resolution of 28x28 pixels. The torchvision library was used to apply affine transformations to the MNIST images. For non-affine transformations, such as the SCT, the torch.nn.functional.grid_sample method was used.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES