Abstract
Two-dimensional materials with flat electronic bands are promising for realising exotic quantum phenomena such as unconventional superconductivity and nontrivial topology. However, exploring their vast chemical space is a significant challenge. Here we introduce elf, an unsupervised convolutional autoencoder that encodes electronic band structure images into fingerprint vectors, enabling the autonomous clustering of materials by electronic properties beyond traditional chemical paradigms. Unsupervised visualisation of the fingerprint space then uncovers hidden chemical trends and identifies promising candidates based on similarities to well-studied exemplars. This approach complements high-throughput ab initio methods by rapidly screening candidates and guiding further investigations into the mechanisms underlying flat-band physics. The elf autoencoder is a powerful tool for autonomous discovery of unexplored flat-band materials, enabling unbiased identification of compounds with desirable electronic properties across the 2D chemical space.
Subject terms: Electronic properties and materials, Electronic structure
Machine-learning-based methods have become one of the key technologies for materials discovery. In this work, the authors introduce an unsupervised convolutional autoencoder that clusters materials by electronic band features rather than crystal structures and apply it to search for two-dimensional flat-band materials.
Introduction
High-throughput computational methods based on machine learning are rapidly becoming the paradigm for next-generation materials discovery. These methods encompass the prediction of novel materials1–8 as well as automated synthesis8–10. However, testing the stability and properties of vast array of potential materials at the lab scale remains a significant bottleneck. Consequently, AI-based approaches to classify patterns among identified materials using their characteristic fingerprints (machine-learnable vector representations of material properties) are urgently needed. These techniques are crucial for understanding the emergent properties in computationally generated materials, such as flat band formation11, band topology, superconductivity, photovoltaic potential, catalytic behaviour, and more. Additionally, they enable predicted materials to be linked to compounds with experimentally confirmed properties. Using this approach, large sets of predicted materials can be analysed simultaneously, allowing candidates with the most promising properties to be efficiently flagged for further investigation.
Several previous studies have attempted to use unsupervised machine learning to explore the space of materials’ electronic band structures. For instance, using a selection of materials within a particular symmetry group and algorithmically exploring the feature space of energy eigenvalues along some of the high-symmetry lines in the reciprocal space, materials can be mapped onto t-distributed stochastic neighbour embedding (t-SNE) representation12,13. Alternatively, a parameter space of Hamiltonians can be navigated using an unsupervised path-finding algorithm, allowing to classify them topologically14. However, the supervised selection of feature space introduces bias, and fully unsupervised electronic band structure fingerprints will be beneficial for automating the discovery of materials with desirable electronic properties, such as flat bands.
Flat or nearly-flat bands, which are states with approximately the same energy, have attracted considerable attention for hosting exotic strongly correlated physics15–19. In two dimensions, ‘plane flat bands’ extend in both kx and ky directions, forming an extended planar manifold in reciprocal space. The suppression of kinetic energy in these bands facilitates strong electron-electron interactions, enabling phenomena such as chiral plasmons20, unconventional superconductivity, first observed in twisted bilayer graphene21, Chern insulator states22, and more.
This paper presents an approach to generate materials fingerprints based solely on electronic band structures from density functional theory (DFT) databases. Using a convolutional autoencoder (CAE), elf, we autonomously generate fingerprints to cluster materials by electronic band features without bias from crystal structure. The input to elf consists of band structure images automatically generated during preprocessing. Representing band structures as images has only minimal bias11 while offering several benefits. First, it leverages the strength of CAEs optimised for image processing. Second, it can analyse the vast corpus of research papers where band structures are presented as figures. Third, the visual format is intuitive for human researchers, enhancing AI interpretability. By employing unsupervised clustering algorithms, elf identifies fingerprints of 2D flat-band materials, uncovering novel chemical and electronic feature groups, thus extending the known flat-band paradigm. Two-dimensional embedded clustering plots map the chemistry of 2D flat-band materials, serving as a blueprint for high-throughput analysis of electronic and chemical properties. Scalable to any database containing band structure data, elf enables the autonomous exploration of computationally generated materials. Its robust encoding supports strong predictions of emergent electronic properties of grouped materials, particularly when accompanied by well-studied compounds in the same cluster, all at a very low computational cost. By detecting duplicates, elucidating chemical patterns, and clustering materials by electronic properties, elf provides a versatile alternative to structure-based fingerprints like CrystalNNFingerprint23. Applied here to 2D materials with flat bands identified in Ref. 11 using 2Dmatpedia24, one of the largest open 2D materials databases, this method complements high-throughput ab initio approaches, accelerating discovery across the vast 2D chemical space.
Results
Several attempts were recently reported to detect flat band materials in databases using computational screening and data mining techniques, creating repositories of well-documented 2D and 3D candidates11,25–28. Building on these efforts, we started this work from one such pool of 2127 flat band materials identified by Bhattacharya et al.11 using the 2Dmatpedia database24. For each of the flat-band materials, the band structure image data were encoded by a trained CAE elf, and the latent space representation was flattened to produce a 98-dimensional fingerprint vector. The training and subsequent fingerprint extraction process is shown in Fig. 1, with random Gaussian noise applied only during the training. We employed a ResNet18 architecture for both the encoder and decoder of elf, as its deep structure allows learning nuanced features of band structure images in comparison to shallow networks. Several ResNet models with different input image sizes and latent space dimensions were tested (further details in Supplementary Note 2). Optimum performance and accuracy for the chosen model were obtained with (224 × 224) input image size and (7 × 7 × 2 channels) latent space dimensions (further details in the Methods section ‘Network Training and Fingerprint’). Furthermore, by introducing random noise to the regions of input images during training, the network is forced to learn physically sensible connections between electronic state lines in the ‘noised’ regions, relying only on the shapes of the surrounding bands. This more challenging task helps to prevent overfitting by encouraging the learning of more general band structure features that are robust to small perturbations.
Fig. 1. Elf network and fingerprint analysis pipeline for autoencoded band structure fingerprints.
The convolutional autoencoder is trained to reproduce electronic band structure images within an energy range of ±4 eV relative to the Fermi level. The process involves encoding these images into a compressed latent space representation. This compressed representation serves as the material fingerprint. The diagram outlines the steps of training the network, applying random noise to input images during training to enhance the learning of robust band structure features, and using the encoded representation for clustering materials based on their electronic properties. The reconstructed band structure of 2dm-1 (IrF2) is shown as an example, demonstrating the network’s ability to accurately capture and reproduce band structures even in the presence of noise.
The set of fingerprints generated by the trained elf was then clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)29. The optimal parameters for HDBSCAN were determined using the optimization procedure outlined in the Methods section ’Multi-stage clustering module’. Specifically, the minimum cluster size (Nc) and the minimum sample size (Ns) were set to 5 and 2, respectively. This clustering process identified 50 distinct clusters, while 1662 materials remained unclassified.
To visualise the distribution of materials in the machine-learned fingerprint space, we employed Uniform Manifold Approximation and Projection (UMAP)30. UMAP was applied with a Nearest Neighbour (NN) parameter of 10 and Minimum Distance (MD) parameter of 0.1, to reduce the high-dimensional fingerprint space to a 2D representation. Figure 2 presents the UMAP embedding, displaying the clustered materials in the reduced-dimensional space.
Fig. 2. Visualization of the electronic fingerprint space and emergent chemical trends.
Clustering using HDBSCAN and DBSCAN algorithms were combined with UMAP dimensionality reduction. a Phylogenetic condensed tree of the HDBSCAN clusters showing the relative sizes of the clusters. Clusters with similar defining features are grouped within the same coloured boxes. b–h Examples of band structures mapped onto UMAP plot for the machine-learned fingerprints, with Number of Neighbours (NN) = 10 and Minimum Distance (MD) = 0.1. The clusters determined by HDBSCAN are marked with an opaque colour, while the transparent shades from DBSCAN identify global trends and major types of band structures. The band structures of a few exemplars from a group of clusters are highlighted nearby to visualise their distinctive features: b Post-transition metal compounds and transition metal chalcogenides, with vanishing, indirect, or band overlap materials (potential semi-metallic phases). c 1-2 eV band gap materials with frequent crossings. d Wide band gap (>3 eV) semiconductors. e Insulators and large indirect band gap materials. f Metal oxides and halides. Insulators with strips of plane flat bands at the Fermi level. g Metal and transition metal halides with dispersive valence bands and plane-flat conduction bands. Within these groups, the band gap decreases from bottom (cluster 22) to top (cluster 18). h Flat bands with metallic band structures.
To further identify major groups of materials based on their electronic structure similarities, we applied an additional layer of clustering using the density-based (DBSCAN) algorithm31. This second clustering step was performed on the UMAP projected coordinates of the materials, excluding those that were left unclassified by HDBSCAN. The DBSCAN parameters were set to a maximum nearest neighbour distance (ϵ) on the UMAP projected plane of 0.8 and minimum cluster size (Smin) of 5. This visualisation process created the shaded regions overlaid on the clusters in Fig. 2, highlighting the major classes of band structures present in the dataset.
During the density-based clustering process, HDBSCAN assigns a probability score to each data point. This score indicates the likelihood of the data point (material) belonging to its assigned cluster. This probability effectively measures the range of density cut-off values for which a given data point remains part of its assigned cluster throughout the clustering process. By sorting the materials within each cluster based on their membership probability, it is possible to identify ‘exemplar’ materials that are most representative of the properties characteristic of their respective clusters. These exemplar materials exhibit the highest membership probabilities within their clusters. A full list of clusters with the materials ordered by membership probability can be found in https://huggingface.co/datasets/2Dmatters/Elf_encoded_flat_band_materials/tree/main.
To visualise the separation and evolution of different band structure types across the UMAP embedding, we plotted band structure images corresponding to several exemplar materials from DBSCAN clusters alongside the UMAP chart, Fig. 2. The combination of the identified clusters and the UMAP embedding collectively form an electronic band structure genome for 2D flat-band materials, which serves as a comprehensive map of the diverse electronic properties found in these materials. This genome enables further exploration and a better understanding of the relationships among different flat-band materials based on their band structure characteristics. To further visualize the relationships among different types of band structures, we present a phylogenetic tree (Fig. 2a) that shows their hierarchical organisation. The hierarchy of the clusters is determined internally by the HDBSCAN algorithm, with additional details discussed in the Supplementary Note 3.
The leaves of the tree are opaque-coloured (same as in UMAP) squares, the size of which shows the relative size of the clusters. Adjacent leaves belonging to the same DBSCAN group are further grouped in the tree shown in transparent shades of the same colours. We can see that clusters 18, 30-31-32-33, 26-27 and 14-15-17, which form isolated groups on the right in UMAP, also form a separate branch in the tree. Clusters 45-46-47-48-49, 38-39-40, 36-37, 28-29, 13-16 and 24-25 also form adjacent leaves in the tree highlighting their similar origin.
Analysis of global trends
The final stage of clustering using DBSCAN reveals distinct chemical patterns among the 2D flat-band materials. For example, clusters 26 and 27 predominantly contain halides, while clusters 24, 41, 25, and 8 are rich in transition metal chalcogenides. Other notable chemical trends are highlighted on the UMAP embedding in Fig. 2.
Interestingly, these chemical insights emerge solely from the unsupervised learning of electronic band structure features, without explicit input of the atomic composition. This can be attributed to the fact that materials containing elements from the same group in the periodic table often possess similar valence orbital structures, leading to comparable band arrangements and properties. Our convolutional autoencoder elf effectively captures these chemical patterns by learning the intrinsic similarities in the band structures.
However, the clustering is not solely determined by the chemical composition. The precise features of the band structure, such as band gaps, crossings, and gradients, act as additional classification constraints. Consequently, each cluster contains materials with similar band structure characteristics, despite potential variations in their chemical composition or crystal structure. This is exemplified in clusters 38-39-40 (Fig. 3c, Dataset: https://huggingface.co/datasets/2Dmatters/Elf_encoded_flat_band_materials), where materials within each cluster exhibit characteristic band gaps, albeit deviations in the chemical composition. The convolutional nature of the autoencoder allows for a suitable margin of distortions and shifts in the electronic bands while still preserving the overall similarity. Materials with more pronounced deviations from the characteristic band structure of a cluster (exemplar band structure) are generally assigned lower membership probabilities.
Fig. 3. Homogeneity of band structure features within clusters.
Band structures of the top three exemplar materials of clusters 37 (a), 24 (b), and 20 (c).
The UMAP embedding in Fig. 2 reveals a global partitioning of the flat-band materials based on their electronic band structure features. Notably, metallic, semi-metallic, semiconducting, and insulating states are well separated in the latent space. This clear partitioning validates the effectiveness of the elf-based fingerprinting approach in capturing meaningful electronic features. Furthermore, several clusters (e.g., 47–49 in Fig. 2d and 23-22-18 in Fig. 2g) are characterized by the presence of dense, flat bands near the Fermi energy. These clusters are particularly interesting from a materials discovery perspective, as they may host stronger electron-electron correlations and potentially exhibit exotic phenomena. The identification of these sub-groups demonstrates the power of the unsupervised learning approach in uncovering materials with desirable electronic properties.
Local trends
Within the individual clusters, HDBSCAN ensures a high degree of similarity among the band structure features of the clustered materials. Cluster 24, shown in Fig. 3b, is a prime example of this. The cluster consists primarily of semiconductors, with Hg and Cd-based materials exhibiting additional complexity in their band structures. Depending on their specific structure, these materials can exhibit semimetallic behaviour32, or a strain-tunable band gap, as observed in HgSe and HgTe, which can gradually transition into a topologically insulating phase33). The unique band features near the Fermi level in cluster 24 suggest that these materials could potentially lead to a range of useful (opto)electronic applications. Notably, most of the materials in this cluster were predicted to be stable as 2D monolayers, making them promising candidates for van der Waals heterostructures34–36.
One of the key advantages of our band structure-based fingerprinting approach is its ability to identify promising candidate materials that share electronic properties with well-studied compounds, even if their chemical compositions differ. By clustering materials based on their band structure similarity, we can flag computationally predicted compounds that have yet to be experimentally investigated but are likely to exhibit desirable properties.
For instance, cluster 46 contains 2dm-1072 Bi2O3, a wide-band-gap semiconductor frequently used in heterostructures for its optoelectronic properties37,38. Interestingly, the cluster also includes 2dm-3090 ZnMoO4 and 2dm-3226 Tl2SiSe3, which displays a nearly identical band gap and distribution of flat bands but has not been previously studied. Moreover, the properties of two-dimensional ZnMoO4 and Tl2SiSe3 have yet to be investigated in the literature. Based on its electronic similarity to Bi2O3, they are flagged as promising candidates for similar optoelectronic applications, warranting further investigation.
Another example of this predictive power is demonstrated by cluster 20. This cluster is anchored by the well-known semiconductor InAs (2dm-2474), which is considered one of the prime candidates for next-generation (opto)electronic devices due to its high mobility, large surface area, and direct band gap39. The cluster also contains several less-studied materials, such as AlBi (2dm-2252), a Rashba semiconductor40, and a group of thallium-based pnictides (2dm-2650 TlBi, 2dm-2847 TlSb and 2dm-2672 TlP, all of which display strikingly similar band structures to InAs (Fig. 3c). Notably, these materials share a square-octagonal lattice (Fig. 4), a structural motif known to host topological states41,42 and flat-band mediated correlated electron phenomena43. We predict that materials comprising cluster 20 will exhibit similarly promising electronic properties to InAs and are worthy of further investigation.
Fig. 4. Schematics of square-octagonal lattice.
a Symmetric square-octagon lattice structure. b Skewed square-octagon lattice structure63.
Interestingly, 2dm-2650 TlBi and 2dm-2847 TlSb in this cluster exhibit a unique asymmetrically skewed square-octagonal structure (Fig. 4b), a lattice that has not been previously reported to host flat bands. Despite this structural distortion, these materials remain close to the other symmetric square-octagon lattice materials in our fingerprint space. This underscores the ability of our approach to cluster materials with similar electronic properties even in the presence of structural variation.
This structural flexibility extends to the clustering of materials with entirely different crystal structures based on their common band features. For example, clusters 46-49 contain a variety of wide band gap (≈3 eV) flat-band semiconductors with different crystal structures. Notably, cluster 46 includes 2dm-3090 ZnMoO4, which exhibits a unique edge-sharing zigzag octahedra chain sublattice, a material whose flat-band formation mechanisms are yet to be understood.
While some of the observed flat-band clusters can be readily explained by well-known flat-band physics, such as the localisation of electron wavefunctions in the f-orbitals of lanthanides and actinides27, many others emerge beyond the known flat-band paradigm. For instance, there are bilayer flat-band structures commonly consisting of two stacked monolayers that individually exhibit flat bands, with well-studied examples including stacked square and Kagome arrangements. However, stacked centred-orthorhombic-square chains have only been reported very recently11.
By lowering the minimum cluster size to Nc = 4, we identify a cluster of four chemically similar materials: NdF3 (2dm-321), TbF3 (2dm-441), YF3 (2dm-553), and SmF3 (2dm-875). All of these compounds exhibit a bilayer sublattice structure composed of stacked centred-orthorhombic-square chains and possess flat electronic bands. The mechanisms leading to the emergence of flat bands in this type of lattice have yet to be uncovered.
Cluster 37, shown in Fig. 3a, contains a mixture of group 8-9 transition metals with group 14 elements (Si/Pb/Sn) with the AB4 and AB3 stoichiometries. The two groups of materials with these stoichiometries exhibit distinct tetragonal structures with different point groups (4/mm and 422, respectively). Further inspection of the orbital-projected band structures for these materials reveals that the nearly plane-flat bands arise from mixtures between the A and B elements, whose element groups are shared for both stoichiometries. This suggests that the plane-flat bands in this cluster likely result from a non-trivial interplay between the shared chemistry of these materials and the tetragonal D4 abstract symmetry group (shared by both 4/mm and 422 point groups) under which both lattices fall. More investigation is required to understand this interplay in detail, adding to the known non-trivial mechanisms that result in flat electronic bands beyond the traditional picture of orbital overlap within a single element sublattice44.
Clusters 26-27 are populated by alkaline metal halides which exhibit dense sets of particularly flat bands just below the Fermi level. These materials represent large band gap insulators.
Comparison to structure fingerprints
While our auto-encoded fingerprint is highly effective at clustering materials based on their electronic properties, it is not completely robust to structural distortions. Small shifts in a lattice can alter the high-symmetry points in reciprocal space, which in turn affect the band structure image used as input to the network. However, the likelihood of this occurring in practice is minimal. We further inspected the clusters (see https://huggingface.co/datasets/2Dmatters/Elf_encoded_flat_band_materials/tree/main), and found that the ability to extend beyond structural similarity and cluster materials based on their emergent electronic features is a general capability of our approach.
To directly compare our elf fingerprints to structure-based fingerprints, we generated structural fingerprints for the flat band sublattices following our previous work11 for each of 2127 flat-band materials studied using CrystalNNFingerprint (CNNF)23. We then clustered these CNNF fingerprints using HDBSCAN, which resulted in a total of 45 clusters. To visualize the relationship between structural (CNNF) and electronic fingerprints (elf), we assigned each material in the elf UMAP space its corresponding CNNF cluster label. This allows us to identify any localized groupings of CNNF clusters within the electronic fingerprint space, thereby revealing correlations between structural and band-structure similarity. The resulting plot is shown in Fig. 5, with the identified structural motifs within each cluster listed on the right.
Fig. 5. Clusters of iso-structural materials marked in the UMAP projected elf space.
This plot helps to visualise the extent to which electronic fingerprints also cluster similar structures together. A structural fingerprint CNNF was used to find the similarity among structures. Out of 45 iso-structural groups, 18 also form clusters on elf UMAP space, showing the correlation between electronic and crystal structures.
We find that out of the 45 CNNF clusters, 18 form localized groups of more than 5 materials within the electronic fingerprint space. This clearly demonstrates that certain structural motifs tend to give rise to similar band structures. For example, in Fig. 5, cluster 13 predominantly contains Kagome sublattices, cluster 7 hosts dice lattices, cluster 35 is composed of honeycomb lattices, and cluster 25 features square lattices. These lattice-specific groups form fairly isolated islands in the elf space.
This highlights the model’s capacity to differentiate non-trivial flat bands. In the absence of spin-orbit coupling, topological flat bands can be identified by a characteristic touching point with a dispersive band at a high-symmetry k-point. Additionally, topological flat bands arising from special lattices like Kagome or Lieb have a characteristic pattern in the band structure, which can also be identified in their tight-binding models45. Figure 5 illustrates that sublattices like Kagome and Dice form well-defined, low-spread clusters, confirming that these features are effectively learned and incorporated into our elf. Further examples include cluster 20 in Fig. 2, which contains materials with square-octagonal lattices known to support topological flat bands46, and cluster 16, which exhibits the Lieb-square lattice.
However, it is important to note that the number of CNNF clusters forming distinct groups in electronic fingerprint space is much lower than the total number of CNNF clusters. This implies that many materials sharing similar sublattice structures can indeed exhibit very different band structures. This can be attributed to the variations in the orbital composition of the materials, as well as differences in the atomic sizes and electronegativities of the constituent elements, which also play a role in determining the electronic distribution around the structural motifs.
Duplicate detection
In general, when 2dmatpedia generates “bottom-up” materials via element substitution, the structure is allowed to relax to equilibrium bond lengths and angles without changing the crystal symmetries. Conversely, based on 2Dmatpedia’s “top-down” generation mechanism, we should expect that some layers exfoliated from unique 3D structures in the Materials Project will be equivalent in two dimensions. Subsequently, this could generate identical materials when the bottom-up element substitution chains intersect such that the constituent elements are also the same. These are duplicate materials, and most would have been removed by 2Dmatpedia using structure-matching tools available from the pymatgen library47.
Our algorithm enables the automatic detection of such duplicates, as demonstrated in Fig. 3b. In this example, we see two entries of ZnSe in cluster 24 exhibiting nearly identical band structures. Our elf detected several other pairs of duplicate materials with similar band structures, differing only by small structural distortions. The 2dmatpedia IDs of these materials are listed in Table 1, with each duplicate pair sharing the same chemical formula. Using this approach, we found up to 40 potential duplicate entries (listed in https://huggingface.co/datasets/2Dmatters/Elf_encoded_flat_band_materials/tree/main). However, further investigation is necessary to determine the stability of these materials within the accuracy of DFT calculations.
Table 1.
Examples of the duplicates identified by elf
| In2S | Tl2Te | PI3 | Te2Se |
|---|---|---|---|
| 2dm-1845 | 2dm-1554 | 2dm-495 | 2dm-1823 |
| 2dm-1986 | 2dm-1521 | 2dm-2009 | 2dm-1596 |
| SrLaCl5 | AsBr3 | I3N | ZnSe |
|---|---|---|---|
| 2dm-5260 | 2dm-2624 | 2dm-2010 | 2dm-2113 |
| 2dm-5422 | 2dm-4881 | 2dm-726 | 2dm-2321 |
We recommend verifying the comparative stabilities of the identified duplicate pairs and removing the less stable entries from the 2Dmatpedia database. Our fingerprint, based solely on electronic band structures, enables the identification of fundamentally equivalent materials that differ only by small structural distortions, setting it apart from structure-based methods. This capability is particularly valuable for maintaining accurate and concise materials databases, which is a prerequisite for high-throughput computational materials discovery.
Conclusions
In this work, we have proposed a material fingerprinting method based on electronic band structures and demonstrated its advantages over structure-based methods, complementing existing techniques employed for material similarity search. We applied our fingerprinting and clustering framework (elf) to two-dimensional materials exhibiting flat bands to determine chemical and electronic property trends, elucidating multiple chemical and structure groups for further investigation. This fully unsupervised approach is a stepping stone in the realisation of the autonomous materials discovery paradigm.
Similarity search in material properties has become one of the main challenges in modern materials science. Very recently, Google Deepmind significantly increased the number of known stable crystals with GNoME (Graph Neural Networks for Materials Exploration)1, releasing an unprecedented number of candidate materials. In this work, we have demonstrated for the first time that material fingerprints deep-learned from electronic band-structure features prove to be a robust tool in linking computationally generated materials to already synthesised compounds exhibiting important emergent properties. This will help widen the bottleneck between material prediction and synthesis of the most promising candidate materials, a crucial task as we move into the new paradigm of AI-driven materials discovery.
In future work, we plan to apply our approach to analyse larger databases, such as the Materials Project, containing 3D materials, and investigate different electronic phenomena emerging from non-trivial band features such as high Tc superconductivity and novel topological phases. Additionally, optimising the material clustering pipeline will be essential, with a focus on methods that prioritise the underlying physics of the materials. One possible approach could involve using dimensionality reduction algorithms to establish broader material groups before applying clustering algorithms in the full fingerprint space. This approach could itself be automated with machine learning, provided key physical properties remain central.
Methods
Network training and fingerprint
Previous studies have employed simple fingerprint vectors directly extracted from the electronic band structure of a material48,49 to cluster materials’ electronic properties. However, when applying similar techniques to the subset of flat-band materials from 2Dmatpedia, we found that the results were dominated by noise, with materials relatively uniformly spread across fingerprint space. This issue mainly arises from the reliance of those techniques on integrated variables of all electronic bands in some energy range, like the density of states (DoS). As a result, materials sharing meaningful electronic properties may be far apart in fingerprint space if they happen to have a different number of bands passing through some energy range.
To address this issue, we have proposed a fingerprint, based solely on the electronic band structure features of a material. The autoencoder we trained to encode the band feature fingerprints is based on the ResNet series of convolutional neural networks50,51, which have found extensive applications in feature extraction problems across the medical and physical sciences52,53. We used the first 16 layers of ResNet18 as an encoder and ‘transposed’ it by replacing convolutions with deconvolutions to obtain the decoder. The resulting network is fully convolutional and features skip connections and batch normalization layers, which are characteristic of ResNet models and enable the deep model to converge. We also tested ResNet34 and ResNet50 as backbones for elf, but did not observe a significant improvement; see Supplementary Note 2 for details.
To train the network, we plotted band structure data within a ± 4 eV range of the Fermi energy, binarized the plots, and resized them as 224 × 224 pixel images. Limiting the energy range to this region around the Fermi energy focuses on the crucial features of the band structure, preventing excessive uniqueness in materials that could lead to reduced clustering power.
The input to the network is a 224 × 224 matrix of zeros and ones, representing the pixels of the band structure image. The network takes 3 layers of size 224 × 224 to represent the RGB colour values of the pixels but these are redundant for our black and white images. The network then predicts an output matrix of the same size as the input. When this output matrix is plotted, we obtain a prediction of the input band structure, based only on the information from the compressed latent layer representation in the centre of the network.
During the optimization process, the network is trained to minimise the binary cross-entropy (BCE) loss54 between input and output images. To generate an accurate prediction of the input image from the much smaller set of numbers in the latent representation, the network is forced to encode compressed features of the band structure image in this latent space. Additionally, during training, we applied random noise to regions of the input images with a probability of 0.5.
To achieve a balance between the flexibility needed for high reconstruction accuracy and the dimensionality reduction required for improved clustering, we chose a flattened length of 98 for the latent space. The latent space size was set to a 7 × 7 matrix with two parallel channels. With this network architecture and the application of noise during training, we observed the training loss stabilise at 0.282 (using BCE) after 30 epochs. Performance of the network on the validation set remained within 5% of the training loss throughout the training process.
We can interpret the physical features learned by the network by inspecting its latent space representations. Due to the purely convolutional architecture of the network, we expect soft correlation between specific regions of the input image and specific regions of the latent space matrix. To visualise this, we run the band structure of 2dm-1’s (IrF2) through the encoder, and systematically varied the value of one dimension of the resulting encoded representation by a Δ. The full set of slightly altered latent space values is then decoded, and any effect of the change will be observed in the features of the reconstructed image.
We changed the latent dimension of channel 2 at matrix position (2,2) by a value Δ in the range 0.5 to 0.9, and the dimension at matrix position (3,0) was changed by Δ in the range of 0.25 to 1.1. The resulting reconstructions are displayed in Fig. 6, with Δ = 0 corresponding to the material’s original band structure.
Fig. 6. Decoded band structures showing the effect of changing one dimension of 2dm-1’s latent space.
Panels a and b correspond to Δ changes in the latent dimensions [2,2] and [3,0] respectively, of channel 2. The ranges of Δ have been chosen to display the alteration of a single feature in the band structure. With larger ∣Δ∣ values, different possible band crossings and splittings can emerge in the same region.
We observe that, because of the learned features, the auto-encoder can generalise to generate entirely sensible band structures of materials that, in theory, do not exist, by simple manipulation of the latent space. This helps to elucidate the meanings of the individual latent space dimensions. Moreover, due to the compression, there is generally overlap in the latent space regions, and this overlap can help obtain band structures of two seemingly different material groups by continuously tuning some parameters of the latent vector. However, that exercise is out of the scope of this article.
Multi-stage clustering module
To classify the machine-learned fingerprints, we employed a completely unsupervised multi-stage algorithm. HDBSCAN29 was first used to discern regions of high density in the fingerprint space and suggest a hierarchical structure for these clusters. This serves as a stringent identifier of band structure feature similarity among the materials. HDBSCAN, being density-based, facilitates much more general cluster shapes compared to the common ellipsoid-based k-means method. Additionally, it allows us to obtain hierarchical cluster information without being as sensitive to noise in the data (from structural distortions).
To offer a complementary and independent view of our 98-dimensional fingerprint space, the Uniform Manifold Approximation and Projection (UMAP) algorithm was used. UMAP excels at dimensional reduction while preserving the local and global distance relations of points30. This sets it apart from other approaches such as Locally Linear Embedding (LLE)55 and Hessian Eigenmaps56. Furthermore, to visualize relations between different clusters, we used another clustering technique DBSCAN, which allows even arbitrary-shaped connections to create larger groups.
The t-SNE algorithm57 provides an alternative to UMAP as it also preserves both the local and global structure of the feature space58. In our analysis, we did not observe significant improvement in the embedding space when using t-SNE compared to UMAP. However, UMAP has been extensively applied in bioinformatics research, showing results broadly consistent with t-SNE while offering superior run times and better preservation of global structures59,60, such as distances between cell types. Hence, the UMAP algorithm was chosen for our analysis while the comparative embedding results from t-SNE can be found in the Supplementary Note 1.
The Minkowski distance with exponent p = 0.2 was used during the clustering process, as this metric is known to scale better than Euclidean (p = 2) and Manhattan (p = 1) metrics to high dimensional vector spaces61.
The two primary free parameters of HDBSCAN (minimum cluster size, Nc, and minimum sample size, Ns), were optimised by considering their effect on three metrics quantifying the quality of the resulting clustering solution. These were the number of clusters formed, the number of unclassified materials, and the ‘density based clustering validation’ (DBCV) index62. The DBCV index evaluates the compactness of a clustering solution by comparing the sparseness of clusters (based on the point in the cluster with the largest core distance measure) with the inter-cluster separation. Thus, a higher value indicates more compact, well-separated clusters and an overall better clustering solution.
Ns and Nc were both varied from 2 to 11, and the metrics above were calculated for the resulting clustering solution. These are displayed as colour maps in Fig. 7. Initially, the number of unclassified materials increases, indicating the presence of many difficult-to-classify materials that get forced out of clusters as the clustering parameters become more stringent, requiring larger and more compact clusters. These behaviours are typical of material clustering solutions11 using this approach. Considering these factors, we chose Nc = 5 and Ns = 2. This achieves a relatively large DBCV index while minimising the number of unclassified materials and keeping the number of clusters bounded enough to effectively represent the major band structure groups among flat-band materials.
Fig. 7. Four metrics for assessing validity of the clustering solutions.
The metrics are plotted as function of HDBSCAN's minimum cluster size and minimum sample size variables. The number of clusters formed and the number of unclustered materials together indicate how fine-grained the similarities are between all the materials in a given cluster. The DBCV and S-Dbw indices directly quantify the quality of a given clustering solution with a higher score indicating a better solution with clusters that are less diffuse and more separated from each other.
For UMAP, a Nearest Neighbour (NN) parameter of 10 and Minimum Distance (MD) of 0.1 were found to give the optimal 2D representation and general agreement with the HDBSCAN clusters. Finally, for DBSCAN, the parameter ϵ = 25 was chosen which allowed identifying visibly separate regions in the UMAP projection. UMAP was mainly a visualisation tool in this work. For clustering, we used HDBSCAN, which prevents the output from being highly sensitive to dimensionality reduction parameters, such as the UMAP Nearest Neighbour parameter, for which it is difficult to predict the precise effect that changes will have on the shape of the embedding space.
Supplementary information
Acknowledgements
This research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 865590) and the Research Council UK [BB/X003736/1]. Q.Y. acknowledges the funding from Royal Society University Research Fellowship URF\R1\221096 and UK Research and Innovation Grant [EP/X017575/1].
Author contributions
H.P. and T.W. equally contributed to elf architecture and programming, data analysis and interpretation. I.T. and H.Z. provided support in data analysis and AI/ML architecture. Q.Y., A.M., and A.B. conceived the research plan and supervised the project. All authors participated in discussions and contributed to writing the manuscript.
Peer review
Peer review information
Communications Physics thanks Matías Núñez and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data availability
All relevant data are available at our HuggingFace repository: Elf_encoded_flat_band_materials.
Code availability
The code used to generate the results discussed in this work is available from our public Huggingface repository: Elf_encoder.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Henry Kelbrick Pentz, Thomas Warford.
Contributor Information
Qian Yang, Email: qian.yang@manchester.ac.uk.
Anupam Bhattacharya, Email: anupamcounting@gmail.com.
Artem Mishchenko, Email: artem.mishchenko@manchester.ac.uk.
Supplementary information
The online version contains supplementary material available at 10.1038/s42005-025-01936-2.
References
- 1.Merchant, A. et al. Scaling deep learning for materials discovery. Nature624, 80–85 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rashid, A., Lazarev, M., Kazeev, N., Novoselov, K. & Ustyuzhanin, A. Review on automated 2D material design. 2D Mater.11, 032002 (2024). [Google Scholar]
- 3.Jain, A. Machine learning in materials research: developments over the last decade and challenges for the future. ChemRxiv https://chemrxiv.org/engage/chemrxiv/article-details/65d934059138d23161da5fab (2024).
- 4.Lyngby, P. & Thygesen, K. S. Ab initio property characterisation of thousands of previously unexplored 2D materials. 2D Mater. 11, 035030 (2024)
- 5.Cheetham, A. K. & Seshadri, R. Artificial Intelligence Driving Materials Discovery? Perspective on the Article: Scaling Deep Learning for Materials Discovery. Chem. Mater.36, 3490–3495 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Alverson, M. et al. Generative adversarial networks and diffusion models in material discovery. Digital Discov.3, 62–80 (2024). [Google Scholar]
- 7.Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature620, 47–60 (2023). [DOI] [PubMed] [Google Scholar]
- 8.Leeman, J. et al. Challenges in high-throughput inorganic materials prediction and autonomous synthesis. PRX Energy3, 011002 (2024). [Google Scholar]
- 9.Adam, D. The automated lab of tomorrow. Proc. Natl Acad. Sci.121, e2406320121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature624, 86–91 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bhattacharya, A., Timokhin, I., Chatterjee, R., Yang, Q. & Mishchenko, A. Deep learning approach to genome of two-dimensional materials with flat electronic bands. npj Computational Mater.9, 101 (2023). [Google Scholar]
- 12.Nunez, M. Exploring materials band structure space with unsupervised machine learning. Computational Mater. Sci.158, 117–123 (2019). [Google Scholar]
- 13.Núñez, M., Weht, R. & Núñez-Regueiro, M. Searching for electronically two dimensional metals in high-throughput ab initio databases. Computational Mater. Sci.182, 109747 (2020). [Google Scholar]
- 14.Scheurer, M. S. & Slager, R.-J. Unsupervised machine learning and band topology. Phys. Rev. Lett.124, 226401 (2020). [DOI] [PubMed] [Google Scholar]
- 15.Checkelsky, J. G., Bernevig, B. A., Coleman, P., Si, Q. & Paschen, S. Flat bands, strange metals and the kondo effect. Nat. Rev. Mater.9, 509–526 (2024).
- 16.Leykam, D., Andreanov, A. & Flach, S. Artificial flat band systems: from lattice models to experiments. Adv. Phys. X3, 1473052 (2018). [Google Scholar]
- 17.Törmä, P., Peotta, S. & Bernevig, B. A. Superconductivity, superfluidity and quantum geometry in twisted multilayer systems. Nat. Rev. Phys.4, 528–542 (2022). [Google Scholar]
- 18.Rhim, J.-W. & Yang, B.-J. Singular flat bands. Adv. Phys. X6, 1901606 (2021). [Google Scholar]
- 19.Sun, K., Gu, Z., Katsura, H. & Das Sarma, S. Nearly flatbands with nontrivial topology. Phys. Rev. Lett.106, 236803 (2011). [DOI] [PubMed] [Google Scholar]
- 20.Huang, T. et al. Observation of chiral and slow plasmons in twisted bilayer graphene. Nature605, 63–68 (2022). [DOI] [PubMed] [Google Scholar]
- 21.Cao, Y. et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature556, 43–50 (2018). [DOI] [PubMed]
- 22.Choi, Y. et al. Correlation-driven topological phases in magic-angle twisted bilayer graphene. Nature589, 536–541 (2021). [DOI] [PubMed] [Google Scholar]
- 23.Zimmermann, N. E. & Jain, A. Local structure order parameters and site fingerprints for quantification of coordination environment and crystal structure similarity. RSC Adv.10, 6063–6081 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhou, J. et al. 2DMatPedia, an open computational database of two-dimensional materials from top-down and bottom-up approaches. Sci. data6, 86 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Duan, J. et al. Cataloging high-quality two-dimensional van der Waals materials with flat bands. Adv. Funct. Mater.34, 2313067 (2024). [Google Scholar]
- 26.Neves, P. M. et al. Crystal net catalog of model flat band materials. npj Computational Mater.10, 39 (2024). [Google Scholar]
- 27.Regnault, N. et al. Catalogue of flat-band stoichiometric materials. Nature603, 824–828 (2022). [DOI] [PubMed] [Google Scholar]
- 28.Zhang, X., Zhao, Y.-M., Song, Z. & Shen, L. Physically explainable statistical learning of flat bands in stoichiometric materials from the periodic table. Phys. Rev. Mater.7, 064804 (2023). [Google Scholar]
- 29.Campello, R. J. G. B., Moulavi, D. & Sander, J. Density-based clustering based on hierarchical density estimates. In Advances in Knowledge Discovery and Data Mining, 160–172 (PAKDD, 2013).
- 30.McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
- 31.Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226–231 (ACM, 1996).
- 32.Lanzillo, N. A., Roy, S. & Nayak, S. K. Quantum confinement and quasiparticle corrections in α-HgS from first principles. Surf. Sci.636, 54–58 (2015). [Google Scholar]
- 33.Li, J. et al. Two-dimensional topological insulators with tunable band gaps: Single-layer hgte and hgse. Sci. Rep.5, 14115 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Imran, M. et al. Highly efficient and stable inverted perovskite solar cells with two-dimensional znse deposited using a thermal evaporator for electron collection. J. Mater. Chem. A6, 22713–22720 (2018). [Google Scholar]
- 35.Xiong, A. & Zhou, X. Tunable electronic and optical properties of novel ZnSe/AlP van der Waals heterostructure. Mater. Res. Express6, 075907 (2019). [Google Scholar]
- 36.Zhang, J., Sun, Y., Ye, S., Song, J. & Qu, J. Heterostructures in two-dimensional CdSe nanoplatelets: synthesis, optical properties, and applications. Chem. Mater.32, 9490–9507 (2020). [Google Scholar]
- 37.Zhou, J. et al. Robust photocatalytic activity of two-dimensional h-BN/Bi2O3 heterostructure quantum sheets. RSC Adv.12, 13535–13547 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mei, J., Liao, T., Ayoko, G. A. & Sun, Z. Two-dimensional bismuth oxide heterostructured nanosheets for lithium-and sodium-ion storages. ACS Appl. Mater. Interfaces11, 28205–28212 (2019). [DOI] [PubMed] [Google Scholar]
- 39.Yu, W., Li, J., Wu, Y., Lu, J. & Zhang, Y. Systematic investigation of the mechanical, electronic, and interfacial properties of high mobility monolayer InAs from first-principles calculations. Phys. Chem. Chem. Phys.25, 10769–10777 (2023). [DOI] [PubMed] [Google Scholar]
- 40.Wu, K. et al. Two-dimensional giant tunable rashba semiconductors with two-atom-thick buckled honeycomb structure. Nano Lett.21, 740–746 (2020). [DOI] [PubMed] [Google Scholar]
- 41.Wunderlich, P., Ferrari, F. & Valentí, R. Detecting topological phases in the square–octagon lattice with statistical methods. Eur. Phys. J.138, 336 (2023). [Google Scholar]
- 42.Bao, A., Tao, H.-S., Liu, H.-D., Zhang, X. & Liu, W.-M. Quantum magnetic phase transition in square-octagon lattice. Sci. Rep.4, 6918 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Nunes, L. H. & Smith, C. M. Flat-band superconductivity for tight-binding electrons on a square-octagon lattice. Phys. Rev. B101, 224514 (2020). [Google Scholar]
- 44.Peotta, S., Huhtinen, K.-E. & Törmä, P. Quantum geometry in superfluidity and superconductivity. Preprint at https://arxiv.org/abs/2308.08248 (2023).
- 45.Liu, H., Meng, S. & Liu, F. Screening two-dimensional materials with topological flat bands. Phys. Rev. Mater.5, 084203 (2021). [Google Scholar]
- 46.Mukherjee, A. & Singh, B. Topological flat bands and higher-order topology in square-octagon lattice. Preprint at https://arxiv.org/abs/2410.04515 (2024).
- 47.Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Mater. Sci.68, 314–319 (2013). [Google Scholar]
- 48.Isayev, O. et al. Materials cartography: representing and mining materials space using structural and electronic fingerprints. Chem. Mater.27, 735–743 (2015). [Google Scholar]
- 49.Knøsgaard, N. R. & Thygesen, K. S. Representing individual electronic states for machine learning GW band structures of 2D materials. Nat. Commun.13, 468 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (IEEE, 2016).
- 51.Horizon2333. Imagenet autoencoder. https://github.com/Horizon2333/imagenet-autoencoder GitHub repository (2021).
- 52.Yu, X. & Wang, S.-H. Abnormality diagnosis in mammograms by transfer learning based on resnet18. Fundam. Inform.168, 219–230 (2019). [Google Scholar]
- 53.Odusami, M., Maskeliūnas, R., Damaševičius, R. & Krilavičius, T. Analysis of features of alzheimer’s disease: Detection of early stage from functional brain changes in magnetic resonance images using a finetuned resnet18 network. Diagnostics11, 1071 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Creswell, A., Arulkumaran, K. & Bharath, A. A. On denoising autoencoders trained to minimise binary cross-entropy. Preprint at https://arxiv.org/abs/1708.08487 (2017).
- 55.Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. science290, 2323–2326 (2000). [DOI] [PubMed] [Google Scholar]
- 56.Donoho, D. L. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc. Natl Acad. Sci.100, 5591–5596 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res.9, 2579−2605 (2008).
- 58.Kobak, D. & Linderman, G. C. Initialization is critical for preserving global data structure in both t-sne and umap. Nat. Biotechnol.39, 156–157 (2021). [DOI] [PubMed] [Google Scholar]
- 59.Xiang, R. et al. A comparison for dimensionality reduction methods of single-cell rna-seq data. Front. Genet.12, 646936 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Roca, C. P. et al. A cross entropy test allows quantitative statistical comparison of t-sne and umap representations. Cell Rep. Methods3, 100390 (2023). [DOI] [PMC free article] [PubMed]
- 61.Aggarwal, C. C., Hinneburg, A. & Keim, D. A. On the surprising behavior of distance metrics in high dimensional space. In Database Theory — ICDT, 420–434 (ICDT, 2001).
- 62.Moulavi, D., Jaskowiak, P. A., Campello, R. J., Zimek, A. & Sander, J. Density-based clustering validation. In Proceedings of the 2014 SIAM international conference on data mining, 839–847 (SIAM, 2014).
- 63.Momma, K. & Izumi, F. Vesta 3 for three-dimensional visualization of crystal, volumetric and morphology data. J. Appl. Crystallogr.44, 1272–1276 (2011). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are available at our HuggingFace repository: Elf_encoded_flat_band_materials.
The code used to generate the results discussed in this work is available from our public Huggingface repository: Elf_encoder.







