Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Dec 30;14:31564. doi: 10.1038/s41598-024-68920-8

A machine learning based classifier for topological quantum materials

Ashiqur Rasul 1,, Md Shafayat Hossain 2, Ankan Ghosh Dastider 1, Himaddri Roy 1, M Zahid Hasan 2, Quazi D M Khosru 1
PMCID: PMC11686355  PMID: 39738190

Abstract

Prediction and discovery of new materials with desired properties are at the forefront of quantum science and technology research. A major bottleneck in this field is the computational resources and time complexity related to finding new materials from ab initio calculations. In this work, an effective and robust deep learning-based model is proposed by incorporating persistent homology with graph neural network which offers an accuracy of 91.4% and an F1 score of 88.5% in classifying topological versus non-topological materials, outperforming the other state-of-the-art classifier models. Additionally, out-of-distribution and newly discovered topological materials can be classified using our method with high confidence. The incorporation of the graph neural network encodes the underlying relation between the atoms into the model based on their crystalline structures and thus proved to be an effective method to represent and process non-Euclidean data like molecules with a relatively shallow network. The persistent homology pipeline in the proposed neural network integrates a topological analysis of crystal structures into the deep learning model, enhancing both robustness and performance. Our classifier can serve as an efficacious tool for predicting the topological class, thereby enabling a high-throughput search for fascinating topological materials.

Subject terms: Topological matter, Condensed-matter physics

Introduction

Recent years have witnessed the rise of machine learning, enabling a multitude of new applications, ranging from accelerated drug discovery to personalized advertising13. Machine learning and more specifically deep learning-based techniques are becoming popular in material science, thanks to their high accuracy, computational speed, and ease of use in comparison with the ab initio calculations4. Although our computational power has increased manifold, the task of discovering new materials, and exploring their types, properties, and structures is still significantly time-consuming and computationally expensive57. On the other hand, deep learning is capable of providing the same result several orders of magnitude faster than traditional ones8,9. The key requirement to apply deep learning-based methods is the availability of a large-scale data set. Thankfully, density functional theory10 based computational databases comprising ab initio and symmetrical calculations have been available in recent years, and it enables the application of deep learning in material science5,1118.

Data-driven intelligent models learn certain features or descriptors of the materials from the provided data and can make certain decisions based on those in an automated system6,19,20. Encoding the properties, i.e., structural and other relevant information, of various materials is a critical step in building an adroit deep learning model. In fact, the objective of how to represent the atoms of a molecule led to a surge of recent interest in graph convolutional neural networks21,22. Notably, Xie et al.23 developed a generalized crystal graph convolutional neural network (CGCNN) to predict material properties by embedding molecular information into graph neural network and it led to further developments in this field20,2426. For example, Karamad et al.26 incorporated atomic orbital interaction features in the developed graph neural network model, outperforming the CGCNN one. Although various machine learning and deep learning models for predicting material properties can be found in the literature, a robust framework that can predict a particular class of material from atomistic data is limited. Here we build a model focusing on the topological class of materials. The discovery of topological materials introduced us to unprecedented new physics, and predicting new topological materials remains a major technological goal27,28. There are a few frameworks geared towards this frontier: Claussen et al.’s29 model built upon a gradient-boosted trees algorithm30 can predict the DFT-computed topology of a given material. Acosta et al.’s31 compressed sensing-based statistical-learning approach can create a two-dimensional map where trivial insulators and Quantum spin-Hall insulators are separated in different domains. The deep neural network presented in the works of Sun et al.32 can predict the topological invariant of one-dimensional four-band and two-dimensional two-band insulators based on momentum space Hamiltonian. The Hamiltonian is also used as an input by Zhang et al.33, where the supervised neural network is trained to distinguish various topological phases of topological band insulators.

Deep learning-based automatic prediction of the topological materials requires an efficient representation of the material and a scheme to extract the useful features for the network. Persistent homology, a method within the broader field of topological data analysis that explores qualitative features based on geometry and topology34, is a strong candidate for such requirements. It has already found numerous applications in scientific and engineering applications, including biological information analysis35,36, prediction of chemical stability37, and crystalline compound representation7. For modeling crystals, the inclusion of many-atom interaction is crucial. Atom-specific persistent homology (ASPH) can capture such interactions and thereby extract the true topological information of the materials, thus providing strong features for the deep learning model7. In this work, we integrate ASPH with a graph convolutional neural network to build an efficient classification model for predicting the topological class of material. We find that the use of atom-based chemical information as the learning feature boosts the overall classification performance significantly. We first benchmark the performance of our model with well-known materials and then inspect a few recently discovered materials to justify our claim. Satisfactory performance in both 2-class (topological vs. non-topological/trivial) and 3-class (topological insulator, semimetal, and trivial) classifications and relevant comparisons with other studies prove the effectiveness of the proposed model.

Results

Model architecture

Graph neural network, combined with persistent homology model encodes the structural, chemical, and topological information of the material. The graph representation of the crystals acts as a structural descriptor, whereas the persistent homological feature vector acts as a topological descriptor. Therefore, the model can be divided into two separate but operating in parallel models: one network works with graph representation learning and the other part performs the algebraic homological analysis on the material crystal structures to produce a topological feature vector. These two representations in two different spaces are finally combined to produce the complete representation of the whole crystal system. This conjoined feature vector is then fed into a deep neural network to produce the final prediction result. The entire mechanism of merging homological and graph features to reach the final result is presented in Fig. 1.

Figure 1.

Figure 1

Schematic of the composite network executed in our model. Features are extracted from the simultaneously progressing graph neural networks and atom-specific persistent homology pipelines. In turn, these features are concatenated to feed the predictor deep neural network model to ensure proper utilization of all the necessary features supplied by the two pipelines and thereby achieve the best possible prediction result from the proposed deep neural network architecture.

Crystal graph generation

Graph representation of data can provide information on the basic structure thanks to its ability to represent the relation between the entities. The basic components of a graph are nodes and edges, where the edges denote connections between the nodes. When considering a crystal as a graph, each atom is represented as a node in the graph while the interactions among the atoms are represented as edges. Thus, atoms and their bonding information can be encoded into a graph structure. However, the graphs are non-structured and non-Euclidean in nature, so accessing and working with the information is challenging and requires scaling down to a lower dimension. A key advantage of a graph convolutional neural network is that it can work with varying and non-ordered node connections. It is, therefore, significantly advantageous over the traditional convolutional neural network, which only works with regular structured Euclidean data.

As illustrated in Fig. 2, we implement the graph neural network portion of our model to be a graph classification network that adopts a graph convolutional network algorithm. Its basic framework23 entails a transformation of the input atomic information into a graph where the nodes and edges represent the atoms and the bonds, respectively22 and the graph convolutional neural network is trained on our dataset to predict the topological class of the material. The network contains convolutional and pooling layers, where a convolutional layer applies filters on the input vector to capture the accurate position of features, whereas the pooling layer creates a reduced (pooled) size layer with a precise focus on the useful features. The pooling layer also makes the model tolerant to the little distortions at the input side. For the properties of ith atom (represented by node i), a feature vector vi is formed. Similarly, the edge is embedded with the u(i,j)k feature vector. Here, the kth bond connects the atoms i and j. At the training stage, the node feature vectors are updated as23:

vi(t+1)=vi(t)+j,kσ(z(i,j)k(t)Wf(t)+bf(t))g(z(i,j)k(t)Ws(t)+bs(t)). 1

Here and σ denote element-wise multiplication and sigmoid function, respectively and Wt is the weight vector and bt represents the bias on tth convolution stage. To account for the differences in interaction among neighbours, convolution is performed after neighbour feature vector concatenation using z(i,j)k(t)=vi(t)vj(t)u(i,j)k.

Figure 2.

Figure 2

Crystal graph convolutional neural network (CGCNN) pipeline used in this study. (a) A molecule is represented by a crystal graph, consisting of an adjacency matrix, edge, and node attribute matrix. Here the atoms of a crystal form the nodes, whereas the bonds are represented by edges. (b) Crystal graph is propagated through a series of graph convolutional layers, followed by a graph pooling layer, and lastly, a fully connected layer to produce the desired output.

For graph generation, each atom in the crystal is represented as a node, and the chemical bonds between atoms are represented as edges in the crystal graph. The first step in this process is to determine the connectivity among atoms in the structure, taking into account potential bonding distances and the positions of neighboring atoms. Inspired by Ishayev et al.38, we employed atom-centered Voronoi−Dirichlet polyhedra39 to partition the crystal geometry. Interatomic interactions were deemed plausible if the atoms shared a Voronoi face and the interatomic distance is within the Cordero Covalent radii40. Satisfying these two criteria discards weak interactions among atoms, such as Van der Waals forces, and creates a three-dimensional graph for the corresponding crystal structure. These atomic interactions are then represented in the adjacency matrix, which captures the global connectivity of the atoms within the crystal. Individual elements are encoded into a one-hot encoded feature vector, incorporating their chemical characteristics such as group and period position in the periodic table, electronegativity, covalent bond radius, electron affinity, and atomic volume. For example, encoding the discrete attributes like the group and period number for hydrogen (H) results in a one-hot encoded vector of dimension 27, where the first and 19th indices are 1, and all other indices are 0. For continuous variables such as electronegativity and first ionization energy, the values are divided into 10 segments and converted into a one-hot encoded feature vector of dimension 10. Edge feature embeddings are generated by transforming the interatomic distances between the atoms. In the subsequent three stages of the graph convolutional network, each nodal feature vector is updated through gradient propagation based on the aggregation of the neighboring atoms. The model considers a maximum of 12 neighbors for individual atoms at a maximum radius of 15 Å. The convolutional layers consist of a series of linear (fully connected layers) with sigmoid activation function and batch normalization layers, followed by the pooling layers intended for dimensionality reduction. These features are then fed to a single-stage hidden layer of dimension 128, eventually producing output features extracted from material graph information.

Atom-specific persistent homology (ASPH)

In order to build a functional neural network for predicting topological classes of materials, we need to extract the necessary features related to the complex crystal geometry as well as the interactions between atoms. However, features provided by the traditional crystal descriptors are not sufficient to efficiently train a neural network because they are over-simplified and lack the required topological information. It necessitates a more accurate crystalline representation and feature extraction approach capable of portraying the overall crystal domain interactions.

We find persistent homology to be an effective tool for this purpose. Persistent homology can successfully encode the multi-scale crystalline geometric information embedded into the topological invariant. Besides crystal geometry, it is also necessary to consider atomic diversity and crystal periodicity information to distinguish one crystal from others of a similar kind. Atom-specific persistent homology (ASPH) can be beneficial here, as it provides global topological features from graphs generated by crystal graph generators. ASPH, an algebraic topology-based method, uniquely represents crystal structures by introducing a local atomic-level representation of a molecule using a global topological tool. This model constructs conjugated sets of atoms along with corresponding simplicial complexes and topological spaces, capturing both pairwise and many-body interactions. In this way, it effectively reveals the topology-property relationship of atoms at various scales. The topological information, gathered from the neighborhood of each atom in the crystal using the simplicial homology groups (cycles, boundaries) of the ASPH model, can enhance the neural network’s output41.

To better understand how ASPH combines atom-wise information with the topological invariant to generate an individual topological fingerprint, first, we analyze the Betti number generation based on the simplicial homology group rank7. To ease computational complexity, a group of simplex structures is used to describe the original complex crystal shape. Each simplicial complex is denoted by a k-simplex, where k+1 is the number of affinely independent points, i.e., vertices in that simplex. From those simplices, the k-cycle group, Zk, and k-boundary group (Bk) are formed. The resultant quotient group produced by Zk modulo Bk is defined as the k-homology group, Hk. Hk is more significant in this case because its rank denotes the desired kth Betti number; the persistent Betti numbers are directly used as the topological fingerprints in the later proceedings of the deep learning model’s training.

Next, to extract element-specific interaction features, all the possible element pairs of the crystal composition are considered. Each element-specific pair collection of atoms inside a crystal is symbolized as Pα,iβ, where α and β are the two types of elements of that crystal composition, and the ith central atom of type-α is surrounded by atoms of type-β. Then, the unit cell is enlarged to the extent that the distance between any atom in the original unit cell and the boundary atom is smaller than a cutoff radius rco. In this enlarged unit cell, a point cloud of atoms within rco is selected as the region of interest to generate the homology groups and persistence barcodes based on the Betti numbers. The point cloud region Riα,β is defined as below7:

Riα,β(rco)={rjβ|riα-rjβ<rco,rjβ,riαPα,iβ,j1,2,N}, 2

where N is the number of atoms in pair Pα,iβ. In our atom-specific persistent homology pipeline shown in Fig. 3, crystal structure information from the corresponding Vienna Ab Initio Simulation Package (VASP)42,43 POSCAR files are gathered and their unit cells are enlarged to encompass a cutoff radius of 8Å. Next, Betti numbers and persistent barcodes are generated for each of the structures using the Ripser package44. Betti numbers and persistent barcodes provide global geometric information for topological embeddings. Betti 0 indicates isolated components, while Betti 1 corresponds to circles, and Betti 2 denotes the presence of cavities/voids in the geometric space of the crystal structure. The topological features of an atom are conjoined with the neighboring atoms’ topological features using five statistical quantities: minimum, maximum, mean, standard deviation, and the sum of each of the birth, death, and persistent length to produce atom-specific persistent homological feature vectors. It is worth mentioning that persistent homology and persistent barcodes remain invariant with subtle perturbations in atomic positions. Consequently, topological features extracted from individual crystal structures exhibit translation and scale invariance, contributing to the robustness of the classifier model.

Figure 3.

Figure 3

Atom-Specific Persistent Homology (ASPH) pipeline implemented in parallel with the CGCNN pipeline. Crystal cell structure is extracted from the database followed by cell enlargement and Betti number generation. Later, persistent barcodes are produced to provide the topological ASPH feature vector.

Figure 4 captures the topological high-dimensional feature vectors projected onto a two-dimensional plane with the help of the t-distributed Stochastic Neighbor Embedding (tSNE) algorithm. Intriguingly, topological semimetals, topological insulators, and trivial materials are segregated in the projection space. However, it is evident from the two figures that persistent homological features embody the differential distribution of data points that can aid in predicting the precise property of the material. It is also worth mentioning that, topological insulator data points are disseminated across the entire space. This observation justifies our point of utilizing topological descriptors for classifying materials into their respective topological or trivial class.

Figure 4.

Figure 4

Atom Specific Persistent Homology feature vectors projected in the two-dimensional plane using t-distributed stochastic neighbor embedding. In the figure, topological features are represented in the right (b) and chemical feature embeddings are displayed in the left (a).

Classification results

The structural features obtained from the graph convolutional network and topological descriptors from atom-specific persistent homology are integrated and processed through a deep neural network to perform the final prediction. A detailed illustration of the ensemble model is depicted in Fig. 5. We have tested our proposed model in predicting the three topological classes (trivial, semimetal, or topological insulator) and binary classes (topological vs. non-topological). The proposed model is trained on 80% of the dataset and then it is applied on the test set, i.e., 20% of the dataset comprising 2000 materials. A detailed description of the methodology followed in this study is presented in Sect. 4. The test set was completely unseen to our model during the training phase. For the test set, our proposed model yields an accuracy of 91.4% on binary classification tasks and around 80% on three-class classification with F1-scores around 88.5% and 78.2% respectively. This result outperforms all the state-of-the-art prediction models by around 7% in accuracy (binary classification) and around 2% in three-class classification tasks. The reason behind the disparate performance gains in binary and multi-class classification can be attributed to the persistent homological features of the proposed model. From Fig. 4b, it is clear that persistent homological features can differentiate topological (semimetal and topological insulator) from non-topological (trivial) materials but it fails to segregate the three individual classes, which is evident from the topological insulators (represented by red dots) spread all over the feature space. Examples of some materials from the two classes and the respective confidence scores are presented in Table 1. As seen in Table 1, the developed model can predict the true class of the material in question with a reasonably high confidence score. We also investigated a few newly discovered materials, which were not included in the training dataset as seen in the Table 2, to test the ability of our model and our proposed model is able to successfully predict the topological class of these unseen materials with high confidence score.

Figure 5.

Figure 5

Detailed implementation of our proposed model. The high dimensional persistent homological feature vector is reduced in dimension through a series of linear layers, followed by batch normalization and rectified linear activation layer. This vector is concatenated with a graph neural network vector and propagated through a deep neural network to produce the final result.

Table 1.

Classification performance on well-known materials.

Trivial material Confidence score Topological material Confidence score
CdWO4 0.9949 Sb2Te3 0.8650
GaTe 0.9976 LaSi 0.9163
YVO4 0.9498 Ca3Ni7B2 0.9603
SF6 0.9843 Sb2Te3 0.8650
As4S5 0.9945 LaH2 0.9564
TePbF6 0.9974 Ba5Al5Sn 0.9851
NaSbF4 0.9962 Pr3Ga 0.9990
HgTe 0.9710 Bi2Se3 0.6644
Cs2Cu2Sb2Se5 0.9683 DyNbO4 0.9890
BaF2 0.9970 LuNi2Sn 0.9375
LiEu3O4 0.9544 FeB 0.9098
BaMgF4 0.9999 Nb3Ga 0.992
CsTaI6 0.9985 LuTiSi 0.9915
BaLaCaTe3 0.9834 LiVS2 0.8794
BaTiO4 0.9999 Dy2Fe2Si2C 0.9705

Table 2.

Classification performance on newly discovered materials.

Trivial material Confidence score Topological material Confidence score
Bi4I4 0.9982 RbV3Sb5 0.7433
Sr2SnO4 0.9989 In2Ni3Se2 0.999
RbLa(MoO4)2 0.8991 CsV3Sb5 0.7341
P4PtF12 0.9129 KV3Sb5 0.7441
GaAg(PSe3)2 0.9990 ZrPRu 0.9473
K3ReC4(N2O)2 0.999 Pr(GeRu)2 0.9946
HgHNO4 0.9999 CeAu 0.9820
LiCaAs 0.9999 VAsRh 0.9985
KPt2Se3 0.9831 Ho(CuGe)2 0.9943
LiCuO 0.9225 Ti2BRh6 0.9999
PAuS4 0.8890 EuAu5 0.9999
LuH2ClO2 0.9970 Sr2TbReO6 0.9049

In order to compare the proposed model with some other state-of-the-art classifiers, we calculate accuracy, precision, recall, and F1 scores on the test set for our developed model as well as the crystal graph convolutional neural network (CGCNN), orbital graph convolutional neural Network (OGCNN), GATGNN45, DeeperGATGNN46 and Materials Graph Network (MEGNet)47. Figure 6 captures the quantitative comparisons between these models. Applying OGCNN and CGCNN on our test set yields an accuracy of 83% and 85%, respectively. Enabled by the combination of atom-specific information and CGCNN, our model scores a 91.4% accuracy on the testing set, outperforming the CGCNN and OGCNN by 6.4% and 8.3%, respectively. Moreover, transformer and attention-based models have been the center of attention for quite some time. Our proposed model is seen to surpass graph attention models such as GATGNN45 and DeeperGATGNN46 by an appreciable margin. The precision, recall, and F1 scores are also improved significantly in our proposed model compared to the other models. We expect our model will enable an efficient classification and search for new topological materials.

Figure 6.

Figure 6

Comparison of model performance in predicting topological class in different classification metrics. (a) Shows the performance of different models in binary classification (topological vs. trivial) while (b) shows the performance in three-class classification task (trivial, topological semimetal, and topological insulator).

Discussion

In summary, our proposed model contributes to the exploration of novel topologically reactive materials by exploiting persistent homological features and integrating them with the graph convolutional neural network. The novelty of this work encompasses robust prediction of the topological nature of crystals along with efficient incorporation of graph features with persistent homological information. In spite of the fact that graph neural networks have been rigorously used in material property prediction, the impact of graph representation and graph convolutional features in encoding topological information of material has been an untrodden realm. It is observed in this work that, structural information and neighborhood connectivity information encoded in the graphs of the materials can be employed to predict the topological nature of materials. Furthermore, manipulating the graph information with atom-specific persistent homological features of individual materials can stimulate the prediction. Crystals, being unstructured non-Euclidean data, are difficult to assimilate in the deep learning pipeline. Graph feature along with persistent homological feature is an effective method to do this, reflecting upon the fact that these features are rotation and translation invariant. The proposed model has been tested against state-of-the-art graph neural network prediction models and showed promising results with an accuracy of over 91% and an F1 score of over 89%. Additionally, the model’s performance has been examined on both notable and newly discovered topological materials, thus bearing testimony to the robustness of our model. Considering recent advancements in topological material physics and its anticipated application in the field of spintronics, optics, and electronics, our model presents a novel approach to studying topological materials bypassing time and resource-intensive density functional theory calculations and is expected to be a pioneer in the search for new topologically exquisite materials.

Methods

Dataset

The dataset for this study has been collected from the Topological Quantum Chemistry database (https://www.topologicalquantumchemistry.org/, https://www.cryst.ehu.es/)4850. The final dataset consists of 33800 material entries: around 15000 topologically trivial materials, 5600 topological insulators, and 13250 topological semimetals. The topological material database is collected through web scraping from the topological quantum chemistry website. Crystallographic Information File (CIF) files51 along with other physical and topological properties such as space group, topological invariant, etc. were also collected for input. The collected dataset was cleansed to eliminate empty CIF files and erroneous parameters. Next, the dataset was converted to Atomic Simulation Environment format (VASP POSCAR) for compatibility with our model. The histogram plots of the frequency of elements in a particular class of topological materials are presented in Fig. 7. There is no discernible difference between the element distribution of topological insulators, topological semimetals, and topologically trivial materials. Therefore, we can conclude that the presence of a certain class of elements does not bear statistical significance on the topological property prediction of a material.

Figure 7.

Figure 7

Histogram plot demonstrating the elemental distribution in topological semimetals, trivials, and topological insulators.

Further insight can be obtained from Fig. 8 regarding the physically differentiable characteristics of topological and trivial materials. Figure 8a and b depict the discernible attributes in the bandgap and crystal structures between trivial and non-trivial materials. Evidently, topological materials are low-bandgap materials with predominant cubic or hexagonal crystal symmetry. On the other hand, topologically trivial materials show a probability of having an orthorhombic or monoclinic structure with a wide range of bandgaps.

Figure 8.

Figure 8

Statistical analysis performed on topological and trivial materials. The distribution of band gap is exhibited in (a), while (b) depicts the variation in crystal structure for the three material classes.

Implementation details

The graph convolutional neural network and atom-specific persistent homology-based ensemble model is implemented in PyTorch52. Pytorch Geometric53, a specific library to design and implement graph neural networks, has been adopted in this work. For the graph convolutional neural network, atom feature length and neighbor feature length have been taken as 64 with 1 hidden layer and 128-dimensional hidden feature. Maximum 12 neighbors with a cutoff radius of 15 Å are considered in the generation of graphs. 3 graph convolutional layers are incorporated in the model and they are designed with concatenated atomic feature and neighbor atom feature being passed through a fully connected layer, batch normalization layer with succeeding gating function, and sigmoid activation function.

In the persistent homological feature generation process, given cells are enlarged to a cutoff radius of 8Å and the betti number for each structure is calculated accordingly. Persistent barcodes of the dataset are generated using Ripser package44 and five statistical features, i.e. minima, maxima, average, standard deviation, summation of birth, death, and persistent length, are extracted. In the case of our implementation, a 3115-dimensional persistent homological feature is produced for each material and these features are processed through a series of fully connected layers, batch normalization layers, and rectified linear unit activation layers, eventually reducing to a dimension of 64 and merged with output features of graph convolutional neural network. Dropout layers are used as regularizers to ensure homogeneity of information distribution among the units. Concatenated features of dimension 192 are finally passed through a shallow classification layer with a log softmax activation function to determine the topological class of each material.

The model has been instructed to minimize the Negative Log-Likelihood loss function54 using the Adaptive Moment Optimization (Adam) optimizer55 with a learning rate of 0.001 and zero weight decay and trained for 50 epochs. The learning rate is controlled to reach the minima in the loss landscape using step learning rate scheduling, with a 10-fold reduction in learning rate in reaching 30 and 40 epoch milestones. Accuracy, precision, recall, and F1 scores for model evaluation have been calculated using the Scikit-learn library56.

Training scheme

The deep learning models of this work have been implemented on computers of the Robert Noyce Simulation Laboratory at Bangladesh University of Engineering and Technology. The proposed model, along with the reference models, are trained on an 8-core Intel Core i7 CPU and Nvidia GTX 1650 for 50 epochs with batch size of 256. The dataset is split into 80–10–10 ratios for training, testing, and validation purposes, respectively, and therefore experienced a total of 33800 training samples. Dataset validation has also been applied to ensure the stability and robustness of results, whereas dropout layers are introduced to avoid overfitting.57.

Acknowledgements

Financial grant to procure VASP is given by the Bangladesh University of Engineering and Technology (BUET). Computational facilities provided by BUET and Princeton University are duly acknowledged. The M.Z.H. group acknowledges primary support from the US Department of Energy, Office of Science, National Quantum Information Science Research Centers, Quantum Science Center (at Oak Ridge National Laboratory) and Princeton University; scanning tunneling microscopy instrumentation support from the Gordon and Betty Moore Foundation (GBMF9461) and support with theory work; and support from the US Department of Energy under the Basic Energy Sciences program (grant no. DOE/BES DE-FG-02-05ER46200) for the theory and sample characterization work, including photoemission spectroscopy.

Author contributions

M.S.H. and Q.D.M.K. conceived the project. A.R. and A.G.D. initiated the research work presented here. A.R. designed and trained the models. The analysis has been performed by A.R., A.G.D., and M.S.H. A.G.D., H.R., and A.R. prepared the draft with contributions from M.S.H., M.Z.H., and Q.D.M.K. The project has been supervised by M.S.H., M.Z.H., and Q.D.M.K.

Data availability

The data that support the findings of this study are available on the website https://topologicalquantumchemistry.com/. Furthermore, the complete dataset can be obtained from the corresponding author upon reasonable request.

Code availability

The code, model description, and training configurations used in this work will be available on https://github.com/xercxis/P_zeta.git.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov.18, 463–477. 10.1038/s41573-019-0024-5 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater.18, 435–441. 10.1038/s41563-019-0338-z (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Choi, J.-A. & Lim, K. Identifying machine learning techniques for classification of target advertising. ICT Express6, 175–180. 10.1016/j.icte.2020.04.012 (2020). [Google Scholar]
  • 4.Schmidt, J., Marques, M. R., Botti, S. & Marques, M. A. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater.5, 1–36. 10.1038/s41524-019-0221-0 (2019). [Google Scholar]
  • 5.Jha, D. et al. Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning. Nat. Commun.10, 1–12. 10.1038/s41467-019-13297-w (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chibani, S. & Coudert, F.-X. Machine learning approaches for the prediction of materials properties. APL Mater.8, 080701. 10.1063/5.0018384 (2020). [Google Scholar]
  • 7.Jiang, Y. et al. Topological representations of crystalline compounds for the machine-learning prediction of materials properties. npj Comput. Mater.7, 1–8. 10.1038/s41524-021-00493-w (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Peano, V., Sapper, F. & Marquardt, F. Rapid exploration of topological band structures using deep learning. Phys. Rev. X11, 021052. 10.1103/PhysRevX.11.021052 (2021). [Google Scholar]
  • 9.Giustino, F. et al. The 2021 quantum materials roadmap. J. Phys. Mater.3, 042006. 10.1088/2515-7639/abb74e (2021). [Google Scholar]
  • 10.Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev.140, A1133 (1965). [Google Scholar]
  • 11.Tao, Q., Xu, P., Li, M. & Lu, W. Machine learning for perovskite materials design and discovery. npj Comput. Mater.7, 1–18. 10.1038/s41524-021-00495-8 (2021). [Google Scholar]
  • 12.Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput.13, 5255–5264. 10.1021/acs.jctc.7b00577 (2017). [DOI] [PubMed] [Google Scholar]
  • 13.Xue, D. et al. Accelerated search for materials with targeted properties by adaptive design. Nat. Commun.7, 1–9. 10.1038/ncomms11241 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B96, 024104. 10.1103/PhysRevB.96.024104 (2017). [Google Scholar]
  • 15.Legrain, F., Carrete, J., van Roekeghem, A., Curtarolo, S. & Mingo, N. How chemical composition alone can predict vibrational free energies and entropies of solids. Chem. Mater.29, 6220–6227. 10.1021/acs.chemmater.7b00789 (2017). [Google Scholar]
  • 16.Stanev, V. et al. Machine learning modeling of superconducting critical temperature. npj Comput. Mater.4, 1–14. 10.1038/s41524-018-0085-8 (2018). [Google Scholar]
  • 17.Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine learning in materials informatics: Recent applications and prospects. npj Comput. Mater.3, 1–13. 10.1038/s41524-017-0056-5 (2017). [Google Scholar]
  • 18.Seko, A., Hayashi, H., Nakayama, K., Takahashi, A. & Tanaka, I. Representation of compounds for machine-learning prediction of physical properties. Phys. Rev. B95, 144110. 10.1103/PhysRevB.95.144110 (2017). [Google Scholar]
  • 19.LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444. 10.1038/nature14539 (2015). [DOI] [PubMed] [Google Scholar]
  • 20.Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater.31, 3564–3572. 10.1021/acs.chemmater.9b01294 (2019). [Google Scholar]
  • 21.Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2020.2978386 (2020). [DOI] [PubMed] [Google Scholar]
  • 22.Fung, V., Zhang, J., Juarez, E. & Sumpter, B. G. Benchmarking graph neural networks for materials chemistry. npj Comput. Mater.7, 1–8. 10.1038/s41524-021-00554-0 (2021). [Google Scholar]
  • 23.Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett.120, 145301. 10.1103/PhysRevLett.120.145301 (2018). [DOI] [PubMed] [Google Scholar]
  • 24.Park, C. W. & Wolverton, C. Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery. Phys. Rev. Mater.4, 063801. 10.1103/PhysRevMaterials.4.063801 (2020). [Google Scholar]
  • 25.Louis, S.-Y. et al. Graph convolutional neural networks with global attention for improved materials property prediction. Phys. Chem. Chem. Phys.22, 18141–18148. 10.1039/D0CP01474E (2020). [DOI] [PubMed] [Google Scholar]
  • 26.Karamad, M. et al. Orbital graph convolutional neural network for material property prediction. Phys. Rev. Mater.4, 093801. 10.1103/PhysRevMaterials.4.093801 (2020). [Google Scholar]
  • 27.Moore, J. E. The birth of topological insulators. Nature464, 194–198. 10.1038/nature08916 (2010). [DOI] [PubMed] [Google Scholar]
  • 28.Qi, X.-L. & Zhang, S.-C. Topological insulators and superconductors. Rev. Mod. Phys.83, 1057. 10.1103/RevModPhys.83.1057 (2011). [Google Scholar]
  • 29.Claussen, N., Bernevig, B. A. & Regnault, N. Detection of topological materials with machine learning. Phys. Rev. B101, 245117. 10.1103/PhysRevB.101.245117 (2020). [Google Scholar]
  • 30.Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–1232 (2001).
  • 31.Acosta, C. M. et al. Analysis of topological transitions in two-dimensional materials by compressed sensing. Preprint at arXiv:1805.10950 (2018).
  • 32.Sun, N., Yi, J., Zhang, P., Shen, H. & Zhai, H. Deep learning topological invariants of band insulators. Phys. Rev. B98, 085402. 10.1038/s41586-019-0954-4 (2018). [Google Scholar]
  • 33.Zhang, P., Shen, H. & Zhai, H. Machine learning topological invariants with neural networks. Phys. Rev. Lett.120, 066401. 10.1103/PhysRevLett.120.066401 (2018). [DOI] [PubMed] [Google Scholar]
  • 34.Otter, N., Porter, M. A., Tillmann, U., Grindrod, P. & Harrington, H. A. A roadmap for the computation of persistent homology. EPJ Data Sci.6, 1–38. 10.1140/epjds/s13688-017-0109-5 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Cang, Z. & Wei, G.-W. Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction. Int. J. Numer. Methods Biomed. Eng.34, e2914. 10.1002/cnm.2914 (2018). [DOI] [PubMed] [Google Scholar]
  • 36.Gameiro, M. et al. A topological measurement of protein compressibility. Jpn. J. Ind. Appl. Math.32, 1–17. 10.1007/s13160-014-0153-5 (2015). [Google Scholar]
  • 37.Xia, K., Feng, X., Tong, Y. & Wei, G. W. Persistent homology for the quantitative prediction of fullerene stability. J. Comput. Chem.36, 408–422. 10.1002/jcc.23816 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Isayev, O. et al. Universal fragment descriptors for predicting properties of inorganic crystals. Nat. Commun.8, 15679. 10.1038/ncomms15679 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Blatov*, V. A. Voronoi–Dirichlet polyhedra in crystal chemistry: Theory and applications. Crystallogr. Rev.10, 249–318. 10.1080/08893110412331323170 (2004).
  • 40.Cordero, B. et al. Covalent radii revisited. Dalton Trans.10.1039/B801115J (2008). [DOI] [PubMed] [Google Scholar]
  • 41.Jiang, Y. et al. Topological representations of crystalline compounds for the machine-learning prediction of materials properties. npj Comput. Mater.10.1038/s41524-021-00493-w (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B47, 558 (1993). [DOI] [PubMed] [Google Scholar]
  • 43.Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci.6, 15–50. 10.1016/0927-0256(96)00008-0 (1996). [DOI] [PubMed] [Google Scholar]
  • 44.Tralie, C., Saul, N. & Bar-On, R. Ripser.py: A lean persistent homology library for python. J. Open Source Softw.3, 925 (2018). [Google Scholar]
  • 45.Louis, S.-Y. et al. Graph convolutional neural networks with global attention for improved materials property prediction. Phys. Chem. Chem. Phys.22, 18141–18148. 10.1039/D0CP01474E (2020). [DOI] [PubMed] [Google Scholar]
  • 46.Omee, S. S. et al. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns3, 100491. 10.1016/j.patter.2022.100491 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater.31, 3564–3572. 10.1021/acs.chemmater.9b01294 (2019). [Google Scholar]
  • 48.Bradlyn, B. et al. Topological quantum chemistry. Nature547, 298–305. 10.1038/nature23268 (2017). [DOI] [PubMed] [Google Scholar]
  • 49.Vergniory, M. et al. A complete catalogue of high-quality topological materials. Nature566, 480–485. 10.1038/s41586-019-0954-4 (2019). [DOI] [PubMed] [Google Scholar]
  • 50.Vergniory, M. G. et al. All topological bands of all nonmagnetic stoichiometric materials. Science376, eabg9094 (2022). [DOI] [PubMed] [Google Scholar]
  • 51.Hall, S. R., Allen, F. H. & Brown, I. D. The crystallographic information file (CIF): A new standard archive file for crystallography. Acta Crystallogr. A47, 655–685 (1991). [Google Scholar]
  • 52.Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst.32, 8024–8035. 10.5555/3454287.3455008 (2019). [Google Scholar]
  • 53.Fey, M. & Lenssen, J. E. Fast graph representation learning with pytorch geometric. Preprint at arXiv:1903.02428 (2019).
  • 54.Theodoridis, S. Classification: A Tour of the Classics 2nd edn. (Academic Press, London, 2020). [Google Scholar]
  • 55.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR (2015).
  • 56.Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011). [Google Scholar]
  • 57.Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A Simple way to prevent neural networks from overfitting. J. Mach. Learn. Res.15, 1929–1958 (2014). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available on the website https://topologicalquantumchemistry.com/. Furthermore, the complete dataset can be obtained from the corresponding author upon reasonable request.

The code, model description, and training configurations used in this work will be available on https://github.com/xercxis/P_zeta.git.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES