Abstract

Crystal structure prediction is one of the major unsolved problems in materials science. Traditionally, this problem is formulated as a global optimization problem for which global search algorithms are combined with first-principles free energy calculations to predict the ground-state crystal structure of a given material composition. These ab initio algorithms are currently too slow for predicting complex material structures. Inspired by the AlphaFold algorithm for protein structure prediction, herein, we propose AlphaCrystal, a crystal structure prediction algorithm that combines a deep residual neural network model for predicting the atomic contact map of a target material followed by three-dimensional (3D) structure reconstruction using genetic algorithms. Extensive experiments on 20 benchmark structures showed that our AlphaCrystal algorithm can predict structures close to the ground truth structures, which can significantly speed up the crystal structure prediction and handle relatively large systems.
Introduction
The periodic crystal structures of inorganic materials determine the many unique and exotic functions of functional devices such as lithium batteries, quantum computers, solar panels, and chemical catalysts. While it is easy to compose a material with chemically reasonable formula or to generate millions of formulas with charge neutrality and electronegativity balance using modern generative machine learning algorithms such as MATGAN,1 it is notoriously challenging to predict the crystal structure from a given chemical composition,2 which however is required to check its thermodynamic and mechanical stability or their synthesizability.3−5 With the crystal structure of a chemical substance, many physicochemical properties can be predicted reliably and routinely using first-principles calculation or machine learning models.6
Due to its importance in chemistry and condensed matter physics, crystal structure prediction has been investigated intensively for more than 30 years.7−16 In the crystal structure prediction (CSP) problem,17 the goal is to find a ground-state structure (in terms of all atomic coordinates of the atoms in a unit cell) with the lowest free energy for a given chemical composition (or a chemical system with variable compositions) at a given pressure–temperature condition. It is assumed that atomic configurations with lower free energy correspond to a more stable arrangement of atoms and the materials will be more synthesizable. One of the simplest and most widely used approaches for CSP is the template-based or element substitution approach in which an existing crystal structure with a similar formula is first identified and then some atoms are replaced with other types of elements. The replacement can either be based on personal heuristics or guided by machine-learned substitution rules in terms of element combination patterns11,18 or atomic fingerprints that describe coordination topology15 or other chemical patterns19 around unique crystallographic sites. Template-based approaches are widely used in discovering new materials such as lithium-ion cathode materials19 and heteroanionic compounds.20 But they have a major limitation in their inability to generate new crystal structure types.
The majority of work on the CSP problem is focused on ab initio approaches, which try to search the atomic configuration space to locate the ground-state structure guided by the first-principles calculations of the free energy of candidate structures.8−14 These approaches use a variety of search/optimization algorithms such as random sampling, simulated annealing, minima hopping, basin hopping, metadynamics, genetic algorithms, and particle swarm optimization to achieve systematic search while overcoming the local minima due to energy barriers in the search landscape. They have been successfully applied to discover a series of new materials as summarized in refs (16, 21). To improve the sampling efficiency and save the costly DFT calculations, a variety of strategies have been proposed such as exploiting symmetry22 and pseudosymmetry,17 smart variation operators, clustering, and machine learning interatomic potentials with active learning.23
Despite their wide successes, the scalability and applicability of these ab initio CSP algorithms are severely limited due to their dependence on the costly DFT calculations of free energies for sampled structures. A quick check of their success stories reported in the literature16,21,24 can find that most of their discovered crystal materials are binary materials or those with less than 20 atoms in the unit cell. Our practice with these software shows that the algorithms tend to waste a lot of DFT calculations to reach the local areas of the ground-state structures, which may be addressed by seeding them with an approximate structure close to the target. With a limited DFT calculation budget, how to efficiently sample the atom configurations becomes a key issue and the scalability of CSP remains an unsolved issue.
Here, we propose a novel deep knowledge-guided ab initio approach for crystal structure prediction, which is inspired by the recent successes of deep learning approaches for protein structure prediction25−27 led by the famous AlphaFold.26 To our knowledge, our AlphaCrystal algorithm is the first method for crystal contact map prediction in CSP. We use deep residual neural networks28 for contact prediction, which learns the intricate relationships of bonding relationships of atoms. The advantage of AlphaCrystal is that it can exploit the rich atom interaction distribution or other geometric patterns or motifs29 existing in a large number of known crystal structures to predict the atomic contact map. This complex hidden knowledge can be learned as deep physical knowledge by our deep neural network, which can be exploited by the contact map prediction and atomic coordinate reconstruction process. We train the deep neural network using a subset of materials with solved structures from the Materials Project database and then test it on a set of test samples. Our experimental results show that our method when trained with 80% MP samples can achieve almost 100% contact map accuracy out of 48% of the test set.
Our contributions can be summarized as follows:
We propose AlphaCrystal, a deep learning and genetic algorithm-based approach for crystal structure prediction using the predicted atomic contact map as a knowledge-guided methodology for addressing the crystal structure prediction problem.
We evaluated our algorithm over 20 benchmark crystal targets and find that it can predict their contact maps with high accuracy, which further leads to successful prediction of their crystal structures.
We applied AlphaCrystal to predict 10 nontrivial crystal structures and verified their stability by DFT calculations.
Methods
AlphaCrystal Framework for Contact Map-Based Crystal Structure Prediction
In our previous work,30 we showed that given a correct contact map along with the space group and lattice constants, a genetic algorithm can be used to reconstruct the atomic coordinates (so its structure) with high accuracy. Here, we propose a deep learning-based model for predicting the contact map given its composition only. Based on this contact map predictor, we propose AlphaCrystal, a new framework for knowledge-guided crystal structure prediction as shown in Figure 1. The architecture is composed of three main modules: (1) a contact map predictor, a space group predictor,31 and a lattice constant predictor;32 (2) the contact map-based atomic coordinate reconstruction algorithm;30 and (3) DFT relaxation-based local search or free energy-based ab initio search.
Figure 1.
AlphaCrystal framework for contact map-based crystal structure prediction.
Deep Learning Model for Crystal Material Atomic Contact Prediction
One of the major components of the AlphaCrystal algorithm is the deep residual network-based predictor of contact maps. As shown in Figure 2, the whole network is composed of three parts: the first part uses a sequence of stacked one-dimensional (1D) residual network layers to learn convoluted atom site features. The input to this module is the sequence of element symbols in the input formula where L is the number of atoms in the unit cell. Each element is represented by 11 features including Mendeleev number, unpaired electrons, ionization energies, covalent radius, heat of formation, dipole polarizability, average ionic radius, group number and row number in the periodic table, Pauling electronegativity, and atomic number.
Figure 2.
Deep neural network model for crystal material contact map prediction.
The second part of our contact map predictor is the conversion of convolved site features into pairwise feature maps with dimensions of L × L × 3n (outer concatenation), where n indicates 128. The third module is composed of a sequence of stacked two-dimensional (2D) residual network layers, which maps the paired site features to predicted contact maps. Moreover, a batch normalization33 and a nonlinear transformation34 succeed in each convolutional layer. Batch normalization makes the training faster and stable by recentering and rescaling each layer at each minibatch. The output of the 1D residual network is a 2D matrix with dimensions of L × n, which is the learnt convoluted interatom site features hierarchically. The learnt intersite features are converted to a three-dimensional (3D) matrix as the inputs to a 2D residual network.
Residual Network Block
Figure 2 right pane shows the architecture of the residual network block used in our two residual network modules. In each block, there are two convolutional layers, a batch normalization, and two nonlinear transformations. The nonlinear transformation is performed by the ReLU activation function max(X, 0).34 Let F(Xl) denote the output of the block, and then Xl+1 is max(F(Xl) + Xl,0). The addition of Xl and F(Xl) is nonlinearly transformed. We use nine building blocks for each module in our main architecture. The number of filters is doubled per three blocks. The initial numbers of filters for the first and second modules are 32 and 256, respectively.
Contact Map Generation and Loss Function
We use the following rule to convert a crystal structure into a contact map matrix M: for each pair of atoms A and B in the unit cell, if their distance is within the range of [covalence radiusA + covalence radiusB – 0.4, covalence radiusA + covalence radiusB + 0.4], then there is a bond between atom A and atom B and the corresponding M[i,j] is set to 1, otherwise it is set to 0. When two atoms are both metal atoms, we set their contact map entry as 0 too.
Since a contact map is a binary matrix, we use the cross-entropy loss as the loss function for neural network training. It is defined as follows
| 1 |
where N is the maximum length of the formula, which is set to 12 and 24 in our experiments; yi is the true contact map label at position i, and ŷi is the predicted probability scores at position i.
Training and Dealing with Crystals of Different Atom Sites
To deal with varying sizes, we set the maximum number of atoms in a formula as L, which is set to 12 and 24 in our experiments. When a formula has fewer atoms, we create tensors by padding zeros. We sort all samples by their atom number and then partition them into minibatches so that for each minibatch, the sizes are similar.
Predictors for Space Group and Lattice Constants
For each formula, we use CryspNet31 to predict the top two crystal systems and the top 5 space groups for each crystal system. We then use MLatticeABC32 to predict the lattice constants for each formula. Next, we use the deep neural network model as shown in Figure 2 to predict the contact map.
3D Crystal Structure Reconstruction Algorithm
With all of the predicted information including the contact map, the space group, and the lattice constants, we then used CMCrystal,30 a genetic algorithm for contact map-based atomic position reconstruction, to predict the crystal structure for a given formula. We set the number of evaluations to be 100,000 or 1000 generations for a population size of 100 of the GA. The mutation rate is set to 0.001. Compared to the previous version of the CMCrystal algorithm, we have added an additional term to the GA optimization objective function, which is the fitness of valid bonds. It is defined as follows
| 2 |
where short bonds are defined as any bond with a length less than the sum of two neighbor atoms’ covalent radius minus 0.4 Å; valid bonds are those with lengths within the range of [covalence radiusA + covalence radiusB – 0.4, covalence radiusA + covalence radiusB + 0.4] for the atom pair A and B. The final fitness is defined as the product of contact map fitness and valid bond fitness.
Evaluation Metrics
The objective function for contact map-based structure reconstruction is defined as the dice coefficient, which is shown in the following equation
| 3 |
where A is the predicted contact map matrix and B is the true contact map of a given composition, both only contain 1/0 entries. A ∩ B denotes the common elements of A and B, —g— represents the number of elements in a matrix, • denotes the dot product, and Sum(g) is the sum of all matrix elements. The dice coefficient essentially measures the overlap of two matrix samples, with values ranging from 0 to 1, with 1 indicating perfect overlap. We also call this performance measure contact map accuracy.
To evaluate the reconstruction performance of different algorithms, we can use the dice coefficient as one evaluation criterion, which, however, does not indicate the final structure similarity between the predicted structure and the true target structure. To address this, we define the root mean square distance (RMSD) and mean absolute error (MAE) of two structures as follows
![]() |
4 |
![]() |
5 |
where n is the number of independent atoms in the target crystal structure. For symmetrized cif structures, n is the number of independent atoms of the set of Wyckoff equivalent positions. For regular cif structures, it is the total number of atoms in the compared structure. vi and wi are the corresponding atoms in the predicted crystal and the target crystal structure. It should be pointed out that in the experiments of this study, the only constraint for the optimization is the contact map; it is possible that the predicted atom coordinates are oriented differently from the target atoms in terms of coordinate systems. To avoid this complexity, we compare the RMSD and MAE for all possible coordinate system matching such as (x,y,z -->x,y,z), (x,y,z -->x,z,y), etc., and report the lowest RMSD and MAE.
We also calculate root mean square (RMS) distances as a performance measure of the structure prediction using Pymatgen’s structure matcher module. We use the getrmsdist function with a fractional length tolerance ltol of 0.6, a site tolerance stol of 0.6, and an angle tolerance in degrees angletol of 20 to compute the displacement between two structures. These threshold values are much larger than the defaults due to the range of discrepancy between the predicted structures and the ground truth ones.
DFT Validation of Predicted Structures
The predicted structures were relaxed using density functional theory (DFT) based on the Vienna Ab initio Simulation Package (VASP)35−38 in which projected augmented wave (PAW) pseudopotentials were implemented.39,40 A plane-wave cutoff energy of 400 eV was considered with the Perdew–Burke–Ernzerhof (PBE) exchange–correlation functional of the generalized gradient approximation (GGA).41,42 The structural optimization was performed with energy and force criteria of 1.0 × 10–5 eV/atom and 10–2 eV/Å, respectively. The Brillouin zone integrations were carried out with Γ-centered Monkhorst–Pack k-meshes.
Results and Discussion
Training and Test Data
The contact map predictor in the present study is trained and tested using the MP database, which is a database of inorganic crystal structures with DFT-calculated properties consisting of almost all elements in the periodic table and is freely accessible through the REST API interface. A total of 126,336 unique crystal structure data points queried in November 2020 (consisting of 46,781 synthesized crystals associated with the ICSD identifiers and 77,734 theoretically proposed virtual crystals) were used for our learning model.
The training data set is downloaded from Materials Project using Pymatgen API. We only choose a crystal structure with the least formation energy if the corresponding formula has multiple structures. Materials with only metal elements are removed in this manuscript. We set the maximum number of atoms in the unit cell to be 12, which contains 11,355 samples.
Overall Contact Map Prediction Performance
In this experiment, we hold out 1136 known materials in the data set as the test set and use the remaining 10,219 samples as the training set for training our deep neural network model for contact map prediction. We set the number of epochs to 125, Adam optimizer43 is used to update model parameters, and the learning rate is set to 0.0001. After training, the model is used to predict the contact maps of the test samples, and their contact map accuracy scores are plotted in Figure 3. The average and standard deviation of prediction accuracy are 0.927 and 0.090, respectively. It is impressive to see that almost 461 out of 1136 test samples have the contact map predicted with 100% accuracy. For more than 91% of test samples, the contact map accuracy is higher than 80%, indicating that the deep contact map predictor has captured the bonding relationships of atoms in the crystal structures. Figure 4 shows two examples of the true contact maps and the predicted contact maps for Dy4S4Cl4 and As8Ir4.
Figure 3.
Contact map accuracy score distribution of predicted contact maps of test samples. For more than 40% of the 1136 test samples, the contact map accuracy is 100%.
Figure 4.
Contact maps for Dy4S4Cl4 and As8Ir4. Yellow cells indicate bonds, while black cells show no bonds.
To further examine the contact map prediction performance, Table 1 shows the contact map accuracy for 10 structures of different space groups with different numbers of atoms ranging from 6 to 12 atoms in their unit cells. We find that for binary materials, the contact map accuracy scores range from 0.6 to 1.0 for Mg2P8 and Ag2F4. The number of independent sites is not the only determining factor, as Pd4S8 has only two independent sites but its contact accuracy is 0.694, which is lower than those of the other binary structures with three independent sites such as As8Ir4 and Ge4F8. For ternary materials, Table 1 shows that our model can also achieve high contact map accuracy over Si4Pt4Se4 and Ta4N4O4.
Table 1. Performances of AlphaCrystal in Terms of Contact Map Prediction Accuracy.
| target | mp_id | no. of sites | atom# in the unit cell | # of variables | space group | contact map accuracy |
|---|---|---|---|---|---|---|
| Ag2F4 | mp-7715 | 2 | 6 | 6 | 14 | 1.0 |
| Mg2P8 | mp-384 | 3 | 10 | 9 | 14 | 0.6 |
| Ru2F8 | mp-974434 | 3 | 10 | 9 | 14 | 0.86 |
| As8Ir4 | mp-15649 | 3 | 12 | 9 | 14 | 0.819 |
| Ge4F8 | mp-7595 | 3 | 12 | 9 | 19 | 0.819 |
| Pd4S8 | mp-13682 | 2 | 12 | 6 | 61 | 0.694 |
| Dy4S4Cl4 | mp-561307 | 3 | 12 | 9 | 14 | 0.875 |
| Si4Pt4Se4 | mp-1103261 | 3 | 12 | 9 | 29 | 1.0 |
| Ta4N4O4 | mp-4165 | 3 | 12 | 9 | 14 | 1.0 |
Contact Map-Based Crystal Structure Prediction: Benchmark Results
Here, we evaluate how the predicted contact maps by our deep neural network model can be used to edit the crystal structures using our CMCrystal algorithm,30 in which a genetic algorithm is used to search the fractional coordinates of the crystals with the specified space group and contact map by minimizing the contact map distance error. We select 10 target structures, as shown in Table 2, and then predict their contact maps using our neural network model. Next, we use the CryspNet algorithm to predict their crystal systems and top 5 space groups. We then use MLatticeABC to predict the lattice parameters. However, we find that CryspNet is not reliable enough to always predict the ground truth space group as its top 5 predictions. Considering that there are only a limited number of space groups, it is possible to exhaustively use each of the possible space groups combined with the predicted contact map and lattice parameters to reconstruct the structures and then pick the structure with the lowest DFT-calculated formation energy. This can be done using 270 jobs (corresponding to 270 possible space groups) on Linux clusters in parallel. For simplicity, here, we just directly specify the ground truth space groups and combine them with the predicted contact maps and lattice parameters to do structure reconstruction using the CMCrystal algorithm. Then, we calculate the corresponding RMSD, MAE, and RMS errors for all the reconstructed structures as shown in Table 2(last three columns).
Table 2. Structure Prediction Performance of AlphaCrystal with Ground Truth Space Groups.
| target | mp_id | atom# in the unit cell | given space group | target space group | predicted contact map accuracy | reconstruct contact map accuracy | RMSD | MAE | RMS |
|---|---|---|---|---|---|---|---|---|---|
| Cr3O5 | mp-1096920 | 8 | 1 | 1 | 0.939 | 0.917 | 0.411 | 0.314 | 0.387 |
| Pb4O4 | mp-550714 | 8 | 29 | 29 | 0.696 | 0.952 | 0.376 | 0.324 | 0.398 |
| Co4P8 | mp-14285 | 12 | 14 | 14 | 0.727 | 0.968 | 0.196 | 0.156 | none |
| Ir4N8 | mp-415 | 12 | 14 | 14 | 0.773 | 0.952 | 0.145 | 0.128 | none |
| V2Cl10 | mp-1101909 | 12 | 2 | 2 | 0.848 | 0.889 | 0.212 | 0.156 | 0.466 |
| Co2As2S2 | mp-553946 | 6 | 31 | 31 | 0.939 | 0.857 | 0.196 | 0.146 | 0.404 |
| V2O1F7 | mp-765500 | 10 | 1 | 1 | 0.939 | 1.0 | 0.336 | 0.255 | none |
| V4O4F4 | mp-754589 | 12 | 92 | 92 | 0.803 | 0.889 | 0.196 | 0.171 | 0.382 |
| Fe4As4Se4 | mp-1101894 | 12 | 14 | 14 | 1.0 | 1.0 | 0.193 | 0.163 | 0.531 |
| Mn4Cu4P4 | mp-20203 | 12 | 62 | 62 | 0.879 | 0.941 | 0.146 | 0.117 | 0.560 |
There are several interesting observations. First, we found that for all these binary and ternary benchmark materials, our algorithm has achieved good contact map prediction accuracy as shown in column 6 of Table 2, which ranges from 0.696 to 1.0. The structure prediction performances are shown in column 8 with the lowest RMSD of 0.145 for IrrN8. In terms of MAE, the best performance is on Mn4Cu4P4 with an MAE of 0.117 despite the predicted contact map accuracy not being the highest with a score of 0.879. We also tried to calculate the root mean square error as defined by the Pymatgen routine and found that it cannot calculate successfully for some of the structures, while for others, the distances were not consistent with our RMSE/MAE results, possibly because the deviations of the predicted structures were too large to the ground truth structures to calculate them using their algorithm. Another interesting observation is from the comparison of the predicted contact map accuracy and the reconstruction contact map accuracy, which evaluates how the contact map from predicted structures matches the contact map from ground truth structures. We find that in most cases, the GA-based optimization algorithm CMCrystal can achieve high accuracy in reconstructing the predicted contact maps with accuracy ranging from 0.857 to 1.0.
Figure 5 shows three target structures and their predicted ones. For Pb4O4, our algorithm achieves a contact map accuracy of 95.24% and an RMSD of 0.376. For the target structure V4O4F4, even though the contact map accuracy is lower (88.89%), the RMSD error is lower with a score of 0.196, and the overall structures are similar. A similar RMSD score has also been achieved for the target Co2As2S2.
Figure 5.
Benchmark crystal structures predicted by AlphaCrystal. (a–c) Target structures. (d) Predicted structure of Pb4O4 with contact map accuracy = 95.24% and RMSD = 0.376; (e) predicted structure of V4O4F4 with contact map accuracy = 88.89% and RMSD = 0.196; and (f) predicted structure of Co2As2S2 with contact map accuracy = 85.71% and RMSD = 0.196.
Discovery of New Structures Using AlphaCrystal
In our previous work, we developed MatGAN,1 a deep generative machine learning for large-scale generation of new hypothetical material compositions with chemical validity and high potential of being stable. Here, we use MATGAN to generate 5 million hypothetical material compositions and then apply charge neutrality check and electronegativity balance check. Then, we train a composition-based formation energy predictor using Roost, a composition, and graph-based predictor. We then filter out those candidates with La or Ac elements. We use the trained free energy predictor to screen the top 100 compositions with the lowest predicted formation energies and with the number of atoms less than 12 and the number of elements in the compounds to be 2 or 3.
For the selected 100 candidate materials, we use CryspNet,31 a composition and deep neural network predictor for crystal systems and space groups to predict the top 2 crystal systems. For each predicted crystal system, we predict the top 5 space groups for the candidate. So for each composition, we have 10 candidate structures of different space groups. For each of such structure candidates, we apply the MLatticeABC algorithm to predict its lattice constants a, b, and c (Table 3).
Table 3. Structural Information and Formation Energy of Six Predicted New Structures.
| material | space group | a | b | c | α | β | γ | Eform (ev/atom) |
|---|---|---|---|---|---|---|---|---|
| Al3As4 | 215 | 5.3666 | 5.3666 | 5.3666 | 90 | 90 | 90 | –0.045 |
| CrCu3S4 | 215 | 5.459 | 5.459 | 5.459 | 90 | 90 | 90 | –0.356 |
| CrRh3S4 | 164 | 3.5322 | 3.5322 | 11.3559 | 90 | 90 | 120 | –0.287 |
| Ge3P4 | 164 | 3.9036 | 3.9036 | 14.2124 | 90 | 90 | 120 | 0.057 |
| Li3LaS4 | 225 | 6.685 | 6.685 | 6.685 | 90 | 90 | 90 | –0.991 |
| Li3MnS4 | 225 | 8.3957 | 8.3957 | 8.3957 | 90 | 90 | 90 | –0.822 |
| Li3ZnS4 | 215 | 5.7787 | 5.7787 | 5.7787 | 90 | 90 | 90 | –0.481 |
For the above 100 × 10 = 1000 candidate structures, we use the contact map predictor to predict their contact maps and use the CMCrystal30 algorithm to predict their crystal structures. For each of the 10 candidate structures of a given formula, we use a graph neural network-based model to predict their formation energy and pick the structure with minimum formation energy as its final structure. Out of the 100 predicted structures, we pick the top 7 with the lowest predicted formation energy to do DFT relaxation and phonon calculation to further determine their stability. The structures of these calculations are shown in Figure 6. Out of the seven candidate structures, one structure is dynamically stable, as shown, where there are no imaginary phonon frequencies (Figure 7).
Figure 6.
Predicted crystal structures after DFT relaxation by AlphaCrystal. (a) Al3As4 (Eform: −0.045 eV/atom). (b) CrCu3S4 (Eform: −0.356 eV/atom). (c) CrRh3S4 (Eform: −0.287 eV/atom). (d) Li3LaS4 (Eform: −0.991 eV/atom). (e) Li3MnS4 (Eform: −0.822 eV/atom). (f) Li3ZnS4 (Eform: −0.481 eV/atom).
Figure 7.
Structure and phonon dispersion of Li3MnS4, which is likely to be thermodynamically stable. (a) Structure (formation energy: −0.822 eV/atom). (b) Phonon dispersion.
Discussion
We have shown that deep learning models can be trained to predict atomic pairwise relationships, which can be further used to reconstruct atomic coordinates using genetic algorithms. A possible limitation is that the contact map itself may not be sufficient to guide the search of the coordinates for complex material structures. The pairwise distance matrix may be more informative than the binary contact map, as is the case in protein structure prediction. Moreover, this contact map-based CSP algorithm may be combined with an energy-based global optimization algorithm, especially with the progress of machine learning-based potentials.44
Conclusions
We propose AlphaCrystal, a deep residual neural network approach for crystal structure prediction by first predicting the contact map of atom pairs for a given material composition and then using it to predict its crystal structure using a genetic algorithm. Compared to the minimization of free energy during atomic configuration search in conventional ab initio CSP methods, our method takes advantage of the existing physical or geometric constraints (such as the symmetry of atom positions) of the existing crystal structures in material repositories. Our experiments show that our AlphaCrystal algorithm is able to reconstruct crystal structures for a large number of materials with diverse space groups by optimizing the placement of the atoms using the contact map matching as the objective for the given space group and stoichiometry. We also applied y-scrambling to shuffle the structures of the compositions and found that the model trained with the shuffled data set lost its contact map prediction power. Our predicted structures are so close to the target crystal structures; therefore, they can be used to seed the costly free energy minimization-based CSP algorithms for further structure refining. While we have demonstrated the feasibility of contact map-based prediction of structures from formulas, we do recognize that the structure reconstruction from only the contact maps is not sufficient for successful structure prediction for many formulas. Adding distance constraints may be the next step. Overall, we believe our AlphaCrystal can be a new kind of deep knowledge-guided approach for large-scale prediction of crystal structures, which is very useful in the high-throughput discovery of new materials using modern generative material design models.45
Acknowledgments
Research reported in this work was supported in part by NSF under grants 2110033, 1940099, and 1905775. The views, perspective, and content do not necessarily represent the official views of NSF.
Data Availability Statement
The data that support the findings of this study are openly available in the Materials Project database at http://www.materialsproject.org. The source code is available from https://github.com/usccolumbia/AlphaCrystal.
Author Present Address
∥ Visa Inc., Houston, Texas, United States
Author Contributions
Conceptualization: J.H.; methodology: J.H., Y.Z., W.Y., and Q.L.; software: Y.Z., W.Y., J.H., Q.L., and Y.S.; validation: J.H., E.S., Y.Z., and W.Y.; investigation: J.H., Y.Z., W.Y., Q.L., Y.S., E.S., and R.D.; resources: J.H.; data curation: J.H., Y.Z., and W.Y.; writing—original draft preparation: J.H., Y.Z., and E.S.; writing—review and editing: J.H., Y.Z., E.S., and R.D.; visualization: J.H., Y.Z., W.Y., and E.S.; supervision: J.H.; and funding acquisition: J.H.
The authors declare no competing financial interest.
References
- Dan Y.; Zhao Y.; Li X.; Li S.; Hu M.; Hu J. Generative adversarial networks (GAN) based efficient sampling of chemical composition space for inverse design of inorganic materials. npj Comput. Mater. 2020, 6, 84 10.1038/s41524-020-00352-0. [DOI] [Google Scholar]
- Maddox J. Crystals from first principles. Nature 1988, 335, 201. 10.1038/335201a0. [DOI] [Google Scholar]
- Jang J.; Gu G. H.; Noh J.; Kim J.; Jung Y. Structure-Based Synthesizability Prediction of Crystals Using Partially Supervised Learning. J. Am. Chem. Soc. 2020, 142, 18836–18843. 10.1021/jacs.0c07384. [DOI] [PubMed] [Google Scholar]
- Frey N. C.; Wang J.; Vega Bellido G. I.; Anasori B.; Gogotsi Y.; Shenoy V. B. Prediction of synthesis of 2D metal carbides and nitrides (MXenes) and their precursors with positive and unlabeled machine learning. ACS Nano 2019, 13, 3031–3041. 10.1021/acsnano.8b08014. [DOI] [PubMed] [Google Scholar]
- Aykol M.; Hegde V. I.; Hung L.; Suram S.; Herring P.; Wolverton C.; Hummelshøj J. S. Network analysis of synthesizable materials discovery. Nat. Commun. 2019, 10, 2018 10.1038/s41467-019-10030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie T.; Grossman J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 2018, 120, 145301 10.1103/PhysRevLett.120.145301. [DOI] [PubMed] [Google Scholar]
- Tsuneyuki S.; Tsukada M.; Aoki H.; Matsui Y. First-principles interatomic potential of silica applied to molecular dynamics. Phys. Rev. Lett. 1988, 61, 869. 10.1103/PhysRevLett.61.869. [DOI] [PubMed] [Google Scholar]
- Bush T. S.; Catlow C. R. A.; Battle P. D. Evolutionary programming techniques for predicting inorganic crystal structures. J. Mater. Chem. 1995, 5, 1269–1272. 10.1039/jm9950501269. [DOI] [Google Scholar]
- Glass C. W.; Oganov A. R.; Hansen N. USPEX—Evolutionary crystal structure prediction. Comput. Phys. Commun. 2006, 175, 713–720. 10.1016/j.cpc.2006.07.020. [DOI] [Google Scholar]
- Woodley S. M.; Catlow R. Crystal structure prediction from first principles. Nat. Mater. 2008, 7, 937–946. 10.1038/nmat2321. [DOI] [PubMed] [Google Scholar]
- Hautier G.; Fischer C.; Ehrlacher V.; Jain A.; Ceder G. Data mined ionic substitutions for the discovery of new compounds. Inorg. Chem. 2011, 50, 656–663. 10.1021/ic102031h. [DOI] [PubMed] [Google Scholar]
- Wang Y.; Lv J.; Zhu L.; Ma Y. CALYPSO: A method for crystal structure prediction. Comput. Phys. Commun. 2012, 183, 2063–2070. 10.1016/j.cpc.2012.05.008. [DOI] [Google Scholar]
- Curtis F.; Li X.; Rose T.; Vazquez-Mayagoitia A.; Bhattacharya S.; Ghiringhelli L. M.; Marom N. GAtor: a first-principles genetic algorithm for molecular crystal structure prediction. J. Chem. Theory Comput. 2018, 14, 2246–2264. 10.1021/acs.jctc.7b01152. [DOI] [PubMed] [Google Scholar]
- Avery P.; Toher C.; Curtarolo S.; Zurek E. XtalOpt Version r12: An open-source evolutionary algorithm for crystal structure prediction. Comput. Phys. Commun. 2019, 237, 274–275. 10.1016/j.cpc.2018.11.016. [DOI] [Google Scholar]
- Ryan K.; Lengyel J.; Shatruk M. Crystal structure prediction via deep learning. J. Am. Chem. Soc. 2018, 140, 10158–10168. 10.1021/jacs.8b03913. [DOI] [PubMed] [Google Scholar]
- Oganov A. R.; Pickard C. J.; Zhu Q.; Needs R. J. Structure prediction drives materials discovery. Nat. Rev. Mater. 2019, 4, 331–348. 10.1038/s41578-019-0101-8. [DOI] [Google Scholar]
- Lyakhov A. O.; Oganov A. R.; Stokes H. T.; Zhu Q. New developments in evolutionary structure prediction algorithm USPEX. Comput. Phys. Commun. 2013, 184, 1172–1182. 10.1016/j.cpc.2012.12.009. [DOI] [Google Scholar]
- Fischer C. C.; Tibbetts K. J.; Morgan D.; Ceder G. Predicting crystal structure by merging data mining with quantum mechanics. Nat. Mater. 2006, 5, 641–646. 10.1038/nmat1691. [DOI] [PubMed] [Google Scholar]
- Shen J.-X.; Horton M.; Persson K. A. A charge-density-based general cation insertion algorithm for generating new Li-ion cathode materials. npj Comput. Mater. 2020, 6, 161 10.1038/s41524-020-00422-3. [DOI] [Google Scholar]
- He J.; Yao Z.; Hegde V. I.; Naghavi S. S.; Shen J.; Bushick K. M.; Wolverton C. Computational Discovery of Stable Heteroanionic Oxychalcogenides ABXO (A, B= Metals; X= S, Se, and Te) and Their Potential Applications. Chem. Mater. 2020, 32, 8229–8242. 10.1021/acs.chemmater.0c01902. [DOI] [Google Scholar]
- Wang Y.; Lv J.; Li Q.; Wang H.; Ma Y.. CALYPSO method for structure prediction and its applications to materials discovery. In Handbook of Materials Modeling: Applications: Current and Emerging Materials; Springer, Cham, 2020; pp 2729–2756. [Google Scholar]
- Pretti E.; Shen V. K.; Mittal J.; Mahynski N. A. Symmetry-Based Crystal Structure Enumeration in Two Dimensions. J. Phys. Chem. A 2020, 124, 3276–3285. 10.1021/acs.jpca.0c00846. [DOI] [PubMed] [Google Scholar]
- Podryabinkin E. V.; Tikhonov E. V.; Shapeev A. V.; Oganov A. R. Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning. Phys. Rev. B: Condens. Matter Mater. Phys. 2019, 99, 064114. 10.1103/PhysRevB.99.064114. [DOI] [Google Scholar]
- Zhang L.; Wang Y.; Lv J.; Ma Y. Materials discovery at high pressures. Nat. Rev. Mater. 2017, 2, 17005 10.1038/natrevmats.2017.5. [DOI] [Google Scholar]
- Di Lena P.; Nagata K.; Baldi P. Deep architectures for protein contact map prediction. Bioinformatics 2012, 28, 2449–2457. 10.1093/bioinformatics/bts475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senior A. W.; Evans R.; Jumper J.; Kirkpatrick J.; Sifre L.; Green T.; Qin C.; Žıdek A.; Nelson A. W.; Bridgland A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. 10.1038/s41586-019-1923-7. [DOI] [PubMed] [Google Scholar]
- Zheng W.; Li Y.; Zhang C.; Pearce R.; Mortuza S.; Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins: Struct., Funct., Bioinf. 2019, 87, 1149–1164. 10.1002/prot.25792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He K.; Zhang X.; Ren S.; Sun J. In Identity Mappings in Deep Residual Networks; European Conference on Computer Vision; Springer, Cham, 2016; pp 630–645.
- Zhu Z.; Wu P.; Wu S.; Xu L.; Xu Y.; Zhao X.; Wang C.-Z.; Ho K.-M. An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs. J. Phys. Chem. C 2017, 121, 11891–11896. 10.1021/acs.jpcc.7b02486. [DOI] [Google Scholar]
- Hu J.; Yang W.; Dong R.; Li Y.; Li X.; Li S.. Contact Map Based +Crystal Structure Prediction Using Global Optimization. 2020, arXiv:2008.07016. arXiv.org e-Print archive. https://arxiv.org/abs/2008.07016.
- Liang H.; Stanev V.; Kusne A. G.; Takeuchi I.. CRYSPNet: Crystal Structure Predictions via Neural Network. 2020. arXiv:2003.14328. arXiv.org e-Print archive. https://arxiv.org/abs/2003.14328.
- Li Y.; Yang W.; Dong R.; Hu J.. MLatticeABC: Generic Lattice Constant Prediction of Crystal Materials using Machine Learning, 2020. arXiv:2010.16099. arXiv.org e-Print archive. https://arxiv.org/abs/2010.16099. [DOI] [PMC free article] [PubMed]
- Ioffe S.; Szegedy C.. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 2015. arXiv:1502.03167. arXiv.org e-Print archive. https://arxiv.org/abs/1502.03167.
- Krizhevsky A.; Sutskever I.; Hinton G. E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. 10.1145/3065386. [DOI] [Google Scholar]
- Kresse G.; Hafner J. ab initio molecular dynamics for liquid metals. Phys. Rev. B 1993, 47, 558–561. 10.1103/PhysRevB.47.558. [DOI] [PubMed] [Google Scholar]
- Kresse G.; Hafner J. ab initio molecular-dynamics simulation of the liquid-metal–amorphous-semiconductor transition in germanium. Phys. Rev. B 1994, 49, 14251–14269. 10.1103/PhysRevB.49.14251. [DOI] [PubMed] [Google Scholar]
- Kresse G.; Furthmüller J. Efficiency of ab initio Total Energy Calculations for Metals and Semiconductors Using a Plane-Wave Basis Set. Comput. Mater. Sci. 1996, 6, 15–50. 10.1016/0927-0256(96)00008-0. [DOI] [PubMed] [Google Scholar]
- Kresse G.; Furthmüller J. Efficient Iterative Schemes for ab initio Total-Energy Calculations Using a Plane-Wave Basis Set. Phys. Rev. B 1996, 54, 11169–11186. 10.1103/PhysRevB.54.11169. [DOI] [PubMed] [Google Scholar]
- Blöchl P. E. Projector Augmented-Wave Method. Phys. Rev. B 1994, 50, 17953–17979. 10.1103/PhysRevB.50.17953. [DOI] [PubMed] [Google Scholar]
- Kresse G.; Joubert D. From Ultrasoft Pseudopotentials to the Projector Augmented-Wave Method. Phys. Rev. B 1999, 59, 1758–1775. 10.1103/PhysRevB.59.1758. [DOI] [Google Scholar]
- Perdew J. P.; Burke K.; Ernzerhof M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996, 77, 3865–3868. 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]
- Perdew J. P.; Burke K.; Ernzerhof M. Generalized Gradient Approximation Made Simple [Phys. Rev. Lett. 77, 3865 (1996)]. Phys. Rev. Lett. 1997, 78, 1396. 10.1103/PhysRevLett.78.1396. [DOI] [PubMed] [Google Scholar]
- Kingma D. P.; Ba J.. Adam: A Method for Stochastic Optimization. 2014. arXiv:1412.6980. arXiv.org e-Print archive. https://arxiv.org/abs/1412.6980.
- Chen C.; Ong S. P. A universal graph deep learning interatomic potential for the periodic table. Nat. Comput. Sci, 2022, 2, 718–728. 10.1038/s43588-022-00349-3. [DOI] [PubMed] [Google Scholar]
- Fu N.; Wei L.; Song Y.; Li Q.; Xin R.; Omee S. S.; Dong R.; Siriwardane E. M. D.; Hu J. Material transformers: deep learning language models for generative materials design. Mach. Learn.: Sci. Technol. 2023, 4, 015001 10.1088/2632-2153/acadcd. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are openly available in the Materials Project database at http://www.materialsproject.org. The source code is available from https://github.com/usccolumbia/AlphaCrystal.









