Skip to main content
iScience logoLink to iScience
. 2024 Dec 20;28(1):111659. doi: 10.1016/j.isci.2024.111659

Generative design of crystal structures by point cloud representations and diffusion model

Zhelin Li 1,2, Rami Mrad 1,2, Runxian Jiao 1,2, Guan Huang 1,2, Jun Shan 1,2, Shibing Chu 1,2,3,, Yuanping Chen 1,2,∗∗
PMCID: PMC11763582  PMID: 39868038

Summary

Efficiently generating energetically stable crystal structures has long been a challenge in material design, primarily due to the immense arrangement of atoms in a crystal lattice. To facilitate the discovery of stable materials, we present a framework for the generation of synthesizable materials leveraging a point cloud representation to encode intricate structural information. At the heart of this framework lies the introduction of a diffusion model as its foundational pillar. To gauge the efficacy of our approach, we employed it to reconstruct input structures from our training datasets, rigorously validating its high reconstruction performance. Furthermore, we demonstrate the profound potential of point cloud-based crystal diffusion (PCCD) by generating materials, emphasizing their synthesizability. Our research stands as a noteworthy contribution to the advancement of materials design and synthesis through the cutting-edge avenue of generative design instead of conventional substitution or experience-based discovery.

Subject areas: Natural sciences, Physics, Computer science, Materials science

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • We present a materials generation framework for bulk materials based on diffusion model

  • We use point cloud presentation to encode atomic position information

  • We generate batches of crystals and many of them have the potential for stability

  • We validate our results with first-principles calculations


Natural sciences; Physics; Computer science; Materials science

Introduction

The continuous advancement of technology hinges significantly on the development of material science, making it essential to unravel the complex relationships between molecular or crystal structures and their properties. Currently, two main methods are used for designing crystal structures: altering existing materials using scientific intuition and empirical principles or global optimization algorithms1 and mining material databases with the Materials Project (MP),2 known as high-throughput virtual screening,3 which has shown great success in various applications. However, the computational expense associated with density functional theory (DFT) calculations renders an exhaustive search of the theoretical material space infeasible.4 In recent years, there has been a notable surge in research dedicated to harnessing artificial intelligence (AI) for the exploration of undiscovered materials.5,6,7,8,9,10 However, within the field of crystallography, the predominant application of machine learning (ML) techniques is focused on predicting material properties, such as composition, band gap, or formation energy.11,12,13 Consequently, the utilization of ML algorithms for crystal generation remains relatively nascent, underscoring the pressing need for the further development of artificial intelligence–generated content (AIGC) within the realm of crystallography.

In the field of material exploration, generative models have been proven to be particularly effective.7 Over the past few years, two fundamental models have been widely applied: the generative adversarial network (GAN)14 and the variational autoencoder (VAE).15 Currently, an array of studies has been dedicated to structure generation, drawing on the capabilities of these two models. An example is the study conducted by Jordan Hoffmann et al.,16 in which voxel representation was employed for crystals, a VAE was utilized for voxel data generation, and a U-Net model was subsequently applied for voxel classification. Zekun Ren et al.5 employed VAE for the reverse design of materials. Kim et al.7 utilized a GAN model to explore structures within the Mg-Mn-O ternary system, while Baekjun Kim et al.8 employed a Wasserstein generative adversarial network (WGAN) in their quest to discover crystalline porous materials. These research endeavors highlight the versatility and promise of generative models in the context of material discovery and design.

Recently, there has been a significant emergence of models for generating generic crystal structures. A notable example is the Crystal Diffusion Variational Autoencoder (CDVAE) developed by Tian Xie et al.,6 which successfully integrates a diffusion model with a VAE for crystal generation. Furthermore, the Cond-CDVAE model17 extends this approach by allowing the incorporation of user-defined material and physical parameters, such as composition and pressure. Another major breakthrough in this domain is MatterGen,18 which is capable of generating stable and materials with specified chemical compositions, symmetries, and mechanical, electronic, and magnetic properties.

Nevertheless, most models address the challenge of how to improve the quality of generation results.19 Jonathan Ho et al.20 introduced a generative model known as the denoising diffusion probabilistic model (DDPM). Notably, various research teams, such as OpenAI,21,22 NVIDIA23 and Google,24 have achieved significant breakthroughs in the application25 of this model. Considering its excellent generative capability, we aim to investigate the latent potential of this model in the domain of structure generation and its potential to enhance the creative aspects of the model. Additionally, to minimize computational expenses and tailor diffusion modeling, we propose a point cloud representation26 to encode atom sites, element information, and lattice constants.

In this paper, we introduce a streamlined deep learning framework for crystal generation: point cloud-based crystal diffusion (PCCD). To test the model’s reliability, we intentionally added noise to our dataset and then used the PCCD to reconstruct the majority of the inputs with only minor deviations. Furthermore, we calculated the energy above hull (Ehull) per atom for a set of crystal structures generated by PCCD, revealing that many of these structures had low energy values, indicating their potential significance. Furthermore, our analysis revealed structures not in the database or with a stable phonon structure, emphasizing the ability of PCCD to generate potentially significant crystal structures.

Results and discussion

Reconstruction

In the PCCD, the training of the diffusion model involves the incremental addition of noise, with the model essentially learning how to peel noise from the corrupted data. In an ideal scenario, saving the data from the database, along with the added noise, should enable the eventual reconstruction of these original data without noise. To validate the model’s effectiveness, we selected a batch of structures from the database as the test set and performed 1000 iterations of noise to obtain and store the results. These noise-augmented results were then used as inputs for the PCCD instead of true random numbers. In theory, 1000 times of noise should be removed, and the data should be restored. To ensure accurate atomic site matching, the statistics presented are based on 868 samples, as only structures with matching atom counts can be compared. For the purpose of predicting atomic coordinates, we align each atom in the predicted crystal structure with its counterpart in the original crystal, given that both structures have the same total number of atoms. The distance between each atom in one structure and each atom in the other structure is calculated, taking into account translational symmetry. This symmetry allows atoms in the original crystal to be matched with atoms in adjacent cells of the predicted crystal, effectively aligning coordinates such as (0,0,0) in one structure with (1,1,1) in the other when the distance is zero. Finally, the Greedy Algorithm is employed to perform the matching after all distances have been determined. We then compared the restored data to the original dataset, as illustrated in Figure 1 (with detailed information provided in Table 1), providing a robust assessment of the model’s reconstruction performance. This experiment serves as a rigorous validation of the model’s capabilities.

Figure 1.

Figure 1

Reconstruction results of the PCCD

(A) The parity plots for lattice lengths of reconstructed materials and original materials.

(B) The parity plots for the atomic positions of the reconstructed materials and original materials.

(C) Heatmap of the atom number relationship between the reconstructed materials and original materials.

(D) Boxplot of the lattice length relative error with density distribution.

(E) Boxplot of the relative errors of the atomic x, y, and z coordinates with respect to the density distribution.

Table 1.

Data details for reconstruction results

a b c x y z
Efficiency 88.48% 90.55% 89.40% 70.83% 71.60% 72.02%
Upper limits 3.81% 3.66% 3.57% 10.05% 5.44% 6.44%
Lower limits −5.12% −5.23% −4.90% −6.50% −3.84% −4.09%

Efficiency calculations according to the boxplots in Figures 1D and 1E. The upper and lower limits are shown for the boxplots in Figures 1D and 1E, respectively.

Given that we did not specify the number of atoms in the PCCD, the accuracy of predicting the atom count serves as a direct indicator of the model’s performance. In Figure 1C, a heatmap illustrates the relationship between the sum of atoms in the original data and the data predicted by the PCCD. Notably, a clear diagonal line represents accurate predictions, and the accuracy rate is 67.81% (868 out of 1280 samples).

Among these 868 samples with accurately predicted atom counts, we further calculated the relative errors for the lattice length of each corresponding atom, where |a|,|b|,|c| are the original lattice lengths and |aˆ|,|bˆ|,|cˆ| are the predicted lattice lengths (as shown in Figure 1A) and their coordinates (depicted in Figure 1B). A significant portion of these errors is visibly clustered around the y=x line. To gain deeper insights into their distribution, we conducted a detailed analysis using boxplots for both aspects, as presented in Figures 1D and 1E. Figure 1D displays the boxplot for the relative errors of the lattice parameters a, b, and c. The acceptable ranges for a, b, and c typically fall within upper limits of 3.81%, 3.66%, and 3.57%, respectively, and lower limits of −5.12%, −5.23%, and −4.90%, respectively. The effective rates for these parameters are as follows: 88.48% for lattice parameters a, 90.55% for lattice parameters b, and 89.40% for lattice parameters c. Alongside the boxplots, the kernel density function plots help illustrate the concentration of the data. In Figure 1E, we present the boxplot for the relative errors of the x, y, and z coordinates for each atom. The typical acceptable ranges for x, y, and z coordinates fall within the upper limits of 10.05%, 5.44%, and 6.44%, respectively, and lower limits of −6.50%, −3.84%, and −4.09%, respectively. The effective rates for these coordinates are calculated as 70.83% for x, 71.60% for y, and 72.02% for z. Most of the errors are relatively small, and they can be readily corrected during DFT geometry optimization. This analysis indicates that the framework is already functional and effective. However, we also conducted a more in-depth analysis to explore the objective factors that may influence model errors.

A significant contributing factor to the performance limitations of PCCD lies in the preprocessing stage before training. To facilitate the normalization of lattice vectors and enhance reversibility, all lattice vector data (data in the 3rd channel) were divided by 15. Consequently, the model’s capacity was restricted to generating values within the range of −1 to 1, which, in turn, led to a limitation in predicting lattice vectors. Specifically, the model could predict only lattice vectors with a maximum length of 15Å. As a consequence, structures featuring lattice vectors exceeding the maximum could not be accurately predicted. Upon calculating the relative errors for the lattice lengths of each atom across all samples (a total of 1280), we observed less favorable outcomes due to this limitation. The boxplots vividly illustrate that the effective rates for lattice vector length were only 73.98% for the ‘a’ lattice parameter, 75.39% for ‘b’, and 71.64% for ‘c’. This highlights the significant correspondence between the errors in both the atom counts and lattice vector length predictions.

Notably, many of these errors were associated with structures featuring at least one lattice vector longer than the maximum. This suggests a substantial interrelation between the atom count and lattice vector predictions, despite their presence in different data channels. Furthermore, the comparison of the x,y and z positions of atoms, as shown in the three figures in Figure 1B, reveals that a portion of the data clustered around the position (0,1). These data points were excluded when calculating statistics, as they were considered erroneous. However, it is important to note that crystal cells are periodic, and such data points are essentially equivalent to (0,0) or (1,1). This periodicity factor contributes to lower accuracy in the statistical analysis.

In order to describe the matching relationships between structures more accurately, we selected several statistical metrics that are suitable for crystal structure prediction (CSP).27 For each pair of crystals composed of a reconstructed structure and its original structure, we calculated their energy distance (Figure 2A), orbital field matrix distance (Figure 2B), CrystalNN fingerprint distance (Figure 2C), superpose distance (Figure 2D), RMS anonymous distance (Figure 2E) and graph edit distance (Figure 2F). Boxplots and kernel density function graphs can reflect their distribution situation. Figure 2 presents the outputs of the statistics, from which we can deduce that, for the energy distance, the reconstruction process of the model consists of obtaining values close to 0, proving the efficiency of the crystal reconstruction process. The boxplot in Figure 2A shows that the largest portion of the reconstructed data are near to 0. Figure 2B illustrates the orbital field matrix distance, where the density of the reconstructed data shows a peak at approximately 0; here, we can conclude that the reconstruction process is efficient. The CrystalNN fingerprint distance is an ML-based approach that investigates the number of neighboring atoms in the same or similar crystal structures; hence, Figure 2C shows the repartition of the data, which proves the similarity of the reconstructed data. Figure 2D shows the superpose distance, which reflects the efficacity of the model in the training process, where these metrics are used to compare the similarity of the periodic structure. Illustrating the RMS anonymous distance in Figure 2E is an absolute shoring to our model where we constate the similarity of the reconstructed data in these evaluation metrics and prove the model capacity. By exploring Figure 2F and the last performance metrics that we chose for assessing our work, the graph edit distance, we can also deduce the success of the reconstruction process due to the output of this metric, where we compare the number of edges and nodes; hence, we notice the peak at 0 to prove the structural similarity in terms of reconstruction.

Figure 2.

Figure 2

Statistics for matched structures

(A–F) Boxplots and kernel density functions of the Energy distance (A), Orbital Field Matrix distance (B), CrystalNN Fingerprint distance (C), Superpose distance (D), RMS Anonymous distance (E) and Graph Edit distance (F) for matched crystals.

Generated results

In comparison to models tailored for specific material components, such as the Mg-Mn-O or VxOy systems,7,28 PCCD demonstrates superior generalization capabilities. This means that we can effectively generate crystal structures composed of any combination of elements, provided that the total number of elements does not exceed three. As depicted in Figure 3, PCCD enables the generation of unary systems (Figure 3A), binary systems (Figure 3B), and ternary systems (Figure 3C), demonstrating its versatility and broad applicability. These findings also prove the diversity of this framework (Figures 3D–3F).

Figure 3.

Figure 3

Examples of generated crystals

(A) Sample of the predicted data Si system.

(B) A sample of the predicted data 3×3 supercell for the H-O system (H2O).

(C) Predicted data of the unit cell for the Mg-Mn-O system (Mg2Mn3O8).

(D) Unit cell generated for CaZn3O4.

(E) Unit cell generated for La2ZnO4.

(F) Unit cell generated for MgFe2O4.

For statistical validation and comparative analysis with other models, we generated three distinct batches of structures. The first batch (batch #1) consists of all possible elements excluding noble gases and radioactive elements. The second batch (batch #2) comprises rare earth elements, alkaline earth elements, transition metals, and oxygen, selected based on their ability to display characteristic properties that set them apart. The third batch (batch #3) comprises only commonly used elements. Following an initial screening process, we identified 1809, 746, and 120 structures in each batch, respectively. Of these, 1680 (92.87%), 669 (89.68%), and 108 (90%) structures were not exist in database.

We employed the Vienna Ab initio Simulation Package (VASP)29,30 to calculate the total energy. The generalized gradient approximation (GGA)31 given by the Perdew–Burke–Ernzerhof (PBE) parametrization32 was used to describe exchange–correlation interactions. Furthermore, we utilized the pymatgen package to calculate the Ehull per atom33 and found that approximately 39.44%, 61.80% and 66.7% separately of the structures exhibited values less than 0.25 eV/atom (Figure 4 and Table 2). In comparison, according to Yong Zhao’s PGCGM,4 out of 1579 structures, 106 had values less than 0.25 eV/atom (5.3%). Sungwon Kim’s model7 and Juhwan Noh’s model, known as iMatGen28, are two earlier models that have also made significant contributions to the field of GAN and VAE. Both of these authors assert that a structure with an Ehull less than 80 meV/atom can be considered relatively stable. In their respective paper, Sungwon Kim’s work obtained 113 results with an Ehull per atom less than 80 meV/atom from 6000 generated structures, while iMatGen achieved 40 such results from 10,981 structures, with a ratio of 1.8% and 0.36%, respectively. In contrast, we identified 160,122 and 41 structures that met these criteria from generated structures in these three batches (8.9%, 16.35%, 34.17%).

Figure 4.

Figure 4

The distribution of Ehull per atom for the generated data (three batches)

The label “Exist” means the generated structure is present in the database, while the label “New” indicates that it is outside of the database. Batch #1 contains structures with any components. Batch #2 contains structures that consist of rare earth elements, alkaline earth elements, transition metal elements, and oxygen. Batch #3 contains structures that consisted of common elements.

Table 2.

Percent of structures with energy above the hull per atom lower than the given standard among these batches and some other models

Ehull Batch #1 Batch #2 Batch #2 (without rare earth elements) Batch #3 PGCGM4 Sungwon Kim’s model7 iMatGen28
0.25 eV/atom 39.44% 61.80% 89.66% 66.67% 5.3%4
80 meV/atom 8.90% 16.35% 60.92% 34.17% 1.8%7 0.36%28

It is important to note that in batch #2, the definition and calculation method of Ehull may result in statistically anomalous values, either falsely high or low, particularly due to the limited representation of structures containing rare earth elements in the database. Consequently, we excluded structures containing rare earth elements from batch #2 and recalculated the statistics, as presented in Table 2. Further details can be found in the supplemental information. These findings suggest that the diffusion model may, to some extent, outperform GANs or VAEs in this field. It is worth highlighting that despite being a simplified version designed to explore the potential of the diffusion model and point cloud usage in the field of materials, our framework, akin to a pretrained model, has demonstrated comparable or even superior effectiveness in various aspects when compared to many other existing models.

Moreover, we chose three materials (Ca2SnO4, LiMg6, and MgSc2O4) from the generated structures for further investigation (as shown in Figure 5). We utilized DS-PAW34 for structural relaxation calculations and band structure assessments. Importantly, all three materials were successfully optimized via DFT calculations. Subsequently, we conducted phonon structure calculations for these selected materials, and all the materials demonstrated structural stability. Among these three compounds, MgSc₂O₄ and LiMg₆ are being not reported in MP database. These compounds are notably challenging to obtain through simple elemental substitution. Furthermore, among these three materials, MgSc2O4 and LiMg6 were reported, but they are difficult to obtain by simple elemental substitution. Furthermore, Cheng et al.,35 who used the PCCD to discover of Magnesium-Aluminum alloys, demonstrated that PCCD can generate structures effectively. This finding not only validates the use of PCCD in the discovery and design of materials but also opens avenues for future research in material science.

Figure 5.

Figure 5

Graphical depiction of the structures and their DFT calculations

Crystal representation and band and phonon band structures of Ca2SnO4 (A), LiMg6 (B), and MgSc2O4 (C).

Methods

At the core of our approach is the utilization of a diffusion model as the foundational model, as illustrated in Figure 6. We leverage U-Net36 as the backbone of PCCD, a well-established architecture frequently employed for tasks such as classification and segmentation tasks.37,38

Figure 6.

Figure 6

Sketch map of the PCCD

(A) Training phase process with data manipulation section. First, crystals are transformed to a point cloud data type, followed by the addition of noise to the data, which enables the PCCD to perform observation and learning.

(B) Generation phase with retrieval data operation. The method starts by feeding the PCCD random data and composition conditions and then passes to the data extraction and finishing with generating structures.

Two main methods are commonly used to represent a 3D object: voxels and point clouds. Voxel-based representation is thorough but resource intensive. In contrast, point clouds are more efficient than sparse matrices and reduce resource usage. Some prior works have claimed to use point clouds,7 but they essentially used individual points to lower computational costs. Our approach treats point clouds and lattice constants as three-channel entities akin to RGB in the computer vision (CV) field. We then employ clustering to determine the position, element composition and lattice.

Drawing from the notable achievements of diffusion models in the field of CV, we are motivated to extend their application to the generation of crystal structures. In this paradigm, we envision each crystal structure as akin to a patch in an image. To explore this innovative approach further, we integrate the point cloud representation technique with the power of diffusion models within the PCCD. This fusion of methods is designed to leverage the inherent advantages of both approaches. The diffusion model, renowned for its ability to capture intricate dependencies in data, holds promise for encoding the structural nuances of crystal formations. Moreover, the use of point cloud data representation, akin to a cloud of 3D points, serves to describe atomic positions and their attributes efficiently. By combining these two methodologies, we seek to harness their collective potential to revolutionize the generation and understanding of crystal structures.

Data preprocessing

Our material data were sourced from the Materials Project (MP).2 In this extensive database, our selection process targeted structures with ternary, binary, or monadic compositions that feature a maximum of 16 atom sites. This thorough filtering yielded a comprehensive dataset comprising 52,028 distinct materials. This dataset encompasses a wealth of information, including the POSCAR file, band gap, magnetism, crystal system, magnetic ordering, etc., for each of these structures. However, for model efficiency, we opted to narrow our focus to the band gap and magnetic ordering as the primary control variables. This decision, in conjunction with our use of the POSCAR files as training data, was made to streamline and lighten the model while ensuring the retention of essential variables for our specific research objectives.

As mentioned previously, we initially gathered various properties and POSCAR files of each crystal before training. The primary objective revolves around transforming the POSCAR data into a three-channel format, encompassing atom positions, element information, and lattice constants, as illustrated in Figure 7. Each of these channels comprises 128 items, effectively representing each structure as a 3×128×3 (C×W×H) matrix. The first channel is dedicated to atomic site information, where we distribute 128 points within the space. It is essential to clarify that the positions here are relative coordinates akin to those in the POSCAR file. The lattice vectors have not been determined at this stage. In essence, we use 128 items or several sets of data at this point in the process. To determine the absolute positions of these points, it is necessary to multiply them by the three lattice vectors obtained after processing the third channel. The data in the second channel correspond one-to-one with those in the first channel. Prior to generating or training samples, we input up to three elements. Each item in this channel contains three data values, which represent the likelihood of these three elements being associated with each atom. The data in the third channel do not correspond one-to-one with those in the first two channels. In fact, we want to obtain only six parameters α,β,γ,a,b,c from here, which can be converted to three vectors. To match the shape before, we expand them to 128 items by copying. In theory, after training, two distinct groups of data are generated. We can then obtain three vectors by employing clustering techniques, determining the means of each group, and performing calculations.

Figure 7.

Figure 7

Data format for the framework (e.g., MgMnO3)

The first channel represents the position, the second channel represents the element information, and the third channel determines the lattice constants.

Generation model

Our generation model is based on the diffusion model, which is essentially a parameterized Markov chain. It is trained using variational inference to produce samples that closely match the data distribution after finite time.20

The diffusion model comprises two distinct processes, the training process and the generation process, often referred to as the sampling process, as illustrated in Figure 8A. These processes work in tandem to enable the generation of data samples that align with the underlying distribution of the training data. The training process can be briefly described as a procedure in which noise is progressively introduced to the data and the model endeavors to meet the characteristics of this noise addition. In contrast, the sampling process involves the gradual application of the trained model to denoise pure noise data. These data, in essence, are treated as source data with superimposed noise, and the model works to refine and clarify them.

Figure 8.

Figure 8

Schematic depiction of the PCCD architecture with generation and training processes

(A) The data flow of the training process and generation process.

(B①) A step in the model's training process corresponds to (A①). (B②) A step in the denoising process corresponds to (A②).

The training process begins with x0 and gradually adds noise ε1,ε2,,εT1,εT to x0, resulting in x1,x2,,xT1,xT. Assuming that x0q(x0) and the noise εt follow a normal distribution, then, for t1:

q(x1:T|x0)=t=1Tq(xt|xt1)q(xt|xt1)=N(xt;1βtxt1,βtI) (Equation 1)

We follow the definition of J. Ho et al.20 Here, we define a constant variance schedule β1,,βT, where β increases as t increases. According to reparameterization, Equation 1 can also be expressed as:

xt=1βtxt1+βtϵ (Equation 2)

where ϵN(0,1). We can obtain xt through the probability method from xt1. For simplicity, we define αt=1βt,αt¯=s=1tαs and βt¯=s=1tβs. By applying Equation 2 recursively, we can obtain that at any time t:

xt=αt¯x0+1αt¯ϵ,q(xt|x0)=N(x0;αt¯x0,(1αt¯)I (Equation 3)

and the reverse process begins with p(xT)=N(xT,0,I); this process denoises gradually as pθ(x0)=pθ(x0:T)dx1:T. In the reverse process pθ, we know the variance of every step but do not know the means μθ.

pθ(x0:T)=p(xT)t=1Tpθ(xt1|xt)pθ(xt|xt1)=N(xt;μθ(xt,t),βtI) (Equation 4)

Therefore, we need to know what μθ is. It can be derived that20,39:

μθ(xt,t)=1αt(xtβt1α¯tϵθ(xt,t)) (Equation 5)

After the parameterization (5), for any t[1,T]:

xt1=1αt(xtβt1α¯tϵθ(xt,t))+σtz (Equation 6)

ϵθ is the model that needs to be trained. This means that we can obtain xt1 from xt by ϵθ.

In a one-step noise addition process (Figure 8B①), the noise is composed of random numbers following a normal distribution. The mean and variance of this noise depend on time t and the preceding data x(t1). Simultaneously, the PCCD actively learns the characteristics of this noise. During each iteration, the model receives data with noise, and its primary task is to predict the most recent noise addition. Consequently, we end up with two types of noise: one generated from a probabilistic approach and the other predicted by our deep learning model. By comparing these two noise sources, we can calculate a loss, which serves as feedback to the model, facilitating its adjustment and improvement. This iterative process continues until the model effectively learns to reproduce the noise characteristics, achieving accurate denoising.

In the noise-removal process, as shown in Figure 8B②, we only have data with noise, and our objective is to estimate and separate the noise from the data. In this way, we can separate the current noise and the previous data. At the macro level, this is a disorderly to orderly process (Figure S1).

In this context, we employ a U-Net model (Figure S2) to predict and separate noise from the data. U-Net, initially introduced in 2015,36 is a well-established model in the CV field that was notably acclaimed for its exceptional performance in image segmentation tasks. Our U-Net model is configured with five sets of upsampling and downsampling layers. To enhance its capacity to capture intrinsic data correlations, we incorporated intra-data correlation. This augmentation allows the model to effectively learn and predict noise, contributing to the denoising process.

Conclusion

We introduced a framework employing the denoising diffusion probabilistic model (DDPM) and point cloud representation for crystal structure generation. This versatile framework enables the generation of crystal structures composed of fewer than three elements and featuring up to 16 atom sites by specifying the elemental composition. To assess the framework’s validity, we successfully reconstructed a batch of structures randomly sampled from the training dataset, confirming its reliability. Furthermore, we applied this framework to generate a batch of structures comprising rare earth elements, alkaline earth elements, transition metal elements and oxygen as an illustrative example. For the three batches of crystals generated by the PCCD, the percentages of structures with Ehull/atom less than 0.25 eV/atom were 39.44%, 61.85% and 66.67%, respectively, and these with Ehull/atom less than 80 meV/atom were 8.90%, 16.35% and 34.17%, respectively. Structures with some special components are more abundant. In addition, the stabilities of several structures have been confirmed through phonon structure analysis (e.g., Ca2SnO4, LiMg6, and MgSc2O4). Consequently, we demonstrated the efficacy of utilizing DDPM and point cloud representations in crystal structure generation, which was validated by DFT high-throughput calculations. This framework serves as a foundational step, offering potential for further enhancement and the development of larger models for inverse crystal design. Furthermore, this approach serves to expand the database of crystals.

Limitations of the study

The constraints of PCCD are still few and cannot achieve true controllable generation yet. Introducing spatial group constraints to explore the chemical space of materials may be a valuable approach, not only to enhance the validity, novelty, and stability of the generated materials but also to improve the efficiency and effectiveness of the generative process in future research. Furthermore, we need to advance the model in the future to support inverse design capabilities for a broader range of chemical elements (beyond three), desired material properties, and experimental observations. This development could facilitate the systematic exploration and discovery of materials with targeted functionalities, thereby addressing key challenges in materials science.

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Shibing Chu (c@ujs.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

  • This paper analyzes existing, publicly available data. The accession information for these datasets are listed in the key resources table.

  • All original code has been deposited at the GitHub repository (https://github.com/lzhelin/CrystalDiffusion) and is publicly available.

  • All additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Acknowledgments

This work gratefully acknowledges the National Natural Science Foundation of China (No. 11904137, 12074150 and 12174157) and the financial support from Jiangsu University (No. 4111190003). We gratefully acknowledge HZWTECH for providing computation facilities.

Author contributions

Z.L.: conceptualization, method, software, investigation, formal analysis, model validation, writing original draft; R.M.: editing, model validation and investigation; R.J., G.H., and J.S.: DFT calculation guide; S.C. and Y.C.: conceptualization, writing review, funding acquisition, resources and supervision.

Declaration of interests

The authors declare no competing interests.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

The Materials Project Jain et al.2 https://doi.org/10.1063/1.4812323
https://next-gen.materialsproject.org/

Software and algorithms

CrystalDiffusion This paper; original code for reported results. https://github.com/lzhelin/CrystalDiffusion
https://zenodo.org/records/10570395
CSPBenchMetrics Wei et al.27 https://doi.org/10.1016/j.commatsci.2024.112802.
https://github.com/usccolumbia/CSPBenchMetrics
VASP VASP Software GmbH https://www.vasp.at/

Method details

Data collection

For this study, we curated the dataset from Materials Project, focusing on structures with ternary, binary, or monadic compositions and limiting the number of atomic sites to a maximum of 16. This filtering process resulted in a dataset of 52,028 materials. Each entry in this dataset includes detailed information such as the POSCAR file, band gap, magnetic properties, crystal system, and magnetic ordering, among other key attributes.

Model training details

In PCCD, we employ the U-Net for noise prediction (Figure S2). It primarily encompasses four up-sampling progresses and for down-sampling progresses, with each progresses comprising multiple convolutional layers and self-attention layers.

As depicted in Equation 3, for each step t during the training progress, we can calculate xt, while the noise ϵtN(0,1) is given. The objective of the U-Net is to estimate the noise term ϵt given xt. For the loss of U-Net (Figure S3), we utilize Mean Absolute Error (MAE, Equation 7) to quantify the discrepancy between output of U-Net and ϵt. For more Hyperparameters details in training can be seen in Table S1, supplemental information.

While training, we use Mean Absolute Error (MAE) to calculate loss (Figure S3). The loss function is as follow.

Loss=Et,x0,ϵ(ϵϵθ(α¯tx0+1α¯tϵ,t)) (Equation 7)

Here, t is timestep, x0 is the original data (training data) without noise, ϵ is random matrices with the same shape of x0 following normal distribution.

Data expression example

Taking MgMnO3 as an example (Figure 7), the data is generated by our diffusion model. For the first channel, obviously, 128 data points can be classified into five categories: (0, 0.5, 0.5), (0.5, 0, 0.5), (0.5, 0.5, 0), (0, 0, 0) and (0.5, 0.5, 0.5). The clustering method we use is Density-Based Spatial Clustering of Applications with Noise (DBSCAN) due to the raison of undefined number of groups in our data. However, here we may know: one atom is at the position of (0, 0, 0), one atom is at the center of the crystal body, and three atoms are at the centers of the crystal faces. However, we have not been given more information on elements and lattice. It can be a carbon system or a Ca-Ti-O system, and can also be a triclinic system or a hexagonal system. At the second channel, it gives the element information for the 128 points one by one. In Figure 7, it can be seen that there are three groups of data, further categorize the five categories in the first channel into three by one-to-one correspondence”, we change like follow “the first data channel is divided to five classes of data while the second is divided to three classes. However, these two channels (first and second) are aligned according to three classes of data (second channel) due to index correspondence (see how classes are aligned in Figure 7). and they further categorize the five categories in the first channel into three by one-to-one correspondence. So far, the elements and the relative coordinate in unit cell of every atom can be confirmed. Before training, the elements information have been given by inputting a list (Mg, Mn, O), which assign in the second channel: (1,0,0), (0,1,0), and (0,0,1) mean Mg, Mn, O, separately. But the shape of lattice or crystal system is still unknown, and it can also be a cubic system or a trigonal system. Ignoring the first two channels, while processing the third channel, we deal with them directly. It can be aggregated into two categories theoretically which are the lenghth a,b,c and angle α,β,γ of lattice. During training and data preprocess, all samples have the same template for the third channel. Hence, we neglect the use of additional clustering algorithm, only perform calculations average for every column of the front half (α,β and γ) and back half (a, b and c). Here, the data we get approximately are (0.50, 0.50, 0.50), (0.25, 0.25, 0.25) (All numbers below 1 is because before training, lattice data have been normalized by dividing 15Å and the angle used radian system and divided by 2π ). Then, the three vectors of the lattice can be calculated as follows.

a=a(1,0,0) (Equation 8)
b=b((cosγ,sinγ,0)) (Equation 9)
c=c(cosβ,cosαcosβcosγsinγ,1+2cosαcosβcosγcos2αcos2βcos2γsinγ (Equation 10)

The result is (3.75, 0, 0), (0, 3.75, 0), (0, 0, 3.75), and further determine that it’s a cubic system with side length of 3.75Å. Integrate the above all, we summarize:

  • 1.

    It’s a cubic system with side length of 3.75Å.

  • 2.

    The formula of this structure is MgMnO3.

  • 3.

    For every unit cell, there will be an oxygen atom on each face, a magnesium atom on each corner, and a manganese atom at the center.

Generation processes example

Also taking MgMnO3 as an example(Figure S1), different color of points means different atoms (the second channel of data) and ignoring the third channel for better visualization. The whole inference process need 1000 steps. It’s the last step for training and also the first step for generation while t=999, while generating, the data given here is random numbers obey a normal distribution. It can be seen that the points in this step are disorganized. with t getting closer and closer to 0. Points with the same color gradualy come together. From t=200, there are already rudiments of clustering. While t=0, it can be clear that there are five groups, and it means that this structure has these five atoms, including positions and elements.

DFT configuration

The structures were optimized by Density Functional Theory (DFT) that were carried out with Vienna ab initio simulation package (VASP). The Perdew–Burke–Ernzerhof (PBE) of the generalized gradient approximation (GGA) was used for exchange–correlation functional. The kinetic energy cutoff was set to be 520 eV for the electronic wavefunction having a plane wave basis set which was obtained using the projector augmented-wave method. The Monkhorst–pack k-mesh grids was selected by vaspkit.

Quantification and statistical analysis

There are no quantification or statistical analyses to include in this paper.

Published: December 20, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2024.111659.

Contributor Information

Shibing Chu, Email: c@ujs.edu.cn.

Yuanping Chen, Email: chenyp@ujs.edu.cn.

Supplemental information

Document S1. Figures S1–S3 and Table S1
mmc1.pdf (301.4KB, pdf)

References

  • 1.Wang Y., Lv J., Zhu L., Ma Y. CALYPSO: A method for crystal structure prediction. Comput. Phys. Commun. 2012;183:2063–2070. doi: 10.1016/j.cpc.2012.05.008. [DOI] [Google Scholar]
  • 2.Jain A., Ong S.P., Hautier G., Chen W., Richards W.D., Dacek S., Cholia S., Gunter D., Skinner D., Ceder G., Persson K.A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. Apl. Mater. 2013;1:011002. doi: 10.1063/1.4812323. [DOI] [Google Scholar]
  • 3.Pyzer-Knapp E.O., Suh C., Gómez-Bombarelli R., Aguilera-Iparraguirre J., Aspuru-Guzik A. What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery. Annu. Rev. Mater. Res. 2015;45:195–216. doi: 10.1146/annurev-matsci-070214-020823. [DOI] [Google Scholar]
  • 4.Zhao Y., Siriwardane E.M.D., Wu Z., Fu N., Al-Fahdi M., Hu M., Hu J. Physics guided deep learning for generative design of crystal materials with symmetry constraints. npj Comput. Mater. 2023;9:38. doi: 10.1038/s41524-023-00987-9. [DOI] [Google Scholar]
  • 5.Ren Z., Tian S.I.P., Noh J., Oviedo F., Xing G., Li J., Liang Q., Zhu R., Aberle A.G., Sun S., et al. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties. Matter. 2022;5:314–335. doi: 10.1016/j.matt.2021.11.032. [DOI] [Google Scholar]
  • 6.Xie T., Fu X., Ganea O.-E., Barzilay R., Jaakkola T. Crystal Diffusion Variational Autoencoder for Periodic Material Generation. arXiv. 2021 doi: 10.48550/arXiv.2110.06197. Preprint at. [DOI] [Google Scholar]
  • 7.Kim S., Noh J., Gu G.H., Aspuru-Guzik A., Jung Y. Generative Adversarial Networks for Crystal Structure Prediction. ACS Cent. Sci. 2020;6:1412–1420. doi: 10.1021/acscentsci.0c00426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kim B., Lee S., Kim J. Inverse design of porous materials using artificial neural networks. Sci. Adv. 2020;6 doi: 10.1126/sciadv.aax9324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sanchez-Lengeling B., Aspuru-Guzik A. Inverse molecular design using machine learning: Generative models for matter engineering. Science. 2018;361:360–365. doi: 10.1126/science.aat2663. [DOI] [PubMed] [Google Scholar]
  • 10.Merchant A., Batzner S., Schoenholz S.S., Aykol M., Cheon G., Cubuk E.D. Scaling deep learning for materials discovery. Nature. 2023;624:80–85. doi: 10.1038/s41586-023-06735-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Xie T., Grossman J.C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018;120 doi: 10.1103/PhysRevLett.120.145301. [DOI] [PubMed] [Google Scholar]
  • 12.Liu Y., Zhao T., Yang G., Ju W., Shi S. The onset temperature (Tg) of As Se1 glasses transition prediction: A comparison of topological and regression analysis methods. Comput. Mater. Sci. 2017;140:315–321. [Google Scholar]
  • 13.Fernandez M., Boyd P.G., Daff T.D., Aghaji M.Z., Woo T.K. Rapid and Accurate Machine Learning Recognition of High Performing Metal Organic Frameworks for CO2 Capture. J. Phys. Chem. Lett. 2014;5:3056–3060. doi: 10.1021/jz501331m. [DOI] [PubMed] [Google Scholar]
  • 14.Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative Adversarial Nets. arXiv. 2014 doi: 10.48550/arXiv.1406.2661. Preprint at. [DOI] [Google Scholar]
  • 15.Kingma D.P., Welling M. Auto-Encoding Variational Bayes. arXiv. 2013 doi: 10.48550/arXiv.1312.6114. Preprint at. [DOI] [Google Scholar]
  • 16.Hoffmann J., Maestrati L., Sawada Y., Tang J., Sellier J.M., Bengio Y. Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures. arXiv. 2019 doi: 10.48550/arXiv.1909.00949. Preprint at. [DOI] [Google Scholar]
  • 17.Luo X., Wang Z., Gao P., Lv J., Wang Y., Chen C., Ma Y. Deep learning generative model for crystal structure prediction. arXiv. 2024 doi: 10.48550/arXiv.2403.10846. Preprint at. [DOI] [Google Scholar]
  • 18.Zeni C., Pinsler R., Zügner D., Fowler A., Horton M., Fu X., Shysheya S., Crabbé J., Sun L., Smith J., et al. MatterGen: a generative model for inorganic materials design. arXiv. 2023 doi: 10.48550/arXiv.2312.03687. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cao Y., Li S., Liu Y., Yan Z., Dai Y., Yu P.S., Sun L. A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT. arXiv. 2023 doi: 10.48550/arXiv.2303.04226. Preprint at. [DOI] [Google Scholar]
  • 20.Ho J., Jain A., Abbeel P. Denoising Diffusion Probabilistic Models. arXiv. 2020 doi: 10.48550/arXiv.2006.11239. Preprint at. [DOI] [Google Scholar]
  • 21.Ramesh A., Dhariwal P., Nichol A., Chu C., Chen M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv. 2022 doi: 10.48550/arXiv.2204.06125. Preprint at. [DOI] [Google Scholar]
  • 22.Nichol A., Jun H., Dhariwal P., Mishkin P., Chen M. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv. 2022 doi: 10.48550/arXiv.2212.08751. Preprint at. [DOI] [Google Scholar]
  • 23.Balaji Y., Nah S., Huang X., Vahdat A., Song J., Zhang Q., Kreis K., Aittala M., Aila T., Laine S., et al. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv. 2022 doi: 10.48550/arXiv.2211.01324. Preprint at. [DOI] [Google Scholar]
  • 24.Kawar B., Zada S., Lang O., Tov O., Chang H., Dekel T., Mosseri I., Irani M. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv. 2022 doi: 10.48550/arXiv.2210.09276. Preprint at. [DOI] [Google Scholar]
  • 25.Yang L., Zhang Z., Song Y., Hong S., Xu R., Zhao Y., Zhang W., Cui B., Yang M.-H. Diffusion Models: A Comprehensive Survey of Methods and Applications. arXiv. 2022 doi: 10.48550/arXiv.2209.00796. Preprint at. [DOI] [Google Scholar]
  • 26.Ebrahimi T., Alexiou E. Vol. 10396. 2017. On the performance of metrics to predict quality in point cloud representations; pp. 282–297. (Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series). [Google Scholar]
  • 27.Wei L., Li Q., Omee S.S., Hu J. Towards quantitative evaluation of crystal structure prediction performance. Comput. Mater. Sci. 2024;235 doi: 10.1016/j.commatsci.2024.112802. [DOI] [Google Scholar]
  • 28.Noh J., Kim J., Stein H.S., Sanchez-Lengeling B., Gregoire J.M., Aspuru-Guzik A., Jung Y. Inverse Design of Solid-State Materials via a Continuous Representation. Matter. 2019;1:1370–1384. [Google Scholar]
  • 29.Kresse G., Furthmüller J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B Condens. Matter. 1996;54:11169–11186. doi: 10.1103/physrevb.54.11169. [DOI] [PubMed] [Google Scholar]
  • 30.Kresse G. Ab initio molecular dynamics for liquid metals. J. Non-Cryst. Solids. 1995;47:558. doi: 10.1016/0022-3093(95)00355-X. [DOI] [PubMed] [Google Scholar]
  • 31.Kresse G., Joubert D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B. 1999;59:1758–1775. [Google Scholar]
  • 32.Perdew J.P., Burke K., Ernzerhof M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996;77:3865–3868. doi: 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]
  • 33.Ong S.P., Richards W.D., Jain A., Hautier G., Kocher M., Cholia S., Gunter D., Chevrier V.L., Persson K.A., Ceder G. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 2013;68:314–319. [Google Scholar]
  • 34.Blöchl P.E. Projector augmented-wave method. Phys. Rev. B. 1994;50:17953–17979. doi: 10.1103/PhysRevB.50.17953. [DOI] [PubMed] [Google Scholar]
  • 35.Cheng S., Li Z., Zhang H., Yan X., Chu S. Discovery of magnesium-aluminum alloys by generative model and automatic differentiation approach. Model. Simulat. Mater. Sci. Eng. 2024;32 doi: 10.1088/1361-651X/ad38d0. [DOI] [Google Scholar]
  • 36.Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv. 2015 doi: 10.48550/arXiv.1505.04597. Preprint at. [DOI] [Google Scholar]
  • 37.Cho S.-J., Ji S.-W., Hong J.-P., Jung S.-W., Ko S.-J. Rethinking Coarse-to-Fine Approach in Single Image Deblurring. arXiv. 2021 doi: 10.48550/arXiv.2108.05054. Preprint at. [DOI] [Google Scholar]
  • 38.Franani A.O. Analysis of the performance of U-Net neural networks for the segmentation of living cells. arXiv. 2022 doi: 10.48550/arXiv.2210.01538. Preprint at. [DOI] [Google Scholar]
  • 39.Luo C. Understanding Diffusion Models: A Unified Perspective. arXiv. 2022 doi: 10.48550/arXiv.2208.11970. Preprint at. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3 and Table S1
mmc1.pdf (301.4KB, pdf)

Data Availability Statement

  • This paper analyzes existing, publicly available data. The accession information for these datasets are listed in the key resources table.

  • All original code has been deposited at the GitHub repository (https://github.com/lzhelin/CrystalDiffusion) and is publicly available.

  • All additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES