Abstract

Grain boundaries (GBs) in two-dimensional (2D) materials are known to dramatically impact material properties ranging from the physical, chemical, mechanical, electronic, and optical, to name a few. Predicting a range of physically realistic GB structures for 2D materials is critical to exercising control over their properties. This, however, is nontrivial given the vast structural and configurational (defect) search space between lateral 2D sheets with varying misfits. Here, in a departure from traditional evolutionary search methods, we introduce a workflow that combines the Graph Neural Network (GNN) and an evolutionary algorithm for the discovery and design of novel 2D lateral interfaces. We use a representative 2D material, blue phosphorene (BP), and identify 2D GB structures to test the efficacy of our GNN model. The GNN was trained with a computationally inexpensive machine learning bond order potential (Tersoff formalism) and density functional theory (DFT). Systematic downsampling of the training data sets indicates that our model can predict structural energy under 0.5% mean absolute error with sparse (<2000) DFT generated energy labels for training. We further couple the GNN model with a multiobjective genetic algorithm (MOGA) and demonstrate strong accuracy in the ability of the GNN to predict GBs. Our method is generalizable, is material agnostic, and is anticipated to accelerate the discovery of 2D GB structures.
Keywords: graph neural networks, genetic algorithm, 2D Materials, grain boundary, blue phosphorene, machine learning, first-principle simulation
Introduction
Grain boundaries (GBs) and defects in two-dimensional (2D) materials have a significant impact on various material properties as compared to pristine monolayers, such as the tensile strength decrease related to the inflection of 2D material with a GB1 and changes to the charge carrier transmission at the GB region in graphene.2 GBs and defects are also difficult to control in the synthesis process,3 such as chemical vapor deposition (CVD) for graphene,4 molybdenum disulfide,5 and tungsten disulfide.6 Also of high interest is the study of fabricating lateral heterostructures (LHS) for novel electronic applications7−9 to design semiconducting devices, including bipolar transistors10 and field-effect transistors.11 Therefore, understanding the relationship between a desired property and atomic structure at a lateral interface is fundamental to avoiding material performance loss due to a defect or conversely to taking advantage of the property tuning possibility of LHS and 2D materials. However, before researchers can systematically implement computational tools for material property studies of interface defects or LHS, one must search for energetically stable atomic structures, as the formation energy is the key feature indicating the stability of the material. Traditional methods for new material discovery rely heavily on the trial–error approach, which is extremely inefficient for the discovery of new materials.
In our previous study,12 we introduced a workflow based on a genetic algorithm (GA) for systematically searching through the possible GBs or interfaces of 2D materials. We use graphene as the benchmark material due to the wide breadth of research studies related to graphene GBs, including mechanical properties studied by Zhang et al.,1 electronic properties studied by Yazyev et al.,2 and CVD synthesis of graphene done by Tan et al.4 We were able to show that the workflow produces energetically stable GB structures for graphene while preserving a prescribed level of diversity. The statistical study of the topological features of the GB structures also showed that the workflow gives out information consistent with previous research results.3,13 However, despite the many advantages of the GA, it also has a major limitation when it comes to efficiently searching for novel materials where reliable surrogates for energy evaluation are unavailable. Density functional theory (DFT) simulations offer accurate predictions for the energy of low-dimensional structures. However, integrating DFT with the GA search is computationally prohibitive. Although there are libraries of empirical potentials14−17 available as surrogates, many are designed for pristine 2D sheets and are not generalizable to defective 2D materials. Thus, one can only take full advantage of the GA workflow for a stable GB structure search when the energy predictions are reliable.
Machine learning (ML) methods such as the Graph Neural Network (GNN) have been implemented to predict the formation energy of nanoscale systems and their material properties such as the band gap and Fermi energy18−23 with prediction errors comparable to DFT calculations. It is also noted that 2D materials can be naturally described as graphs, where the atoms are the graph nodes, and the chemical bonds can be described as edges. However, to the best of our knowledge, there has been no implementation of a GNN for energy prediction targeting defective 2D structures. As seen in Figure 1, the simulation supercell of a 2D interface is formed by the GB region in the middle and two pristine bulk regions with rotation angles (θL and θR for the left and right sides, respectively) located on two sides. We implement this methodology for generating GB structures in the GA. However, when searching for a fixed orientation, meaning when θL and θR are fixed in a GA search, only the GB region will change and the majority of the supercell would be identical due to the fixed bulk regions. This increases the difficulty of distinguishing different structures for the GNN model. GNN models19,20,22,24−26 reported in the literature typically train and test on systems with multiple element types, for which a different atomic number can be appended to each node as the node feature of the GNN. In our case, when searching for GBs for 2D material, all atoms are from the same element and the GNN model cannot gain information when the atomic number is assigned as the node feature. This further increases the difficulty of training an accurate GNN model for energy prediction of 2D GBs of a single element.
Figure 1.
Top view of example simulation supercell of GB structure of GA searching. Heptagons and pentagons are labeled in yellow and blue, respectively. The bulk region of the supercell is shown in red, and the GB region atoms and boundary are shown in blue. The bulk region is formed by pristine BP sheets rotated at a certain angle (θL, θR) with respect to the zigzag orientation. The atoms and the boundary of the bulk region are colored in black.
To overcome these obstacles and develop a reliable surrogate for the GA searching workflow introduced in previous work,12 we first transfer generated 2D GB structures into graphs and build a GNN model that has convolution layers based on the Graph Isomorphism Network (GIN),27 which was proposed by Xu et al. and shown to have the highest discriminative power among classic GNNs.
The data sets for training this model were generated and utilized in two ways. Considering the high cost of DFT simulations, we first demonstrate that our model can replicate the energy hypersurface of a Tersoff potential that was fitted for blue phosphorene (BP).28 We generated a data set that included GB structures that were geometrically optimized (relaxed) by the Tersoff potential and structures that were not geometrically optimized (unrelaxed). The total energy of all the structures was evaluated using the Tersoff potential, and this data set is therefore referred to as the complete Tersoff data set. As both relaxed and unrelaxed structures are included in the complete Tersoff data set, the total energy of all the structures covers a broad range. This allows our GNN model to train on various structures and improve the prediction accuracy on lower energy structures. As discussed in the “Complete Tersoff Dataset and Model Performance” section, by gradually increasing the number of structures for training, we found that 950 unrelaxed structures and 950 relaxed structures were sufficient for the GNN model to reach an optimum. When using 950 unrelaxed structures and 950 relaxed structures, the mean absolute error (MAE) was 1.617 and 1.507 eV for the predicted energy with the Tersoff potential unrelaxed and relaxed structures, respectively.
After the GNN model’s accuracy was validated on the Tersoff generated data set, we trained the GNN model on the DFT energy hypersurface for a more accurate prediction of the total energy. We sampled structures from the complete Tersoff data set as discussed in the “Single Shot DFT Data Set and Model Performance” section. Approximately 2200 unrelaxed structures and 2200 structures relaxed using the Tersoff potential were selected. The total energy for each sampled structure was evaluated by single shot DFT calculations. We, therefore, name this data set the single shot DFT data set (SS DFT data set). Similar to the previously mentioned complete Tersoff data set, we again confirmed that 950 structures of both unrelaxed and relaxed structures for training are sufficient on the SS DFT data set. Our GNN model achieved a percentage error of approximately 0.5% for the total energy prediction on the test set, which shows that the GNN model “learned” the DFT energy hypersurface with good accuracy.
We then combined the GNN model with the GA search. To avoid trapping the search to a local minimum, we introduced a novelty measure of the structures as one of the fitness parameters and performed a multiobjective GA (MOGA) search,29 as discussed in the Methods section. We used the GNN model for the MOGA search to compare with the MOGA search based on the Tersoff potential. The results for searching with the two different surrogates were consistent, where searches with the two different approaches converged to the same structure.
Similarly, the GNN model was trained with the SS DFT data set and used as the MOGA surrogate for the search. Considering that the GA is a stochastic search method, three independent searches were performed, and the search results were studied as discussed in the “DFT Trained GNN for the MOGA Search” section. For these three independent searches, we extracted the structures in the Pareto front and obtain the total energy using DFT calculations as the reference for comparison. By comparing the MAE of the GNN model and the DFT total energy difference between the structures, we confirmed that the GNN model can distinguish among different GB structures generated during the MOGA search. Previous work done by Kirklin et al.30 assessed the general accuracy of DFT when compared to the experimental results, where the MAE of the formation energies was calculated. Based on the extracted structures from the Pareto front of the searches, we also compared the GNN predicted formation energies with the DFT calculated ones, and the calculated MAE for the GNN model predicted formation energies are comparable with the above-mentioned MAE for the DFT evaluated formation energies. This indicates that our GNN model can provide a good estimation of DFT evaluated total energy values with magnitudes lower computational cost, thus enabling the evolutionary search of 2D GBs that was previously prohibitive with DFT simulations.
Methods
GNN Model Architecture
The first step to develop the GNN model was creating the 2D sheets with GB defects. As shown in Figure 1, defects are formed within a GB region in the middle of the simulation supercell, while most of the simulation cell is dominated by the pristine 2D nanomaterial. The bulk regions prevent the influence from neighboring GBs and satisfy a semi-infinite boundary condition. However, this methodology for constructing the GB structure also introduces similarities for all the GB structures generated for a fixed combination of interfacing nanosheets. Compared to previous research implementing the GNN for property prediction,18−23 our data set includes deceiving structures with similar atomic structures. When searching for atomic structures of GB, there is no atomic number difference between the atoms, which makes the structures harder to distinguish. To develop a method for an accurate DFT surrogate, the total energy of each structure is the target value for the GNN model and is essentially a graph-level regression problem. Therefore, the ability to tell the difference between graphs and to represent various GB structures using different high-dimensional vectors in the latent space is crucial for this problem. Among GNN operators, we took advantage of the theoretically proven expressive power of the GIN model from Xu et al.,27 in which the multilayer perceptrons (MLP) were used to update the node embedding for each layer of a graph convolution as shown in the eq 1.
| 1 |
where Vi(k) is the feature vector of the ith node in the kth layer. The initial feature vectors of the nodes are constructed as eq 2.
| 2 |
Since a cuboid supercell is used, here vector (X, Y, Z) is composed of the supercell size in Cartesian coordinates. The vector (xi, yi, zi) is composed of the fractional coordinates of the ith atom, where each coordinate is normalized with respect to the corresponding size of the supercell. The node feature is thus constructed as the concatenation of these two vectors. The initial node vectors Vi(0) are then processed with r convolution layers, and the connectivity for the nodes in the graph is learned and stored into high dimensional node embedding vectors Vi. To gather the information from all the nodes in the graph, a pooling layer described by eq 3 below was used to generate the feature vector representing the whole graph.
| 3 |
Here, Vg is the graph-level feature vector that gathers the information from the node embedding vectors. It was shown in previous work by Xu et al.27 that the pooling layer using an average or maximum value might not be able to distinguish certain distinct graphs. Therefore, we implement the summation pooling layer for the maximum expressive power.
The last part of the GNN architecture is the fully connected layers of the neural network L, which will decipher the connectivity information from the vector Vg generated by previous layers and predict the target value. The schematic of the architecture is shown in Figure 2, where the structures are first transferred to graphs with atomic position information attached to each node as described above. Then r layers of the graph convolution aggregate the neighboring information for each node and generate the high-dimensional node embedding. For the last part of the model, the node embedding vectors are fed into a global pooling layer to generate the vector that represents the whole graph. For the last step, the information contained inside this vector is mapped into a single total energy prediction by a MLP.
Figure 2.
Schematic of the GNN model architecture. The architecture is divided into three parts. Details are introduced in the GNN Model Architecture section.
Multiobjective Searching GA
As mentioned in our previous work,12 the GA search of 2D GB structures relies on empirical potentials and modeling, as implemented with the large-scale atomic/molecular massively parallel simulator (LAMMPS)31 code, which allows us to perform structural relaxations for all the structures generated during the searching process. The structural relaxations during the GA search help the search process escape from local minima and avoid generating similar GB structures resulting in a loss of diversity. However, the GNN model trained using the above-mentioned method does not support any geometry optimization, as it is only trained for the single shot total energy evaluation of a defective 2D material structure. To overcome this problem, we added a novelty score for each structure as a fitness parameter in addition to the total formation energy. With multiple fitness parameters considered during the search, we updated the original GA to a MOGA searching method.29 Among available algorithms, we use the nondominated sorting genetic algorithm II (NSGA II)32 implemented in the distributed evolutionary algorithms in python (DEAP)33 package. For each generation of a 2D structure with a GB, the selection of individuals considered the novelty and formation energy. Two archives of the structures are updated after each generation, the first one to keep a record of the Pareto front for all the individuals, while the second records structures with the lowest formation energy. The formation energy Ef of each structure is evaluated using the following equation:
| 4 |
where Et is the total energy evaluated, N is the total number of atoms within the GB structure, e is the energy per atom for the pristine blue phosphorene, L is the length of the GB, and factor 2 corresponds to two GBs in one supercell.
To measure the novelty of structures, the eigenvalue similarity method from Koutra et al.34 was implemented as formulated in the following equation:
| 5 |
for which λ is the spectrum of the graph Laplacian, which can be calculated using L = D – A, where D is the diagonal degree matrix and A is the adjacency matrix. The k in the equation is selected according to eq 6:
| 6 |
where j is 1 or 2 labeling the two graphs being compared. An integer value of k is selected so that the spectrum is truncated to ensure the summation of all eigenvalues is larger than 90% of the total summation. Graphs with similar connectivity will have Vnovelty close to 0, and increasing novelty values indicate that the two structures are increasingly dissimilar. The novelty was evaluated between each structure and other structures in the best structure archive and current population. The average value of the novelty was taken as the final novelty score, and using this novelty score as one of the search fitness helped maintain diversity during the search. With this definition of novelty, we performed the MOGA search using both the Tersoff potential with the LAMMPS code and the GNN surrogates. Representative examples of the structures and their corresponding novelty score are shown in the Supporting Information.
LAMMPS and DFT Simulation Details
For energy evaluation and minimization of structures as implemented in LAMMPS (May 27, 2021 version), a Tersoff potential28 fitted for phosphorene was used. The minimization criterion was selected where the forces on all the atoms were relaxed until they were smaller than 10–6 eV/Å or a maximum of 105 evaluations of energy or forces were performed.
The DFT simulations were done using the projector-augmented wave method (PAW)35 implemented with the VASP36 code (5.4.1). The PBE functional37 was used with a kinetic energy cutoff set to 500 eV. Considering the small variations of the structures’ sizes due to the different widths of the GB region, we set the k-point distance to approximately 0.035 Å–1 for the periodic direction of all the structures. The k-point for the nonperiodic direction was set to 1. The global break condition for the electronic SC-loop was set to 10–6 eV. To simulate the semi-infinite plane of the 2D sheet for GB, pristine BP sheets with a width of approximately 17 Å were added to each side of the GB. A vacuum with a thickness of 8 Å was added above and below the 2D sheet to avoid the influence of periodic neighboring images.
Code and Model Implementation
All the GA and GNN codes were written in Python 3.7, and the Pytorch v1.1138 and PyTorch-Geometric v2.0.439 libraries were used to implement the GNN model.
Results and Discussion
In this section, details for training and testing of the GNN model in the previous section are discussed. Considering the high computational cost of the DFT simulations for generating data sets, we first used the Tersoff potential with the LAMMPS code to generate an unrelaxed Tersoff data set of 5000 GB structures. The maximum height, midsection width, and interatomic distance of all structures were controlled. By gradually reducing the training set size, we confirmed that 400 is the minimum training set size to use without dramatically increasing the error when testing the trained model. We also verified that a random sampling method for sampling the structures for training has an insignificant impact on the error when compared with the even sampling technique. To test our model and to replicate the energy hypersurface of the Tersoff potential, 5000 additional geometrically optimized (relaxed) structures using the Tersoff potential were generated. We named this data set containing both relaxed and unrelaxed structures (10 000 in total) the complete Tersoff data set. Based on this data set, we confirmed that 950 unrelaxed structures along with 950 relaxed structures are the minimum training set size to allow our model to achieve the average MAE for relaxed and unrelaxed structures of 1.507 and 1.617 eV, respectively. Finally, ∼2200 Tersoff relaxed structures and ∼2200 unrelaxed structures were then selected to calculate the total energies with a single shot DFT calculation, henceforth referred to as the SS DFT data set. As before, the minimum size of the training set was 950 unrelaxed structures and 950 relaxed structures. Using this size of a training set, a percentage MAE of less than 0.5% was achieved with the SS DFT data set. Finally, we performed MOGA searches with the GNN model trained on the complete Tersoff data set and again with the SS DFT data set. By analyzing the search results, we demonstrate that our model can replicate the energy hypersurface and provides a good estimation of DFT energy values.
Unrelaxed Tersoff Data set and Downsampling
To test the performance of our model and avoid wasting computational resources, a test data set was generated using a Tersoff potential before we generated a data set using DFT simulations. We first randomly created GB structures, as shown in Figure 1, consistent with the GA generated structures where only the midsection varies. Then, the Tersoff potential28 was used to evaluate the energy values of these structures. For the structures in the data set shown below, both the left and right bulk regions of the structures have rotation angles equal to 19.11°. The width of the GB region was randomly set to 1–9 Å, and the maximum height difference of the atomic coordinates was limited to a maximum of 2 Å. The interatomic distance for atoms within the cell was also set larger than 2.14 Å, which is 2 times the covalent radius40 from the Atomic Simulation Environment package.41 A total of 5000 structures were generated with this method, and their total energy was evaluated using an in-house Tersoff potential28 with the LAMMPS code. We found that the total energy distribution follows a bell curve, as shown in Figure 3, and the energy range is between −221.83 eV and −190.94 eV.
Figure 3.

Total energy distribution of the unrelaxed Tersoff data set and the average MAE on the test sets for different training set sizes used on this data set.
Using this Tersoff generated test data set, we determined the minimum number of structures required for training the GNN model by gradually decreasing the number of structures for training. Starting at 4000 structures as the training set, which is 80% of all structures, this number was decreased step by step until only 100 structures remained for training. The remaining structures in the set were evenly split into a validation set and a test set for each step.
For the model hyperparameters, eight convolutional layers were tested to achieve the best performance and were adopted to maximize the connectivity information from the graph while avoiding oversmoothing,42,43 which happens when nodes are connected with too many neighbors and small differences between convoluted node features lead to a performance drop. For each GIN convolution layer, a two-layer MLP with 600 neurons in each layer is used to generate the node embedding for each layer. Two fully connected layers were attached after the global pooling layer to make the final prediction. For the training of the model, the L1 loss and Adam optimizer44 were used, and the initial learning rate was set to 1 × 10–4. The step decay learning rate annealing was implemented, for which the initial learning rate will be multiplied by a factor equal to 0.8 every 5 epochs. The GNN model was trained for a total of 100 epochs, with a training batch size equal to 128 structures and a 0.25 dropout rate for both the convolution layers and the fully connected layers to avoid overfitting during the training. All hyperparameters are summarized in Table S1 in the Supporting Information.
For each training set, three independent trainings of the GNN were performed. For each independent training of the GNN, the data set was randomly shuffled before being split into new training, validation, and test sets. The model with the best performance on the validation set was chosen during 100 epochs of training. The MAE on the test set for this model was recorded. The average MAE for the three independent trainings of the same training set size was plotted against the training set size as shown in Figure 3. As one can see, the MAE increases sharply after the training set size becomes smaller than 500. With 400 structures for training, the model can achieve a MAE of 2.051 eV.
To better understand the impact of the training set size and its distribution on the results, we further analyzed the total energy distribution for the smaller training set size. Figure 4 shows the distributions of the total energy for smaller training sizes (100–400) for one of the three iterations. As one can see, the energy range coverage for the sparse training set sizes is narrower than the range for the complete data set. To avoid possible performance reduction due to poor sampling among the energy range, adjustments were made to the sampling method. All structures were first ranked according to their energy value and then divided into 30 intervals. Different sizes of the training set were then made by sampling evenly within each interval. Figure 4 also shows the average MAE of three independent training tests, for which the structures are sampled using the above-mentioned sampling method. One can see that the MAE for the smallest training set decreases. However, the improvement for larger training sizes diminishes as compared to the original random sampling. Thus, this suggests that if the smallest training size is needed, then sampling evenly through the total energy spectrum will improve the training set results. As the training set sizes increase, random sampling can be performed without any performance loss. For both sampling methods, the MAE dramatically changes when 400–500 structures are used for training.
Figure 4.
Total energy distributions for sparse training set sizes equal to (a) 100 structures, (b) 200 structures, (c) 300 structures, and (d) 400 structures; MAE on the test set vs training set size. Training structures are evenly sampled from each energy interval.
Complete Tersoff Data Set and Model Performance
A sparse training data set is crucial to determine the minimum number of necessary calculations when a first-principles method is used to generate a reliable data set. In addition, to accurately capture the energy hypersurface predicted by the Tersoff potential and DFT, it is also necessary to include structures with a broad range of the total energy in the data set. From the previous data set shown in Figure 3, the average energy value of the data set is −207.58 eV and the difference between the maximum and minimum of the total energy is approximately 30.89 eV. To generate relaxed structures for training, new structures were first randomly generated as discussed in the previous section, and each structure was then relaxed using the Tersoff potential with the LAMMPS package. After the relaxation, we transferred the structures into a graph and used the subgraph and bridge concept in graph theory to check if the structure had low connectivity. If bridges or subgraphs exist in the graph, which correspond to a single bond connection or complete fracture between left and right bulk sides, respectively, then the structures were removed because these low connectivity structures would not be considered as physically viable 2D nanosheets with a GB. Example structures with low connectivity and the details for ruling out these structures are discussed in the Supporting Information. Otherwise, the relaxed structure and corresponding energy were added to the data set. This process was iterated until 5000 relaxed structures were generated. The combined data set containing 5000 unrelaxed structures and 5000 relaxed structures is herein referred to as the complete Tersoff data set. The total energy histogram for the complete Tersoff data set is shown in Figure 5. Details for minimization can be found in “LAMMPS and DFT Simulation Details” section in the Methods section.
Figure 5.
Energy distribution for the complete Tersoff data set and average of MAE obtained on the test sets plotted against the training set size for Tersoff relaxed and unrelaxed structures. All energy values are evaluated by the Tersoff potential with the LAMMPS code.
As seen in the inset of Figure 5, for the complete Tersoff data set, the energy range for the relaxed structures is from −269.4 eV to −228.2 eV, with a mean value equal to −247.5 eV and a standard deviation of 6.3 eV. For the unrelaxed structures, the energy falls within the interval of −228.7 eV to −190.9 eV, the mean value is −208.9 eV, and the standard deviation is 5.9 eV, which is slightly smaller than the value for relaxed structures. Although an energy range of 40.6 eV for relaxed structures is slightly larger than 37.77 eV for the unrelaxed structures, we noticed that the distribution for both types is similar and there is no energy overlap. Therefore, the relaxed and unrelaxed structures were treated as two separate data sets and the GNN was trained with an equal number of structures for each type.
Figure 5 also shows the number of structures used for training as the number increased from 400 to 1500 and the average MAE was obtained. The remaining structures were split into two sets for test and validation. For each set, the number of relaxed and unrelaxed structures was kept equal. Three training iterations were performed for different sizes of the training set, and for each iteration, the structures are randomly selected from the data set. During all the epochs of training, the model that performed the best on the validation set was chosen and the MAE on the test set was recorded. The average value of the MAE for three training iterations is plotted versus the number of structures used for training in Figure 5. From this plot, one can see that compared to the data set containing only unrelaxed structures, the training on the complete Tersoff data set needs a higher number of structures to achieve the best performance. The MAE reduced to a plateau when approximately 950 structures for both unrelaxed and relaxed structures were used for training, and the average MAE for the relaxed and unrelaxed structures was 1.507 and 1.617 eV, respectively. Figure 6a is the parity plot for one of the training iterations using 950 relaxed and 950 unrelaxed structures. The average MAE obtained using 950 relaxed and unrelaxed structures are 1.01 and 1.69 eV, respectively. With lower energy structures added to the data set, the GNN model was able to predict a much broader range of the total energy, therefore replicating a larger area of the energy hypersurface.
Figure 6.
Parity plots on the test set when using 950 data for each type of structure for training on (a) the complete Tersoff data set and (b) the single shot DFT data set.
Single Shot DFT Data Set and Model Performance
Considering the high cost of the DFT geometry optimization, structures were sampled from the complete Tersoff data set. The structures from the complete Tersoff data set were first sorted based on their Tersoff evaluated energy and then put into two subsets, one for relaxed and one for unrelaxed, respectively. Each subset was divided into 30 intervals based on their energy. The structures were sampled evenly within these intervals to guarantee diversity across the energy range. Approximately 2200 structures relaxed using the Tersoff potential, and another ∼2200 unrelaxed structures were selected to ensure there were enough structures for training, validation, and testing purposes. The energies of all sampled structures were then evaluated with single shot DFT calculation, and the details of DFT simulations are discussed in the “LAMMPS and DFT Simulation Details” in the Methods section. We label the data set with these structures and the resulting DFT evaluated total energy values as the SS DFT data set. The total energy range for the structures in this data set is from −375.32 eV to −305.22 eV. For training with the SS DFT data set, we increased the convolution layer number from 8 to 10. The number of neurons in the convolution layer and the fully connected layer was increased to 1024, and the cutoff value used to determine the connectivity between atoms was increased to 3.3 Å since DFT simulations capture a longer range of interactions compared to the Tersoff potential. The discussion of the influence of this cutoff value is included in the Supporting Information.
Adopting the same training and evaluation method from the last section, we determined that on the SS DFT data set, using 950 relaxed and 950 unrelaxed structures for training is adequate. The average MAE on the test sets when using this number of structures for training is 1.71 and 1.33 eV for relaxed and unrelaxed structures, respectively. The percentage MAE is 0.53% and 0.38% for relaxed and unrelaxed structures, respectively. Figure 6b shows one of the parity plots between the DFT total energy and the GNN model prediction, where the percentage mean absolute error was 0.55% and 0.41% for relaxed and unrelaxed structures, respectively. These small MAE errors demonstrate that our GNN model was able to accurately predict the total energy on the SS DFT data set enabling an evolutionary search with strong accuracy.
The motivation for using the Tersoff potential relaxed structures was to generate a large data set based on DFT energy hypersurface while reducing the computational cost for performing DFT geometry optimization. However, for Figure 6b, one may notice that some of the Tersoff relaxed structures have a higher DFT evaluated energy as compared to the unrelaxed structures, and data points in the parity plot are clustered. Both the reversal of the total energy between relaxed and unrelaxed structures and the clustering of the data set points in the plot are because we sampled the structures from the complete Tersoff data set for SS DFT energy calculations as mentioned at beginning of this section. The difference between the energy hypersurface of the Tersoff potential and DFT led to these effects. A more detailed explanation of the differences in the hypersurfaces can be found in the Supporting Information, where we have a histogram and parity plot for DFT energies of a small data set which included 81 DFT fully relaxed structures and 200 unrelaxed structures. Despite the above-mentioned effects, the SS DFT data set covers a broad range of total energies on DFT energy hypersurface and the GNN model trained on this data set can accurately predict the total energy of the BP structures with GBs as discussed above.
Evolutionary Search with GNN
GNN and Tersoff Surrogate Comparison
Considering that MOGA is a stochastic search method, to compare the search results between the different surrogates, six independent searches were performed: three using Tersoff as the surrogate and another three using GNN trained on the complete Tersoff data set. For each search process, a hall of fame (HOF) archive that contained structures with the best formation energy was updated at each generation, and searching was stopped when the maximum generation number of 5000 was reached. For the three searches using the Tersoff potential with LAMMPS, two of the searches converged to the structure shown in Figure 7, which has the lowest formation energy in the archive. This result is consistent with the results obtained for all three searches using the GNN surrogate. The remaining Tersoff-based search converged to a Stone–Wales defect; however, the structure in Figure 7 was also archived in the HOF of this search. This test demonstrates that the GNN model can successfully replicate the energy hypersurface of the Tersoff potential. To generate an accurate but also inexpensive surrogate, we further test the methodology with a DFT-generated data set.
Figure 7.

All three MOGA searches using the GNN model trained with the complete Tersoff data set as a surrogate predicted the following structure as well as two of the MOGA searches using the Tersoff potential.
DFT Trained GNN for the MOGA Search
Due to the high cost of DFT simulations, it cannot be directly used to evaluate the energy during MOGA searching. Therefore, we were not able to perform a similar comparison as in the previous section. To validate the searching performance of the GNN model trained on the SS DFT data set, three iterations of independent searching using the GNN model were performed, and the structures from the Pareto front of each search were extracted after the search was ended. Their energy was evaluated using single shot DFT calculations, and the difference in the total energy between permutation of all the structures was evaluated. The difference is then averaged over the number of permutations, and we label this average value as d. The MAE of the GNN predictions for the total energy of those structures was also calculated. The comparison of these two values for three searches is shown in Table 1. The three most stable structures obtained during the three MOGA searches were then fully relaxed using DFT simulations, and the formation energies were calculated as 0.181 eV/Å, 0.012 eV/Å, and 0.042 eV/Å. These formation energy values are comparable with that of black phosphorene (0.09–0.24 eV/Å)45 and much smaller than graphene (0.28–0.8 eV/Å).46 Additional details and a discussion of the structures are included in the Supporting Information. As shown in Table 1, the average total energy difference between the structures (d) in the Pareto front is at least 5 times larger than the MAE of the GNN model prediction. This demonstrates that the GNN model is capable of distinguishing different structures in the Pareto front in terms of their energy difference. The accuracy of the GNN model is sufficient for performing MOGA searching. We also evaluated the formation energies of all the structures using eq 7:
| 7 |
where Etot is the total energy evaluated with DFT or the GNN model. The variable μ is the energy per atom for the pristine BP sheet, and N is the number of atoms within the GB structures. We calculated the MAE for formation energies with the GNN and DFT calculations. The MAE of the formation energies is then normalized over the number of atoms for comparison with previously reported results and referred to as formation energy MAE in Table 1. In the work by Kirklin et al.30, the general accuracy of DFT evaluated formation energies was assessed by calculating the MAE between DFT predictions and experimental data. Our formation energy MAE values between the GNN model and DFT results are comparable to the 0.096 eV/atom MAE obtained by Kirklin et al., which indicates that our model provides a good estimation of DFT energy values with orders of magnitude lower required computational time.
Table 1. Average Difference of DFT Total Energy (d) and Mean Absolute Error of GNN (MAE) for Three Independent MOGA Searches Using the GNN Surrogate.
| search number | d (eV) | MAE (eV) | formation energy MAE (eV/atom) |
|---|---|---|---|
| 1 | 13.20 | 2.09 | 0.030 |
| 2 | 10.59 | 2.00 | 0.030 |
| 3 | 35.23 | 1.81 | 0.025 |
Conclusion
In this work, a GNN model to predict the energy of 2D GBs for BP was implemented and trained. Implementing the GIN convolution layers, we aimed to obtain the maximum expressive power of the GNN model, so that the model can to distinguish 2D sheets with different GB structures in the middle section. For training the GNN model, a Tersoff data set was first generated, and the number of structures needed for training the GNN model for the desired accuracy was determined. After the training set size is determined for the complete Tersoff data set, part of the randomly generated structures and the previously Tersoff minimized structures were fed into DFT simulations to generate a data set with DFT-evaluated energy values. The optimal training set size was again determined, and percentage errors lower than 0.5% were obtained using 950 unrelaxed structures and 950 Tersoff relaxed structures for training. With the validated GNN model, we performed MOGA searches for BP GB structures. We confirmed that, with the complete Tersoff data set trained GNN model, the GA search tended to converge to the same result as the Tersoff potential. This again shows that this trained model can replicate the energy hypersurface of Tersoff potential. We also performed MOGA searching with the GNN model trained on the SS DFT data set. We extracted the structures in the Pareto front of the searches and evaluated the total energy using DFT. The MAE for the formation energies of those structures is comparable to the previously reported MAE of DFT simulations. This indicates that our model provides a good estimation of the DFT energy values with a magnitude lower computational cost. By comparing the average DFT total energy difference and the MAE for the GNN prediction of those structures, we confirmed that the DFT-trained GNN can distinguish different energies and structures in the Pareto front and supports using MOGA searching to predict geometries of 2D lateral interfacing nanosheets with GB defects. The GNN model predicts energy values purely from the atomic positions, and this method tested on BP is generalizable to other 2D materials. For future work, we aim to develop GNN architecture that can also predict the forces on atoms as the derivative of energy. This will allow the GNN model to be more versatile and improve the MOGA search performance. We believe this method will accelerate 2D GB discovery and property study.
Acknowledgments
The use of the Center for Nanoscale Materials, an Office of Science user facility, was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Contract DE-AC02-06CH11357. This research used resources of the National Energy Research Scientific Computing Center, which was supported by the Office of Science of the U.S. Department of Energy under Contract DE-AC02-05CH11231. The authors acknowledge the support from the Argonne LDRD and UIC faculty start-up funds. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Data, Artificial Intelligence, and Machine Learning at DOE Scientific User Facilities program under Award 34532.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsami.3c01161.
Additional details for transferring GB structures into graphs; method for filtering out the disconnected GB structures; histogram and training results of the DFT R+U data set; results of the MOGA search based on the GNN that trained on the SS DFT data set; example structures and corresponding novelty score; hyperparameters used for training; and the challenges and future work of the GNN model (PDF)
Author Contributions
J.Z. developed the GNN architecture, MOGA searching method, and ran all simulations and training. A.K. conceived the idea of using GNN for defective 2D sheets. S.K.R.S.S. contributed to the downsampling method. C.M.L. contributed to the method for generating data sets. S.K.R.S.S. and C.M.L. supervised the research. C.M.L. helped J.Z. with writing the manuscript.
The authors declare no competing financial interest.
Notes
Data and Code Availability. The data sets generated and the code for training GNN are available in the Jupyter Notebook located at https://github.com/JannarZ/gnn_bp_gb_tersoff and https://github.com/JannarZ/gnn_bp_gb_dft. The complete Tersoff data set and the SS DFT data set can also be accessed through the link https://figshare.com/s/5a0199a77fcf49348a4d. The data generated during the MOGA search and a sample code for performing the MOGA search are available at the GitHub repository: https://github.com/JannarZ/moga_gnn_search.
Supplementary Material
References
- Zhang J.; Zhao J.; Lu J. Intrinsic Strength and Failure Behaviors of Graphene Grain Boundaries. ACS Nano 2012, 6 (3), 2704–2711. 10.1021/nn3001356. [DOI] [PubMed] [Google Scholar]
- Yazyev O. v.; Louie S. G. Electronic Transport in Polycrystalline Graphene. Nat. Mater. 2010, 9 (10), 806–809. 10.1038/nmat2830. [DOI] [PubMed] [Google Scholar]
- Banhart F.; Kotakoski J.; Krasheninnikov A. v. Structural Defects in Graphene. ACS Nano 2011, 5, 26–41. 10.1021/nn102598m. [DOI] [PubMed] [Google Scholar]
- Tan C.; Rodríguez-López J.; Parks J. J.; Ritzert N. L.; Ralph D. C.; Abruña H. D. Reactivity of Monolayer Chemical Vapor Deposited Graphene Imperfections Studied Using Scanning Electrochemical Microscopy. ACS Nano 2012, 6 (4), 3070–3079. 10.1021/nn204746n. [DOI] [PubMed] [Google Scholar]
- Najmaei S.; Yuan J.; Zhang J.; Ajayan P.; Lou J. Synthesis and Defect Investigation of Two-Dimensional Molybdenum Disulfide Atomic Layers. Acc. Chem. Res. 2015, 48 (1), 31–40. 10.1021/ar500291j. [DOI] [PubMed] [Google Scholar]
- Jeong H. Y.; Jin Y.; Yun S. J.; Zhao J.; Baik J.; Keum D. H.; Lee H. S.; Lee Y. H. Heterogeneous Defect Domains in Single-Crystalline Hexagonal WS2. Adv. Mater. 2017, 29 (15), 1605043 10.1002/adma.201605043. [DOI] [PubMed] [Google Scholar]
- Duan X.; Wang C.; Shaw J. C.; Cheng R.; Chen Y.; Li H.; Wu X.; Tang Y.; Zhang Q.; Pan A.; Jiang J.; Yu R.; Huang Y.; Duan X. Lateral Epitaxial Growth of Two-Dimensional Layered Semiconductor Heterojunctions. Nat. Nanotechnol 2014, 9 (12), 1024–1030. 10.1038/nnano.2014.222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun Q.; Dai Y.; Ma Y.; Yin N.; Wei W.; Yu L.; Huang B. Design of Lateral Heterostructure from Arsenene and Antimonene. 2d Mater. 2016, 3 (3), 035017. 10.1088/2053-1583/3/3/035017. [DOI] [Google Scholar]
- Li Q.; Ma X.; Zhang L.; Wan X. G.; Rao W. Theoretical Design of Blue Phosphorene/Arsenene Lateral Heterostructures with Superior Electronic Properties. J. Phys. D: Appl. Phys. 2018, 51 (25), 255304. 10.1088/1361-6463/aac563. [DOI] [Google Scholar]
- Lin C. Y.; Zhu X.; Tsai S. H.; Tsai S. P.; Lei S.; Shi Y.; Li L. J.; Huang S. J.; Wu W. F.; Yeh W. K.; Su Y. K.; Wang K. L.; Lan Y. W. Atomic-Monolayer Two-Dimensional Lateral Quasi-Heterojunction Bipolar Transistors with Resonant Tunneling Phenomenon. ACS Nano 2017, 11 (11), 11015–11023. 10.1021/acsnano.7b05012. [DOI] [PubMed] [Google Scholar]
- Hong W.; Shim G. W.; Yang S. Y.; Jung D. Y.; Choi S.-Y. Improved Electrical Contact Properties of MoS2-Graphene Lateral Heterostructure. Adv. Funct Mater. 2019, 29 (6), 1807550 10.1002/adfm.201807550. [DOI] [Google Scholar]
- Zhang J.; Srinivasan S.; Sankaranarayanan S. K. R. S.; Lilley C. M. Evolutionary Inverse Design of Defects at Graphene 2D Lateral Interfaces. J. Appl. Phys. 2021, 129 (18), 185302. 10.1063/5.0046469. [DOI] [Google Scholar]
- Lee G. do; Wang C. Z.; Yoon E.; Hwang N. M.; Kim D. Y.; Ho K. M. Diffusion, Coalescence, and Reconstruction of Vacancy Defects in Graphene Layers. Phys. Rev. Lett. 2005, 95 (20), 1–4. 10.1103/PhysRevLett.95.205501. [DOI] [PubMed] [Google Scholar]
- Jiang J.-W.; Rabczuk T.; Park H. S. A Stillinger-Weber Potential for Single-Layered Black Phosphorus, and the Importance of Cross-Pucker Interactions for a Negative Poisson’s Ratio and Edge Stress-Induced Bending. Nanoscale 2015, 7, 6059. 10.1039/C4NR07341J. [DOI] [PubMed] [Google Scholar]
- Zhang X.; Xie H.; Hu M.; Bao H.; Yue S.; Qin G.; Su G. Thermal Conductivity of Silicene Calculated Using an Optimized Stillinger-Weber Potential. Phys. Rev. B 2014, 89, 54310 10.1103/PhysRevB.89.054310. [DOI] [Google Scholar]
- Zhang Y.-Y.; Pei Q.-X.; Sha Z.-D.; Zhang Y.-W. A Molecular Dynamics Study of the Mechanical Properties of H-BCN Monolayer Using a Modified Tersoff Interatomic Potential. Phys. Lett. A 2019, 383, 2821–2827. 10.1016/j.physleta.2019.05.055. [DOI] [Google Scholar]
- Islam A. S. M. J.; Islam M. S.; Ferdous N.; Park J.; Bhuiyan A. G.; Hashimoto A. Anomalous Temperature Dependent Thermal Conductivity of Two-Dimensional Silicon Carbide. Nanotechnology 2019, 30, 445707. 10.1088/1361-6528/ab3697. [DOI] [PubMed] [Google Scholar]
- Fung V.; Zhang J.; Juarez E.; Sumpter B. G. Benchmarking Graph Neural Networks for Materials Chemistry. npj Comput. Mater. 2021, 7 (1), 1–8. 10.1038/s41524-021-00554-0. [DOI] [Google Scholar]
- Karamad M.; Magar R.; Shi Y.; Siahrostami S.; Gates I. D.; Farimani A. B. Orbital Graph Convolutional Neural Network for Material Property Prediction. Phys. Rev. Mater. 2020, 4 (9), 093801 10.1103/PhysRevMaterials.4.093801. [DOI] [Google Scholar]
- Xie T.; Grossman J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120 (14), 145301 10.1103/PhysRevLett.120.145301. [DOI] [PubMed] [Google Scholar]
- Louis S.-Y.; Zhao Y.; Nasiri A.; Wang X.; Song Y.; Liu F.; Hu J. Graph Convolutional Neural Networks with Global Attention for Improved Materials Property Prediction. Phys. Chem. Chem. Phys. 2020, 22 (32), 18141–18148. 10.1039/D0CP01474E. [DOI] [PubMed] [Google Scholar]
- Park C. W.; Wolverton C. Developing an Improved Crystal Graph Convolutional Neural Network Framework for Accelerated Materials Discovery. Phys. Rev. Mater. 2020, 4 (6), 063801 10.1103/PhysRevMaterials.4.063801. [DOI] [Google Scholar]
- Zhou J.; Cui G.; Hu S.; Zhang Z.; Yang C.; Liu Z.; Wang L.; Li C.; Sun M. Graph Neural Networks: A Review of Methods and Applications. AI Open 2020, 1, 57–81. 10.1016/j.aiopen.2021.01.001. [DOI] [Google Scholar]
- Choudhary K.; DeCost B. Atomistic Line Graph Neural Network for Improved Materials Property Predictions. npj Comput. Mater. 2021, 7 (1), 1–8. 10.1038/s41524-021-00650-1. [DOI] [Google Scholar]
- Pandey S.; Qu J.; Stevanovic V.; St. John P.; Gorai P.; Stevanovi V. A Graph Neural Network for Predicting Energy and Stability of Known and Hypothetical Crystal Structures. ChemRxiv 2021, 10.26434/chemrxiv.14428865.v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai M.; Demirel M. F.; Liang Y.; Hu J. M. Graph Neural Networks for an Accurate and Interpretable Prediction of the Properties of Polycrystalline Materials. npj Computational Materials 2021, 7 (1), 1–9. 10.1038/s41524-021-00574-w. [DOI] [Google Scholar]
- Xu K.; Jegelka S.; Hu W.; Leskovec J. How Powerful Are Graph Neural Networks?. arXiv 2019, 10.48550/arxiv.1810.00826. [DOI] [Google Scholar]
- Koneru A.; Batra R.; Manna S.; Loeffler T. D.; Chan H.; Sternberg M.; Avarca A.; Singh H.; Cherukara M. J.; Sankaranarayanan S. K. R. S. Multi-Reward Reinforcement Learning Based Bond-Order Potential to Study Strain-Assisted Phase Transitions in Phosphorene. J. Phys. Chem. Lett. 2022, 13, 1886–1893. 10.1021/acs.jpclett.1c03551. [DOI] [PubMed] [Google Scholar]
- Murata T.; Ishibuchi H.. MOGA: Multi-Objective Genetic Algorithms. Proceedings of 1995 IEEE International Conference on Evolutionary Computation; IEEE, 1995; Vol. 1, pp 289–294. [Google Scholar]
- Kirklin S.; Saal J. E.; Meredig B.; Thompson A.; Doak J. W.; Aykol M.; Rühl S.; Wolverton C. The Open Quantum Materials Database (OQMD): Assessing the Accuracy of DFT Formation Energies. npj Comput. Methods 2015, 15010. 10.1038/npjcompumats.2015.10. [DOI] [Google Scholar]
- Plimpton S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics. 1995, 117, 1–19. 10.1006/jcph.1995.1039. [DOI] [Google Scholar]
- Deb K.; Pratap A.; Agarwal S.; Meyarivan T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 2002, 6 (2), 182–197. 10.1109/4235.996017. [DOI] [Google Scholar]
- Fortin F.-A.; Rainville F.-M. de; Gardner M.-A.; Parizeau M.; Gagné C. DEAP: Evolutionary Algorithms Made Easy. J. Mach. Learn. Res. 2012, 13 (70), 2171–2175. [Google Scholar]
- Koutra D.; Parikh A.; Ramdas A.; Xiang J.. Algorithms for Graph Similarity and Subgraph Matching. Proc. Ecol. Inference Conf. 2011, 17. [Google Scholar]
- Kresse G.; Joubert D. From Ultrasoft Pseudopotentials to the Projector Augmented-Wave Method. Phys. Rev. B Condens Matter Mater. Phys. 1999, 59 (3), 1758–1775. 10.1103/PhysRevB.59.1758. [DOI] [Google Scholar]
- Kresse G.; Furthmüller J. Efficient Iterative Schemes for Ab Initio Total-Energy Calculations Using a Plane-Wave Basis Set. Phys. Rev. B Condens Matter Mater. Phys. 1996, 54 (16), 11169–11186. 10.1103/PhysRevB.54.11169. [DOI] [PubMed] [Google Scholar]
- Perdew J. P.; Burke K.; Ernzerhof M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996, 77 (18), 3865–3868. 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html (accessed May 27, 2022).
- Fey M.; Lenssen J. E. Fast Graph Representation Learning with PyTorch Geometric. arXiv 2019, 10.48550/arxiv.1903.02428. [DOI] [Google Scholar]
- Cordero B.; Gómez V.; Platero-Prats A. E.; Revés M.; Echeverría J.; Cremades E.; Barragán F.; Alvarez S. Covalent Radii Revisited. Dalton Transactions 2008, (21), 2832–2838. 10.1039/b801115j. [DOI] [PubMed] [Google Scholar]
- Hjorth Larsen A.; JØrgen Mortensen J.; Blomqvist J.; Castelli I. E.; Christensen R.; Dułak M.; Friis J.; Groves M. N.; Hammer B.; Hargus C.; Hermes E. D.; Jennings P. C.; Bjerre Jensen P.; Kermode J.; Kitchin J. R.; Leonhard Kolsbjerg E.; Kubal J.; Kaasbjerg K.; Lysgaard S.; Bergmann Maronsson J.; Maxson T.; Olsen T.; Pastewka L.; Peterson A.; Rostgaard C.; SchiØtz J.; Schütt O.; Strange M.; Thygesen K. S.; Vegge T.; Vilhelmsen L.; Walter M.; Zeng Z.; Jacobsen K. W. The Atomic Simulation Environment - A Python Library for Working with Atoms. J. Phys.: Condens. Matter 2017, 29, 273002. 10.1088/1361-648X/aa680e. [DOI] [PubMed] [Google Scholar]
- Li Q.; Han Z.; Wu X.. Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning. Thirty-Second AAAI Conference on Artificial Intelligence; AAAI, 2018. [Google Scholar]
- Cai C.; Wang Y. A Note on Over-Smoothing for Graph Neural Networks. arXiv 2020, 10.48550/arXiv.2006.13318. [DOI] [Google Scholar]
- Kingma D. P.; Ba J. L.. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings; ICLR, 2014. [Google Scholar]
- Guo Y.; Zhou S.; Zhang J.; Bai Y.; Zhao J. Atomic Structures and Electronic Properties of Phosphorene Grain Boundaries. 2d Mater. 2016, 3 (2), 025008 10.1088/2053-1583/3/2/025008. [DOI] [Google Scholar]
- Zhang J.; Zhao J. Structures and Electronic Properties of Symmetric and Nonsymmetric Graphene Grain Boundaries. Carbon N Y 2013, 55, 151–159. 10.1016/j.carbon.2012.12.021. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





