Pattern Learning Electronic Density of States

Byung Chul Yeo; Donghun Kim; Chansoo Kim; Sang Soo Han

doi:10.1038/s41598-019-42277-9

. 2019 Apr 10;9:5879. doi: 10.1038/s41598-019-42277-9

Pattern Learning Electronic Density of States

Byung Chul Yeo ¹, Donghun Kim ¹, Chansoo Kim ¹, Sang Soo Han ^1,^✉

PMCID: PMC6458116 PMID: 30971723

Abstract

Electronic density of states (DOS) is a key factor in condensed matter physics and material science that determines the properties of metals. First-principles density-functional theory (DFT) calculations have typically been used to obtain the DOS despite the considerable computation cost. Herein, we report a fast machine learning method for predicting the DOS patterns of not only bulk structures but also surface structures in multi-component alloy systems by a principal component analysis. Within this framework, we use only four features to define the composition, atomic structure, and surfaces of alloys, which are the d-orbital occupation ratio, coordination number, mixing factor, and the inverse of miller indices. While the DFT method scales as O(N³) in which N is the number of electrons in the system size, our pattern learning method can be independent on the number of electrons. Furthermore, our method provides a pattern similarity of 91 ~ 98% compared to DFT calculations. This reveals that our learning method will be an alternative that can break the trade-off relationship between accuracy and speed that is well known in the field of electronic structure calculations.

Introduction

Electronic density of states (DOS) plays a tremendously important role in determining the properties of metals¹. Researchers in the fields of solid-state and condensed matter physics carefully diagnose density distributions of free electrons in metals to understand scientific concepts that are hidden in such density distributions (e.g., the d-band center theory)² and to develop new materials^3,4.

Quantum mechanical approaches (e.g., density functional theory) shed light on the nature of electrons in metals, and first-principles density functional theory (DFT) calculations are successful methods to develop the electronic DOS of metals. Although quantum mechanical methods provide a high accuracy, they have the disadvantage of a severe computational workload, which originates from the complexity of many-body systems⁵. Thus, many researchers are seeking a fast method to predict electronic structures of materials with a high accuracy^6–9.

Within quantum mechanical frameworks, their high computational cost limits the system size that can be studied. To circumvent such causality-based frameworks, an inductive method can be realized by utilizing data and statistical learning algorithms^10–17. Recently, a machine-learning approach was pursued to address different quantum mechanical problems^18,19, and in particular, to predict the electronic structures of alloys, e.g., to predict the DOS values at the Fermi level²⁰ or the d-band centers²¹. However, to date, these attempts have been limited to the prediction of only single value, and no machine-learning technique is available for the prediction of DOS patterns that includes both the value and shape.

Herein, we propose a new perspective on the representation of DOS that has been regarded as multi-dimensional digital data from one-dimensional continuous curves. Using principal component analysis, we identified highly correlated DOS patterns for various metal systems and proposed features to determine the correlation between the DOS patterns and the atomic structures of materials in a linear subspace. We successfully reproduced the DOS patterns of alloys usually found by quantum mechanical approaches, which is independent of the number of electrons in the system. Furthermore, our method achieves a small loss of accuracy (Accuracy >90%) compared to DFT calculations. The DOS pattern learning method can provide a breakthrough in the trade-off relationship between accuracy and speed, which is well known in the field of electronic structure calculations. Moreover, the approach is applicable for predicting DOS patterns in not only bulk structures but also of surfaces in multi-component alloy systems.

Results

When mapping DOS patterns from the atomic structures of alloys, there is a mathematical puzzle, i.e., the number of input material labels (e.g., compositions, crystal structures, and lattice parameters) is much smaller than the number of output DOS values at the corresponding energy levels. Accordingly, we first compressed the output information by digitizing an analog signal of the DOS in a rectangular window to one multi-dimensional vector, as shown in Fig. 1a. Next, we applied principal component analysis (PCA), an unsupervised learning technique, to reduce the high-dimensional data to a low-dimensional data set^22,23. Then, we could build a model to represent the DOS patterns.

Scheme of the pattern learning (PL) method for learning and predicting electronic DOSs. (a) Conversion of a DOS pattern from a continuous energy function in a rectangular window to a digital image vector with M × N entries. (b) Learning process of PCs of A_xB_1−x alloys with their DOS patterns. x_i is a row vector where M and N correspond to the grid size of the DOS window, and $\bar{x}$ is the average value of the entries in the row vectors. As a training system for learning, five compositions (A, A_0.75B_0.25, A_0.50B_0.50, A_0.25B_0.75, and B) are considered on the left side. A covariance matrix, Y, is constructed in the middle. PCA determine the eigenvectors, which are PCs, and eigenvalues of the training data set, which are shown on the right side. (c) The prediction process of an unknown DOS pattern for an arbitrary alloy, A_xB_1-x. The process involves several steps: (1) estimation of PC coefficients using features, including n_d, CN, and F_mix; (2) estimation of a new DOS image vector; (3) production and utilization of the DOS probability matrix; and (4) prediction of the DOS pattern for the test alloy, A_xB_1-x, using a probability matrix.

Learning process of DOS patterns

In the learning process of the DOS pattern (ρ), PCA was employed in which, we implemented Python code with matrices operation package NumPy²⁴ for the analysis. Mathematically, this code finds the maximum variance of linearly independent eigenvectors. Prior to the analysis, DOS image vectors were digitized in a rectangular window. In our study, we considered an energy range from −10 eV to 5 eV and a DOS range from 0 to 3. We standardized the DOS image vectors of the training data by obtaining the normalized matrix Y in which the i^th row (y_i) of Y is $x_{i} - \bar{x}$ , where $\bar{x}$ is the mean of each column vector of X. Then, we calculated the eigenvectors, u_p = ${(u_{1}, u_{2}, \dots, u_{M \times N})}_{p}$ , and the corresponding eigenvalues, λ_p, were calculated by the covariance matrix, S = Y^TY, according to Eq. (1):

S u_{p} = λ_{p} u_{p}

Here, the eigenvectors are called principal components (PCs), and the corresponding eigenvalues describe the data variance along the PCs.

The original vector x can be reconstructed by using the following Eq. (2):

x \approx \sum_{p = 1}^{P} (y^{T} u_{p}) u_{p} + \sum_{p = 1}^{P} ({\bar{x}}^{T} u_{p}) u_{p} = \sum_{p = 1}^{P} α_{p} u_{p}

where P is the number of PCs and p is their index. Thus, coefficient α_pof the eigenvectors can be computed by $y^{T} u_{p} + {\bar{x}}^{T} u_{p}$ , and it corresponds to the coordinate values on the linear subspace that is composed of PCs.

In the learning process using the PCA, we identified the linear subspace for which the orthogonal projections of the image vector, x, have a maximum variance, and we learned the eigenvectors, u, of the training systems in the linear subspace (Fig. 1b). The original image vectors can be reconstructed by $\sum_{p = 1}^{P} α_{p} u_{p}$ .

Predicting process of DOS patterns

During the predicting process for the DOS pattern (a new image vector, x′) of a test alloy, as shown in Fig. 1c, we estimated the new coefficients, ${α'}_{p}$ , via a linear interpolation between α_p of the two training systems that is most similar to the test composition, where features relevant to the electron occupation and atomic configuration were considered (Supplementary Figs S1 and S2). Using $\sum_{p = 1}^{P} {α'}_{p} u_{p}$ , we obtained a new image vector, x′, and transformed from the x′ to the DOS probability matrix, X′, the elements of which are the probable values of each DOS levels at the given energy interval.

To predict DOS patterns, only a single DOS value must be determined by a given energy interval. Thus, we defined the DOS probability matrix originating from the DOS image vector. The DOS image vector, $x' = ({x'}_{1}, {x'}_{2}, \dots, {x'}_{M \times N})$ , calculated by the PCs and the estimated coefficients, was transformed to the DOS image matrix, I′, with M columns and N rows in a grid-based rectangular window, and its size is the same as the size used in the learning process (Supplementary Fig. S6). To define the DOS probability matrix, we considered only positive entries in the I′, and the other entries were regarded as zero. Moreover, we normalized all of the entries of the DOS levels at each energy interval. Then, we defined the DOS probability matrix, X′, with M columns and N rows, as given by

{X'}_{m, n} = \frac{{x'}_{m, n}}{\sum_{n} {x'}_{m, n}}

where ${x'}_{m, n}$ is the positive entry value of the column vector in X′, and m and n are the matrix indices.

To predict the DOS pattern with X′, one should determine a single DOS value at each energy interval. Therefore, we obtained the estimated DOS, which is ρ′ and is given by

ρ' = \sum_{m = 1}^{M} ρ' (E_{m}) = \sum_{m = 1}^{M} \sum_{n = 1}^{N} {{X'}_{m, n} \cdot ρ_{n} (E_{m})}

where E_m is the m^th energy interval, and ρ_n is the n^th DOS level.

Application into binary alloy systems

To test our pattern learning (PL) method, it was first applied to a Cu-Ni system. Thermodynamically, this alloy system shows a complete solid solution, indicating that the Cu and Ni atoms in the alloys are homogeneously mixed in a face-centered cubic (fcc) structure regardless of the composition. Thus, it is expected that the DOS of the alloy system follows intrinsic electronic structures of Cu and Ni crystals and their composition can be a key feature for the representation of their DOS patterns. Therefore, we define the d-orbital electron occupation rate (n_d) as an alloy composition-dependent feature that represents local DOS patterns of the d-orbitals. Moreover, all of the pristine Cu and Ni and their alloys have a fcc crystal structure, indicating that the effect of the atomic structure on the DOS pattern is not significant. Accordingly, to predict the DOS patterns in this system, we considered only n_d as a feature. After training the DOS patterns for various Cu-Ni compositions {Cu, Cu_0.75Ni_0.25, Cu_0.5Ni_0.5, Cu_0.25Ni_0.75, Ni}, we predicted the DOS of Cu_0.375Ni_0.625 as the test alloy (Fig. 2a) by considering three PCs. A comparison with the DFT results revealed that our method obtained the pattern similarity of 95% (σ = 0.95). However, the calculation time is less than 1 minute even on 1 core of an Intel Xeon CPU, whereas the DFT method requires approximately 2 hours on 16 cores of the CPU.

Prediction results of the PL method in binary alloy systems. (a) DOS pattern of Cu_0.375Ni_0.625 as a test of the Cu-Ni alloy system. Its atomic structure is shown in Fig. S3. The energy range (E − E_f) is from E = −10 eV to E = 5 eV, and the DOS range is from 0.0 to ± 3.0, where the positive region is for the up-spin, and the negative region is for the down-spin. Black corresponds to the DFT method, and pink corresponds to the learning method using only one feature of n_d. (b) DOS pattern of Cu_0.375Fe_0.625 as a test of the Cu-Fe alloy system. Its atomic structure is shown in Fig. S3. Black corresponds to the DFT method, pink corresponds to the learning method using the n_d feature, and violet corresponds to the learning method using all features including n_d, CN, and F_mix.

In contrast to the Cu-Ni system, the Cu-Fe system was also considered because the crystal structures of Cu and Fe are different and their alloys do not exhibit a complete solid solution. This implies that features based on the atomic structures in addition to n_d are required for the DOS representation. We introduced the coordination number (CN) and a mixing factor (F_mix) as features to distinguish the atomic structures. The CN was obtained by dividing the number of all bonds between two atoms by the total number of atoms in the system, where the bonds were calculated using the covalent atomic radii. F_mix indicates the ratio of the number of different pair bonds in the alloy system to the total number of bonds. Using F_mix, one can distinguish the atomic distributions in alloy systems that have the same CN (Supplementary Fig. S2).

To represent the DOS patterns for the test data, the coefficient ${α'}_{p}$ should be determined. Since the eigenvectors obtained after PCA correspond to the PC vectors, the distributed coefficients lying on identical eigenvectors were correlated with each other. Thus, we generated linear regression lines between the α_p of the training data in which we focused on the linear regression line between two training data near the test composition. Then, using the features of the training and test systems, we estimated the ${α'}_{k}$ contributions of n_d, CN, and F_mix ( $α_{p}^{' n_{d}}, α_{p}^{' CN}, α_{p}^{' F_{mix}}$ ) for the test system using the linear regression line (Fig. 1c). We defined the set of features as Φ = {n_d, CN, F_mix}. Here, it was assumed that the three features have equal weights so that

{α'}_{p} = \sum_{ϕ \in Φ} β_{ϕ} \cdot {α'}_{p}^{ϕ}

where β_φ is 1/3 for all the features. A detailed description of the estimation of the coefficients is also provided in Section 3 of the Supplementary Information (Table S1 and Fig. S7).

Using these three features (n_d, CN, and F_mix), the DOS of Cu_0.375Fe_0.625 was predicted (Fig. 2b). The use of only n_d leads to the pattern similarity of 78%, while the use of all three features improves the pattern similarity up to 95%. Even in the previously examined Cu-Ni system, consideration of CN and F_mix in addition to n_d can slightly improve the pattern similarity up to 96% for Cu_0.375Ni_0.625 (Supplementary Fig. S8). Furthermore, we calculated DOS patterns for new test data, Cu_0.625Ni_0.375 and Cu_0.625Fe_0.375, and then obtained the pattern similarities of 95% and 97%, respectively (Supplementary Fig. S9).

To highlight a novelty of our PL method, we additionally calculated the DOS patterns by a linear interpolation of the DOS patterns of the two nearest neighbors without PCA. Compared to the DFT calculation, the linear interpolation method shows the pattern similarities of 90% for Cu_0.375Ni_0.625 and 88% for Cu_0.375Fe_0.625 (Supplementary Fig. S10), which are lower accuracies than those predicted by our PL method.

Application into multi-component alloy systems

To extend our method to multi-component alloy systems, we also developed a method to represent the DOS patterns of ternary systems, using the example of the Cu-Ni-Pt system (Fig. 3). Figure 3a shows a triangular composition diagram of the Cu-Ni-Pt system, where a total of 15 compositions were considered as the training set: pure 3, binary 9, and ternary 3. Similar to the previous binary cases, by determining the coefficients (α_p) of the PCs for a ternary test composition, one can represent its DOS pattern. First, we selected three training compositions that were located closest to the test composition and calculated the distances (d) between the test composition and the three training compositions (Fig. 3b). Then, the ${α'}_{p}$ for the DOS representation were estimated by using the features and α_p at the training compositions, where it was assumed on physical grounds that the DOS pattern of the test composition is represented by the highest weight for the training composition that is nearest to the test composition. According to coherent-potential approximation (CPA)^25,26 that has been extensively used in calculating electronic structures of various alloy systems, the effective or coherent potential lattice can be represented by the average behavior of the A-B binary alloy. However, the single-site nature of the CPA limits its applicability to systems with negligible short-range order and local lattice relaxation effects²⁷. When estimating the ${α'}_{p}$ for the DOS representation of the test ternary alloy on the basis of the theory, we consider the three training compositions that is nearest to the test composition, rather than the pure compositions. The atomic structures at the training compositions that is nearest to the test composition include more similar atomic distribution information (e.g., atomic ordering or lattice relaxation) to the test alloy than those of the pure metals or others.

Estimation of coefficients and prediction results of the PL method in ternary alloy systems. (a) Triangular diagram of the Cu-Ni-Pt system representing the training data (circle) and test data (star). The equation for the calculation of the PCs coefficients for the test data is shown at the bottom of the figure: the equation is based on the coefficients and their weights for training alloys that most closely match the test alloy composition. (b) Maps of the weights of the coefficients of the PC vectors for the test composition (Cu_0.03Ni_0.03Pt_0.94). The weights depends on the distance between the test composition and each training composition, and they also depend on the difference of three features (n_d, CN, and F_mix) between the training and test data. (c) DOS pattern of the Cu_0.03Ni_0.03Pt_0.94 test alloy. (d) DOS pattern of the Cu_0.32Ni_0.34Pt_0.34 test alloy. Their atomic structures are shown in Supplementary Fig. S3. In (c,d), black corresponds to the DFT method, and pink corresponds to the learning method using all features including n_d, CN, and F_mix.

The basic idea for an A-B-C ternary case is similar to that of the binary case. In this case, we define the set of features as Φ = {n_d,A, n_d,B, n_d,C, CN_norm, F_mix}. The number of feature values for n_d depends on the number of elements in the multi-components case. Here, we also considered the differences (d_ij) in the feature values between the test and the adjacent three training compositions (Supplementary Fig. S11) as given by:

d_{i j} = \sum_{ϕ \in Φ} {(ϕ^{i} - ϕ^{j})}^{2}

where i and j are the selected data of the A-B-C alloy system, and n_d,A, n_d,B, n_d,C, CN_norm, and F_mix are feature values corresponding to the data. Of the three material features (n_d, CN, and F_mix), the n_d and F_mix values range from 0 to 1, whereas CN is greater than 1. To obtain units in the same range, we considered the normalized value of CN (CN_norm) by dividing the CN value by 12, which is based on the fact that the maximum CN value in the alloy system is 12 for a fcc structure. When the composition and crystal structure of a test alloy are more similar to the training data, the differences in the feature values decreases. We defined Ω as the set of the nearest three training data, ν as the test data, and ν′ as the training data. To estimate the PC coefficient ( ${α'}_{k, ν}$ ) for the test data, three weights (w) of the coefficients of the three training systems were calculated based on d_ij using Eq. (7):

w_{ν ν'} = \frac{d_{ν ν'}^{- 1}}{\sum_{ν' \in Ω} d_{ν ν'}^{- 1}}

The range of $w_{ν ν'}$ is from 0 to 1. Then, the estimated PC coefficients for the test alloy were calculated by

{α'}_{p, ν} = \sum_{ν' \in Ω} w_{ν ν'} \cdot α_{k, ν'}

For the example of the Cu-Ni-Pt system shown in Fig. 3, the n_d, CN, and F_mix of the training and test data are summarized in Supplementary Table S2. This approach was tested for two compositions: Cu_0.03Ni_0.03Pt_0.94 (Fig. 3c) and Cu_0.32Ni_0.34Pt_0.34 (Fig. 3d), and we determined that our method obtains the pattern similarity of 96%.

Using the similar procedure, our method can be readily extended to quaternary or quinary alloy systems. As an example, we considered the quinary system of Cu-Ni-Pt-Fe-Cr that is on an extension line of the ternary Cu-Ni-Pt system discussed in Fig. 3. We trained the DOS patterns for the 15 ternary Cu-Ni-Pt compositions in Fig. 3 and the 3 quaternary/quinary Cu-Ni-Pt-Fe-Cr compositions {Cu_0.315Ni_0.315Pt_0.25Fe_0.12, Cu_0.315Ni_0.315Pt_0.25Cr_0.12, and Cu_0.315Ni_0.315Pt_0.25Fe_0.06Cr_0.06}, and then predicted the DOS patterns of Cu_0.315Ni_0.315Pt_0.25Fe_0.03Cr_0.09 and Cu_0.315Ni_0.315Pt_0.25Fe_0.09Cr_0.03 as test quinary systems, in which the atomic structures of the training and test quaternary/quinary systems are shown in Supplementary Fig. S4. Compared to the DFT results, our method reveals the superior pattern similarity: 97% for Cu_0.315Ni_0.315Pt_0.25Fe_0.03Cr_0.09 and 96% for Cu_0.315Ni_0.315Pt_0.25Fe_0.09Cr_0.03 (Supplementary Fig. S12). This reveals a high utility of our method. In particular, it can be applied to the field of not only a metallic catalysis but also a high entropy alloy (HEA), in which the HEA is usually consisted of five or more metallic elements. Here, we need to compare our PL method with the CPA^25,26 that has been extensively used in calculating the electronic structure calculations in multicomponent random solid solutions. Recently, the electronic structures of various HEAs were calculated by the exact muffin-tin orbitals (EMTO) method in combination with the CPA^27,28. The EMTO-CPA is undoubtedly an accurate and efficient method. However, the computational complexity of the EMTO-CPA scales linearly with the number of atoms in the supercell²⁹, while our PL method is independent on the number of atoms in the supercell (the details will be discussed in the Discussion section). This implies that as a supercell size increases our PL method can be more efficient than the EMTO-CPA method.

Application into surface structures

By expanding the scheme that was applied to bulk structures, we studied the representation of the DOS patterns for surface structures of alloys. In particular, we used method to represent the DOS patterns of high-index surfaces based on those of low-index surfaces. Here, it is important to find a feature to define the surface structures, with which we can estimate the PC coefficients for a test surface. In Fig. 4a, a high-index surface, (211), can be regarded as a surface to connect two low-index surfaces, (011) and (111). Moreover, the step alignment of atoms on the (211) surface plane is generated by a combination of the atom alignments on the (011) and (111) surface plane. In this regard, we employed a lattice plane vector by using the miller indices, which consist of three integers, h, k, and l, and where the notation of the lattice plane vector is written by (hkl). Then, we defined the lattice plane that intercepts three points, ${\vec{L}}_{1}$ /h, ${\vec{L}}_{2}$ /k, and ${\vec{L}}_{3}$ /l, where ${\vec{L}}_{1}$ , ${\vec{L}}_{2}$ , and ${\vec{L}}_{3}$ are the lattice vectors in a conventional unit cell. Therefore, we considered the inverses of the miller indices, 1/h, 1/k, and 1/l, as the features regarding the surface plane orientations, and they are denoted by h′, k′, and l′, respectively. If one of the miller indices is zero, the feature value is set to be zero to avoid an infinity value.

Scheme of the PL method for the DOS representation of surface structures and the predicted results. (a) Two-dimensional cleaved lattice structure to represent the plane vectors of the (011) and (111) low-index surface and the (211) high-index surface. Here, red, green, and violet nodes represent the atoms on the surface layer for the (211), (111), and (011) plane vector, respectively. The dotted lines represent the periodicities of the (111) and (011) lattice vectors. (b) Three-dimensional cubic diagram of the lattice plane vectors in the coordinate system for the inverse of miller indices h′, k′, and l′ representing the training (black circle) and test (red star) data. (c) DOS pattern of the Cu (211) surface. (d) DOS pattern of the Cu_0.375Ni_0.625 (211) surface. In (c,d), black corresponds to the DFT method, and pink corresponds to the learning method (use of three PCs) using the inverse value of the miller index as a feature. The atomic structures of each surfaces can be found in Supplementary Fig. S4.

Figure 4b shows the three-dimensional (h′, k′, l′) vector space representing the inverses of miller indices for the training and test surfaces. The training samples include seven lattice plane vectors where all of the miller indices are lower than 2; {(001), (010), (100), (011), (101), (110), (111)}. The vectors correspond to low-index surface plane vectors. Then, after adding the origin vector, (000), and connecting all vectors of the training samples and the origin, a cubic geometrical figure can be obtained. The test sample is the vector of which the miller indices is larger than one, which corresponds to a high-index surface plane vector. Thus, the high index surface plane vectors can lie on an edge or face in the cube figure. In Fig. 4a, the alignments of atoms on the (211) surface plane is a combination of the atom alignments on the (011) and (111) surface planes. Therefore, we can estimate the DOS pattern for the (211) surface with those for the (011) and (111) surfaces. Here, for the three vectors, the miller indices k and l are the same, indicating that one can be distinguished only using the miller index h. During the predicting process of our method (Fig. 1c), we only used the h′ value to determine the PC coefficients, ${α'}_{p (211)}$ , by the linearly interpolating between the two coefficients for the (011) and (111) surfaces after performing the PCA using all training DOS data.

To validate our method for surface structures, we tested (211) surfaces of the pure Cu metal (Fig. 4c) and the Cu_0.375Ni_0.625 alloy (Fig. 4d). For the Cu case, our method provided the pattern similarity of 93% compared to DFT calculation (Fig. 4c), where we considered only three DOS data for Cu(001), (011), and (111) surfaces as the training data. When predicting the DOS pattern for the (211) surface of the Cu_0.375Ni_0.625 alloy (see Supplementary Fig. S13), we considered (001), (011), and (111) surfaces of five Cu-Ni alloys; {Cu, Cu_0.75Ni_0.25, Cu_0.5Ni_0.5, Cu_0.25Ni_0.75, Ni}. We first predicted the DOS patterns for (001), (011) and (111) surfaces of the Cu_0.375Ni_0.625 alloy with three features (n_d, CN, and F_mix), which is similar to the method used in the bulk case. Then, we performed PCA one more time for the DOS patterns for the low-index surfaces of the Cu_0.375Ni_0.625 alloy that were obtained after the first pattern learning method. Then, using the inverse value of the miller index as a feature, we predicted the DOS pattern for the (211) surface of the Cu_0.375Ni_0.625 alloy, and obtained the pattern similarity of 97% (Fig. 4d). In DFT calculations, the slab calculations are much more time-consuming than bulk calculations. However, our method provides a similar calculation speed for both bulk and surface systems. For example, for the Cu(211) and Cu_0.375Ni_0.625(211) surface, the DFT calculation takes 2 hours on 36 cores of the CPU, while our method is still less than 5 minute even on 1 core of the CPU.

Discussion

Although high-performance computing machines have been used practically thus far, we could still tackle large-scale first-principles calculations of more than hundreds atoms using the limited computing power. Regarding the computation cost, it is well-known that the DFT method scales as O(N³), where N is the number of electrons in the system³⁰. Indeed, a similar trend was observed in this work (Fig. 5a). However, our method remarkably shows a higher speed than DFT and requires only 1 minute regardless of N, although it depends on the training data size since our method needs to scan the entire training data. In addition, tight binding (TB) or density functional TB (DFTB) methods that are an approximate quantum mechanical approach are undoubtedly an efficient method to calculate electronic structures of materials. However, they require massive calculations for eigenvalue distributions of very large matrices^31–34. Thus, their computational complexity scales linearly with a dimension of the matrix, which depends on the number of atoms or orbitals in the alloy system³⁵. However, our PL method is independent of the system size, indicating that our method can show a higher speed than the TB or DFTB method.

Performances of the PL method compared with the DFT method. (a) Comparison of the calculation speeds of the learning method (cyan) and DFT (yellow) as a function of the number of electrons in the alloy systems. The learning method scales as O(1), indicating no dependence on the system size, whereas the DFT scales as O(N³). The calculation times for 1 core of CPU time and 80 alloy systems were considered. (b) The pattern similarity (σ) of the learning method for 10 test alloys in various binary alloy systems: Cu-Ni, Cu-Ru, Cu-Pd, Cu-Pt, Ni-Ru, Ni-Pd, Ni-Pt, Ru-Pd, Ru-Pt, and Pd-Pt. The yellow region is highlighted to show the ρ range of our learning model (91~98%).

Moreover, Fig. 5b shows the accuracy of our method for various binary alloy systems composed of 5 transition metals (Cu, Ni, Ru, Pd, Pt), and found that the pattern similarity is as high as 91~98%. Our method outperforms DFT calculations in terms of the calculation speed, and it loses little information compared to the DFT electronic structures. This clearly reveals that our method will be an alternative to break the trade-off relationship between accuracy and speed, which is well known in the field of electronic structure calculations. And, compared to TB or DFTB method, our PL method has another competence. Such TB approaches basically need a number of training DFT data to determine many TB parameters. For example, to accurately calculate the electronic structure of bulk Rh, 29 TB parameters fitted by ~7,400 training data were required³⁶. However, our PL method used the limited training data. As an example, the DOS (96% similarity) of an arbitral Cu-Ni binary alloy can be obtained with only 5 training data (Supplementary Fig. S8).

One of the novelties of this work is that the electronic DOS that was originally a function of an energy level can be expressed with a simple model in a linear combination form of few (three or four) PC bases. Here, we highlight the use of only three or four PCs. Although the available number of PCs in a learning process of DOS patterns is as many as the dimension of the DOS image vectors, the number of PCs highly contributing to the representation of the DOS is few, where the contribution can be evaluated with the eigenvalue for each PC. Thus, four PCs are enough to recognize the diversity of the DOS patterns in the training data. For example, in the binary alloy systems of Fig. 2 where five training data are considered, the contribution of each PC to the representation of the DOS pattern is 32.9% for the 1^st PC, 25.7% for the 2^nd PC, 22.2% for the 3^rd PC, and 19.1% for the 4^th PC, which indicates that the contribution of the remaining PCs is very miniscule (less than 0.1%). This clearly shows that only four PCs in the PCA are sufficient to represent DOS patterns.

The performance (calculation speed and accuracy) of our PL method is affected by the number of PCs and the grid size. Interestingly, due to the overfitting problem³⁷, the use of three PCs provides the most accurate DOS patterns in Cu-Ni alloy systems (Supplementary Fig. S14), although at least four PCs are required to fully represent the DOS patterns in training data as discussed in the above paragraph. The grid size also affects the pattern similarity (accuracy) and calculation time of our method (Supplementary Fig. S14). The use of a higher (or finer) grid size provides a higher accuracy, even though an improvement in the pattern similarity for a grid with a higher density than a 100 × 100 grid (M = 100) is not significant. However, for M > 100, the calculation time is significantly high. Thus, we should employ appropriate values with respect to the number of PCs and grid size to guarantee high pattern similarity (>90%) and low calculation time (<1 min).

To our knowledge, in this work, we presented the first machine-learning approach for calculating electronic DOS patterns (both of value and shape) with a strong accuracy and a fast speed. Moreover, our approach can handle a variety of spectrum image data of materials (X-ray photoelectron spectroscopy, X-ray diffraction, Raman spectrum, etc.). Toward an era of data-driven material design, the importance of material databases will continue to increase; however, the accumulation of data will be a serious bottleneck. In this regard, the fast generation of material databases will be a key in the future. Application of our PCA-based method into various image-type data will provide rapid and accurate prediction of various material properties in place of DFT calculations or other experimental measurements. Therefore, it is anticipated that our model will accelerate the construction of large-scale material databases as well as the design of materials in various fields such as catalysts and electronic devices^38,39.

Methods

Data selection

To represent the DOS pattern for a test alloy system with the learning model developed in this work, the relevant training data are required. The data include alloy compositions, crystal structures, and DOS patterns. In general, the more training data would provide a more accurate representation. However, since the main purpose of this work is to introduce a new scheme for obtaining DOS patterns by a machine-learning method, we used a limited data set. In a binary A-B system, we considered five data sets (two pure structures and three alloy ones). For the pure cases, we used the experimental crystal structure. For the alloy systems, the A_0.25B_0.75, A_0.5B_0.5, and A_0.75B_0.25 compositions were considered. Here, based on the thermodynamic phase diagram of the alloy system, the crystal structures of the three compositions were determined. If there exists an intermetallic phase at the alloy composition, we preferentially considered an intermetallic crystal structure (e.g., L1₀ for Pt_0.5Ni_0.5, and L1₂ for Pt_0.25Ni_0.75 and Pt_0.75Ni_0.25). On the other hand, for the cases where the intermetallic phase does not exist, atomically randomly mixed structures (i.e., solid-solution phases) were considered with two crystal structures of pure A and B. Among these two structures, we selected the more stable structure as determined by the DFT calculations. The compositions and atomic structures considered in bulk and slab structures study as training and test data are described in Supplementary Figs S3 and S4, respectively. Then, in slab structures study, the compositions of surface layers are considered as same as A_xB_1−x. The DOS patterns of the training structures were also obtained from the DFT calculations.

DFT calculation of electronic structures

All electronic structure calculations were performed using the Vienna ab initio simulation package (VASP)^40,41. The exchange-correlation energy was described by the revised Perdew-Burke-Ernzerhof (RPBE) exchange functional^42,43. The electronic wave functions were expanded in the plane-wave basis set with a kinetic energy cutoff of 520 eV. The effect of the core electrons was modeled by projector augmented-wave (PAW) potentials⁴⁴. The Brillouin zone was sampled using a Monkhorst-Pack k-point mesh, and the k-point sampling was set to 8 × 8 × 8 for bulk structures and 4 × 4 × 1 for slab structures. The bulk crystal structures were modeled using a 2 × 2 × 2 supercell (e.g., fcc: 32 atoms and bcc: 16 atoms), and the slab crystal structures were simulated periodically with four layer cells. In slab structures calculations, a large vacuum spacing >15 Å was used to prevent inter-slab interactions, and the top most surface layer and sub-surface layer of the computational cell were geometrically relaxed such that the maximum force on each atom was less than 0.05 eV Å⁻¹. Their DOS patterns were obtained after a geometry optimization process. We focused on the local DOS of the d orbitals in metals for simplicity, and every DOS normalized by the number of atoms in a periodic system was described as DOS = f (E − E_f), where E − E_f is the relative energy shift from the Fermi level (E_f). In addition, during the DFT calculations, we turned on the spin polarization effect to consider the magnetic properties of the metals. In representing the DOS patterns via our learning model, we applied our model separately for the up spin and the down spin.

Pattern similarity calculation

The pattern similarity of our learning model was calculated through a comparison with the DFT results, in which the l²-norm was used. The pattern similarity σ is defined as follows:

σ = 1 - \frac{\sqrt{\sum_{m = 1}^{M} {| ρ' (E_{m}) - ρ (E_{m}) |}^{2}}}{\sqrt{\sum_{m = 1}^{M} {| ρ (E_{m}) |}^{2}}}

where ρ′ and ρ are the DOS patterns obtained by our learning method and calculated by the DFT method, respectively. When σ is closer to 1, our method becomes more accurate.

Supplementary information

Supplemetary Information^{(931.7KB, pdf)}

Acknowledgements

This work was supported by Creative Materials Discovery Program through the National Research Foundation of Korea (NRF-2016M3D1A1021140). We acknowledge the financial supports of the Korea Institute of Science and Technology (Grant No. 2E28000). This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-MA1801-03.

Author Contributions

B.C.Y. and S.S.H. conceived and designed the research. D.K. and C.K. provided theoretical support. B.C.Y. performed the research. B.C.Y., D.K., C.K., and S.S.H. analyzed the data. B.C.Y. and S.S.H. wrote the manuscript with feedback from all authors. S.S.H. managed the project.

Data Availability

The data that support the plots within this paper and other findings of this study are available from the corresponding author on request.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information accompanies this paper at 10.1038/s41598-019-42277-9.

References

1.Martin, R. M. Electronic Structure: Basic Theory and Practical Methods (Cambridge Univ. Press, 2004).
2.Nørskov JK, Bligaard T, Rossmeisl J, Christensen CH. Towards the computational design of solid catalysts. Nat. Chem. 2009;1:37–46. doi: 10.1038/nchem.121. [DOI] [PubMed] [Google Scholar]
3.Seo D, Shin H, Kang K, Kim H, Han SS. First-principles design of hydrogen dissociation catalysts based on isoelectronic metal solid solutions. J. Phys. Chem. Lett. 2014;5:1819–1824. doi: 10.1021/jz500496e. [DOI] [PubMed] [Google Scholar]
4.Ma X, Li Z, Achenie LEK, Xin H. Machine-learning-augmented chemisorption model for CO2 electroreduction catalyst screening. J. Phys. Chem. Lett. 2015;6:3528–3533. doi: 10.1021/acs.jpclett.5b01660. [DOI] [PubMed] [Google Scholar]
5.Ratcliff LE, et al. Challenges in large scale quantum mechanical calculations. WIREs Comput Mol Sci. 2017;7:1–24. doi: 10.1002/wcms.1290. [DOI] [Google Scholar]
6.Galli G. Quantum molecular dynamics simulations. Curr. Opin. Solid State Mater. Sci. 1996;1:864–874. doi: 10.1016/S1359-0286(96)80114-8. [DOI] [Google Scholar]
7.Saad Y, Chelikowsky JR, Shontz SM. Numerical methods for electronic structure calculations of materials. SIAM Rev. 2010;52:3–54. doi: 10.1137/060651653. [DOI] [Google Scholar]
8.Goedecker S. Linear scaling electronic structure methods. Rev. Mod. Phys. 1999;71:1085–1123. doi: 10.1103/RevModPhys.71.1085. [DOI] [Google Scholar]
9.Ordej P. Order-N tight-binding methods for electronic-structure and molecular dynamics. Comput. Mater. Sci. 1998;12:157–191. doi: 10.1016/S0927-0256(98)00027-5. [DOI] [Google Scholar]
10.Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
11.Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349:255–260. doi: 10.1126/science.aaa8415. [DOI] [PubMed] [Google Scholar]
12.Carleo G, Troyer M. Solving the quantum many-body problem with artificial neural networks. Science. 2017;606:602–606. doi: 10.1126/science.aag2302. [DOI] [PubMed] [Google Scholar]
13.Biamonte J, et al. Quantum machine learning. Nature. 2017;549:195–202. doi: 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]
14.Carrasquilla J, Melko RG. Machine learning phases of matter. Nat. Phys. 2017;13:431–434. doi: 10.1038/nphys4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Nieuwenburg EPLV, Liu Y, Huber SD. Learning phase transitions by confusion. Nat. Phys. 2017;13:435–440. doi: 10.1038/nphys4037. [DOI] [Google Scholar]
16.Snyder JC, Rupp M, Hansen K, Mu K, Burke K. Finding density functionals with machine learning. Phys. Rev. Lett. 2012;108:253002. doi: 10.1103/PhysRevLett.108.253002. [DOI] [PubMed] [Google Scholar]
17.Ghiringhelli LM, Vybiral J, Levchenko SV, Draxl C, Scheffler M. Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 2015;114:105503. doi: 10.1103/PhysRevLett.114.105503. [DOI] [PubMed] [Google Scholar]
18.Brockherde F, et al. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 2017;8:872. doi: 10.1038/s41467-017-00839-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Arsenault L, Lopez-bezanilla A, Millis AJ. Machine learning for many-body physics: The case of the Anderson impurity model. Phys. Rev. B. 2014;90:155136. doi: 10.1103/PhysRevB.90.155136. [DOI] [Google Scholar]
20.Schütt KT, Glawe H, Brockherde F, Sanna A, Gross EKU. How to represent crystal structures for machine learning: Towards fast prediction of electronic properties. Phys. Rev. B. 2014;89:205118. doi: 10.1103/PhysRevB.89.205118. [DOI] [Google Scholar]
21.Takigawa I, Shimizu K, Tsuda K, Takakusagi S. Machine-learning prediction of the d-band center for metals and bimetals. RSC Adv. 2016;6:52587–52595. doi: 10.1039/C6RA04345C. [DOI] [Google Scholar]
22.Mueller T, Kusne AG, Ramprasad R. Machine learning in materials science: Recent progress and emerging applications. Rev. in Comput. Chem. 2016;29:186–273. [Google Scholar]
23.Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning 534–552 (Springer, 2009).
24.Ramani V, et al. Massively multiplex single-cell Hi-C. Nat. Meth. 2017;14:263–266. doi: 10.1038/nmeth.4155. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Soven P. Coherent-potential model of substitutional disordered alloys. Phys. Rev. 1967;156:809–813. doi: 10.1103/PhysRev.156.809. [DOI] [Google Scholar]
26.Gyorffy BL. Coherent-potential approximation for a nonoverlapping-muffin-tin potential model of random substitutional alloys. Phys. Rev. B. 1972;5:2382–2384. doi: 10.1103/PhysRevB.5.2382. [DOI] [Google Scholar]
27.Tian F, Varga LK, Chen N, Delczerg L, Vitos L. Ab initio investigation of high-entropy alloys of 3d elements. Phys. Rev. B. 2013;87:075144. doi: 10.1103/PhysRevB.87.075144. [DOI] [Google Scholar]
28.Tian F, Varga LK, Chen N, Shen J, Vitos L. Ab initio design of elastically isotropic TiZrNbMoVx high-entropy alloys. J. Alloys Compd. 2014;599:19–25. doi: 10.1016/j.jallcom.2014.01.237. [DOI] [Google Scholar]
29.Peil OE, Ruban AV, Johansson B. Self-consistent supercell approach to alloys with local environment effects. Phys. Rev. B. 2012;85:165140. doi: 10.1103/PhysRevB.85.165140. [DOI] [Google Scholar]
30.Whitfield JD, Love PJ, Aspure-Guzik A. Computational complexity in electronic structure. Phys. Chem. Chem. Phys. 2013;15:397–411. doi: 10.1039/C2CP42695A. [DOI] [PubMed] [Google Scholar]
31.Cleri F, Rosato V. Tight-binding potentials for transition metals and alloys. Phys. Rev. B. 1993;48:22–33. doi: 10.1103/PhysRevB.48.22. [DOI] [PubMed] [Google Scholar]
32.Usman M, Broderick CA, Lindsay A, O’Reilly EP. Tight-binding analysis of the electronic structure of dilute bismide alloys of GaP and GaAs. Phys. Rev. B. 2011;84:245202. doi: 10.1103/PhysRevB.84.245202. [DOI] [Google Scholar]
33.Mukherjee S, Morán-López JL, Kumar V, Bennemann KH. Electronic theory for surface segregation in CuxNi1−x alloy. Phys. Rev. B. 1982;25:730–737. doi: 10.1103/PhysRevB.25.730. [DOI] [Google Scholar]
34.Wahiduzzaman M, et al. DFTB parameters for the periodic table: Part 1, electronic structure. J. Chem. Theory Comput. 2013;9:4006–4017. doi: 10.1021/ct4004959. [DOI] [PubMed] [Google Scholar]
35.Hams A, Raedt HD. Fast algorithm for finding the eigenvalue distribution of very large matrices. Phys. Rev. E. 2000;62:4365–4377. doi: 10.1103/PhysRevE.62.4365. [DOI] [PubMed] [Google Scholar]
36.Barreteau C, Spanjaard D. Electronic structure and total energy of transition metals from an spd tight-binding method: Application to surfaces and clusters of Rh. Phys. Rev. B. 1998;58:9721–9731. doi: 10.1103/PhysRevB.58.9721. [DOI] [Google Scholar]
37.Dominggos P. A few useful things to know about machine learning. Communications of the ACM. 2012;55:78–87. doi: 10.1145/2347736.2347755. [DOI] [Google Scholar]
38.Kolb B, Lentz LC, Kolpak AM. Discovering charge density functionals and structure-property relationships with PROPhet: A general framework for coupling machine learning and first- principles methods. Sci. Rep. 2017;7:1–9. doi: 10.1038/s41598-017-01251-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Hill J, et al. Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bulletin. 2017;41:399–409. doi: 10.1557/mrs.2016.93. [DOI] [Google Scholar]
40.Kresse G, Joubert D. From Ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B. 1999;59:1758–1775. doi: 10.1103/PhysRevB.59.1758. [DOI] [Google Scholar]
41.Kresse G, Furthmiiller J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 1996;6:15–50. doi: 10.1016/0927-0256(96)00008-0. [DOI] [Google Scholar]
42.Perdew JP, Burke K, Ernzerhof M. Generalized gradient approximation made simple. Phys. Rev. Lett. 1996;3:3865–3868. doi: 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]
43.Hammer B, Hansen LB, Nørskov JK. Improved adsorption energetics within density-functional theory using revised Perdew-Burke-Ernzerhof functionals. Phys. Rev. B. 1999;59:7413–7421. doi: 10.1103/PhysRevB.59.7413. [DOI] [Google Scholar]
44.Blochl PE. Projector augmented-wave. Phys. Rev. B. 1994;50:17953–17979. doi: 10.1103/PhysRevB.50.17953. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemetary Information^{(931.7KB, pdf)}

Data Availability Statement

The data that support the plots within this paper and other findings of this study are available from the corresponding author on request.

[CR1] 1.Martin, R. M. Electronic Structure: Basic Theory and Practical Methods (Cambridge Univ. Press, 2004).

[CR2] 2.Nørskov JK, Bligaard T, Rossmeisl J, Christensen CH. Towards the computational design of solid catalysts. Nat. Chem. 2009;1:37–46. doi: 10.1038/nchem.121. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Seo D, Shin H, Kang K, Kim H, Han SS. First-principles design of hydrogen dissociation catalysts based on isoelectronic metal solid solutions. J. Phys. Chem. Lett. 2014;5:1819–1824. doi: 10.1021/jz500496e. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Ma X, Li Z, Achenie LEK, Xin H. Machine-learning-augmented chemisorption model for CO2 electroreduction catalyst screening. J. Phys. Chem. Lett. 2015;6:3528–3533. doi: 10.1021/acs.jpclett.5b01660. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Ratcliff LE, et al. Challenges in large scale quantum mechanical calculations. WIREs Comput Mol Sci. 2017;7:1–24. doi: 10.1002/wcms.1290. [DOI] [Google Scholar]

[CR6] 6.Galli G. Quantum molecular dynamics simulations. Curr. Opin. Solid State Mater. Sci. 1996;1:864–874. doi: 10.1016/S1359-0286(96)80114-8. [DOI] [Google Scholar]

[CR7] 7.Saad Y, Chelikowsky JR, Shontz SM. Numerical methods for electronic structure calculations of materials. SIAM Rev. 2010;52:3–54. doi: 10.1137/060651653. [DOI] [Google Scholar]

[CR8] 8.Goedecker S. Linear scaling electronic structure methods. Rev. Mod. Phys. 1999;71:1085–1123. doi: 10.1103/RevModPhys.71.1085. [DOI] [Google Scholar]

[CR9] 9.Ordej P. Order-N tight-binding methods for electronic-structure and molecular dynamics. Comput. Mater. Sci. 1998;12:157–191. doi: 10.1016/S0927-0256(98)00027-5. [DOI] [Google Scholar]

[CR10] 10.Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349:255–260. doi: 10.1126/science.aaa8415. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Carleo G, Troyer M. Solving the quantum many-body problem with artificial neural networks. Science. 2017;606:602–606. doi: 10.1126/science.aag2302. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Biamonte J, et al. Quantum machine learning. Nature. 2017;549:195–202. doi: 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Carrasquilla J, Melko RG. Machine learning phases of matter. Nat. Phys. 2017;13:431–434. doi: 10.1038/nphys4035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Nieuwenburg EPLV, Liu Y, Huber SD. Learning phase transitions by confusion. Nat. Phys. 2017;13:435–440. doi: 10.1038/nphys4037. [DOI] [Google Scholar]

[CR16] 16.Snyder JC, Rupp M, Hansen K, Mu K, Burke K. Finding density functionals with machine learning. Phys. Rev. Lett. 2012;108:253002. doi: 10.1103/PhysRevLett.108.253002. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Ghiringhelli LM, Vybiral J, Levchenko SV, Draxl C, Scheffler M. Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 2015;114:105503. doi: 10.1103/PhysRevLett.114.105503. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Brockherde F, et al. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 2017;8:872. doi: 10.1038/s41467-017-00839-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Arsenault L, Lopez-bezanilla A, Millis AJ. Machine learning for many-body physics: The case of the Anderson impurity model. Phys. Rev. B. 2014;90:155136. doi: 10.1103/PhysRevB.90.155136. [DOI] [Google Scholar]

[CR20] 20.Schütt KT, Glawe H, Brockherde F, Sanna A, Gross EKU. How to represent crystal structures for machine learning: Towards fast prediction of electronic properties. Phys. Rev. B. 2014;89:205118. doi: 10.1103/PhysRevB.89.205118. [DOI] [Google Scholar]

[CR21] 21.Takigawa I, Shimizu K, Tsuda K, Takakusagi S. Machine-learning prediction of the d-band center for metals and bimetals. RSC Adv. 2016;6:52587–52595. doi: 10.1039/C6RA04345C. [DOI] [Google Scholar]

[CR22] 22.Mueller T, Kusne AG, Ramprasad R. Machine learning in materials science: Recent progress and emerging applications. Rev. in Comput. Chem. 2016;29:186–273. [Google Scholar]

[CR23] 23.Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning 534–552 (Springer, 2009).

[CR24] 24.Ramani V, et al. Massively multiplex single-cell Hi-C. Nat. Meth. 2017;14:263–266. doi: 10.1038/nmeth.4155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Soven P. Coherent-potential model of substitutional disordered alloys. Phys. Rev. 1967;156:809–813. doi: 10.1103/PhysRev.156.809. [DOI] [Google Scholar]

[CR26] 26.Gyorffy BL. Coherent-potential approximation for a nonoverlapping-muffin-tin potential model of random substitutional alloys. Phys. Rev. B. 1972;5:2382–2384. doi: 10.1103/PhysRevB.5.2382. [DOI] [Google Scholar]

[CR27] 27.Tian F, Varga LK, Chen N, Delczerg L, Vitos L. Ab initio investigation of high-entropy alloys of 3d elements. Phys. Rev. B. 2013;87:075144. doi: 10.1103/PhysRevB.87.075144. [DOI] [Google Scholar]

[CR28] 28.Tian F, Varga LK, Chen N, Shen J, Vitos L. Ab initio design of elastically isotropic TiZrNbMoVx high-entropy alloys. J. Alloys Compd. 2014;599:19–25. doi: 10.1016/j.jallcom.2014.01.237. [DOI] [Google Scholar]

[CR29] 29.Peil OE, Ruban AV, Johansson B. Self-consistent supercell approach to alloys with local environment effects. Phys. Rev. B. 2012;85:165140. doi: 10.1103/PhysRevB.85.165140. [DOI] [Google Scholar]

[CR30] 30.Whitfield JD, Love PJ, Aspure-Guzik A. Computational complexity in electronic structure. Phys. Chem. Chem. Phys. 2013;15:397–411. doi: 10.1039/C2CP42695A. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Cleri F, Rosato V. Tight-binding potentials for transition metals and alloys. Phys. Rev. B. 1993;48:22–33. doi: 10.1103/PhysRevB.48.22. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Usman M, Broderick CA, Lindsay A, O’Reilly EP. Tight-binding analysis of the electronic structure of dilute bismide alloys of GaP and GaAs. Phys. Rev. B. 2011;84:245202. doi: 10.1103/PhysRevB.84.245202. [DOI] [Google Scholar]

[CR33] 33.Mukherjee S, Morán-López JL, Kumar V, Bennemann KH. Electronic theory for surface segregation in CuxNi1−x alloy. Phys. Rev. B. 1982;25:730–737. doi: 10.1103/PhysRevB.25.730. [DOI] [Google Scholar]

[CR34] 34.Wahiduzzaman M, et al. DFTB parameters for the periodic table: Part 1, electronic structure. J. Chem. Theory Comput. 2013;9:4006–4017. doi: 10.1021/ct4004959. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Hams A, Raedt HD. Fast algorithm for finding the eigenvalue distribution of very large matrices. Phys. Rev. E. 2000;62:4365–4377. doi: 10.1103/PhysRevE.62.4365. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Barreteau C, Spanjaard D. Electronic structure and total energy of transition metals from an spd tight-binding method: Application to surfaces and clusters of Rh. Phys. Rev. B. 1998;58:9721–9731. doi: 10.1103/PhysRevB.58.9721. [DOI] [Google Scholar]

[CR37] 37.Dominggos P. A few useful things to know about machine learning. Communications of the ACM. 2012;55:78–87. doi: 10.1145/2347736.2347755. [DOI] [Google Scholar]

[CR38] 38.Kolb B, Lentz LC, Kolpak AM. Discovering charge density functionals and structure-property relationships with PROPhet: A general framework for coupling machine learning and first- principles methods. Sci. Rep. 2017;7:1–9. doi: 10.1038/s41598-017-01251-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Hill J, et al. Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bulletin. 2017;41:399–409. doi: 10.1557/mrs.2016.93. [DOI] [Google Scholar]

[CR40] 40.Kresse G, Joubert D. From Ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B. 1999;59:1758–1775. doi: 10.1103/PhysRevB.59.1758. [DOI] [Google Scholar]

[CR41] 41.Kresse G, Furthmiiller J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 1996;6:15–50. doi: 10.1016/0927-0256(96)00008-0. [DOI] [Google Scholar]

[CR42] 42.Perdew JP, Burke K, Ernzerhof M. Generalized gradient approximation made simple. Phys. Rev. Lett. 1996;3:3865–3868. doi: 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Hammer B, Hansen LB, Nørskov JK. Improved adsorption energetics within density-functional theory using revised Perdew-Burke-Ernzerhof functionals. Phys. Rev. B. 1999;59:7413–7421. doi: 10.1103/PhysRevB.59.7413. [DOI] [Google Scholar]

[CR44] 44.Blochl PE. Projector augmented-wave. Phys. Rev. B. 1994;50:17953–17979. doi: 10.1103/PhysRevB.50.17953. [DOI] [PubMed] [Google Scholar]

PERMALINK

Pattern Learning Electronic Density of States

Byung Chul Yeo

Donghun Kim

Chansoo Kim

Sang Soo Han

Abstract

Introduction

Results