Skip to main content
MethodsX logoLink to MethodsX
. 2020 Nov 5;7:101127. doi: 10.1016/j.mex.2020.101127

Spectroscopy learning: A machine learning method for study diatomic vibrational spectra including dissociation behavior

Shanshan Long a, Jia Fu a,, Jun Jian a, Zhixiang Fan a, Qunchao Fan a, Feng Xie b, Yi Zhang c, Jie Ma d
PMCID: PMC7680772  PMID: 33251122

Abstract

Molecular spectroscopy plays an important role in the study of physical and chemical phenomena at the atomic level. However, it is difficult to acquire accurate vibrational spectra directly in theory and experiment, especially these vibrational levels near the dissociation energy. In our previous study (Variational Algebraic Method), dissociation energy and low energy level data are employed to predict the ro-vibrational spectra of some diatomic system. In this work, we did the following:

1) We expand the method to a more rigorous combined model-driven and data-driven machine learning approach (Spectroscopy Learning Method).

2) Extracting information from a wide range of existing data can be used in this work, such as heat capacity.

3) Reliable vibrational spectra and dissociation energy can be predicted by using heat capacity and the reliability of this method is verified by the ground states of CO and Br2 system.

Keywords: Spectral prediction, Dissociation energy, Machine learning

Graphical abstract

Image, graphical abstract


Specifications Table
Subject Area Physics and Astronomy
More specific subject area spectroscopy
Method name Spectroscopy Learning Method (SLM)
Name and reference of original method Variational Algebraic Method (VAM)
Y. Zhang, W. Sun, J. Fu, Q. Fan, J. Ma, L. Xiao, S. Jia, H. Feng, H. Li, A Variational Algebraic Method used to study the full vibrational spectra and dissociation energies of some specific diatomic systems, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy. 117 (2014) 442–448.
Resource availability

Method details

For diatomic molecules, the spectral lines in the short-range region can be measured experimentally, while those lines lying in the mid-long-range region and the dissociation energy are difficult to measure. The spectroscopic parameters of the CO molecule are of great need in research of astrochemistry [1], and there remains interest in the Br2 molecule for the industrial applications [2, 3]. These two species are considered as promising candidates in the study of vibrational spectra. In recent years, machine learning has found a way to use data [4] to construct reliable higher-dimensional functions, which has shown good performance in solving problems of quantum mechanics and statistical mechanics [5], [6], [7]. In this paper, model-driven and data-driven methods are combined to predict the accurate vibrational spectra of diatomic molecules including dissociation energy by using limited experimental information, such as energy levels and heat capacity.

The method has three main parts.

  • 1)

    Turning the spectroscopy problem to an optimizing machine learning problem. First, constructing a reasonable parametric model that can describe all the details of vibrational spectrum to solve under fitting problem (discussed in Section 1). Second, using machine learning strategy to focus on the over fitting problem (discussed in Section 2).

  • 2)

    Introducing two main approach to solve under fitting problem. First, limiting the shape and size of parameter space (discussed in Section 3). Second, using testing data that including heterologous information to verify the predictive power of the selected model (discussed in Section 4).

  • 3)

    Using greedy algorithm to search the optimal model in the parameter space (discussed in Section 5).

1. Model analysis from and beyond quantum models

According to Born–Oppenheimer approximation (BOA), the electronic Schrödinger equation of the diatomic molecules is given as

H^ψ(r,R)=Eψ(r,R) (1)

where H^ is the Hamiltonian of the N-electrons system, E is the electronic eigenvalue and ψ(r,R) stands for the total wave function of the system.

H=T+V=22α1mαα222meii2+αα>βZαZβe2rαβαiZαe2riα+ji>je2rij (2)

in which Zα and Zβ represent the charge number of nuclei α and β.

One is able to solve the radial nuclear Schrödinger equation for the vibrational energies and wave functions

{22μd2dr2+V(r)+[J(J+1)Λ2]22μr2}φ(r)=EvJφ(r) (3)

where J, Λ and v represent the total angular momentum quantum number, the absolute value of the projection of the angular momentum of the electron orbit on the nucleus line (corresponding to the electronic state) and the vibrational quantum number respectively. EvJ is the rovibrational energy. And the potential energy function V(r) can be expanded to order 8 at the equilibrium position

V(r)=n=2nmax=81n!fn(rre)n=n=2nmax=81n!fnxn (4)

where through the selection of the origin of coordinates, the constant term and the first-order term can be set as 0, fn=V(n)(re)=(dnVdrn)r=re is n-rank force constant.

Using the second order perturbation method to regroup the Hamiltonian in Eq. (3), the Hamiltonian can be written as

H=H0+H (5)

where H0=22μd2dx2+12f2x2,H=16f3x3++140320f8x8+{J(J+1)Λ2}22μr2, the final vibrational level is

EvJ=Ev(0)+Hvv+mv|Hvm|2Ev(0)Em(0) (6)

The vibrational energies can be obtained with neglect of rotational part of the diatomic molecule

Ev=ω0+(ωe+ωe0)(v+12)ωexe(v+12)2+ωeye(v+12)3+ωeze(v+12)4+ (7)

Where ω0,ωe0,ωexe,ωeye,ωeze…is the spectrum coefficients.

And the dissociation energy is only a function of the last three vibrational energies [8]

DecalEvmax+ΔEvmax,vmax12ΔEvmax,vmax2ΔEvmax,vmax1 (8)

According to Eq. (6) and Eq. (7), it can be found that the form of perturbation results is similar to the energy expansion formula of Herzberg [9] and the Dunham [10] formula derived by the method of WKB theory. Compare to Taylor's expansion

f(x)=f(x0)0!+f(x0)1!(xx0)+f(x0)2!(xx0)2++f(n)(x0)n!(xx0)n+Rn(x) (9)

we can see that, E(v) can be expanded as series at v=12. Any complex physical effect can be reacted by the coefficient of expansion, which is also known as the molecular constant [11] that are usually obtained by fitting the experimental data with the least square method. For the study of energy spectra of diatomic molecules, a large number of expansion terms are needed in the long-range region, so more spectral constants and polynomial terms are required. The problem is that, as the number of polynomial terms increases, the ability of fitting increases rapidly, which overwrite the physical effects from other factors, resulting in overfitting. However, if the relevant high-order constants and uncontrollable errors are abandoned, the degree of fitting will go down, so that the phenomenon of underfitting will appear. Therefore, the least square method is not applicable here [8], and we need to construct a new method to study the molecular vibration in the long-range region.

2. A data-driven approach base on machine learning

Machine learning method is to make use of the existing data (experience), get a certain model (parameter), so as to achieve the purpose of predicting unknown data. In the machine learning method, some very complex functions can be introduced, such as deep-neural-network (DNN), recurneural-network (RNN), convolutional neural-network (CNN), etc., which serves for general purpose like image classification. Their working principle is similar to the neural network shown in Fig. 1.

Fig. 1.

Fig. 1

A typical Artificial neural network to build relationship between X and Y.

In general, machine learning means using data set carefully to determine the right mapping of XY. In order to solve the problem of underfitting, the number of parameters should be able to cover the established relationship when building the model. For the problem of overfitting, all data are divided into training set and test set. The data of the training set is used for learning to determine parameters, while the data of the test set is used to verify learning results. The overfitting problems can be further controlled by normalizing methods that introducing sufficient but limited parameter space to carry out effective model search [4].

Similarly, we found that Eq. (7) had the characteristics of fewer parameters and higher flexibility. And, the diatomic vibrational behavior relevant information (such as low-lying levels, dissociation energy and heat capacity) can be used as the training set and testing set. But, as mentioned above, overfitting is still the core challenge.

3. Limiting the shape and range of the parameter space

For diatomic molecular systems, the following matrix form can be obtained based on Eq. (7)

AX=E (10)

where

Avk=(v+12)k,X=(ω0ωeωexe),E=(Ev1+δEv1Ev2+δEv2Ev3+δEv3) (11)

in which X is the spectral constant matrix, which is the parameter that we need to determine in the spectral learning process. The “real” energy levels E is break up to two parts: Ev1 stands for experimental measure value and δE [12] is a small variational term to offset any possible experimental error. According to Eq. (10), one can solve molecular constants X out:

X=EA1 (12)

and the range of the parameter space (ΔX) is constrained by the experimental error:

ΔX=ΔEA1 (13)

Occam's razor is used to further confine the shape of X. The simpler model (X) with enough expression are preferred. Usually, the dimension of X is set to 5 as a starting guess, and if 5 is not enough (Cannot represent the details of the data) then 6 will be used, and so on.

4. Preparing the data sets

Three different types of data are used to build the dataset.

  • 1)

    The experimental vibrational energy Ev,exp. If the size of X is five, then five energy levels are enough to solve it out according to Eq. (10). The rest of the experimental levels can be used as validation set. For example, there are 42 experimental vibrational energies available for the ground electronic state of CO molecule. Assuming that the expansion order in the vibration energy term (see Eq. (7)) of this system is m(5), there are C42m selections from the known 42 data to form the calculated subset for a certain m. So, we use levels as a part of training data.

  • 2)
    Heat capacity [13] (Cmol) is introduced to enhance the training data set. The molar vibration heat capacity (Cmol) can be obtained experimentally and have a strong relation with the levels
    Cmolcal=NAkT2(Ev2Ev2) (14)
  • 3)

    Dissociation energy (De in Eq. (8)). It is worth noting that this quantity may have very large uncertainty or lack of experimental data. From the probability point of view, if we can predict it correctly, it will greatly enhance the reliability of forecasts. So, we set it to test data.

5. Learning by optimizing

Now, parameter X is constrained by Eq. (13), and there are still many possibilities for its value. In learning steps, there are following objects to optimize:

X*=argminXEexpAX (15)
X*=argminXDeexpDecal(X) (16)
X*=argminXCmolexpCmolcal(X) (17)

where Decaland Cmolcal are determined by X, respectively from Eq. (8) and Eq. (14). The distance is defined as

ΔE¯=1mv=0m1|Ev,expEv,cal|2 (18)

for vibrational energies, and

ΔDe¯=|DecalDeexp| (19)
ΔCmol¯=|CmolcalCmolexp| (20)

which represent dissociation energy and heat capacity respectively. Eq. (15) - Eq. (17) can be used to obtain X*, however, in order to predict the vibrational spectra with neglect of experimental dissociation energy, we can only use Eq. (15) and Eq. (17). That means taking De as the unknown parameter and heat capacity is further introduced as an additional physical criterion to determine De.

In real calculation, we use the greedy algorithm to adjust X* parameter one by one, the calculation details are as follows:

  • 1)

    For a certain De, five low-order parameters are used as initial attempt to determine the size of parameter X.

  • 2)

    In the existing m experimental levels, one selects 5 experimental levels arbitrarily. Then 5 parameters in step 1) can be obtained according to Eq. (10). There can be totally Cm5 different attempts, fortunately, you actually only need to try a few of them to find a satisfactory solution in practice.

  • 3)

    Verify the parameters obtained in step 2) to see if they satisfy the criterion ΔE¯<0.5cm1 in Eq. (18) and ΔDe¯<10cm1 in Eq. (19). What is noteworthy is that the criterion can also ensure that the final error given by the parameter solution found by different initial values in step 2) is very small.

  • 4)

    If step 3) is met, the calculation ends. On the contrary, if the condition can't be met, keep the number of parameters to 5, a small variable item δE (usually 1 cm−1) is added to the first level, then, there are two new levels (EvmδEvm,Evm+δEvm)to solve out two new X, perform the validation in step 3) to see if the situation has improved. If not, the variable item becomes δE=12δE. If the conditions are still not met, the variable item will be halved again until reaching the upper limit in 10 times or achieve convergence (usually 0.001 cm−1). Next, adjust the next four energy levels in the same way.

  • 5)

    If the conditions in step 3) are still not satisfied, increasing the number of parameters by 1(to 6 this time) and repeat steps 2) to 4) until step 3) is satisfied.

  • 6)

    When step 5) is completed, the heat capacity can be calculated by Eq. (14) according to the spectrum. The heat capacity error curve of different dissociation energies can be drawn by changing the De used in step 1), so that the optimal De (the first inflection point) can be determined.

  • 7)

    Compare De with the experiment to see the quality of the prediction.

6. Method validation

We found the experimental dissociation energy values of the ground electronic state CO molecule over the years, as shown in Table 1.

Table 1.

Dissociation energy of the ground electronic state for CO molecule [14].

De(cm−1) Year Method
55,821.120 1936 spectrum
70,976.136 1939 electron impact
75,815.428 1947 the theoretical calculation
81,461.247 1943 spectrum
89,615.437 1945 spectrum
90,679.11 2014 spectrum
1

Newly added from [15].

Hence the full vibrational spectra can be predicted using these different experimental dissociation energy values, as shown in Fig. 2, which were found to have great influence on the prediction of vibrational energies, especially those vibrational levels near the dissociation energy. And, better agreement can be found between the measurement and the present calculation using the dissociation energy De=90679.1cm1 [15], as shown in Table 2.

Fig. 2.

Fig. 2

The full vibrational spectra corresponding to different dissociation energies for the ground electronic state of CO.

Table 2.

Vibrational spectra of CO molecule in the ground electronic state.

v Evexp[16] Evcal v Evcal
0 1081.701 1081.756 42 69,159.054
1 3225.042 3225.036 43 70,251.348
2 5341.833 5341.831 44 71,319.269
3 7432.210 7432.210 45 72,362.715
4 9496.241 9496.242 46 73,381.565
5 11,533.994 11,533.995 47 74,375.680
6 13,545.540 13,545.541 48 75,344.898
7 15,530.954 15,530.954 49 76,289.034
8 17,490.307 17,490.307 50 77,207.878
9 19,423.677 19,423.677 51 78,101.196
10 21,331.141 21,331.141 52 78,968.723
11 23,212.778 23,212.778 53 79,810.166
12 25,068.668 25,068.668 54 80,625.202
13 26,898.893 26,898.893 55 81,413.472
14 28,703.535 28,703.535 56 82,174.582
15 30,482.679 30,482.679 57 82,908.102
16 32,236.407 32,236.407 58 83,613.561
17 33,964.805 33,964.805 59 84,290.446
18 35,667.957 35,667.957 60 84,938.200
19 37,345.949 37,345.949 61 85,556.217
20 38,998.865 38,998.865 62 86,143.843
21 40,626.788 40,626.788 63 86,700.371
22 42,229.802 42,229.802 64 87,225.037
23 43,807.989 43,807.989 65 87,717.022
24 45,361.428 45,361.428 66 88,175.441
25 46,890.196 46,890.196 67 88,599.345
26 48,394.370 48,394.370 68 88,987.720
27 49,874.020 49,874.020 69 89,339.474
28 51,329.216 51,329.216 70 89,653.443
29 52,760.022 52,760.022 71 89,928.381
30 54,166.498 54,166.498 72 90,162.961
31 55,548.698 55,548.698 73 90,355.764
32 56,906.672 56,906.672 74 90,505.279
33 58,240.461 58,240.460 75 90,609.901
34 59,550.101 59,550.099 76 90,667.917
35 60,835.619 60,835.616 77 90,677.513
36 62,097.034 62,097.029
37 63,334.355 63,334.347
38 64,547.581 64,547.568
39 65,736.698 65,736.681
40 66,901.681 66,901.660
41 68,042.490 68,042.469
Deexp 90,679.1 [15] Decal 90,679.099

We take the dissociation energy as an unknown quantity and use the relative error between the calculated (Cmolcal) and experimental heat capacity (Cmolexp) as the standard to search for the dissociation energy which can best meet our requirements. As shown in Fig. 3(a), more accurate the dissociation energy is, more reliable the calculated vibrational energies will be. Again, the best choice for the dissociation energy is still De=90679.1cm1 [15].

Fig. 3.

Fig. 3

The relative errors between the theoretical and experimental vibrational molar heat capacity based on different dissociation energy for the ground electronic state of CO (a) under T = 400 K, 500 K, 600 K, 700 K; (b) under T = 500 K [17]; (c) under T = 1200 K [17].

The dependence of dissociation energy on heat capacity provides a way to obtain dissociation energy and makes it a good criterion to verify the reliability of this method. The results show that, as shown in Fig. 3(b)(T = 500 K), with the increase of dissociation energy, the relative error of heat capacity decreases gradually, and the corresponding dissociation energy at the first inflection point is close to the latest experimental value (90,679.1cm−1). Since the heat capacity also have its uncertainty, we can ignore the details of the change after the first turning point(De>91179.1cm1), so that the 91,179.1 cm−1 we found can be used as an estimate of the absolute error within 500 cm−1 (5.5 ‰), which is better than the second best dissociation energy value in Table 1. As shown in Fig. 3(c), and we also set a second temperature(T = 1200 K) to find the right dissociation energy(De=91579.1cm1),which is a little bit worse than what we just did.

In order to further verify the effectiveness and practicability of this method, similar analysis for the ground electronic state of Br2 is carried out. Several candidate points were selected near the experimental dissociation energy (16,057 cm−1 [18]), which can yield a group of vibrational energies for each case, as shown in Fig. 4. The dissociation energy can be determined the same as those in CO molecule and is given as 16,165 cm−1 with the help of vibrational molar heat capacity as the requirement. In addition, the result shown in Fig. 5(c) at the second temperature(T = 2400 K) is consistent with that just shown in Fig. 5(b)(T = 3800 K), and the dissociation energy also given as 16,165 cm−1.

Fig. 4.

Fig. 4

The full vibrational spectra corresponding to different dissociation energies for the ground electronic state of Br2.

Fig. 5.

Fig. 5

The relative errors between the theoretical and experimental vibrational molar heat capacity based on different dissociation energy for the ground electronic state of Br2 (a) under T = 1500 K — 4000 K; (b) under T = 3800 K [17];(c) under T = 2400 K [17].

Declaration of Competing Interests

The Authors confirm that there are no conflicts of interest.

Acknowledgements

This research is supported by the Open Foundation of Key Laboratory of Advanced Reactor Engineering and Safety (Grant No. ares-2019-01), the Ministry of Education "Chunhui Plan" (Grant No. Z2016160), National Natural Science Foundation of China (Grant No. 11904295, 61722507), the Sichuan Education Department Project (Grant No. 17ZA0369), the Fund for Sichuan Distinguished Scientists of China (Grant No. 2019JDJQ0050), the State Key Laboratory Open Fund of Quantum Optics and Quantum Optics Devices, Laser Spectroscopy Laboratory (Grant No. KF201811).

Footnotes

Direct Submission or Co-Submission: Co-Submission SAA-D-20-00197

Contributor Information

Shanshan Long, Email: longssyx@163.com.

Jia Fu, Email: fujia@mail.xhu.edu.cn.

Zhixiang Fan, Email: fanzhixiang235@126.com.

References

  • 1.Savin D.W., Bhaskar R.G., Vissapragada S., Urbain X. On the energetics of the HCO++CCH++CO reaction and some astrochemical implications. Astrophys. J. 2017;844(2):154–158. [Google Scholar]
  • 2.Vosteen B.W., Kanefke R., Koser H. Bromine-enhanced mercury abatement from combustion flue gases-recent industrial applications and laboratory research. VGB Powertech. 2006;86(3):70–75. [Google Scholar]
  • 3.Pilloud F., Pouransari N., Renard L., Steidle R. Bromine recycling in the chemical industry-an example of circular economy. Chimia (Aarau) 2019;73(9):737–742. doi: 10.2533/chimia.2019.737. [DOI] [PubMed] [Google Scholar]
  • 4.Goodfellow I., Bengio Y., Courville A. MIT Press; 2016. Deep Learning. [Google Scholar]
  • 5.Wu D., Wang L., Zhang P. Solving statistical mechanics using variational autoregressive networks. Phys. Rev. Lett. 2019;122(8) doi: 10.1103/PhysRevLett.122.080602. [DOI] [PubMed] [Google Scholar]
  • 6.Levine Y., Sharir O., Cohen N., Shashua A. Quantum entanglement in deep learning architectures. Phys. Rev. Lett. 2019;122(6) doi: 10.1103/PhysRevLett.122.065301. [DOI] [PubMed] [Google Scholar]
  • 7.Mills K., Spanner M., Tamblyn I. Deep learning and the Schrödinger equation. Phys. Rev. A. 2017;96(4) [Google Scholar]
  • 8.Zhang Y., Sun W., Fu J., Fan Q., Ma J., Xiao L., Jia S., Feng H., Li H. A variational algebraic method used to study the full vibrational spectra and dissociation energies of some specific diatomic systems. Spectrochim. Acta Part A. 2014;117:442–448. doi: 10.1016/j.saa.2013.08.043. [DOI] [PubMed] [Google Scholar]
  • 9.Herzberg G. Reitell Press; 2008. Molecular Spectra and Molecular Structure - Vol I. [Google Scholar]
  • 10.Dunham J.L. The energy levels of a rotating vibrator. Phys. Rev. 1932;41(6):721–731. [Google Scholar]
  • 11.Christen D., Hüttner W. Springer; 2017. Molecular Constants Mostly from Microwave, Molecular Beam, and Sub-Doppler Laser Spectroscopy: Paramagnetic Diatomic Molecules (Radicals) [Google Scholar]
  • 12.Zhang Y., SUN W., Fu J., Fan Q., Feng H., Li H. Investigations of vibrational levels and dissociation energies of diatomic systems using a variational algebraic method. Acta Phys. Sin. 2012;61(13):114–121. [Google Scholar]
  • 13.Fu J., Fan Q., Liu G., Li H., Xu Y., Fan Z., Zhang Y. Influence of different micro-vibrational behavior on the thermodynamic properties of SO gas. Comput. Theor. Chem. 2017;1115:136–143. [Google Scholar]
  • 14.Volkenstein M. Science Press; 1960. The Structure and Physical Properties of Molecules. [Google Scholar]
  • 15.Kpa R., Ostrowska-Kopeć M., Piotrowska I., Zachwieja M., Hakalla R., Szajna W., Kolek P. Ångström (B1Σ+A1Π) 0-1 and 1-1 bands in isotopic CO molecules: further investigations. J. Phys. B: At. Mol. Opt. Phys. 2014;47(4) [Google Scholar]
  • 16.Coxon J.A., Hajigeorgiou P.G. Born–Oppenheimer breakdown in the ground state of carbon monoxide: a direct reduction of spectroscopic line positions to analytical radial Hamiltonian operators. Can. J. Phys. 1992;70(1):40–54. [Google Scholar]
  • 17.Chase M.W. American Institute of Physics; 1998. Nist-janaf Thermochemical Tables. [Google Scholar]
  • 18.Focsa C., Li H., Bernath P.F. Characterization of the ground state of Br2 by laser-induced fluorescence fourier transform spectroscopy of the B3Πo+uX1Σg+ system. J. Mol. Spectrosc. 2000;200(1):104–119. doi: 10.1006/jmsp.1999.8039. [DOI] [PubMed] [Google Scholar]

Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES