ABSTRACT
We develop an automatic peak fitting algorithm using the Bayesian information criterion (BIC) fitting method with confidence-interval estimation in spectral decomposition. First, spectral decomposition is carried out by adopting the Bayesian exchange Monte Carlo method for various artificial spectral data, and the confidence interval of fitting parameters is evaluated. From the results, an approximated model formula that expresses the confidence interval of parameters and the relationship between the peak-to-peak distance and the signal-to-noise ratio is derived. Next, for real spectral data, we compare the confidence interval of each peak parameter obtained using the Bayesian exchange Monte Carlo method with the confidence interval obtained from the BIC-fitting with the model selection function and the proposed approximated formula. We thus confirm that the parameter confidence intervals obtained using the two methods agree well. It is therefore possible to not only simply estimate the appropriate number of peaks by BIC-fitting but also obtain the confidence interval of fitting parameters.
KEYWORDS: X-ray photoelectron spectroscopy, spectral decomposition, pseudo-Voigt function, Bayesian estimation, exchange Monte Carlo method
CLASSIFICATION: 404 Materials informatics / Genomics, 502 Electron spectroscopy
Graphical abstract

1. Introduction
High-throughput measurements have become increasingly important for the efficient development of science and technology, and there is an urgent need to accumulate large amounts of spectral data. In X-ray photoelectron spectroscopy (XPS), which is a time-consuming characterization technique, the use of high-intensity synchrotron radiation and a high-sensitivity detector enables a rapid accumulation of large amounts of spectral data [1–3]. Matsumura et al. performed peak shift analysis of high-throughput XPS spectra using the expectation-maximization algorithm [4]. High-throughput data processing is therefore required for efficient spectral data analysis.
Peak fitting is performed in the analysis of XPS spectra. Such fitting is usually carried out using the gradient method. This technique faces three main problems. The first is that the technique tends to find a local solution, the second is that the number of peaks cannot be estimated, and the third is that the confidence interval of fitting parameters cannot be evaluated. In the gradient method, the initial value of the parameter must first be given, and the result is readily affected by the initial value. Peak fitting requires the number of peaks to be determined at the beginning, but the gradient method does not show how many peaks are appropriate. The method is also susceptible to spectral noise, and although it is intuitively understood that the confidence interval of the fitting parameter is wide when there is much noise, there is no framework for evaluating the confidence interval.
By incorporating informatics knowledge, we previously developed a low-cost and efficient method of obtaining appropriate models in terms of not only the fitting of parameters but also the number of peaks, even though the developed method is based on the gradient method [5]. Having many initial models allows us to search for pseudo-global solutions, and a Bayesian information criterion (BIC) allows us to obtain a model with an appropriate number of peaks. We refer to this technique as BIC-fitting in the present paper. However, it remains difficult to evaluate the confidence intervals of the fitting parameters with this technique.
A spectrum decomposition technique based on Bayesian estimation has been proposed for quantitative evaluation of the confidence interval of fitting parameters [6]. This technique solves all three problems described above. By carrying out the model selection to a given spectrum on the basis of Bayesian estimation, we may be able to estimate not only peak parameters such as the peak position but also the number of peaks. Furthermore, when adopting this technique, global solutions can be searched for efficiently by performing optimization using algorithms in what is called the exchange Monte Carlo (EMC) method, even in the case that there is a local optimal solution. Through Bayesian estimation, all peak parameters can be optimized, and the confidence intervals of the parameters can also be determined using the standard deviation (STD) of the Bayesian posterior distribution. However, the EMC method has a huge computational cost and is difficult to use when analyzing many spectra.
In this study, therefore, we develop an algorithm to calculate the confidence interval of fitting parameters obtained by the EMC method from the results of BIC-fitting. By generating various spectral data on a computer and applying Bayesian estimation, we obtain the behavior of model selection and the STD of the Bayesian posterior distribution for each peak parameter by computer simulation. In particular, in this paper, we cover the peak-to-peak distance between two peaks and the signal-to-noise (S/N) ratio of spectral data. As a result, we succeed in deriving an approximated model formula representing the relationship of the STD of the posterior distribution with the peak-to-peak distance and S/N ratio.
We also apply the approximated model formula to real spectra. The confidence interval of the peak parameters obtained using the EMC method is compared with that obtained by applying BIC-fitting to the approximated formula. As a result, it is confirmed that the parameter confidence interval obtained by the EMC method can be reproduced by BIC-fitting and using the approximated formula. This approximated formula is applicable to not only BIC-fitting but also other optimization methods and can be used to estimate the parameter confidence interval to the same extent as when adopting the EMC method.
2. Calculation methods
2.1. Fitting model: pseudo-Voigt function
We first describe the model function used in this study. We consider fitting spectral data , where is the number of spectral data points, by summing pseudo-Voigt functions :
| (1) |
The pseudo-Voigt function is frequently used in spectral decomposition. We here adopt the sum type of the pseudo-Voigt function, defined as a linear combination of the Gaussian and Lorentzian functions:
| (2) |
| (3) |
| (4) |
where is the number of peaks. The fitting parameters are , where is the peak height, is the peak position, is the half width at half maximum (HWHM) of the peak, and is the Lorentz–Gauss mixing ratio of the pseudo-Voigt functions. In the peak fitting of XPS, the appropriate basis function is the Voigt function defined by the convolution of a Lorentzian function derived from the natural width and a Gaussian function derived from a device. Indeed, the pseudo-Voigt function, an approximated form of the Voigt function, is commonly used because of computational difficulty in peak fitting with the Voigt function [7].
A least-squares method is often used to optimize fitting parameters. In this method, parameters are obtained so as to minimize the error function representing the difference between the model function and the spectral data :
| (5) |
In spectral decomposition, this problem becomes a nonlinear least-squares problem, and it is difficult to derive such an optimum solution analytically. It is therefore common to find the parameter that minimizes the error function based on the gradient method. However, there is a problem that the fitting result is easily trapped into a local solution depending on the selection of initial values. In addition, it is impossible to objectively determine the number of peaks from the data. The gradient method also has a problem that the confidence interval of the fitting parameter cannot be obtained. Bayesian estimation can solve these problems as we will see below [6].
2.2. Bayesian spectral deconvolution
Bayesian estimation is a framework in which the process of generating data in a probabilistic model is formulated and an estimation is made by tracing back the causal relationship using the Bayesian theorem [6,8]. By combining Bayesian estimation with the exchange Monte Carlo (EMC) method, we may be able to not only perform spectral deconvolution but also obtain confidence intervals for fitting parameters through Bayesian posterior probabilities. It is also possible to select a good model by comparing the Bayesian free energies of different models with different numbers of peaks. In this study, we call this method the Bayesian EMC method. Details are shown in Appendix A.
2.3. BIC-fitting
It is usually difficult to analytically evaluate the Bayesian free energy for model selection because it requires multiple integration on the parameter space. The BIC is obtained by approximating the multiple integration under the assumption that the likelihood function can be approximated with a Gaussian distribution for all parameters. The BIC is expressed as the sum of a likelihood term and a penalty term, and the model is selected on the basis of the trade-off between models.
We have developed a low-cost and efficient method of obtaining appropriate models in terms of not only the fitting parameters but also the number of peaks using many initial models and the BIC [5]. The method searches many initial fitting models by changing the degree of smoothing, and then optimizes the peak parameters using the modified Levenberg–Marquardt method [9–11], which is one of the gradient methods. The goodness of the optimized models is ranked on the basis of the BIC, written as
| (6) |
where is the maximum likelihood calculated from the likelihood between the measured spectrum and the model function obtained as a result of optimization. is the number of parameters included in the model function. When we ignore the background, is obtained using the number of peaks . The logarithm of the maximum likelihood can be obtained as
| (7) |
| (8) |
Using the BIC values of optimized models as a criterion for model selection, we can select a simple model with reasonably good agreement and a moderate number of peaks. We hereafter refer to this technique as BIC-fitting in this paper.
BIC-fitting can perform spectral fitting and model selection, but cannot obtain confidence intervals for fitting parameters. The purpose of this study is to extend the BIC-fitting method so that the confidence intervals of the fitting parameters can be obtained at the same time, using the results of simulation by the Bayesian EMC method.
Models used in spectral decomposition are singular models whose parameters and properties do not correspond to each other [12,13]. In this case, the BIC may have penalty terms different from those in the exact evaluation of free energy, and the approximation of the BIC may affect the result of model selection. In Section 4, we compare the model obtained using the Bayesian EMC method with the model obtained from BIC-fitting, targeting the analysis of the measured XPS spectrum, and we discuss the effectiveness of BIC-fitting.
3. Simulation with artificial spectra
In the present study, we use the Bayesian spectral decomposition framework described in Section 2 to clarify the effects of the peak-to-peak distance in the true spectra and the S/N ratio of the data on the confidence interval of the estimated parameters. In this section, we discuss the simulations performed for verification.
3.1. Settings
In the simulation, we use spectral data artificially measured by computer simulation. For the data set , we take the number of data and between [0.0, 3.0] in steps of 0.01. Assuming that the number of peaks , we define the true spectral function used for data generation as
| (9) |
Here, the true parameters are , and . We also fix and set at various peak-to-peak distances for discussion. We generate data in 32 patterns for the range . We add noise that follows a Gaussian distribution with zero mean and variance to the data and prepare 14 patterns of values in the range [0.0005, 10.0]. Hence, the total number of prepared data sets is . Three examples of artificially measured spectral data are shown in Figure 1, where Figure 1(a–c) present spectral data with settings of , , and , respectively.
Figure 1.

Three examples of artificially measured spectra with Gaussian noise. The solid line is the true curve and the dots are the artificially measured spectral data: (a) and , (b) and , and (c) and .
The S/N ratio of the data can be defined using the value of the noise level . The intensity of the true spectrum is , and the signal intensity is thus 1.0. In this study, we define the S/N ratio as . The range of the S/N ratios in the simulation is [0.1, 2000].
We perform a Bayesian estimation for all data sets. We assume that the candidate numbers of peaks are one and two. The prior distribution for the number of peaks is thus defined as
| (10) |
The prior distribution for each parameter is set as
| (11) |
| (12) |
| (13) |
| (14) |
where , and are respectively the gamma distribution, Gaussian distribution, and uniform distribution:
| (15) |
| (16) |
| (17) |
The prior distribution of the parameters of interest can be given by the analyst. However, it is possible to predict an appropriate distribution shape by considering the characteristics of the spectrum. For example, the peak height and width must be positive. We can infer the approximate range of peak height and width values by looking at the structure of the spectrum. We have adopted the gamma distribution as the distribution function with those characteristics. On the other hand, the peak position can move either to the positive or negative side. We then adopted the Gaussian distribution without any special boundary in the prior distribution. Since it is clear that the peak position is between 1 and 2, we set a Gaussian distribution with an average of 1.5 and an STD of 0.2 as in Eq. (12). The Lorentz–Gauss mixing ratio can range from 0 to 1 by definition. In this simulation, we decided not to impose any further constraints and adopted a uniform distribution for the Lorentz–Gauss mixing ratio.
As settings of the Monte Carlo method, we use 50,000 Monte Carlo steps (MCSs) as a burn-in and then use 30,000 MCSs for sampling.
The inverse temperatures in the EMC method are defined as [14]
| (18) |
The value of and the total number of inverse temperatures in the equation are set according to the level of noise added to the data (Table 1).
Table 1.
Settings of the inverse temperature corresponding to Eq. (18).
| 0.0005 | 0.001 | 0.002 | 0.005 | 0.01 | 0.02 | 0.05 | 0.1 | 0.2 | 0.5 | 1 | 2 | 5 | 10 | |
| S/N | 2000 | 1000 | 500 | 200 | 100 | 50 | 20 | 10 | 5 | 2 | 1 | 0.5 | 0.2 | 0.1 |
| 1.2 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | |
| 128 | 64 | 64 | 64 | 44 | 44 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
3.2. Results of Bayesian EMC method and derivation of approximated formula
We first show the results of model selection through Bayesian estimation. The results of model selection corresponding to the spectral data in Figure 1 are shown in Figure 2. The straight line represents the free energy and the histogram represents the posterior probability . It is seen that the correct number of peaks is estimated in Figure 2(a), whereas is estimated in Figure 2(b). In the case of Figure 2(b), the peak-to-peak distance is too small to extract information dividing the spectrum into two peaks at the given S/N ratio. In the case of Figure 2(c), there is no significant difference between and , and we cannot estimate the number of peaks because the S/N ratio is too small. Therefore, by performing model selection, we expect that we can determine the peak-to-peak distance and S/N ratio that are necessary for estimating the correct structure from the spectral data.
Figure 2.

(a)–(c) Results of model selection by Bayesian estimation respectively corresponding to spectral data in Figure 1(a)–(c).
We then perform model selection for various peak-to-peak distances and S/N ratios. Results are shown in Figure 3. The abscissa represents the S/N ratio and the ordinate represents the peak-to-peak distance . The values in the figure indicate the posterior probability for . It is seen that when S/N < 0.5, the number of peaks cannot be estimated regardless of the peak-to-peak distance. Furthermore, we can clearly classify the region where or . We next discuss the posterior distributions for the parameters . The posterior probabilities for each parameter when Bayesian estimation is performed on the spectral data in Figure 1(a) are shown in Figure 4. The dashed lines in the figures indicate the true parameter values corresponding to the artificially measured spectral data. In the case of the peak position, the histogram shows the differences in peak positions and from their true values and , respectively. The results show that each parameter is estimated with good accuracy in that the distribution roughly includes the true value and appears similarly to a Gaussian distribution. As exceptions, the HWHM and Lorentz–Gauss mixing ratio for are distributed away from the true values. This might be due to the properties of the pseudo-Voigt functions and that there is large variability around the true values depending on the artificially measured spectral data. Details are given in Appendix B.
Figure 3.

Results of model selection for various peak-to-peak distances and S/N ratios. The values in the figure indicate the posterior probability for .
Figure 4.

Posterior distribution of each parameter when Bayesian estimation is performed on the spectral data in Figure 1(a). Dashed lines indicate the true parameter values used to generate the spectral data.
On the basis of the above results, we examine the STDs of the posterior distributions of various values and S/N ratios to obtain the confidence intervals of these parameters. A plot of the relationship between the S/N ratio and the STD of the posterior distribution for is shown in Figure 5. The results suggest that estimation of the Lorentz–Gauss mixing ratio is the most difficult because its STD is larger than the STDs of the other parameters. We also find that there is the relationship between the STD and the S/N ratio for any peak parameter. An exception is that the Lorentz–Gauss mixing ratio deviates from the relationship for S/N < . We believe that this is because the STD of in the posterior distribution cannot exceed the STD in the prior distribution defined by Eq. (14) . The STD as a function of the peak-to-peak distance for S/N = 100.0 is shown in Figure 6. The results indicate that the STD is stable for all parameters when > 0.4. Conversely, when , the STD is larger for smaller , indicating that the estimation is unstable. The estimation thus becomes unstable when is small because the two peaks overlap. According to this analysis, peak overlap begins to affect parameter estimation when is less than about 4 times the HWHM.
Figure 5.

STD of the posterior distribution as a function of the S/N ratio for Δ = 0.5. Triangles and inverted triangles are respectively the STDs of the parameters for the first and second peaks.
Figure 6.

STD as a function of the peak-to-peak distance for S/N = 100.0. Triangles and inverted triangles are respectively the STDs of the parameters for the first and second peaks.
As a result of simulations for various artificial spectral data, we found the following features. First, when the peak-to-peak distance is large (especially ), the STD of any parameter has the relation . Next, when was plotted against the peak-to-peak distance for each parameter, we found that any curve with an arbitrary noise level can be approximated by a single curve. Also, as decreases, diverges to positive infinity. Although a function with such characteristics is not unique, it can be expressed as a power function + baseline as one of the candidates. However, it is necessary to adjust the position of the asymptote where the values diverge, the curvature of the curve, and the position where it reaches the baseline. Considering such requirements, we suggest an approximated formula as follows:
| (19) |
where is the true peak height and is the peak-to-peak distance scaled by the HWHM of the true peak . Considering the characteristics of each peak parameter , we define the scaled STDs as
| (20) |
and performed regression using Eq. (19). The fitting parameters in Eq. (19) are . Figure 7 shows schematic diagrams of values obtained using Eq. (19). Figure 7(a) shows as a function of S/N for and , where the peak-to-peak distance is sufficiently large. Figure 7(b) shows as a function of the peak-to-peak distance for several conditions of , when we set the prefactor of Eq. (19) . By adjusting the parameters and , we can express the STD of any peak parameter. A description of how to optimize the features and parameters of these functions is given in Appendix D. In the present experiments, we fix . Although there is no reason to fix the value of and it was decided heuristically, we confirmed that the fitting result was very good as shown in Figure 8. Using the same value for all parameters simplifies the formula and improves usability. The remaining three features and are used for regression, and we obtained them as shown in Table 2. The regression results are shown in Figure 8. The results show that approximate regression is achieved under all conditions. In addition, the range of operable S/N ratios of this equation differs between peak parameters; the ranges are presented in Appendix D.
Figure 7.

Schematic diagrams of values obtained using Eq. (19). (a) Scaled STD as a function of S/N for and , where the peak-to-peak distance is sufficiently large: and . (b) as a function of the peak-to-peak distance for several conditions of when we set the prefactor of Eq. (19) .
Figure 8.

Results of regression using the fitting function in Eq. (19) of the STDs of the posterior distributions for each parameter. Parameters are the (a) Lorentz–Gauss mixing ratio , (b) peak position , (c) peak height , and (d) HWHMs of the peaks .
Table 2.
Values of the fitted parameters 2 in Eq. (19).
| Parameter | Bj | μj | Dj (fixed) | Ej |
|---|---|---|---|---|
| r | 1.708 | −4.626 | 2.5 | −1.394 |
| μ | 0.324 | −3.271 | 2.5 | −0.216 |
| h | 0.355 | −5.756 | 2.5 | −0.540 |
| w | 0.504 | −3.158 | 2.5 | −0.760 |
As will be described later in detail, even when the Bayesian EMC method is not used, the confidence interval of the peak parameter can be estimated by using Eq. (19), after we obtain a fitted spectrum by an optimization method such as the BIC-fitting method.
As a further use of this formula, when the S/N ratio of the measured spectrum is known, we can estimate the peak-to-peak distance that achieves a certain confidence interval. Alternatively, when the peak-to-peak distance of the measured spectrum is known from the chemical shift, we can estimate the S/N ratio required to obtain the desired confidence interval of the parameter. This makes it possible to use Eq. (19) in experimental planning such as the setting of measurement time and energy resolution for individual measurements.
4. Simulation using a real spectrum
In this section, we analyze real XPS spectra to confirm the practicability of the approximated formula in the previous section. As an example of a real spectrum, we select a valence spectrum of SiO2 from the XPS spectrum databases provided in COMPRO software [15] (Figure 9). The binding energy varies from −10 to 40 eV with an energy step of 0.1 eV, resulting in 501 data points. There is a strong peak assigned to O2s in the vicinity of eV. There is a peak structure derived from the hybridization of O2p, Si3s, and Si3p at eV. We apply the Bayesian EMC method and BIC-fitting to this spectrum.
Figure 9.

Fitted spectra from Bayesian estimation (a) and BIC-fitting (b) for the experimental valence spectrum of SiO2. Open circles are the experimental spectrum, the orange line is the fitted spectrum, the green line is the background, and the black lines are all peaks above the background.
4.1. Model function with background
A real spectrum has a background. In addition to the superposition of the peaks of the pseudo-Voigt function of Eq. (1), the Shirley background is added to the model function [5]:
| (21) |
| (22) |
where and are respectively the intensity of the spectrum on the high-binding-energy side and the low-binding-energy side in the analysis range. and are respectively the areas of peak intensity from the high-binding-energy side to and the area of the peak intensity from the low-binding-energy side to .
4.2. Settings of the Bayesian EMC method and BIC-fitting
In carrying out the Bayesian EMC method, we set the candidate number of peaks as . We set the prior distributions of each peak parameter as
| (23) |
| (24) |
| (25) |
| (26) |
| (27) |
| (28) |
The values of the inverse temperature in the EMC method are and in Eq. (18). As a setting of the Bayesian EMC method, 80,000 MCSs are used for the burn-in and a subsequent 80,000 MCSs for the sampling. The computational conditions in BIC-fitting are the same as those in the literature [5].
4.3. Results and discussion
By the Bayesian EMC method, we can estimate the number of peaks as previously mentioned. By plotting the marginal likelihood and free energy as a function of the number of peaks , as shown in Figure 10(a), we estimated that with a probability of 97%. BIC-fitting can also be used to estimate the number of peaks. Figure 10(b) shows the BIC values as a function of the number of peaks , and the model with the smallest BIC is found at . Considering the properties of the singular model, the results of model selection using the BIC and free energy are not always in agreement [12,13], but in the case of this real spectrum, the same number of peaks are selected with the two methods. Note that the computation by the Bayesian EMC method takes 10.5 h, whereas BIC-fitting is completed in less than 3 min.
Figure 10.

Results of model selection through Bayesian estimation (a) and BIC-fitting (b) for the experimental valence spectrum of SiO2. The red circle in (b) indicates the model with the minimum BIC.
Figure 9(a) shows the fitted spectrum of the optimum solution obtained from the sampling results for by the Bayesian EMC method. The spectrum obtained by BIC-fitting is shown in Figure 9(b). We find that both methods reproduce the original spectrum well, and that the positions and shapes of individual peaks are similar. It is note that BIC-fitting derived models equivalent to the optimal solutions obtained using the Bayesian EMC method, despite the limited search space for the solutions.
Using the sampling results of the Bayesian EMC method, we obtain the confidence interval from a posterior probability of each parameter. Principal component analysis (PCA) is performed to find the trend of sampled model groups. Figure 11 shows a two-dimensional heat map of the first and second principal components obtained by PCA with respect to the peak positions of the four peaks. Most models belong to the group enclosed by a red circle in the figure, and optimum solutions are included in this group. Models with features different from those of the optimal solution also appear with a posterior probability as high as 0.2%. We decide to exclude such minority models obtained using the PCA results and then evaluate the posterior probability.
Figure 11.

Two-dimensional histogram of PC1 and PC2 obtained by PCA of EMC sampling.
We here focus on two peaks near the binding energies and eV. These two peaks are similar in height and overlap each other. We set the IDs of the two peaks to and . Specifically, Figure 12 shows the posterior probability densities of their peak parameters . Looking at Figure 12, the shapes of the distributions are almost Gaussian for the peak position, height, and width, indicating that the sampling was performed appropriately. The distribution of the Lorentz–Gauss mixing ratio is widely scattered within the defined region, which suggests that the ratio is difficult to estimate. The mixing ratio is related to the shape of the tail of the pseudo-Voigt function. Our results suggest that the shape of the tail of the peak is difficult to estimate, because the noise of the target spectrum is relatively high. The results of evaluating the STDs of the posterior probability distributions of these peaks are given in Table 3.
Figure 12.

Posterior probability densities of peak parameters for two peaks located at about and eV for the valence spectrum of SiO2.
Table 3.
Calculated confidence intervals of the posterior distribution for all peak parameters and those estimated using Eq. (19).
| Parameter | Confidence interval for peak k = 1 | Confidence interval for peak k = 2 | Confidence interval estimated using Eq. (19) and the BIC-fitting model |
|---|---|---|---|
| 0.22 | 0.21 | 0.34 | |
| 2.35 | 3.69 | 5.48 | |
| 0.33 | 0.32 | 0.40 | |
| 0.12 | 0.27 |
The confidence interval of the parameters of the two peaks is obtained, using the peak parameters obtained by the approximated formula (19) and BIC-fitting. It is only necessary to know the estimated peak parameters without using the Bayesian EMC method. The approximated formula assumes that the heights and widths of the two peaks are identical, but this is not so for real spectra. In applying the approximated formula to the real spectrum, we decide to use the average of the two peak heights for and the average of peak widths for . The STD of noise is estimated as the root mean square of the residual between the original spectrum and the fitted spectrum. We further use the peak-to-peak distance obtained from the fitted spectrum.
From the peak parameters of the two peaks obtained by BIC-fitting, we have . The estimated S/N ratio is therefore .
The estimated confidence interval is given in the right-hand column of Table 3. The approximated formula reproduces well the actual confidence interval obtained from the posterior distribution. The S/N ratio is less than 10, and the Lorentz–Gauss mixing ratio is thus outside the applicable range of the approximated formula (19), and the confidence interval of cannot be calculated. The results confirm that we can estimate a confidence interval comparable to that estimated by the Bayesian EMC method using the gradient method and the approximated formula (19) when the fitting is good. The approximated formula (19) is also applicable to the case where the heights of the two peaks are more different. Details are shown in Appendix E.
Approximated formula (19) was derived assuming that the spectrum consists of two pseudo-Voigt functions. These simplified two functions mean the nearest neighbor two peaks of the spectrum consists of more than two peaks. Whereas, we must be careful when a target peak is sandwiched between two peaks that are almost equally spaced because the tails of the target peak are overlapping with those of the other two peaks. The Lorentz–Gauss mixing ratio contributes to the shape of the tail of the peak. Therefore, it is difficult to estimate the Lorentz–Gauss mixing ratio of an inner target peak whose both tails are not clearly distinguished. By using the Bayesian EMC method, in principle, we may be able to obtain the confidence intervals of arbitrary parameters even when a target peak is sandwiched between two peaks and we have to consider what parameter should be used to make a model formula of STD. This will be a future work.
5. Conclusions
We developed a BIC-fitting method with confidence-interval estimation in spectral decomposition. By adopting the Bayesian EMC method, we may be able to not only estimate the number of peaks but also optimize peak parameters, such as the Lorentz–Gauss mixing ratio, in addition to the peak intensity, peak position, and peak width. Using this method, we may also be able to obtain the confidence interval through the STD of the Bayesian posterior distribution. We set various peak-to-peak distances and S/N ratios to generate data on a computer, and then applied Bayesian estimation to obtain the behavior of model selection and the STD of Bayesian posterior distributions for each peak parameter through computer simulation. As a result, an approximated formula expressing the relationship between the obtained STD and the peak-to-peak distance or S/N ratio was derived. In terms of practical use, we confirmed the usefulness of the approximated formula for a real valence spectrum of SiO2. The confidence interval of each parameter was estimated using the peak parameter obtained by BIC-fitting, and it was confirmed that the value agreed well with the confidence interval obtained directly from the posterior probability obtained using the Bayesian EMC method. In short, even with low-cost optimization methods such as BIC-fitting, we can now estimate confidence intervals of fitting parameters that are comparable to those estimated by high-cost Bayesian EMC methods. Using the approximated formula derived in this study, we may be able to estimate the S/N ratio required to obtain the desired parameters with the desired confidence interval, which will be useful in experimental planning such as the setting of measurement time and energy resolution for individual measurements.
In this study, we treated the peak shapes of the XPS spectra as pseudo-Voigt functions. In practice, the suitable basis function is a Voigt function defined by the convolution of a Lorentzian function derived from the natural widths and a Gaussian function derived from a device. We will consider peak fitting based on convoluted Voigt functions in our future work.
Appendix A.
Bayesian spectral deconvolution
In this section, we describe Bayesian spectral decomposition used to fit the spectral data and estimate the number of peaks.
A.1 Bayesian estimation
Bayesian estimation is a framework in which the process of generating data in a probabilistic model is formulated and an estimation is made by tracing back the causal relationship using the Bayesian theorem. We first define a probabilistic model. It is assumed that the spectral data are generated by adding noise to the spectrum function :
| (A-1) |
We assume that follows a Gaussian distribution with zero mean and variance , and the conditional probability of the spectral data is then formulated as
| (A-2) |
Assuming that data are independent and identically distributed, using the error function , we obtain the probability of the data set as
| (A-3) |
For the Gaussian probability model, the maximum likelihood estimation method that maximizes is equivalent to the well-known least-squares method.
In the Bayesian estimation, we regard not only the data but also the parameter set and the number of peaks as random variables; i.e., we construct a model of the simultaneous probability distribution :
| (A-4) |
Here, and are prior distributions, and they should be set in advance.
We estimate the parameter using the posterior distribution :
| (A-5) |
where
| (A-6) |
This technique for obtaining the parameter that maximizes the posterior distribution is a maximum a posteriori (MAP) estimation, and the parameter at this time is called the MAP estimator. Bayesian estimation has the advantage that both the optimum value of the parameter and the confidence interval can be evaluated by obtaining the width of the posterior distribution .
We use the posterior probability to estimate the number of peaks . This can be done by adopting the procedure of probability marginalization:
| (A-7) |
where represents the sum of the numbers of peaks . The denominator in Eq. (A-7) does not depend on because it is the sum of all peaks . We thus obtain
| (A-8) |
Therefore, to obtain the posterior probability with respect to the number of peaks, we need to calculate . The Bayesian free energy is defined as
| (A-9) |
If is uniformly distributed, the maximization of the posterior probability is equivalent to the minimization of the Bayesian free energy . The number of peaks can therefore be estimated by minimizing the Bayesian free energy.
A.2 Exchange Monte Carlo method
How to evaluate a posterior probability is a problem when calculating the spectral decomposition through Bayesian estimation. In the case of spectral decomposition, the posterior distribution for the parameter is difficult to handle because the distribution does not become a known distribution, such as a Gaussian distribution. As an example, it is difficult to determine the MAP estimators because of the presence of local solutions. Furthermore, when we attempt to determine the confidence interval of the parameter , it is necessary to determine the shape of the distribution itself, which is much more difficult.
The exchange Monte Carlo (EMC) method [8] solves the above problems. In this study, the method described below is called the Bayesian EMC method, which is a Markov-chain Monte Carlo method. We prepare a plurality of distributions,
| (A-10) |
| (A-11) |
in which the inverse temperature is introduced into the posterior distribution, and simultaneously perform sampling in parallel. That is, the target distribution becomes the simultaneous distribution
| (A-12) |
Here, the inverse temperature is and is the number of prepared temperatures. is the fitting parameters for inverse temperature . For , matches the posterior distribution .
The algorithm of the EMC method comprises two updates.
Metropolis sampling for each temperature. For each inverse temperature , we sample from the distribution of using the Metropolis algorithm.
State exchange between adjacent temperatures. We exchange states between adjacent temperatures and set . We determine whether to exchange on the basis of the probability
| (A-13) |
| (A-14) |
As a result, the parameter obtained from the -th inverse temperature can be regarded as sampling from the posterior distribution . Thus, by repeating the above procedure sufficiently many times and recording , we obtain a sample sequence from the posterior distribution and we can determine the MAP estimators and confidence interval.
There are two advantages of the EMC method. The first is that heating and annealing effects can be introduced by stochastically exchanging samples between adjacent temperatures during the sampling of each distribution. We thereby escape local solutions and efficiently search for global solutions. This is an important advantage for the accurate estimation of MAP estimators and confidence intervals. The second is that the free energy can also be calculated using the results of the EMC method. We introduce the inverse temperature into the free energy:
| (A-15) |
At the high-temperature limit , it holds that . The desired free energy is expressed as
| (A-16) |
| (A-17) |
is therefore an integral of the expected value under the probability distribution with respect to the inverse temperature in the range from 0 to 1. In addition, because the results of the EMC method provide samples at each inverse temperature , the expected value can be calculated. Therefore, by using the piecewise quadrature method, we can calculate the free energy and perform model selection [6,16].
Figure A-1.

Results of model selection and posterior distributions and . (a)–(e) only differ in the initial random seeds used in generating the spectral data sets under the same conditions as those in Figure 1(a).
Figure A-2.

Two-dimensional distribution of parameters . The diagonal components of the figure show the histogram of the posterior distribution of each parameter. The lower off-diagonal components are two-dimensional distributions, whereas the upper off-diagonal components show the coefficients of correlation between two parameters. The dotted lines indicate the true parameter values used to generate the spectral data.
Appendix B.
Effects of changes in the data set
We here discuss how the data set affects the posterior distribution of parameter shown in Figure 4. Specifically, we generate five data sets in which only the noise (initial random seed) is changed under the same condition as that in Figure 1(a), and we then perform Bayesian estimation for each data set. The results of model selection and the posterior distributions of the parameters and are shown in Fig. A-1. The results for the zero seed are the same as those in Figure 2(a) and 4. It is seen that the results of model selection are stable regardless of the data set. In Fig. A-1, deviation from the true values for and in Figure 4 is strongly affected by the noise added to the data. To reduce this deviation, spectral data must be measured so that the S/N ratio improves. It also seems that the deviations from the true values are correlated between parameters. For example, there is a negative correlation between and ; when shifts to a larger value, the corresponding shifts to a smaller value. We have discussed this result in detail in Appendix C.
Appendix C.
Correlation structure in the posterior distribution of the parameters
The two-dimensional distribution of parameters is shown in Fig. A-2. The diagonal components of the figure show the histogram of the posterior distribution for each parameter. The lower off-diagonal components show two-dimensional distributions, whereas the upper off-diagonal components represent correlation coefficients defined by
| (A-18) |
where and are respectively the STDs of and data, whereas is the covariance between and data. is almost uncorrelated with other parameters, whereas the parameters , and are strongly correlated with each other. This correlation is considered to be an intrinsic property of the model that comes from the constraint that the change in the spectral shape is small. Considering the various correlations, it may be possible to estimate a property derived from a spectral parameter, such as a peak area intensity.
Appendix D.
Features of the regression function and the optimization method of the parameters
The functional form of Eq. (19) with fitting parameters and is
| (A-19) |
is the reciprocal of the S/N ratio. is associated with the asymptote of the value of the STD at which the scaled peak-to-peak distance is sufficiently large. is a negative value related to the curvature of the function. The larger the value, the more rapidly the STD increases as decreases. is a parameter for adjusting the scale in the -axis direction. is the position of the asymptote on the axis, and the normal deviation diverges to positive infinity as approaches ; i.e., the domain of this function is .
The fitting parameters are optimized using the trust region reflective algorithm [17]. The parameters are optimized so as to minimize the square error on a logarithmic scale with a constraint condition. We fix , as described in the text, and impose the constraints and . We also assume that the range of peak-to-peak distances to be optimized is . The ranges of the S/N ratio to be optimized are for , for , for , and for .
We perform the optimization using the curve_fit function in the SciPy library of Python3.6.
Appendix E.
Application of approximated formula (19) to a combination of a large peak and a weak peak
In a real spectrum, it is common for a satellite peak to overlap a main peak. Therefore, assuming an application to a real spectrum, we simulated the case where the heights of the two peaks are different. First, artificial spectra were generated under two conditions, and the posterior distribution obtained by Bayesian EMC was investigated in detail. Next, we systematically changed the peak height and investigated whether the approximation formula (19) for the confidence interval of the parameter can be applied.
First, we show two calculation examples. Two artificial spectra were generated under two conditions as shown in Figs. E-1 (a) and (b). The conditions for the artificial spectra are as follows. The peak heights are for Fig. E-1(a) and for Fig. E-1(b). Other peak parameters are common, the peak positions are , the HWHMs are , and the Lorentz–Gauss mixing ratios are . The background is not taken into consideration. The STD of Gaussian noise is . We applied the Bayesian EMC method to each spectrum. As a result, a model with two peaks was selected for both spectra. The fitted results of Bayesian estimation for each spectrum are shown in Figs. E-1 (a) and (b).
Figures E-2 and E-3 show the posterior distributions of the fitting parameters corresponding to the spectra in Figs. E-1 (a) and E-1 (b), respectively. Figure E-2 shows that each parameter is estimated with good accuracy in that the distribution roughly includes the true value and appears similarly to a Gaussian distribution. The confidence interval of the parameters of the second peak is broader than that of the first peak, except for the confidence interval of peak height. The small second peak has a relatively low S/N ratio because the Gaussian noise is uniformly applied to the entire spectral range. As a result, the parameters of the second peak became more difficult to estimate, and their confidence intervals became wide. On the contrary, the posterior distributions of two peak heights have similar confidence intervals because the Gaussian noise is uniformly applied. Figure E-3 shows that the posterior distributions of the second peak parameters are unclear. It is, thus, difficult to estimate the parameters of a small peak. This is probably because the height of the second peak is small and it is very strongly affected by noise.
Next, to investigate the applicability of approximated formula (19), we prepared artificially measured spectra in which the height of the second peak was systematically changed. The conditions of the spectra were almost the same as those in Fig. E-1 (a), but the heights of the second peak were changed as . We applied the Bayesian EMC method to each spectrum and calculated the confidence intervals for each fitting parameter after filtered by PCA as in Section 4. Figure E-4 shows the confidence intervals as a function of the height of the second peak . We also plotted the values of confidence intervals calculated using approximated formula (19) using the true peak parameters. From this figure, it is confirmed that even if the heights of the two peaks are different, the approximated formula (19) can reproduce the actual confidence intervals obtained by the Bayesian EMC method. However, when , the confidence interval of the parameter of the second peak does not agree well with the approximated formula. Note that it is difficult to apply the approximated formula (19) when the parameters are difficult to estimate owing to the strong effect of noise.
Figure E-1.

Two examples of artificial spectra that have two peaks with different heights: (a) , (b) . The other parameters are the same. Open circles are the artificial spectrum, the orange line is the fitted spectrum, and the black lines are the peak components.
Figure E-2.

Posterior distribution of each parameter when Bayesian estimation is performed on the spectral data in Fig. E-1(a). The dashed lines indicate the true parameter values used to generate the spectral data.
Figure E-3.

Posterior distribution of each parameter when Bayesian estimation is performed on the spectral data in Fig. E-1(b). The dashed lines indicate the true parameter values used to generate the spectral data.
Figure E-4.

Confidence intervals of peak parameters as a function of the height of the second peak for peaks 1 (a) and 2 (b). (sim) indicates values simulated by the Bayesian EMC method, (approx) indicates values calculated with the approximated formula (19).
Funding Statement
This work was supported by JST CREST under grant number [JPMJCR1761] and by a JSPS KAKENHI Grant-in-Aid for Scientific Research (C) under grant number [19K2154].
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- [1].Kobayashi K, Yabashi M.. High resolution-high energy x-ray photoelectron spectroscopy using third-generation synchrotron radiation source, and its application to Si-high k insulator systems. Appl Phys Lett. 2003;83:1005–1007. [Google Scholar]
- [2].Takata Y, Yabashi M, Tamasaku K, et al. Development of hard X-ray photoelectron spectroscopy at BL29XU in SPring-8. Nucl Instrum Methods Phys Res, Sec A. 2005;547:50–55. [Google Scholar]
- [3].Hook AL, Anderson DG, Langer R, et al. High throughput methods applied in biomaterial development and discovery. Biomaterials. 2010;31:187–198. [DOI] [PubMed] [Google Scholar]
- [4].Matsumura T, Nagamura N, Akaho S, et al. Spectrum adapted expectation-maximization algorithm for high-throughput peak shift analysis. Sci Technol Adv Mater. 2019;20:733–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Shinotsuka H, Yoshikawa H, Murakami R, et al. Automated information compression of XPS spectrum using information criteria. J Electron Spectrosc Relat Phenom. 2020;239:146903. [Google Scholar]
- [6].Nagata K, Sugita S, Okada M.. Bayesian spectral deconvolution with the exchange Monte Carlo method. Neural Netw. 2012;28:82–89. [DOI] [PubMed] [Google Scholar]
- [7].Hesse R, Streubel P, Szargan R. Product or sum: comparative tests of Voigt, and product or sum of Gaussian and Lorentzian functions in the fitting of synthetic Voigt-based X-ray photoelectron spectra. Surf Interface Anal. 2007;39:381–391. [Google Scholar]
- [8].Hukushima K, Nemoto K. Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn. 1996;65:1604–1608. [Google Scholar]
- [9].Levenberg K. A method for the solution of certain non-linear problems in least squares. Q Appl Math. 1944;2:164–168. [Google Scholar]
- [10].Marquardt DW. An algorithm for least squares estimation of nonlinear parameters. J Soc Ind Appl Math. 1963;11:431–441. [Google Scholar]
- [11].Fletcher R. A Modified Marquardt Subroutine for Nonlinear Least Squares. Harwell: AERE-R 6799; 1971. [Google Scholar]
- [12].Watanabe S. Algebraic analysis for nonidentifiable learning machines. Neural Comput. 2001;13:899–933. [DOI] [PubMed] [Google Scholar]
- [13].Watanabe S. Algebraic geometrical methods for hierarchical learning machines. Neural Netw. 2001;14:1049–1060. [DOI] [PubMed] [Google Scholar]
- [14].Nagata K, Watanabe S. Asymptotic behavior of exchange ratio in exchange Monte Carlo method. Neural Netw. 2008;21:980–988. [DOI] [PubMed] [Google Scholar]
- [15].Yoshihara K. The introduction of common data processing system version 12. J Surf Anal. 2017;23:138–148. [Google Scholar]
- [16].Ogata Y. A Monte Carlo method for an objective Bayesian procedure. Ann Inst Stat Math. 1990;42:403–433. [Google Scholar]
- [17].Branch MA, Coleman TF, Li Y. A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM J Sci Comput. 1999;21:1–23. [Google Scholar]
