Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2024 Oct 12;52(1):652–661. doi: 10.1002/mp.17445

Prediction of electron‐solid interaction parameters using machine learning

Fatemeh Akbari 1,2,
PMCID: PMC11699995  PMID: 39395202

Abstract

Background

Electron backscattering coefficient and electron‐stopping power are essential concepts in many disciplines, from radiation to materials science, semiconductor manufacturing, and space exploration. They enable precise calculations, measurements, and simulations of electron interactions with matter, which contribute to advancing science, technology, and safety in a variety of applications. The availability of these data is fundamental to scientific research to validate hypotheses, conduct experiments, and explore new theories. A relatively novel machine learning approach has demonstrated notable success in enhancing data quality and completeness, significantly contributing to the facilitation of data discovery.

Purpose

Using fundamental material property data, the stacking ensemble machine learning (EML) technique was established in this study to generate electron‐solid interaction parameters for any target material over a wide range of energies. The final stacking EML was built using the base and meta learners bagging regressor (BR), K‐nearest neighbors (k‐NN), random forest (RF), support vector regression (SVR), and eXtreme Gradient Boosting (XGB).

Methods

In this study, two publicly available databases with a total of 4030 data points were used. Training datasets have 785 and 525 data points for electron backscattering coefficient and stopping power, respectively, whereas testing datasets contain 262 and 175 data points. Five features were used as input variables to train different individual algorithms and their combinations. On both the training and test datasets, the model was evaluated using different error metrics, including R‐squared (R 2), mean‐absolute‐error (MAE), root‐mean‐squared‐error (RMSE), and mean‐absolute‐percentage‐error (MAPE).

Results

Our model evaluation tests revealed that combining RF and XGB with a k‐NN meta‐learner outperformed other algorithms. The analysis of error metrics demonstrated a very close fit to all samples in each training dataset. Furthermore, predictions made by the model on unseen test data indicated accurate estimations of new backscattering and stopping power data.

Conclusions

The developed model achieved high prediction accuracy for various target materials across the broad electron energy spectrum. The outcomes demonstrate the effectiveness of machine learning methodology and the chosen models' suitability for addressing substantial physics challenges.

Keywords: backscattering coefficient, machine learning, stopping power

1. INTRODUCTION

The electron backscattering coefficient, defined as the ratio of the number of backscattered electrons to the total number of incident electrons, and the stopping power, that is, the rate at which the electron transfers its energy to the medium through which it is passing, are of particular importance in a variety of fields. These include applications in scanning electron microscopy, electron microlithography, Auger electron spectroscopy, radiation physics, radiation biology, Monte‐Carlo (MC) computations, and dose calculation in medical physics. 1 , 2 , 3 Notably, these parameters play a crucial role in precisely determining the dose deposited near inhomogeneities where backscattering modifies the spatial energy distribution pattern. Moreover, they contribute to the design of multi‐layer detector structures, facilitating the selection of suitable materials, and optimizing layer thicknesses to enhance signal detection in radiation dosimetry. 33 These data can be obtained using different methods: (a) The Bethe–Bloch formula 4 is one of the most widely used formulas for calculating the stopping power of charged particles, including electrons, in various materials. The Bethe–Bloch formula considers the material's density, the electron's charge and mass, and the electron's velocity. However, it is accurate for electrons with relatively high energies. (b) Monte Carlo simulations like CASINO 5 , 6 and EGSnrc 7 , 8 are potent tools for calculating electron‐stopping power and backscattering coefficients. These simulations employ probabilistic methods to model the trajectory of individual electrons as they traverse through a material. At each simulation step, the energy loss is computed by referencing theoretical expressions for stopping power. Nevertheless, they do have limitations and challenges. Running simulations with high statistical accuracy may require significant computational resources, including time and memory. The quality of the input data, such as the cross‐sections of the electron interactions and the characteristics of the materials, also significantly affects its accuracy. (c) Theoretical Calculations such as density functional theory (DFT) 9 , 10 can be used to estimate these parameters for specific materials and electron energies. However, these calculations are frequently more complex and computationally demanding than other methods. (d) Experimental Measurements in various setups, such as time‐of‐flight detectors or high‐purity germanium (HPGe) detectors, 11 , 12 can be used to measure the energy loss of electrons per unit linear distance through an absorber. Experimental techniques can also be used to determine electron backscattering coefficients using a backscattering electron detector, such as a backscattered electron imaging detector in a scanning electron microscope (SEM). 1 Experimental observations are essential for providing a benchmark for evaluating analytical studies and simulations, but they are restricted to specific energy ranges and targets. (e) Semi‐empirical models such as stopping and range of ions in matter (SRIM) 13 combine elements of both theory and measured data to estimate stopping powers. However, this is just a fitting to incomplete experimental observations or can be used for a limited range of energy.

The limitations and challenges of the current methods require a novel approach to predicting the missing data. Machine learning (ML) techniques have been proven highly effective in addressing specific and relevant physics challenges, as evidenced by recent applications in the field. For instance, in the domain of predicting stopping powers, Parfitt et al. 14 trained a random forest (RF) regression algorithm using over 34 000 experimental measurements of 522 ion‐target combinations across the energy range of 1 to 105 keV/u available on the IAEA website. 15 The R 2, or coefficient of determination, assesses the degree of variance within the predicted dataset on a scale from 0 to 1. Calculated for both the training and testing datasets, the obtained close‐to‐1 R 2 values suggest that the model utilized in this study is highly effective in explaining and predicting the variance in the stopping power data. Guo et al. 16 deployed a deep‐learning‐based stopping power model using 16 907 experimental data with ion kinetic energy within the range of 5–1000 keV/u, available on the IAEA website. 15 Akbari et al. 17 developed an ensemble stacking model using 40 044 experimental stopping power measurements of 593 unique ion‐target combinations across the energy range of 10−5 to 985 MeV, available on the IAEA website. The high R 2 values, approaching 1, on both the training and testing datasets, indicate that their ML model is performing well in capturing the underlying patterns in the data, and it has the potential to accurately predict new, unseen data. Mehnaz et al. 18 used an ensemble ML method to predict electron‐stopping power. Model training was based on the experimental data for only 12 elements and energies from 1 eV to 100 keV, available on the Joy database. 19 The developed model can generate data for 54 elements. The large dataset on ion‐target stopping powers paved the way for applying ML in this field. 14 , 16 , 17 However, only one study has examined the application of ML for electron‐stopping power, and that study's created model was restricted to producing data for only elemental targets. 18

On the other hand, to the best of our knowledge, no comparable work has ever used ML to produce electron backscattering data. This study aims to investigate the implementation of ML to produce data for electron backscattering coefficients (η) and electron‐stopping power (S) for any elements and compounds over a broad range of electron energies. I rely on the latest databases containing up‐to‐date experimental observations to achieve this goal.

2. MATERIALS AND METHODS

2.1. Dataset

A database on electron‐solid interactions was assembled by Joy in 1995 (updated in 2008). 19 It contains summaries of all the experimental stopping powers measured in various laboratories in the form of tables and figures from 1995 to 2008. Joy also collected electron backscattering coefficients for 60 different bulk materials, limited to normal incidence and energies up to 100 keV, for use by the electron microscopy community. However, a new comprehensive database for electron backscattering from solids has recently published all relevant data from Joy's, expanded to include thin film data, various incidence angles, a more comprehensive energy range (up to 15 MeV), and recent publications. A detailed discussion of the data composition, acquisition method, format, and potential applications are described in this database article. 20

In this work, I used electron backscattering coefficient data for normal incidence on bulk targets (1047 data points) from the Akbari database and stopping powers data (700 data points) from Joy's database. 19 , 20 The entire dataset for each quantity was randomly divided into a training dataset (75%) and a testing dataset (25%). The testing dataset was kept aside to be used to provide an unbiased evaluation of a final model's accuracy. Table 1 summarizes the databases used in this investigation and describes the quantities within the data and a list of elements and compounds in each database.

TABLE 1.

Description of the databases and quantities within the data.

Database Quantity Existing targets Energy range Quantity range Total data points
Akbari (1942–2022) 20 Backscattering coefficient (𝜼)

49 elements: Ag, As, Al, Au, B, Ba, Be, Bi, C, Ca, Cd, Co, Cr, Cu, Fe, Ga, Ge, Hf, Hg, In, Ir, K, La, Li, Mg, Mn, Mo, Na, Nb, Ni, Pb, Pd, Pt, Si, Sm, Sn, Sr, Ta, Te, Tl, Tl, Ti, U, V, W, Y, Zn, Zr

18 compounds: CuAu, SiO2, UO2, PbF2, ZnS, AlSi, TaC, TiC, ITO, V2O5, ZrC, IZO

1 eV–15 MeV 0.2–54.8 % 3330 a
Joy (1898– 2008) 19 Stopping power (𝑺)

12 elements: Ag, Al, Au, C, Cr, Cu, Ge, Ni, Pb, Pd, Pt, Si

14 compounds: CuAu, Ice, SiO2, ZnS, Al2O3, GaAs, GaSb, InSb, MgO, MoS2, SiC, ZnSe, ZnTe

1 eV–92.3 keV 10−4–13.8 eV/Å 700
a

This project utilized 1047 data points related to normal incidents on bulk targets.

2.2. Model

2.2.1. Data pre‐processing

Effective data pre‐processing helps ensure that ML models can discover patterns and relationships in the data, leading to better predictive performance. The specific pre‐processing steps needed to perform them depend on the nature of the data and the ML algorithm intended to be used. Our data cleaning and pre‐processing to ensure uniform data formatting involved renaming files and folders and verifying that all quantities were expressed in consistent units. A comprehensive check was conducted for omitted values, duplicate examples, incorrect labels, and erroneous values prior to feeding the data into the models. To mitigate potential numerical challenges arising from variables with varying orders of magnitude, continuous numeric variables were rescaled to a common range between 0 and 1. This adjustment promotes equitable weighting and prevents features with different units or ranges from dominating the model training process.

To verify and validate the data's integrity after cleaning and pre‐processing: (a) Basic statistical properties were compared between the original and cleaned datasets to identify any significant discrepancies, (b) random samples were extracted from both datasets and compared to ensure their representativeness and identify any inconsistencies, (c) unique identifiers were assigned to each record and were tracked through the cleaning process, confirming a one‐to‐one correspondence between records in the cleaned and original datasets, (d) a subset of the cleaned data was reverted to its original form and compared with the corresponding subset of the original data, (e) detailed documentation, describing the applied data cleaning and pre‐processing, was maintained and regularly reviewed to ensure its accurate reflection of the processes applied to the original dataset.

2.2.2. Feature selection

Feature selection is a critical step in ML that involves choosing a subset of the most relevant features (variables) from the dataset while discarding irrelevant or redundant ones. Proper feature selection can lead to more efficient and interpretable models, shorter training times, reduced overfitting, and improved model performance. Too many features often lead to overfitting, which makes the model precisely fit the data and can result in poor performance with another dataset. Furthermore, a larger feature usually requires a larger sample size; otherwise, we will also be doing a lot of regularization. Therefore, according to the literature, we selected proper descriptors of the material properties with the most significant effect on predicting the backscattering coefficient and stopping power of electrons. These features include electron energy, E; target (average) atomic mass, A; target (average) atomic number, Z; target density, ρ; and target period number. In this work, the filter method technique (Pearson's correlation and Spearman's correlation) was chosen among different techniques and strategies for feature selection. Since discovering the potential complex and unknown relationships between the dependent and independent features is an important step in ML, Pearson's and Spearman's correlations were then used to evaluate the linear and non‐linear association between features and the target variable.

2.3. Training

Different individual regression algorithms, including bagging regressor (BR), K‐nearest neighbors (k‐NN), RF, support vector regression (SVR), and eXtreme Gradient Boosting (XGB) were first tested. An ensemble technique was then used through multiple learning algorithms to obtain improved predictive performance. 21 , 22 , 23 , 24 A total of thirty ensembles, each consisting of various combinations of three regressors (two base models and one meta‐model), were developed using Python 3.10 25 and the scikit‐learn package‐ a powerful ML module built upon the SciPy framework. The predictions from the base models become the input for a higher‐level model, often referred to as the meta‐model. The meta‐model takes the predictions from the base models as its features and learns to make a final prediction. Hyper‐parameters optimization was performed using GridSearchCV to compute the optimum values, which resulted in the most accurate predictions. 26 , 27 To avoid overfitting, a 10‐fold cross‐validation was used to tune the hyper‐parameters. Following is an explanation of each of the utilized individual algorithms.

2.3.1. Bagging regressor

Bagging, or bootstrap aggregation, is used to reduce variance in noisy datasets. It involves randomly selecting subsets of the training data with replacement and training multiple independent models. The final prediction is an average of these models, resulting in a more stable estimate. 22

2.3.2. K‐nearest neighbors

KNN identifies a specific number of training samples closest to a new data point and makes predictions based on their labels. The number of samples can be either user‐defined constant or adapted based on local point density. 28

2.3.3. Random forest

The RF algorithm harnesses the strength of multiple decision trees using bagging. It introduces randomness by sampling training data and features during tree construction. Instead of averaging tree predictions, RF combines them, improving accuracy and resilience compared to individual trees. 29

2.3.4. Support vector regression

SVR differs from traditional linear regression by seeking a hyperplane that best represents the data while allowing for a specified margin of error. SVR provides flexibility in setting error tolerance and identifies an appropriate hyperplane for data fitting. 30

2.3.5. eXtreme gradient boosting

XGB combines predictions from weak learners, often decision trees, to create a robust predictive model. Known for outstanding performance, XGB is versatile and practical across diverse data types and tasks. 22 , 23

2.4. Evaluation

Model evaluation is a comprehensive process that involves selecting appropriate evaluation metrics, assessing model performance on multiple datasets, and making informed decisions about model selection, hyperparameter tuning, and deployment. Proper evaluation is essential to build effective and reliable ML models. We evaluated the prediction performance of our models on training and testing datasets using four commonly applied error metrics: R 2, RMSE, MAE, and MAPE. Each metric provided distinct insights and was calculated ten times using a training dataset and testing dataset that was not employed during the model development phase. The following equations were utilized for metrics calculation. Furthermore, ML predictions of electron backscattering coefficients and stopping powers were plotted alongside their corresponding experimental values to enhance the visualization of prediction performance.

MAE=1ni=1nytrueypredictMAPE=100ni=1nytrueypredictytrueRMSE=1ni=1nytrueypredict2R2=1i=1nytrueypredict2i=1nytruey¯true2

Additionally, a customized EGSnrc Monte Carlo code named “backscatter clrp 31 ” for calculating backscattering coefficients was employed in this study. This code is adept at conveniently providing values for backscattering coefficients, energy spectra, and angular distributions of backscattered charged particles tailored to the specific structure under consideration. Within the code's package, the sample cross‐section data file contains comprehensive information covering electrons, positrons, and photons within the energy range of 1 keV to 2 MeV. This data set incorporates considerations for Rayleigh scattering and density effects. The code validation against experimental data confirmed the simulation results to be within 4%. The computation time required to calculate electron backscattering varies depending on factors such as electron energy, number of histories, and material thickness. For instance, on a standard single‐processor computer, the CPU time for a scenario involving a monoenergetic pencil beam comprising five million electrons with a kinetic energy of 1000 keV, incident normally on a bulk Pb target, is approximately 945.5 s.

For electron‐stopping power data at energies above 1 keV, calculations were performed using the ESTAR program from the National Institute of Standards and Technology (NIST). 32

3. RESULTS

3.1. Dataset analysis

A comprehensive dataset investigation is imperative to uncover underlying patterns and potential challenges before initiating the model training process to ensure the best possible outcomes. A thorough analysis of the cleaned dataset was conducted following the essential steps of data cleaning and pre‐processing.

Figures 1 and 2 provide valuable insights into the dataset's characteristics regarding the backscattering coefficient and stopping power experimental data relating to various ranges of electron energy, quantity of interest, and atomic number. Notably, this graphical representation reveals a significant concentration of experiments conducted at lower energy ranges. This concentration draws attention to a critical observation: a substantial shortage of data points for higher energies. This observation is essential for several reasons. (1) Data availability: The limited data for higher energies underscores the potential challenges in accurately predicting outcomes at these energy ranges due to the need for more training examples. (2) Model biases: The imbalance in data distribution among energy ranges may introduce biases into the model's performance. It may excel at predicting outcomes for lower‐energy scenarios but struggle when dealing with higher‐energy cases. (3) Data collection strategy: concentrating experiments at lower energies raises questions about the data collection strategy. It prompts an exploration into why experiments predominantly focus on lower energies and whether additional data collection efforts are warranted for higher‐energy scenarios. These figures further complement this analysis by illustrating the frequency distribution of existing data concerning the backscattering coefficient, stopping power, and atomic number. As depicted in the figures, the most extensively investigated materials are characterized by low atomic numbers.

FIGURE 1.

FIGURE 1

Distribution of backscattering coefficient data as a function of energy, backscattering coefficient value, and atomic number. Energy is represented on a logarithmic scale.

FIGURE 2.

FIGURE 2

Distribution of stopping power data as a function of energy, stopping power value, and atomic number. Energy is represented on a logarithmic scale.

3.2. Features selection

Figures 3a and 4a illustrate the correlation coefficients between the selected features and backscattering and stopping power, respectively. Generally, the strength of correlation can be interpreted based on the absolute values of the coefficients. A correlation coefficient close to 1 or −1 indicates a strong relationship, while a coefficient close to 0 suggests a weak relationship. The relatively high values of both Pearson and Spearman correlation coefficients between the backscattering coefficient and features (Figure 3a) imply a stronger linear relationship between the two variables. On the other hand, the high Pearson and low Spearman correlation shown in Figure 4a suggests that the relationship might not be strictly linear. This situation arises when outliers or non‐linear patterns influence the relationship between variables.

FIGURE 3.

FIGURE 3

Correlation of features with the electron backscattering coefficient (a). Predicted values made by the best model for all samples versus actual experimental values on the training data (b) and the unseen data (c).

FIGURE 4.

FIGURE 4

Correlation of features with electron‐stopping power data (a). Predicted values made by the best model for all samples versus actual experimental values on the training data (b), and the unseen data (c).

Moreover, relationships among different features within the dataset were evaluated. Although features with higher correlation coefficients are inherently more effective in contributing to accurate predictions, all available features were retained for model development. This decision reflects a purposeful decision to leverage the collective information within the dataset, recognizing that each feature, despite potential correlations, can offer unique insights and value to the predictive models.

3.3. Model performance

To create effective predictive models, we generated five distinct individual models and thirty ensemble models by combining various sets of three individual algorithms. These models were rigorously assessed by measuring their accuracy using computed error metrics. Upon a comprehensive evaluation and ranking of each models’ accuracy in terms of MAPE on the test dataset, a model ensemble composed of RF and XGBoost as base learners, complemented by a k‐Nearest Neighbors meta‐learner, emerged as the most accurate predictor for both backscattering coefficients and stopping powers. To provide a clear picture of the model's performance, we presented the average error metrics from five separate runs employing this top‐performing ensemble model in Table 2. The maximum MAE and RMSE for unseen backscattering data were calculated as 1.427% and 2.215%, respectively. These values are considered acceptable within the wide range of backscattering coefficients, which extends up to 54.8%. Similarly, the maximum MAE and RMSE were obtained for unseen stopping power data as 0.3495 and 0.5840 eV/Å, respectively. These figures also fall within an acceptable range, considering the broad spectrum of stopping powers, reaching up to 13.08 eV/Å. MAPE returns errors as a percentage, facilitating the interpretation of the error values. It should be noted that a good score for MAPE is dependent on the problem. This dependency arises from the metric's nature, particularly when dealing with datasets that contain values close to zero. Due to its asymmetry property, this metric gives greater weight to the smaller true values. In general, a MAPE value below 20% is considered indicative of good performance, signifying that the predictions are acceptable in terms of accuracy. Figure 2 illustrates that a significant portion of the currently available experimental data is distributed at lower values. Despite a maximum value of 13.08 eV/Å, there are relatively few experimental measurements in this range. The inherent bias in the data could potentially influence the training of the model, favoring better performance in low stopping power regions, thus causing a reduction in MAPE, which places emphasis on lower values. Because this metric averages all data, the MAPE will be reduced in this work, particularly by including additional experimental data at higher energies.

TABLE 2.

Error metrics calculated on all samples using the best model.

Quantity Dataset R 2 MAE a RMSE a MAPE (%)
Backscattering coefficient Training 0.9975 0.5017 0.6967 3.157
Testing 0.9696 1.427 2.215 7.340
Stopping power Training 0.9842 0.1652 0.2982 20.45
Testing 0.9294 0.3495 0.5840 22.61

Abbreviations: MAE, mean‐absolute‐error; RMSE, root‐mean‐squared‐error.

a

Unit is % and eV/Å for the backscattering coefficient and stopping power, respectively.

Furthermore, Figures 3b,c and 4b,c illustrate the model's predictive ability by comparing the predicted values against experimental values. These figures depict higher R‐squared values for electron backscattering coefficient data, signifying the model's strong predictive capacity. The fact that the R 2 values are somewhat lower for electron‐stopping power data can be attributed to the smaller dataset size, as a more extensive database inherently tends to yield higher R 2 values due to the increased number of data points available for model training and evaluation.

Figure 5 offers a comprehensive comparison that displays the predictive capabilities of our ML model in estimating electron backscattering coefficients. The blue signs represent the model's predictions, while the red symbols denote the available experimental measurements. The green solid line also represents calculations obtained through the EGSnrc. 31 This comparison encompasses several examples, such as Pb, Mo, and Cu. It is important to emphasize that the predicted values span a broad spectrum of energies, covering a wider range than the experimental and calculated data confined to specific energy ranges. Furthermore, we extended the reach of our ML model to materials that lacked documented quantities in the database. Figure 5 includes an illustrative instance of such cases featuring CdTe, which holds significance in applications related to radiation detection in medical physics. 33

FIGURE 5.

FIGURE 5

Comparison of machine learning predicted electron backscattering coefficients (blue signs) with the available measured data (red symbol) and those calculated using EGSnrc (green line).

Similarly, Figure 6 provides an insightful contrast between measured and predicted electron‐stopping powers within various target materials like C and Cu. NIST 32 data were added for comparison, albeit providing approximations with uncertainties typically around 10%. Once again, we spotlight CdTe and Al2O3 as examples for which prior investigations were unavailable, and all data points depicted in the figure were generated using our model. These figures underscore our ML model's versatility and predictive accuracy, particularly in scenarios where experimental or calculated data are limited or non‐existent.

FIGURE 6.

FIGURE 6

Comparison of machine learning predicted electron‐stopping power (blue markers) with the measured data (red symbol), if available. Stopping power values from NIST 32 are represented by the green line.

Plots in Figures 5 and 6 do not incorporate uncertainties to maintain clarity. Statistical uncertainties in simulations typically remain below 1%; however, substantial variations exist among different experiments. In the literature, diverse uncertainties, ranging up to 10%, have been reported for both the backscattering coefficient and stopping power. Furthermore, it's common for experiments to report only overall uncertainties, frequently without providing individual error bars.

4. DISCUSSION

In various scientific disciplines, researchers often employ a combination of both experimental and calculated data to gain a comprehensive understanding of phenomena. Experimental data furnishes empirical evidence, while calculated data bridges gaps and investigates theoretical scenarios. Nevertheless, in many applications, the availability of experimental data is limited, either due to the impracticality of conducting exhaustive physical experiments or constraints related to available instrumentation. Additionally, data derived from simulations can be resource‐intensive in terms of computing time and storage. Given these constraints, ML models may effectively uncover significant patterns within the dataset. For the approaches employed in this study, simulation time for a single energy can extend up to 15 min, whereas ML data training time ranges from a few minutes for individual models to approximately 40 min for ensemble stacking models. However, once trained, querying the ML model for predictions is generally rapid, taking only a few seconds. This enables real‐time predictions for a wide range of energies.

This research introduces an ML approach for generating electron backscattering coefficients and stopping powers, which find applications in clinical and industrial contexts. I employed diverse error metrics to assess the models, providing valuable insights into their performance. The results indicate that the models fit the training dataset well and exhibit relatively sufficient accuracy when applied to new, unseen data. This capability enables them to produce values and reproduce plot shapes effectively.

It is worth noting that, according to my literature review, no prior studies predicted electron backscattering data, making direct comparisons with existing literature impossible. Regarding electron‐stopping power, making a fair comparison with the only similar published study is challenging due to using different evaluation metrics. This discrepancy makes it difficult to definitively determine which model performs better. Nevertheless, my approach offers a broader utility since it can predict missing stopping power data for all elements and compound targets, unlike similar research limited to 54 elements. NIST provides comprehensive electron‐stopping power data, calculated from the theory of Bethe, for various materials and electron energies ranging from 1 keV to 1 GeV. 32 However, the uncertainties of the calculated data for energies below 100 keV can be up to 10% due to the lack of shell corrections, which are required at lower energies.

Despite the significant insights gained from this preliminary study, it is crucial to underscore that further attention and exploration in future research are warranted. The impact of the database on ML model performance, particularly in the context of sparsely and noisily collected experimental data, calls for a thorough examination of strategies to enhance model quality. While the current results contribute valuable findings, it is important to recognize that the study is in its initial stages and has potential for advancement.

Considering the complexities associated with noisy data, exploring advanced noise models, such as Gaussian processes or Bayesian neural networks, presents a promising avenue for future investigation. These models are designed to specifically address and mitigate noise during training, potentially improving the robustness of ML models in handling experimental uncertainties. In addition, applying data augmentation techniques, such as adding random noise to numerical features, shuffling feature values, or duplicating rows with variations, emerges as a potential strategy to enhance model robustness, particularly in scenarios with limited data availability. While not directly supported by the current results, exploring these techniques could be crucial in overcoming challenges associated with data scarcity and noise. Furthermore, compelling exploration of the combination of Monte Carlo simulations and ML would address computational challenges, enhance accuracy, and extract valuable insights across various scientific and engineering domains. This integration could provide a powerful toolbox for researchers dealing with complex systems. It is important to emphasize that these suggestions are presented as hypotheses or speculative ideas for future exploration.

While the outcomes of such endeavors hold the potential to significantly impact the field, caution should be exercised in their implementation. Acknowledging the inherent uncertainties and assumptions associated with predicted values, especially when extending beyond the existing data range, is critical. Researchers should exercise caution when evaluating the reliability and credibility of predicted values and acknowledge the associated uncertainties when incorporating them into their research.

5. CONCLUSIONS

This paper introduces a pioneering approach for accurately estimating electron backscattering coefficients and stopping power values across a diverse range of target materials. We demonstrated that leveraging ML models to generate predictions could mitigate the resource‐intensive and costly tasks associated with conducting extensive numerical simulations and laboratory experiments.

The outcomes derived from our ML approach in this study possess broad applicability, crossing various domains, including medical physics, physics, materials science, and engineering. These findings can empower researchers to delve deeper into the fundamental physics underpinning electron‐solid interactions. Furthermore, they hold the potential to drive advancements in detectors, materials, and technologies reliant on these phenomena.

It is crucial to emphasize that ML is not intended to replace experimental measurements or computational calculations. Instead, it serves as a valuable complement and augmentation to our capabilities, providing an alternative avenue for enhancing models based on existing data.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

ACKNOWLEDGMENTS

The authors have nothing to report.

Akbari F. Prediction of electron‐solid interaction parameters using machine learning. Med Phys. 2025;52:652–661. 10.1002/mp.17445

DATA AVAILABILITY STATEMENT

The author will share data upon request.

REFERENCES

  • 1. Goldstein JI, Newbury DE, Michael JR, Ritchie NW, Scott JHJ, Joy DC. Scanning Electron Microscopy and X‐ray Microanalysis. Springer; 2017. [Google Scholar]
  • 2. Akbari F, Shvydka D. Electron backscattering for signal enhancement in a thin‐film CdTe radiation detector. Med Phys. 2022;49(10):6654‐6665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Buffa FM, Verhaegen F. Backscatter and dose perturbations for low‐to medium‐energy electron point sources at the interface between materials with different atomic numbers. Radiat Res. 2004;162(6):693‐701. [DOI] [PubMed] [Google Scholar]
  • 4. Tai H, Bichsel H, Wilson JW, Shinn JL, Cucinotta FA, Badavi FF. Comparison of stopping power and range databases for radiation transport study. NASA Technical Paper 3644. 1997.
  • 5. Hovington P, Drouin D, Gauvin R. CASINO: a new Monte Carlo code in C language for electron beam interaction—Part I: description of the program. Scanning. 1997;19(1):1‐14. [Google Scholar]
  • 6. Hovington P, Drouin D, Gauvin R, Joy DC, Evans N. CASINO: a new Monte Carlo code in C language for electron beam interactions—part III: stopping power at low energies. Scanning. 1997;19(1):29‐35. [Google Scholar]
  • 7. Ali ESM, Rogers DWO. Benchmarking EGSnrc in the kilovoltage energy range against experimental measurements of charged particle backscatter coefficients. Phys Med Biol. 2008;53(6):1527. [DOI] [PubMed] [Google Scholar]
  • 8. Ali ESM, Rogers DWO. Energy spectra and angular distributions of charged particles backscattered from solid targets. J Phys D: Appl Phys. 2008;41:9. [Google Scholar]
  • 9. Echenique P, Uranga M. Density Functional Theory of Stopping Power. Interaction of Charged Particles with Solids and Surfaces. Springer; 1991:39‐71. [Google Scholar]
  • 10. Shandiz MA. Monte Carlo and density functional theory simulation of electron energy loss spectra (Order No. 28265944). Available from ProQuest Dissertations & Theses Global Closed Collection, 2014.
  • 11. Fontana CL, Chen C‐H, Crespillo ML, et al. Stopping power measurements with the Time‐of‐Flight (ToF) technique. Nucl Instrum Methods Phys Res, Sect B. 2016;366:104‐116. [Google Scholar]
  • 12. Roy T, Tessier F, McEwen M. A system for the measurement of electron stopping powers: proof of principle using a pure β‐emitting source. Radiat Phys Chem. 2018;149:134‐141. [Google Scholar]
  • 13. Ziegler JF. SRIM 2013. Available from http://www.srim.org
  • 14. Parfitt WA, Jackman RB. Machine learning for the prediction of stopping powers. Nucl Instrum Methods Phys Res, Sect B. 2020;478:21‐33. [Google Scholar]
  • 15. Paul H. Stopping power of matter for ions graphs, data, comments and programs. (2015). Accessed October 2024. https://nds.iaea.org/stopping
  • 16. Guo X, Wang H, Li C, Zhao S, Jin K, Xue J. Development of an electronic stopping power model based on deep learning and its application in ion range prediction. Chin Phys B. 2022;31(7):073402. [Google Scholar]
  • 17. Akbari F, Taghizadeh S, Shvydka D, Sperling NN, Parsai EI. Predicting electronic stopping powers using stacking ensemble machine learning method. Nucl Instrum Methods Phys Res, Sect B. 2023;538:8‐16. [Google Scholar]
  • 18. Mehnaz, Yang L, Da B, Ding Z. Ensemble machine learning methods: predicting electron stopping powers from a small experimental database. Phys Chem Chem Phys. 2021;23(10):6062‐6074. [DOI] [PubMed] [Google Scholar]
  • 19. Joy DC. A database on electron‐solid interactions. Scanning. 1995;17(5):270‐275. [Google Scholar]
  • 20. Akbari F. A comprehensive open‐access database of electron backscattering coefficients for energies ranging from 0.1 KeV to 15 MeV. Med Phys. 2023;50:5920‐5929. [DOI] [PubMed] [Google Scholar]
  • 21. Dietterich TG, editor. Ensemble Methods in Machine Learning. International Workshop on Multiple Classifier Systems. Springer; 2000. [Google Scholar]
  • 22. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging‐, boosting‐, and hybrid‐based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2011;42(4):463‐484. [Google Scholar]
  • 23. Schapire RE. The boosting approach to machine learning: An overview. Nonlinear estimation and classification. Lecture Notes in Statistics, Vol 171. Springer, New York, NY; 2003:149‐171. doi: 10.1007/978-0-387-21579-2_9 [DOI]
  • 24. Zhang C, Ma Y. Ensemble Machine Learning: Methods and Applications. Springer; 2012. [Google Scholar]
  • 25.Accessed October 2024. https://www.python.org/
  • 26. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit‐learn: machine learning in Python. J Mach Learn Res. 2011;12:2825‐2830. [Google Scholar]
  • 27.Accessed October 2024. https://scipy.org/
  • 28. Zhang Z. Introduction to machine learning: k‐nearest neighbors. Ann Transl Med. 2016;4(11). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Breiman L. Random forests. Machine Learning. 2001;45(1):5‐32. [Google Scholar]
  • 30. Awad M, Khanna R. Efficient learning machines: Theories, concepts, and applications for engineers and system designers. Apress Berkeley, CA; 2015:67‐80.
  • 31. Ali ESM, Rogers DWO. Benchmarking EGSnrc in the kilovoltage energy range against experimental measurements of charged particle backscatter coefficients. Phys Med Biol. 2008;53:18. [DOI] [PubMed] [Google Scholar]
  • 32. NIST ESTAR database. Accessed October 2024. http://physics.nist.gov//PhysRefData//Star//Text//ESTAR.html
  • 33. Akbari F, Parsai EI, Shvydka D. Large Area Thin‐Film CdTe as the Next‐Generation X‐Ray Detector for Medical Imaging Applications. High‐Z Materials for X‐ray Detection: Material Properties and Characterization Techniques. Springer; 2023:23‐41. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The author will share data upon request.


Articles from Medical Physics are provided here courtesy of Wiley

RESOURCES