Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Dec 26;16:3211. doi: 10.1038/s41598-025-33209-x

Data-driven emulation of modal aerosol microphysics via neural operator-based modeling

Zhe Bai 1,, Damian Rouson 1
PMCID: PMC12830702  PMID: 41454002

Abstract

The complexity and the small characteristic scales of aerosol microphysical processes pose a big challenge for accurate and efficient Earth system simulations at regional and global scales. In this work, we construct and evaluate a surrogate model: the aerosol deep operator network (ADON), a physics-inspired dual-net architecture for emulating the aerosol microphysics parameterization suite in the version 2 of the Energy Earth System Model (E3SMv2). The current version of the surrogate model is trained on a dataset comprising 9.8 million samples obtained from a global E3SMv2 simulation with the horizontal resolution of about one degree under cloud-free conditions. Incorporating domain spatial and temporal coordinates, as well as principle components extracted from training data, the dual-net surrogate model effectively captures the intricate representations of aerosol and the relationship with atmospheric state variables, achieving an R-squared score over Inline graphic for all the lognormal aerosol modes in the extrapolated regime. The validated model provides feature importance of input variables and their impact on the predictive capacity of the surrogate model in relation to the E3SM. The computational cost of online inference time deployed on CPUs and GPUs with lower precisions highlights ADON’s efficiency and potential in robust predictive modeling for large-scale Earth system computations.

Subject terms: Engineering, Mathematics and computing, Physics

Introduction

Aerosol particles play a fundamental role in regulating Earth system processes. As one of the biggest sources of uncertainty, aerosol particles influence weather dynamics by modulating radiation, cloud processes, and precipitation patterns14. For example, by reflecting or absorbing sunlight, aerosols directly modify the energy flow through the atmosphere. Indirectly, aerosols act as cloud condensation nuclei, influencing the formation of clouds and precipitation, and by providing nutrients for the remote oceanic environment and hence affecting ocean biogeochemistry. The extent and nature of these influences are highly dependent on particle characteristics such as shape, size, composition and mixing state, making the microphysical aspects of aerosol life cycle a complex yet critical component in Earth system modeling.

Conventional numerical models that solve differential equations to simulate aerosol life cycles at the global scale typically operate at horizontal resolutions ranging from several tens to several hundreds of kilometers, requiring the use of parameterizations to describe the evolution of microphysical properties of aerosol populations instead of individual particles. In recent decades, various global atmospheric models have used modal aerosol parameterizations where the size distribution of the aerosol population at each time and location of interest is described by a relatively small number of lognormal probability density functions58. Further improving model accuracy without substantially increase computational cost remains an important area of ongoing research912.

Surrogate models leveraging data-driven methods have emerged as a promising approach to emulate, analyze and optimize complex physical systems. Such systems are typically characterized by multi-scale, and multi-physics intricacies constrained by conservation and constitutive laws as in partial differential equations (PDEs). Classic first-principle simulations use analytical or computational tools to solve these equations, incorporating varying domain geometries, initial and boundary conditions (IBCs), and input parameters. However, addressing such problems using traditional methods, for example finite element methods, incurs a substantial computational cost because independent simulation needed to be conducted for each setup of parameters, IBCs, or geometries that is prohibitive for real-time inference. Reduced-order methods, on the other hand, aim to construct projection-based models on existing datasets to approximate the solutions and accelerate computations at the expense of fidelity and generalization performance1316. Machine learning (ML) tools are actively being developed to infer solutions to PDEs across diverse applications including fluid dynamics, atmospheric physics and chemistry, fusion energy, and additive manufacturing1720, to name but a few; however, most existing approaches are limited to the design space of the training data, accommodating a fixed set of input parameters or IBCs, and may not readily generalize to new geometry scenarios. Harder et al introduced a shallow physics-informed neural network (PINN) to emulate the M7 aerosol microphysics parameterization package in the Hamburg Aerosol Model (HAM)21; however, there remains significant potential to enhance the accuracy of such multivariable predictive surrogate model.

Recently, neural operators including DeepONet22 and Fourier neural operator23 have been proposed as a new class of data-driven models to learn the solution operator of parametric PDEs in a mesh-convergent manner, and predict complex dynamics for diverse applications, including fluids, plasma and earth system modeling in science and engineering2427. Extending the universal approximation theorem28, DeepONet generalize finite-dimensional neural networks to learn the mapping operator between infinite-dimensional function spaces, and provide an intuitive model architecture that allows for a continuous representations of the target output functions that is independent of resolutions. Employing the Adaptive Fourier Neural Operator29 framework, FourCastNet30 was proposed to provide global predictions of fast-timescale ERA531 variables on a regular latitude and longitude grid, such as the surface wind speed, precipitation, and atmospheric water vapor. However, it poses challenges when the multiple processes evolve and interact on an irregular grid, especially with missing data in certain regions for global forecast.

In this study, we construct and evaluate a physics-based, Aerosol Deep Neural Operator Network (ADON) that emulates the parameterization suite of aerosol microphysics processes as part of the Energy Exascale Earth System model version 2 (E3SMv2)32. The solution operator network learns the functional of the volume mixing ratio (VMR) tendency in the aerosol microphysics process through the nonlinear mapping of input variables and forcing as well as encoded spatial and temporal coordinates of the discretized grids. Furthermore, we improve the model’s predictive accuracy by incorporating principle components extracted from the training data, effectively correlating the input function encoding with the dominant variances and patterns (in Inline graphic) of the microphysics process. The contributions of this work are summarized as follows:

  • We propose a data-driven surrogate model, ADON, that accurately emulates the VMR tendency of aerosol mass and number concentrations due to aerosol microphysical processes, based on the training data on cloud-free grid cells generated from a global model simulation with the E3SM atmosphere component.

  • The proposed architecture provides an integrated approach of deep learning with domain spatial and temporal coordinates, as well as physics basis extracted from the training data of parameterization suite that applies broadly to simulation data with irregular grids or incomplete spatial coverage.

  • The ADON emulator shows good extrapolation capabilities for all the variables in the same forecasting window spanning an entire year. We compare the computational cost of inference time with reduced (single) precision for improved performance while preserving computation fidelity.

The remainder of the paper is organized as follows. The Results section discusses the modal-based tendency predictions of all the aerosol variables for both in-distribution and out-of-distribution regimes, as well as the computational cost for online inference. The Methods section introduces the aerosol microphysics parameterization suite, the network architecture of our proposed model, and the extended principle-component based model.

Results

We consider the E3SM using the modal method to describe aerosol properties. Based on the NN training and hyperparameter optimization explained in the method section, we obtain the best ADON and ADON-PCA models to be used for the inference of the aerosol microphysics emulation. This section presents the inference results of the volume mixing ratio (VMR) tendency using ADON-PCA, and compares the performance of ADON and ADON-PCA to the baseline PINN model21, as well as ablation study for the proposed model architecture with quantitative analysis.

Inference model performance on lognormal aerosol modes

The aerosol microphysics parameterization suite emulated in this study is part of EAMv2, a global atmospheric model that combines a dynamical core solving fluid dynamics on a cubed-sphere grid with parameterizations of subgrid-scale processes in the atmosphere component of E3SMv2. We focus on the prediction of changes in aerosol number and mass concentrations (including affected chemical gases) due to the formation of new particles, condensation or evaporation of some of the chemical components, coagulation of aerosol particles, and mass and number transfer between different lognormal aerosol modes. The aerosol size distribution is represented by four lognormal modes, including Aitken, accumulation, coarse, and primary carbon mode. The aerosol microphysics considered includes nucleation, coagulation, condensation, and aging processes. Due to these microphysical processes, the particle number and mass are redistributed among the different modes. Multiple aerosol compositions are considered: sulfate (Inline graphic), black carbon (BC), primary organic carbon, secondary organic carbon, dust, sea salt, and marine organic matter.

Figure 2 presents the relative mean absolute error and the uncertainty of predictions from the ML emulator ADON-PCA, compared to the E3SM outputs for the same test time window at 00:30:00 UTC over the entire year of 2010 studied. The curve denoted by black markers represent the mean relative error computed across five independent runs with different random initializations. The narrow spread among these realizations suggests minimal uncertainty and high consistency in the inference results. Notably, the inference model for the aerosol number concentration of Aitken mode (Inline graphic) exhibits a larger error, while the relative Inline graphic error for the other variables of significance (e.g., the accumulation mode of aerosol number concentration (Inline graphic), mass concentration of BC (Inline graphic) and sulfate (Inline graphic), and the Aitken mode of sulfate mass concentration (Inline graphic)) remains below 0.06. Because the diameter of Aitken mode particles ranges from Inline graphic nm, they are of high sensitivity to local sources with uncertain parameters, which poses challenges in capturing the coupled physical and chemical processes with limited simulation data for validation. Figure 2b illustrates the stable inference performance throughout the year for both the in-distribution and out-of-distribution regime, which demonstrates the robust generalization capability of the surrogate model. We observe a significant overlap in the error plot between the accumulation mode (Inline graphic) and primary carbon mode (Inline graphic), which aligns with the conservation law assumption that the net change of these two variable modes are constrained to a zero-sum condition in the coagulation and aging processes. This behavior indicates the effectiveness of our model in accurately predicting the multi-variable tendencies, despite the absence of explicit constraints in the loss function. Additional results for modal aerosol variables are included in the supplementary information.

Fig. 2.

Fig. 2

(a) Global view of VMR tendency for the key components of atmospheric aerosol physics distribution: accumulation-mode aerosol particle concentrations (1/kmol-air) over 1800s as predicted by the E3SM (middle) and our ML emulator (right) in the model layer closest to the earth surface. (b) Relative mean absolute error and uncertainties of the aerosol microphysics tendency predictions from the ML emulator compared to E3SM outputs for the test samples at 00:30:00 UTC over one full year. (c) Vertical profiles of quantified relative mean absolute error and standard deviations across atmospheric pressure for the forecast of aerosol particle number and mass concentration changes in the out-of-distribution regime.

Diagnostic analysis of ML inference for physical insight

A detailed examination is conducted on both the less accurate and more accurate surrogate models to enhance the model interpretability. Figure 2c shows the relative mean absolute error across atmospheric pressure levels for the forecast of accumulation-mode and Aitken-mode aerosol particle number concentrations (1/kmol-air), as well as the mass concentration (mol/mol-air) of black carbon and sulfate in the out-of-distribution regime.

Larger quantified errors are observed at stratosphere levels, accompanied by significant variance in the surrogate model’s inference results. Additional challenges arise in the middle troposphere (especially around the altitude range of 300–550 hPa) for the Aitken-mode particle number concentration predictions, which is the primary region contributing to the error of the surrogate model for Inline graphic numInline graphic. As illustrated in Fig. 3, the discrepancies are particularly pronounced near cloud boundaries, where limited training data are available to constrain the local aerosol dynamics. This data sparsity contributes to reduced predictive accuracy, in regions such as Antarctic and South America, where the ML surrogate model fails to capture the extreme magnitudes of Aitken mode aerosol number concentrations.

Fig. 3.

Fig. 3

Examination of Aitken mode aerosol particle number concentration predictions in the middle troposphere layer: (a) E3SM simulation, (b) ADON-PCA emulator. A larger discrepancy is observed at the regions near cloud boundaries with limited or sparse data samples. The ML surrogate model shows less accurate forecast with missing extremes in Antarctic and South America regions.

With the more accurate surrogate for the accumulation-mode particle number concentration, we examine the attribution of individual input feature relevance in capturing the aerosol dynamics. Figure 4 presents the first 10 ranked input features of significance leveraging the Integrated Gradients33 of the ADON-PCA model in the lower atmosphere with the pressure level ranging from 958.0 to 998.5 hpa. Atmospheric temperature and aerosol particle size exhibit the most substantial contribution, followed by the accumulation mode of sulfate and secondary organic aerosol gases in the pre-aerosol-microphysics mixing ratios. These results are consistent with the prevailing theoretical understanding in aerosol dynamics that Aitken mode particles contribute to the growth of accumulation mode particles through coagulation processes34,35. Furthermore, the condensation of low-volatility gaseous compound precursors onto Aitken mode particles facilitates their subsequent transition into the accumulation mode36,37.

Fig. 4.

Fig. 4

Boxplot of feature importance for the inference of the accumulation mode aerosol particles in the lower atmosphere with pressure levels between 958.0 to 998.5 hpa. Atmospheric temperature and aerosol particle size show significant feature roles in the ADON model, followed by accumulation mode of sulfate, secondary organic aerosol gas, and particles of Aitken mode through coagulation and condensation.

Quantitative evaluation of ML surrogates vs. E3SM simulation

We evaluate the performance of ADON-PCA model on a test case at the last time slice (12/30) of the year, which is beyond the distribution of the training time regime. Figure S.2 shows the global map of the number concentration of accumulation mode aerosol particles of the ML emulation result using the trained ADON-PCA model versus the E3SM simulation as the ground truth. We present a comparative analysis of the distribution comparison across the atmospheric layers, ranging from upper atmosphere at 61 hectopascals (hPa) to lower atmosphere at 998.5 hpa over atmospheric pressure levels. Our ML emulation demonstrate strong agreement with the E3SM simulation for the VMR tendency Inline graphic numInline graphic, although the magnitude exhibits substantial variation from Inline graphic to Inline graphic close to the earth surface. Similarly, Fig. S.3 illustrates the global distribution of accumulation mode black carbon and sulfate aerosol concentrations as predicted by our emulation, with no visually discernible differences to the results obtained from the E3SM simulation.

Figure 5 presents the density heatmap of the coefficient of determination Inline graphic for E3SM and ADON-PCA emulation results for all the aerosol mode variable tendencies tested for the same time slice. We observe that all the ML predicted variables exhibit a high correlation to the E3SM targets, with 19 out of 20 variables achieving an Inline graphic higher than 0.98. This strong correlation persists despite the significant variation in the VMR tendency magnitudes of the aerosol modes, ranging from Inline graphic in Inline graphicmomInline graphic, to Inline graphic in Inline graphic numInline graphic, indicating strong predictive performance across a broad spectrum of variable magnitudes.

Fig. 5.

Fig. 5

Density heatmap of coefficient of determination for E3SM and ML emulations of all the aerosol VMR tendencies tested for 00:30:00 UTC, 12/30/2010. All aerosol modes exhibit consistently high coefficient of determination Inline graphic scores, indicating strong predictive performance across a broad spectrum of variable magnitudes.

Computational performance across models

The ML training and inference experiments were performed on Perlmutter, a multi-node supercomputer maintained by the National Energy Research Scientific Computing Center (NERSC)38. All tests were run on a CPU or GPU node. One CPU node consists of a large memory node running an AMD EPYC 7763 3.0 GHz with 64 physical cores, 2 threads per core, 2 sockets per node and has 512 GB of DDR4 memory; one GPU node has 1 AMD EPYC 7763 CPU with 256 GB of DDR4 memory and 4 NVIDIA A100 GPUs. The inference time on a GPU node is performed using one A100 GPU.

Table 1 presents the overall Inline graphic score, relative mean absolute error (RMAE), relative mean absolute error (RMSE), and Inline graphic of representative variables of significance, comparing our proposed model ADON and ADON-PCA, against the deep fully connected network (D-FCN) in the branch net, and the physics-informed neural network (PINN) in Ref.21 as benchmarking. ADON-PCA demonstrates superior accuracy, achieving an overall Inline graphic of 0.9926, a RMAE of 0.07, and a RMSE RMSE of 0.1991, reducing RMSE by Inline graphic and Inline graphic compared to D-FCN and PINN, respectively. Notably, ADON-PCA shows a significant higher Inline graphic for Inline graphic numInline graphic than both D-FCN and PINN in the same case. All four models predict Inline graphic well, with ADON showing a slight advantage over the others. Detailed quantification of the relative Inline graphic error for all the mode variables using these four surrogate models are included in the n information. Among the evaluated surrogate models, ADON and ADON-PCA show the best overall performance, with ADON-PCA exhibiting superior accuracy in predicting the VMR tendency of the Aitken-mode number concentrations.

Table 1.

Performance comparison between our proposed model ADON, ADON-PCA, the deep fully connected network (FCN), and the PINN21 as benchmarking.

Overall R2 Inline graphic RMAE Inline graphic RMSE Inline graphic Num_a1 R2 Inline graphic Num_a2 R2 Inline graphic bc_a1 R2 Inline graphic so4_a1 R2 Inline graphic
ADON 0.9898 0.0802 0.2741 0.9969 0.9194 0.9946 0.9997
ADON-PCA 0.9926 0.0700 0.1991 0.9986 0.9576 0.9959 0.9995
D-FCN 0.9904 0.0872 0.3010 0.9966 0.9021 0.9982 0.9995
PINN 0.9334 0.3008 0.4918 0.9641 0.7386 0.9812 0.9971

We compare the computational cost of the online inference time deployed on Perlmutter CPUs and GPU offloading for the aerosol mode predictions in Table S.2. The inference time is also reported for predicting the entire volume of a single time slice, comparing performance under reduced precision, e.g., single versus double precision for the same test sample. Notably, the GPU accelerates the online inference by 14x compared to CPU compute that decreases the cost from a bit over 1s to 0.07s. Without loss of inference accuracy, single precision further achieves a Inline graphic speedup in GPU and Inline graphic in CPU compared to the default double precision format. To keep consistency, the performance of accuracy reported in other sections use the double precision results.

Discussion

Our developed data-driven surrogate model accurately emulates the VMR tendency of aerosol particle concentrations on cloud-free grid cells, demonstrating high correlations and low relative errors compared to the E3SM. The emulator ADON shows good extrapolation capabilities to test data spanning an entire year, including handling samples outside the distribution of the training parameter regimes. Conservations laws can be effectively approximated through a comprehensive model training, while aerosol variables characterized by sparse spatial representation are prone to higher error margins. GPU accelerates the training and inference processes, highlighting its efficiency and potential for rapid prediction applications.

Limitations and future directions

The current surrogate model focuses on characterizing the microphysics process under cloud-free conditions in a single-year’s E3SM simulation. Future work includes extending the training and inference model to multiyear study to test its long-term forecast performance. Further development of the ADON model will also incorporate the aerosol-cloud interactions to emulate more complicated atmospheric physics. Integrating simulation and observational data during training or evaluation through data assimilation may constrain the surrogate model to remain consistent with observed reality. Additionally, the inference with the deep learning model is implemented in PyTorch. Building the recent explorations of simpler neural networks for modeling aerosols and cloud microphysics using the Fiats deep learning library39, ongoing research is exploring the implementation of the ADON architecture in Fiats (“Functional inference and training for surrogates”; “Fortran inference and training for science.”). Fiats uses functional programming patterns by providing pure inference and training procedures, a requirement for invocation inside Fortran 2023 do concurrent constructs40. Several compilers can automatically parallelize do concurrent on CPUs or GPUs. Supporting such invocations offers a convenient method for parallel batch inference calculations that ports across CPU and GPU hardware without code modification. Using a deep learning library written in Fortran also simplifies the bindings that facilitate linking the library into Fortran applications such as E3SM for production runs. The library interface, for example, need provide the data structure translations required of multi-language solutions such as those that wrap a C++ backend41,42.

Materials and methods

E3SM and its aerosol microphysics parameterization suite

The aerosol microphysics parameterization suite being emulated in this study is a part of E3SMv2’s atmosphere component commonly referred to as EAMv232. Previous studies have compared E3SM against observational and reanalysis data in a standard Coupled Model Intercomparison Project (CMIP) model evaluation43, with satellite cloud observations44, and ground-based Atmospheric Radiation Measurement (ARM) observations45. In the latest version E3SMv2, a comprehensive evaluation reports further improved realism for aspects of clouds, precipitation sensitivity that lie closer to observationally constrained ranges32. EAMv2 is a numerical model designed for multi-year to multi-decadal simulations of the physical and chemical processes occurring in the Earth’s atmosphere using a 3D domain covering the entire globe and the altitude range from the Earth surface to about 64 km46. EAMv2 consists of a dynamical core that solves the fluid dynamics equations on a cubed-sphere horizontal mesh and a pressure-based terrain-following vertical coordinate, as well as various parameterizations of subgrid-scale processes such as turbulence, clouds, convection, radiation, and aerosol life cycles.

The abundance of aerosol particles in the atmosphere is simulated in EAMv2 by solving time evolution equations of particle mass and number concentrations. Given the substantial variations in particle size and composition, seven chemical components and four lognormal size distribution functions (“modes”) are considered for each grid cell. The design choices regarding the allocation of chemical components across aerosol modes are illustrated47. This configuration of the aerosol parameterization package is referred to as MAM4. Furthermore, aerosol particles in EAMv2 are divided into two sub-populations of different attachment states: the interstitial aerosols are those found outside cloud droplets, while the cloud-borne aerosols are those embedded in cloud droplets. EAMv2 solves separate mass and number concentration equations for the different compositions, modes, and attachment states mentioned above, resulting in a total of 25 unknowns in each grid cell for the interstitial aerosol mass and number concentrations, plus another 25 for the cloud-borne aerosols. Comprehensive descriptions of the aerosol life cycles in EAMv2 are available in previously published studies8,9,47. Aerosol microphysics is a subset of physical processes that affects both interstitial and cloud-borne aerosols but in different ways. The inclusion of detailed prognostic equations for cloud-borne aerosols in MAM4 and in its predecessors as described by Liu et al.8 represents a level of process representation that is not commonly implemented in many other global aerosol models, such as ECHAM-HAM. Given that this is our first effort to emulate aerosol parameterizations in E3SM and to ensure comparability with earlier studies21, cloud-borne aerosols are excluded by focusing solely on cloud-free grid cells in the E3SM simulations.

While EAMv2’s dynamical core solves spatially-3D PDEs and most of the parameterizations involve spatial derivatives in the vertical direction, the aerosol microphysics parameterizations are formulated as ordinary differential equations (ODEs) that involve derivatives with respect to time. At each EAM timestep, the aerosol microphysics code advances these ODEs by one time step to calculate the changes in aerosol mass and number concentrations (as well as the mass concentrations of a few affected chemical gases) due to the formation of new particles, condensation or evaporation of some of the chemical components, coagulation of aerosol particles, and mass and number transfer between different lognormal aerosol modes. In EAM, the time integration of aerosol microphysics is done largely in isolation from the dynamical core and most of the other parameterizations. Here, we say “isolation” in the sense that sequential splitting (i.e., Lie-Trotter splitting) is applied between aerosol microphysics and the rest of EAM, so that the aerosol microphysics code takes a set of initial conditions provided by EAM, advances the ODEs by one timestep considering only the microphysics processes, and then provide to EAM the new concentrations at the end of the timestep for the calculation of other processes. Because there are other parts of EAM that significantly affect aerosol concentrations, the initial conditions used by the aerosol microphysics code in the next timestep are significantly different from the ending concentrations of the current time step predicted by the microphysics ODEs. For this reason, and because the aerosol microphysics code solves ODEs instead of PDEs, when using the E3SM simulation output data for the training and testing of emulators, we consider the data in different grid cells and different time slices as different data samples. Consequently, the model is not constrained by the presence of irregular grids or incomplete grid-box coverage spatially, allowing for broader applicability to simulation datasets.

We also note that the only exception to the Lie-Trotter splitting, from the aerosol microphysics code’s perspective, is that the chemical production rate of one of the gas species (i.e., Inline graphic) is taken into account when the ODEs for condensation-evaporation are formulated and numerically solved. This detail is relevant for the choice of input features described in the next subsection of data collection.

Data collection

The data used for training and testing emulators of aerosol microphysics in E3SMv2 was obtained by performing a global simulation using E3SMv2’s atmosphere and land components32 under present-day conditions, i.e., using prescribed climatological sea surface temperature and sea ice extent as well as other external forcing conditions that are representative of the years from 2005 to 2014. The simulation was 16 months long, including a four-month spin-up followed by another 12 months of integration. Instantaneous values of the input and output fields of the aerosol microphysics calculation were archived every five model days at 00 UTC. Each time slice of data comprises 72 vertical levels and 21,600 grid columns, amounting to approximately 1.2–1.3 million grid points under cloud-free conditions. The data are stored in double-precision format. We extract the input and output variables related to aerosol microphysics from the simulation, aiming to emulate the changes in aerosol and gas mixing ratios over 1800 seconds under cloud-free conditions. We choose to emulate the process under cloud-free conditions first as an initial step toward exploring the effect of surrogate modeling in this domain. The atmospheric conditions driving the aerosol microphysics include air temperature Inline graphic, pressure Inline graphic, pressure layer thickness Inline graphic, geopotential height Inline graphic, planetary boundary layer (PBL) height Inline graphic, and specific humidity Inline graphic to the aerosol microphysics. The input variables include mixing ratio of 6 gas species and 25 interstitial aerosol tracers before the time integration of aerosol microphysics and the sulfuric acid gas (Inline graphic) tendency from the gas-phase chemistry process. Based on physical understanding and simulation evaluation, we are able to reduce some redundant or unimportant variables for the emulation. As a result, we use 20 essential input parameters and variables to characterize the aerosol microphysics process of the tendency of 31 variables as output. Additionally, the aerosol dry diameter, wet diameter, and wet density are also considered.

A total of 9.8 million samples collected from the 8 time slices in the dataset are used for training the surrogate models. It is important to note that the variables of interest exhibit a wide dynamic range, with magnitudes spanning from Inline graphic (e.g., Inline graphic, mol/mol-air) to Inline graphic (e.g., Inline graphic, 1/kmol-air), such as in the case of accumulation-mode number concentration. This substantial disparity in scale introduces potential precision issue and numerical instability when incorporated into multi-variable models. Therefore, before performing normalization, we first reduce the skewness of the input distribution, and employ power transformation, which outperforms Yeo-Johnson transformation48 for this dataset. Specifically, we implement a signed power transformation as Inline graphic (e.g., Inline graphic) so that both the power transformation applies only to the magnitude while the original sign and zeros are preserved. We normalize the processed data with standardization by shifting the data’s mean to 0 and standard deviation to 1. The dataset, which covers a whole year, is split into training, validation, and test sets: eight time slices with 45 days apart are used as training data, one time slice is used as validation data for hyperparameter tuning, and all the others are used as test cases to evaluate the capability of the model in out-of-distribution regime.

Aerosol deep operator network (ADON)

Inspired by the Universal Approximation Theorem extended in DeepONet, we aim to learn the nonlinear operator that maps between the pre- and post- aerosol microphysics process from data. Specifically, we design two primary subneural network branches in our ADON architecture as illustrated in Fig. 1. The branch network takes in the aerosol variables Inline graphic at time step t in the cloud-free cells, and the trunk network processes the domain coordinates Inline graphic, including the latitude, longitude, level, and time of the variable input. The network is trained to learn an operator:

graphic file with name d33e904.gif 1

where Inline graphic is the set of network parameters, and p denotes the number of branches or the output dimension with all branches sharing the same parameters. The trunk-branch network can be seen as a trunk network with each weight in the last layer parameterized by another branch network, where the outputs of both networks are combined through a summation of element-wise product to make the prediction based on the two feature vectors obtained from the branch and trunk nets. In the offline training stage, we use the mean absolute error (MAE) between the true value of Inline graphic and the ML prediction for the input (Inline graphic) as the metric in the loss function,

graphic file with name d33e925.gif 2

which is more robust to the large variation in the data distribution. Regularization is used to prevent overfitting for improved generalization performance during training. We employ the grid search method, and tune the network hyperparameters, including the layer size, hidden dimension, activation function, learning rate, batch size, dropout rate, and weight decay to minimize the loss on the validation data.

Fig. 1.

Fig. 1

Schematic illustration of our developed aerosol deep operator network (ADON) model. The Branch Net encodes the transformed aerosol variables and atmospheric states; the Trunk Net embeds the discretized grid, temporal information, and (optional) PCA basis extracted from training set. The dual-net architecture applies to both training and inference to learn the solution operator of the aerosol microphysics process.

Modified trunk net based on PCA basis (ADON-PCA)

As described in ADON, we incorporate the spatial and temporal coordinates in the trunk net of the model to inform the geographical information for the surrogate model to learn the microphysics process. To further explore the underlying tendency patterns, we first compute the principle components of the training output variables, such as

graphic file with name d33e1003.gif 3

where Inline graphic and Inline graphic denote the i-th mode (basis function) and the corresponding PCA coefficients. Next, we embed the PCA basis into the trunk net along with spatial and temporal coordinates of the sampled cells. This modified trunk-branch structure inherently connect the input variables and their tendency output in the lifted latent space with the physics-informed embeddings. Table 2 presents the architectures of the branch and trunk networks in ADON, where the optional PCA basis is concatenated with the output layer in the trunk net. The two networks employ the same batch size and activation function to ensure consistency in training. We compare the performance of ADON, ADON-PCA and other ML models for inference on the test set with respect to the online inference accuracy and compute time using reduced precision.

Table 2.

ADON network architectures: (a) branch network, (b) trunk network with (optional) PCA basis connected in the output layer. Consistent batch sizes and activation functions are maintained across both networks.

Branch net architecture Trunk net architecture (w/ PCA basis)
Number of inputs 39 Input dimension 4, [20]
Nodes per hidden layer 256, 384, 128 Nodes per hidden layer 64
Number of outputs 21 Number of outputs 21
Batch size 256 Batch size 256
Activation function GELU Activation function GELU

Integrated gradients (IG)

We utilize the method of Integrated Gradients33 to quantify the feature attribution of input variables to explain the predictions of the developed deep learning model. Integrated Gradients are based on computing the integral of gradients along a path from a baseline input to the actual input, with the attribution of feature i computed as:

graphic file with name d33e1033.gif 4

and the attributions sum to the difference between the output of the DNN function G at the inputs x and corresponding baselines Inline graphic, such that

graphic file with name d33e1049.gif 5

We approximate the integral numerically by computing the Riemann sum,

graphic file with name d33e1055.gif 6

where m is the number of steps with the path from baseline to input discretized. After applying Integrated Gradients for the inference model, we obtain the attribution factors that indicate the importance of each input feature to the model’s predicted output variables.

Supplementary Information

Acknowledgements

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Scientific Discovery through Advanced Computing (SciDAC) program, via a partnership in Earth System Model Development between the Advanced Scientific Computing Research (ASCR) and the Biological and Environmental Research (BER). We would like to thank several people for valuable discussions about E3SM, aerosol data curation and variable insights including Hui Wan, Ann Almgren, and Kai Zhang. Z.B. also gratefully acknowledges support from the U.S. Department of Energy, Office of Science, SciDAC/Advanced Scientific Computing Research under Award Number DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract Number DE-AC02-05CH11231.

Author contributions

Z.B. conceived the experiments. Z.B. and D.R. conducted the experiments. Z.B. and D.R. analyzed the results. All authors reviewed the manuscript.

Data availability

The data and source codes that are used for the findings of this study will be openly available in the GitHub at https://github.com/zhbai/AerosolML upon publication for reproducibility.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-33209-x.

References

  • 1.Bellouin, N. et al. Bounding global aerosol radiative forcing of climate change. Rev. Geophys.58, e2019RG000660 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Dagan, G., Yeheskel, N. & Williams, A. I. Radiative forcing from aerosol-cloud interactions enhanced by large-scale circulation adjustments. Nat. Geosci.16, 1092–1098 (2023). [Google Scholar]
  • 3.Li, J. et al. Scattering and absorbing aerosols in the climate system. Nat. Rev. Earth Environ.3, 363–379 (2022). [Google Scholar]
  • 4.Albrecht, B. A. Aerosols, cloud microphysics, and fractional cloudiness. Science245, 1227–1230 (1989). [DOI] [PubMed] [Google Scholar]
  • 5.Vignati, E., Wilson, J. & Stier, P. M7: An efficient size-resolved aerosol microphysics module for large-scale aerosol transport models. J. Geophys. Res. Atmos.109, 1 (2004). [Google Scholar]
  • 6.Easter, R. C. et al. Mirage: Model description and evaluation of aerosols and trace gases. J. Geophys. Res. Atmos.109, 1 (2004). [Google Scholar]
  • 7.Stier, P. et al. The aerosol-climate model echam5-ham. Atmos. Chem. Phys.5, 1125–1156 (2005). [Google Scholar]
  • 8.Liu, X. et al. Toward a minimal representation of aerosols in climate models: description and evaluation in the community atmosphere model cam5. Geosci. Model Dev.5, 709–739. 10.5194/gmd-5-709-2012 (2012). [Google Scholar]
  • 9.Liu, X. et al. Description and evaluation of a new four-mode version of the modal aerosol module (mam4) within version 5.3 of the community atmosphere model. Geosci. Model Dev.9, 505–522 (2016). [Google Scholar]
  • 10.Chen, A. et al. Improving e3sm land model photosynthesis parameterization via satellite sif, machine learning, and surrogate modeling. J. Adv. Model. Earth Syst.15, e2022MS003135 (2023). [Google Scholar]
  • 11.Yarger, D., Wagman, B. M., Chowdhary, K. & Shand, L. Autocalibration of the e3sm version 2 atmosphere model using a pca-based surrogate for spatial fields. J. Adv. Model. Earth Syst.16, e2023MS003961 (2024). [Google Scholar]
  • 12.Chinta, S., Gao, X. & Zhu, Q. Machine learning driven sensitivity analysis of e3sm land model parameters for wetland methane emissions. J. Adv. Model. Earth Syst.16, e2023MS004115 (2024). [Google Scholar]
  • 13.Noack, B. R., Morzynski, M. & Tadmor, G. Reduced-Order Modelling for Flow Control Vol. 528 (Springer, 2011). [Google Scholar]
  • 14.Quarteroni, A. et al. Reduced Order Methods for Modeling and Computational Reduction Vol. 9 (Springer, 2014). [Google Scholar]
  • 15.Bai, Z. & Peng, L. Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators. Adv. Model. Simul. Eng. Sci.8, 28 (2021). [Google Scholar]
  • 16.Rozza, G., Stabile, G. & Ballarin, F. Advanced Reduced Order Methods and Applications in Computational Fluid Dynamics (SIAM, 2022). [Google Scholar]
  • 17.Duncan, J. P. et al. Application of the ai2 climate emulator to e3smv2’s global atmosphere model, with a focus on precipitation fidelity. J. Geophys. Res. Mach. Learn. Comput.1, e2024JH000136 (2024). [Google Scholar]
  • 18.Vinuesa, R. & Brunton, S. L. Enhancing computational fluid dynamics with machine learning. Nat. Comput. Sci.2, 358–366 (2022). [DOI] [PubMed] [Google Scholar]
  • 19.Dasbach, S. & Wiesen, S. Towards fast surrogate models for interpolation of tokamak edge plasmas. Nucl. Mater. Energy34, 101396 (2023). [Google Scholar]
  • 20.Kumar, S. et al. Machine learning techniques in additive manufacturing: a state of the art review on design, processes and production control. J. Intell. Manuf.34, 21–55 (2023). [Google Scholar]
  • 21.Harder, P. et al. Physics-informed learning of aerosol microphysics. Environ. Data Sci.1, e20 (2022). [Google Scholar]
  • 22.Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nat. Mach. Intell.3, 218–229 (2021). [Google Scholar]
  • 23.Li, Z. et al. Fourier Neural Operator for Parametric Partial Differential Equations. http://arxiv.org/abs/2010.08895 (2020).
  • 24.Azizzadenesheli, K. et al. Neural operators for accelerating scientific simulations and design. Nat. Rev. Phys.6, 320–328 (2024). [Google Scholar]
  • 25.Poletti, K., Offner, S. S. & Ward, R. A. Modeling turbulent and self-gravitating fluids with fourier neural operators. APL Mach. Learn.3, 1 (2025). [Google Scholar]
  • 26.Gopakumar, V. et al. Plasma surrogate modelling using fourier neural operators. Nucl. Fusion64, 056025 (2024). [Google Scholar]
  • 27.Bora, A. et al. Learning Bias Corrections for Climate Models Using Deep Neural Operators. http://arxiv.org/abs/2302.03173 (2023).
  • 28.Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw.2, 359–366 (1989). [Google Scholar]
  • 29.Guibas, J. et al. Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. http://arxiv.org/abs/2111.13587 (2021).
  • 30.Pathak, J. et al. Fourcastnet: A Global Data-Driven High-Resolution Weather Model Using Adaptive Fourier Neural Operators. http://arxiv.org/abs/2202.11214 (2022).
  • 31.Hersbach, H. et al. The era5 global reanalysis. Q. J. R. Meteorol. Soc.146, 1999–2049 (2020). [Google Scholar]
  • 32.Golaz, J.-C. et al. The doe e3sm model version 2: Overview of the physical model and initial model evaluation. J. Adv. Model. Earth Syst.14, e2022MS003156 (2022). [Google Scholar]
  • 33.Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning 3319–3328 (PMLR, 2017).
  • 34.Kulmala, M. et al. Formation and growth rates of ultrafine atmospheric particles: a review of observations. J. Aerosol Sci.35, 143–176 (2004). [Google Scholar]
  • 35.Seinfeld, J. H. & Pandis, S. N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change (Wiley, 2016). [Google Scholar]
  • 36.McMurry, P. H. A review of atmospheric aerosol measurements. Atmos. Environ.34, 1959–1999 (2000). [Google Scholar]
  • 37.Wyant, M. C., Bretherton, C. S., Wood, R., Blossey, P. N. & McCoy, I. L. High free-tropospheric aitken-mode aerosol concentrations buffer cloud droplet concentrations in large-eddy simulations of precipitating stratocumulus. J. Adv. Model. Earth Syst.14, e2021MS002930 (2022). [Google Scholar]
  • 38.National Energy Research Scientific Computing (NERSC). Perlmutter Architecture. https://docs.nersc.gov/systems/perlmutter/architecture/.
  • 39.Rouson, D. et al. Cloud microphysics training and aerosol inference with the Fiats deep learning library. In 2025 Improving Scientific Software Conference. https://ucar-sea.github.io/SEA-ISS-2025-Cloud-microphysics-training (University Corporation for Atmospheric Research, 2025).
  • 40.Rouson, D. et al. Automatically parallelizing batch inference on deep neural networks using fiats and fortran 2023 “do concurrent”. In International Conference on High Performance Computing 135–147 (Springer, 2025).
  • 41.Institute of Computing for Climate Science at the University of Cambridge. FTorch. https://github.com/Cambridge-ICCS/FTorch.
  • 42.Institute of Computing for Climate Science at the University of Cambridge. Fortran-TF-Lib. https://github.com/Cambridge-ICCS/fortran-tf-lib.
  • 43.Golaz, J.-C. et al. The doe e3sm coupled model version 1: Overview and evaluation at standard resolution. J. Adv. Model. Earth Syst.11, 2089–2129 (2019). [Google Scholar]
  • 44.Zhang, Y. et al. Evaluation of clouds in version 1 of the e3sm atmosphere model with satellite simulators. J. Adv. Model. Earth Syst.11, 1253–1268 (2019). [Google Scholar]
  • 45.Christensen, M. W. et al. Evaluation of aerosol-cloud interactions in e3sm using a lagrangian framework. Atmos. Chem. Phys.23, 2789–2812 (2023). [Google Scholar]
  • 46.Rasch, P. J. et al. An overview of the atmospheric component of the energy exascale earth system model. J. Adv. Model. Earth Syst.11, 2377–2411. 10.1029/2019MS001629 (2019). [Google Scholar]
  • 47.Wang, H. et al. Aerosols in the e3sm version 1: New developments and their impacts on radiative forcing. J. Adv. Model. Earth Syst.12, e2019MS001851 (2020). [Google Scholar]
  • 48.Yeo, I.-K. & Johnson, R. A. A new family of power transformations to improve normality or symmetry. Biometrika87, 954–959 (2000). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The data and source codes that are used for the findings of this study will be openly available in the GitHub at https://github.com/zhbai/AerosolML upon publication for reproducibility.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES