A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO2-Water Interaction

Alhasan Abdellatif; Hannah P Menke; Julien Maes; Ahmed H Elsheikh; Florian Doster

doi:10.1038/s41597-025-05794-z

. 2026 Mar 6;13:621. doi: 10.1038/s41597-025-05794-z

A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO₂-Water Interaction

Alhasan Abdellatif ^1,^✉,^#, Hannah P Menke ^1,^✉,^#, Julien Maes ¹, Ahmed H Elsheikh ¹, Florian Doster ¹

PMCID: PMC13096352 PMID: 41792182

Abstract

Accurately capturing the complex interaction between CO₂ and water in porous media at the pore scale is essential for various geoscience applications, including carbon capture and storage (CCS). We introduce a comprehensive dataset generated from high-fidelity numerical simulations to capture the intricate interaction between CO₂ and water at the pore scale. The dataset consists of 624 2D samples, each of size 512 × 512 with a resolution of 35μm, covering 100 time steps under a constant CO₂ injection rate. It includes various levels of heterogeneity, represented by different grain sizes with random variation in spacing, offering a robust testbed for developing predictive models. This dataset provides high-resolution temporal and spatial information crucial for benchmarking machine learning models.

Subject terms: Geochemistry, Geophysics

Background & Summary

CO₂ transport through porous media plays a critical role in both natural and engineered processes, including subsurface carbon sequestration^1,2, enhanced oil recovery³, and groundwater management⁴. The challenge lies in accurately characterizing the movement and saturation of CO₂, which is influenced by the complex interactions between fluid phases and the geological heterogeneity of the porous structure⁵. As CO₂ is injected into underground formations, its movement through the pore spaces of geological materials, such as sandstone or basaltic reservoirs, dictates how efficiently it can be stored over long periods. This transport process is influenced by various factors, including capillary forces and chemical interactions between CO₂, brine, and the mineral matrix.

Various approaches are utilized to understand and predict CO₂ transport in porous media. Laboratory techniques, such as core flooding experiments⁶, yield effective bulk properties like permeability and residual saturation. Advanced imaging methods, like X-ray micro-tomography⁷, allow visualization of pore-scale phenomena but have limitations, especially for dynamic processes. Numerical simulations, including lattice Boltzmann⁸, pore-network modeling⁹, and direct numerical simulation¹⁰, offer more precise estimations of the fluid properties, however at a significant computational cost.

Machine learning (ML) models are emerging as valuable tools for predicting CO₂ behavior in porous media, serving as efficient surrogates for computationally expensive simulations. Recent advancements highlight ML’s potential to estimate properties, like pressure build-up and saturation levels, with impressive speed and accuracy^11–16. The principle of these models is to learn the relationship between inputs—such as physical properties of porous media and engineering parameters—and outputs, like spatial and temporal fluid changes. Once trained on a set of representative samples, these models can generalize to predict unseen patterns, such as new permeability fields or different injection scenarios, with considerable efficiency.

However, challenges remain in terms of having a sufficient and diverse dataset for training robust models that generalize well across various scenarios. For example, current datasets often remain constrained to relatively small scales, such as maximum mesh sizes of 256 × 256^17–22, which limits the ability of these models to capture fine-grained patterns necessary for accurate predictions in complex formations. Another key limitation is that most datasets designed for machine learning models focus on predicting the final state (e.g., after the injection duration) rather than capturing intermediate states^18–20. This limitation restricts the ability of models to capture the dynamic evolution of processes over time, which is crucial for understanding CO₂ transient behaviors in real-world geological scenarios.

In this paper, we introduce a high-resolution dataset designed for benchmarking machine learning models in predicting CO₂ behavior during multiphase flow in porous media. The dataset comprises 624 two-dimensional samples, each of size 512 × 512 pixels with a spatial resolution of 35μm, capturing the intricate interplay between CO₂ and water over 100 equally spaced temporal snapshots under a constant CO₂ injection rate. A distinctive feature of this dataset is its incorporation of varying levels of heterogeneity, represented through different grain sizes, which simulate realistic geological variability. This comprehensive dataset offers critical temporal and spatial granularity, serving as a utility for developing and benchmarking machine learning models.

Methods

Geometry Preprocessing

The pore structures are generated with the open-source notebook DrawMicromodels.ipynb in https://github.com/hannahmenke/DrawMicromodels, commit 5e0f947, which perturbs a regular triangular lattice of mean grain radius R₀ by three heterogeneity amplitudes ${r a d d e v m a x, x d e v m a x, y d e v m a x}$ . For the n-th grain

\begin{matrix} R_{n} & = & R_{0} (1 + δ_{n}^{(R)}), \\ x_{n} & = & x_{n}^{l a t t i c e} + L_{x} δ_{n}^{(x)}, \\ y_{n} & = & y_{n}^{l a t t i c e} + L_{y} δ_{n}^{(y)}, \end{matrix}

where each perturbation term δ ∈ [−a, a] is sampled from a uniform distribution whose half-width a is the level-dependent deviation listed in Table 1. Five levels are defined, ranging from well-sorted media (Level 1) to highly heterogeneous media (Level 5).

Table 1.

Quantitative definition of the five heterogeneity levels (dimensionless amplitudes).

Level	1	2	3	4	5
$r a d d e v m a x$	0.05	0.10	0.15	0.20	0.25
$x d e v m a x$	0.02	0.04	0.06	0.08	0.10
$y d e v m a x$	0.02	0.04	0.06	0.08	0.10

Open in a new tab

Physical motivation

The radius variation mimics sedimentary sorting, while positional jitter reproduces local compaction and packing irregularities observed in outcrop sandstones (C_V ≈ 0.05-0.25). Increasing these amplitudes therefore widens the pore–throat distribution and the capillary contrast, both of which are known to control CO₂-water displacement dynamics.

Parametric sweep and augmentation

For each level we perform a deterministic sweep over R₀ ∈ {70, 80, 90} and target porosities ϕ ∈ {0.20, 0.25, 0.30, 0.35, 0.40, 0.45}, producing 5 × 3 × 6 = 90 base images. Twelve images that displayed percolation shortcuts were discarded after visual inspection, leaving 78 accepted bases. Each 1024 × 1024 image is subsequently cropped into four non–overlapping quadrants (512 × 512), and mirrored vertically. This yields the final ensemble of 78 × 4 × 2 = 624 geometries used in this study as shown in Fig. 1. By exposing the ML models to a range of grain size distributions and spatial configurations, the dataset enhances the model’s ability to generalize to unseen porous media. The inter–sample sweep forces machine–learning surrogates to learn scale–invariant descriptors, while the intra–sample jitter trains them to handle local anomalies, both are crucial for robust generalisation to unseen geological settings. It allows the ML model to develop robust feature extraction capabilities that are invariant to changes in grain sizes and configurations. This is crucial for ensuring that the predictions remain accurate across different geological formations. The dataset contains 624 geometries, each one is of size 512 × 512 and the physical resolution per pixel is 35 μm. All samples are available in HDF5 format along with the simulations.

Fig. 1 — Some examples of domain geometries corresponding to different patterns of heterogeneity. The heterogeneity level increases from left to right.

Multi-phase flow at the pore-scale

Understanding CO₂ injection into water-filled porous media at the pore scale is critical for designing effective carbon storage strategies, especially in tight reservoirs where pore structures are highly heterogeneous and capillary forces dominate. At this scale, the interplay between fluid properties, pore geometry, and interfacial dynamics significantly influences the distribution and transport of CO₂. These micro-scale interactions can lead to complex displacement patterns including snap off, coalescence, and ganglion migration that are difficult or impossible to capture with conventional Darcy-scale constitutive functions such as saturation-dependent capillary pressure and relative permeabilities. Robust Darcy-scale models however are key to predicting CO₂ migration and storage efficiency.

The two-phase flow simulations in this study were conducted using GeoChemFoam¹⁰, an advanced open-source numerical simulator developed at the Institute of GeoEnergy Engineering at Heriot-Watt University. GeoChemFoam is based on the OpenFOAM framework and is specifically designed to investigate pore-scale processes critical to energy transition and carbon storage.

GeoChemFoam uses the algebraic Volume-of-Fluid method²³ to solve multiphase flow. The velocity u and the pressure p solve the single-field Navier-Stokes Equations (NSE):

\nabla \cdot u = 0,

ρ (\frac{\partial u}{\partial t} + u \cdot \nabla u) = - \nabla p + \nabla \cdot S + f_{s t},

where:

ρ = α₁ρ₁ + α₂ρ₂ is the fluid density,
u is the velocity,
S = $μ (\nabla u + \nabla u^{T})$ is the viscous stress,
μ = α₁μ₁ + α₂μ₂ is the fluid viscosity,
p is the pressure,
f_st is the surface tension force,
α_i is the phase volume fraction, and
i = 1, 2 refers to the phase index.

The surface tension force is approximated using the Continuous Surface Force (CSF) model²³:

f_{s t} = σ κ \nabla α_{1},

where:

σ is the interfacial tension, and
$κ = \nabla \cdot (\frac{\nabla α_{1}}{∣ \nabla α_{1} ∣})$ is the interface curvature.

The phase indicator function α₁ solves the phase transport equation:

\frac{\partial α_{1}}{\partial t} + \nabla \cdot (α_{1} u) + \nabla \cdot (α_{1} α_{2} u_{r}) = 0 .

To reduce interface smearing, an artificial compression term is introduced by replacing u_r with a compressive velocity²³.

Each geometry is a domain of 512 × 512 voxels at a resolution and depth of 35 microns. We perform a two-phase flow simulation where CO₂ is injected into a fully water-filled model from the left boundary, as shown in Fig. 2, at a flow rate of 1 × 10⁻⁸m³/s corresponding to a capillary number of approximately 5 × 10⁻⁶. The CO₂ properties are set to be $μ_{C O_{2}} = 7.37 \times 1 0^{- 8} m^{2} / s$ and $ρ_{C O_{2}} = 3.84 \times 1 0^{2} k g / m^{3}$ . The water properties are ρ_water = 1 × 10³kg/m³ and μ_water = 1 × 10⁻⁶m²/s, with the interfacial tension between phases at 0.03 N/m, and the contact angle θ = 45°. The simulation was run until a total time of 1 s with a write interval of 0.01 s and a convergence tolerance of 1 × 10⁻⁸.

Fig. 2 — Visualization of CO₂ injection in porous media initially saturated with water. The CO₂ is injected from the left boundary, displacing the water phase as it migrates through the pore space.

In Fig. 3, we show the CO₂ migration pattern, for different heterogeneities, as it displaces water at different time steps. Over time, the CO₂ saturation front expands, displaying distinct channelized patterns and regions of accumulation. These patterns demonstrate the interaction between capillary forces, viscous forces, and the underlying geological features. The time-lapse progression also reveals the impact of grain size and pore structure on flow dynamics, emphasizing the importance of micro-scale processes in controlling large-scale behavior. We also show the pressure, capillary pressure, and vertical velocity fields for different geometries in Figs.4, 5, and 6, respectively.

Fig. 4 — Pressure field at different injection duration. Each row shows an example of the 5 heterogeneity levels in the dataset.

Fig. 5 — Capillary pressure field at different injection duration. Each row shows an example of the 5 heterogeneity levels in the dataset.

Fig. 6 — Vertical velocity field at different injection duration. Each row shows an example of the 5 heterogeneity levels in the dataset.

Data Records

The dataset has been made available on 10.5061/dryad.jm63xsjn5²⁴ and is organized into 10 folders, with each of the 5 geometries having its original version and a vertically flipped version (2 × 5 = 10). The simulation samples are provided in HDF5 format, with each file including water saturation (α_water), pressure (p), capillary pressure (pc), horizontal velocity (U_x), vertical velocity (U_y), and a binary image of the physical domain (where pores are denoted by 1 and grains by 0), as detailed in Table 2 which also lists the keys required to access the data. The water saturation α_water is in the range [0, 1]; hence, the CO₂ saturation field can be computed using the relation $α_{C O_{2}} = (1 - α_{w a t e r}) \times i m g$ , where img denotes the binary physical domain. Additionally, CSV files containing values for porosity, permeability, and relative permeability are provided, with details presented in Table 3.

Table 2.

Overview of the dataset files, including flow velocity components, pressure fields, and physical domain representations with corresponding sizes and descriptions.

File Name	Key	Size	Description
*.hdf5	Ux	100 × 512²	x-component of flow velocity
	Uy	100 × 512²	y-component of flow velocity
	alpha_water	100 × 512²	water saturation field over time
	img	512²	physical domain
	p	100 × 512²	pressure field
	pc	100 × 512²	capillary pressure field

Open in a new tab

Keys are provided for accessing hdf5 files.

Table 3.

List of files describing porosity and relative permeability values.

File Name	Description
poroPerm.csv	Time, porosity, permeability (m²), the characteristic pore length L, the Reynolds number Re, and the Darcy velocity U_D at the beginning of the simulation before any CO₂ is injected into the model.
relperm.csv	Porosity, permeability (m²), and the capillary number of each phase (Ca₁ for water and Ca₂ for CO₂) at the beginning of the simulation. The saturation of water S_w, the relative permeability of water k_rw, and the relative permeability of CO₂ k_wo are shown for each output timestep.

Open in a new tab

Technical Validation

The GeoChemFoam solver used for flow simulation has been validated against experimental data in²⁵. For accurate approximation, a convergence tolerance of 1 × 10⁻⁸ was used for all samples.

To assess the dataset’s utility for improving model generalization, three models of a U-Net architecture²⁶ were trained on datasets of varying levels of heterogeneity. Each model was trained to predict future CO₂ saturation by mapping a sequence of four consecutive saturation maps to the subsequent four timesteps. During evaluation, these models were applied in an autoregressive fashion to generate long-term predictions up to 60 timesteps. Model A was trained on the full dataset (5-Levels), model B was trained on a subset containing four of the five levels (4-Levels), and model C was trained on a subset with only the first level (1-Level). All models were then evaluated on samples from the fifth level, unseen by models B and C. For this analysis, all input samples were resized to 256 × 256 pixels, and predictions were made for the first 60 timesteps.

The results, summarized in Table 4, indicate a clear benefit to training on a more diverse dataset. The 4-Levels model achieved a lower Mean Squared Error (MSE) on average (0.0254) across the test samples compared to the 1-Level model (0.0320). This demonstrates superior average performance and generalization. The 5-Levels model, having been trained on the test data, served as a benchmark and predictably achieved the lowest average MSE (0.0145). A direct visual comparison of the predicted simulations against the ground truth, as seen in Fig. 7, corroborates these quantitative findings. Furthermore, the qualitative error maps in Fig. 8 visualize this trend, showing progressively lower absolute error from the 1-Level to the 5-Levels model. However, the per-sample MSE plots in Fig. 9 reveal that this improvement was not uniform across all samples; in some cases, the 4-Levels model performed similarly to, or slightly worse than, the 1-Level model. This suggests that while training on more varied data helps the model learn more general rules, it can also introduce biases that hinder performance on specific out-of-distribution samples. The primary conclusion is that increased training data diversity leads to better average generalization, though not necessarily universal improvement on every individual sample.

Table 4.

Summary statistics for model performance on the unseen fifth level.

Model Name	Mean MSE	Final Step MSE	Std Dev
5-Levels	0.014484	0.009853	0.004364
4-Levels	0.025410	0.023486	0.007635
1-Level	0.032036	0.037971	0.008166

Open in a new tab

Fig. 7 — Qualitative comparison of model predictions against the target simulation for a sample from the test set.

Fig. 8 — Prediction error maps for each model at different timesteps.

Fig. 9 — Mean Squared Error (MSE) over simulation timesteps for various samples of level 5.

Acknowledgements

This work is funded by the Engineering and Physical Sciences Research Council’s ECO-AI Project grant (reference number EP/Y006143/1), with additional financial support from the PETRONAS Centre of Excellence in Subsurface Engineering and Energy Transition (PACESET).

Author contributions

Conceptualization and methodology, H.P.M., J.M., A.A., A.H.E.; visualization and writing, A.A., H.P.M.; formal analysis, A.A., H.P.M., A.H.E.; funding acquisition, A.H.E., F.D., H.P.M.; supervision, A.H.E, F.D., H.P.M. All authors have read and agreed to the published version of the manuscript.

Code availability

The input files used to simulate CO₂ flow is built using GeoChemFoam¹⁰ and is available at https://github.com/ai4netzero/generating_co2_flow. The code is written in Python 3.11.9 and the list of the requirements is shown in the readme file. GeoChemFoam can be downloaded from https://github.com/GeoChemFoam/GeoChemFoam-5.1 and has been validated against experimental data in²⁵.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Alhasan Abdellatif, Hannah P. Menke.

Contributor Information

Alhasan Abdellatif, Email: alhasanabdellatif@gmail.com.

Hannah P. Menke, Email: h.menke@hw.ac.uk

References

1.Xiao, Y., Xu, T. & Pruess, K. The effects of gas-fluid-rock interactions on CO₂ injection and storage: insights from reactive transport modeling. Energy Procedia1(1), 1783–1790 (2009). [Google Scholar]
2.Guiltinan, E. J., Santos, J. E., Cardenas, M. B., Espinoza, D. N. & Kang, Q. Two-phase fluid flow properties of rough fractures with heterogeneous wettability: Analysis with lattice Boltzmann simulations. Water Resources Research57(1), e2020WR027943 (2021). [Google Scholar]
3.Xu, R., Prodanović, M. & Landry, C. Pore-scale study of water adsorption and subsequent methane transport in clay in the presence of wettability heterogeneity. Water Resources Research56(10), e2020WR027568 (2020). [Google Scholar]
4.Cassiraga, E. F., Fernández-Garcia, D. & Gómez-Hernández, J. J. Performance assessment of solute transport upscaling methods in the context of nuclear waste disposal. International Journal of Rock Mechanics and Mining Sciences42(5-6), 756–764 (2005). [Google Scholar]
5.Dentz, M., Le Borgne, T., Englert, A. & Bijeljic, B. Mixing, spreading and reaction in heterogeneous media: A brief review. Journal of Contaminant Hydrology120, 1–17 (2011). [DOI] [PubMed] [Google Scholar]
6.Mohammed, N. et al. Investigating the flow behaviour of CO₂ and N₂ in porous medium using core flooding experiment. Journal of Petroleum Science and Engineering208, 109753 (2022). [Google Scholar]
7.Huang, R., Herring, A. L. & Sheppard, A. Investigation of supercritical CO₂ mass transfer in porous media using X-ray micro-computed tomography. Advances in Water Resources171, 104338 (2023). [Google Scholar]
8.Gao, J. et al. Reactive transport in porous media for CO₂ sequestration: Pore scale modeling using the lattice Boltzmann method. Computers & Geosciences98, 9–20 (2017). [Google Scholar]
9.Xiong, Q., Baychev, T. G. & Jivkov, A. P. Review of pore network modelling of porous media: Experimental characterisations, network constructions and applications to reactive transport. Journal of Contaminant Hydrology192, 101–117 (2016). [DOI] [PubMed] [Google Scholar]
10.Maes, J. & Menke, H. P. GeoChemFoam: Direct modelling of flow and heat transfer in micro-CT images of porous media. Heat and Mass Transfer58(11), 1937–1947 (2022). [Google Scholar]
11.Zhu, Y. & Zabaras, N. Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics, 366, 415–447. Elsevier (2018).
12.Zhong, Z., Sun, A. Y. & Jeong, H. Predicting CO₂ plume migration in heterogeneous formations using conditional deep convolutional generative adversarial network. Water Resources Research55(7), 5830–5851 (2019). [Google Scholar]
13.Wang, K. et al. A physics-informed and hierarchically regularized data-driven model for predicting fluid flow through porous media. Journal of Computational Physics443, 110526 (2021). [Google Scholar]
14.Wen, G., Catherine, H. & Benson, S. M. CCSNet: a deep learning modeling suite for CO2 storage. Advances in Water Resources, 155, 104009. Elsevier (2021).
15.Wen, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., & Benson, S. M. U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163, 104180. Elsevier (2022).
16.Wen, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., & Benson, S. M. Real-time high-resolution CO₂ geological storage prediction using nested Fourier neural operators. Energy & Environmental Science, 16(4), 1732–1741, Royal Society of Chemistry (2023).
17.Wang, Y. D., Traiwit, C., Armstrong, R. T. & Mostaghimi, P. ML-LBM: predicting and accelerating steady state flow simulation in porous media with convolutional neural networks. Transport in Porous Media138(1), 49–75 (2021). [Google Scholar]
18.Feng, W. & Huang, H. Fast prediction of immiscible two-phase displacements in heterogeneous porous media with convolutional neural network. Advances in Applied Mathematics and Mechanics13(1), 140–162 (2021). [Google Scholar]
19.Wang, Z. et al. Pore-scale modeling of multiphase flow in porous media using a conditional generative adversarial network (cGAN). Physics of Fluids, 34, no. 12 AIP Publishing (2022).
20.Ko, D. D., Ji, H. & Ju, Y. S. Prediction of pore-scale flow in heterogeneous porous media from periodic structures using deep learning. AIP Advances 13, no. 4 AIP Publishing (2023).
21.Meng, Y., Jiang, J., Wu, J. & Wang, D. Transformer-based deep learning models for predicting permeability of porous media. Advances in Water Resources179, 104520 (2023). [Google Scholar]
22.Poels, Y., Minartz, K., Bansal, H. & Menkovski, V. Accelerating Simulation of Two-Phase Flows with Neural PDE Surrogates. arXiv preprint arXiv:2405.17260 (2024).
23.Rusche, H. Computational fluid dynamics of dispersed two-phase flow at high phase fractions, Ph.D. thesis, University of London, (2002).
24.Abdellatif, A., Menke, H. P., Maes, J., Elsheikh, A. H., Doster, F. Benchmark dataset for pore-scale CO₂-water interaction [Dataset]. Dryad.10.5061/dryad.jm63xsjn5 (2025). [DOI] [PMC free article] [PubMed]
25.Zhao, B. et al. Comprehensive comparison of pore-scale models for multiphase flow in porous media. Proceedings of the National Academy of Sciences116(28), 13799–13806 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Olaf, R., Philipp, F. & Thomas, B. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 234–241 (Springer, Cham, 2015).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Xiao, Y., Xu, T. & Pruess, K. The effects of gas-fluid-rock interactions on CO₂ injection and storage: insights from reactive transport modeling. Energy Procedia1(1), 1783–1790 (2009). [Google Scholar]

[CR2] 2.Guiltinan, E. J., Santos, J. E., Cardenas, M. B., Espinoza, D. N. & Kang, Q. Two-phase fluid flow properties of rough fractures with heterogeneous wettability: Analysis with lattice Boltzmann simulations. Water Resources Research57(1), e2020WR027943 (2021). [Google Scholar]

[CR3] 3.Xu, R., Prodanović, M. & Landry, C. Pore-scale study of water adsorption and subsequent methane transport in clay in the presence of wettability heterogeneity. Water Resources Research56(10), e2020WR027568 (2020). [Google Scholar]

[CR4] 4.Cassiraga, E. F., Fernández-Garcia, D. & Gómez-Hernández, J. J. Performance assessment of solute transport upscaling methods in the context of nuclear waste disposal. International Journal of Rock Mechanics and Mining Sciences42(5-6), 756–764 (2005). [Google Scholar]

[CR5] 5.Dentz, M., Le Borgne, T., Englert, A. & Bijeljic, B. Mixing, spreading and reaction in heterogeneous media: A brief review. Journal of Contaminant Hydrology120, 1–17 (2011). [DOI] [PubMed] [Google Scholar]

[CR6] 6.Mohammed, N. et al. Investigating the flow behaviour of CO₂ and N₂ in porous medium using core flooding experiment. Journal of Petroleum Science and Engineering208, 109753 (2022). [Google Scholar]

[CR7] 7.Huang, R., Herring, A. L. & Sheppard, A. Investigation of supercritical CO₂ mass transfer in porous media using X-ray micro-computed tomography. Advances in Water Resources171, 104338 (2023). [Google Scholar]

[CR8] 8.Gao, J. et al. Reactive transport in porous media for CO₂ sequestration: Pore scale modeling using the lattice Boltzmann method. Computers & Geosciences98, 9–20 (2017). [Google Scholar]

[CR9] 9.Xiong, Q., Baychev, T. G. & Jivkov, A. P. Review of pore network modelling of porous media: Experimental characterisations, network constructions and applications to reactive transport. Journal of Contaminant Hydrology192, 101–117 (2016). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Maes, J. & Menke, H. P. GeoChemFoam: Direct modelling of flow and heat transfer in micro-CT images of porous media. Heat and Mass Transfer58(11), 1937–1947 (2022). [Google Scholar]

[CR11] 11.Zhu, Y. & Zabaras, N. Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics, 366, 415–447. Elsevier (2018).

[CR12] 12.Zhong, Z., Sun, A. Y. & Jeong, H. Predicting CO₂ plume migration in heterogeneous formations using conditional deep convolutional generative adversarial network. Water Resources Research55(7), 5830–5851 (2019). [Google Scholar]

[CR13] 13.Wang, K. et al. A physics-informed and hierarchically regularized data-driven model for predicting fluid flow through porous media. Journal of Computational Physics443, 110526 (2021). [Google Scholar]

[CR14] 14.Wen, G., Catherine, H. & Benson, S. M. CCSNet: a deep learning modeling suite for CO2 storage. Advances in Water Resources, 155, 104009. Elsevier (2021).

[CR15] 15.Wen, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., & Benson, S. M. U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163, 104180. Elsevier (2022).

[CR16] 16.Wen, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., & Benson, S. M. Real-time high-resolution CO₂ geological storage prediction using nested Fourier neural operators. Energy & Environmental Science, 16(4), 1732–1741, Royal Society of Chemistry (2023).

[CR17] 17.Wang, Y. D., Traiwit, C., Armstrong, R. T. & Mostaghimi, P. ML-LBM: predicting and accelerating steady state flow simulation in porous media with convolutional neural networks. Transport in Porous Media138(1), 49–75 (2021). [Google Scholar]

[CR18] 18.Feng, W. & Huang, H. Fast prediction of immiscible two-phase displacements in heterogeneous porous media with convolutional neural network. Advances in Applied Mathematics and Mechanics13(1), 140–162 (2021). [Google Scholar]

[CR19] 19.Wang, Z. et al. Pore-scale modeling of multiphase flow in porous media using a conditional generative adversarial network (cGAN). Physics of Fluids, 34, no. 12 AIP Publishing (2022).

[CR20] 20.Ko, D. D., Ji, H. & Ju, Y. S. Prediction of pore-scale flow in heterogeneous porous media from periodic structures using deep learning. AIP Advances 13, no. 4 AIP Publishing (2023).

[CR21] 21.Meng, Y., Jiang, J., Wu, J. & Wang, D. Transformer-based deep learning models for predicting permeability of porous media. Advances in Water Resources179, 104520 (2023). [Google Scholar]

[CR22] 22.Poels, Y., Minartz, K., Bansal, H. & Menkovski, V. Accelerating Simulation of Two-Phase Flows with Neural PDE Surrogates. arXiv preprint arXiv:2405.17260 (2024).

[CR23] 23.Rusche, H. Computational fluid dynamics of dispersed two-phase flow at high phase fractions, Ph.D. thesis, University of London, (2002).

[CR24] 24.Abdellatif, A., Menke, H. P., Maes, J., Elsheikh, A. H., Doster, F. Benchmark dataset for pore-scale CO₂-water interaction [Dataset]. Dryad.10.5061/dryad.jm63xsjn5 (2025). [DOI] [PMC free article] [PubMed]

[CR25] 25.Zhao, B. et al. Comprehensive comparison of pore-scale models for multiphase flow in porous media. Proceedings of the National Academy of Sciences116(28), 13799–13806 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Olaf, R., Philipp, F. & Thomas, B. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 234–241 (Springer, Cham, 2015).

PERMALINK

A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO2-Water Interaction

Alhasan Abdellatif

Hannah P Menke

Julien Maes

Ahmed H Elsheikh

Florian Doster

Abstract

Background & Summary

Methods

Geometry Preprocessing

Table 1.

Physical motivation

Parametric sweep and augmentation

Fig. 1.

Multi-phase flow at the pore-scale

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Data Records

Table 2.

Table 3.

Technical Validation

Table 4.

Fig. 7.

Fig. 8.

Fig. 9.

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO₂-Water Interaction