Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 17;15:30154. doi: 10.1038/s41598-025-15710-5

A study of the radon seasonality with temporal dummy variables

Alessandro Pignatelli 1,, Giulia Romoli 1,#, Veronica Vignoli 1,#
PMCID: PMC12358539  PMID: 40820040

Abstract

Radon, a naturally occurring radioactive gas, has garnered significant attention due to its health risks and its potential role as a seismic indicator. Variations in radon levels have been observed in correlation with seismic activity, suggesting that radon could serve as an early warning signal for earthquakes. However, accurately forecasting radon concentrations remains challenging due to the influence of various factors, including meteorological conditions and seasonal fluctuations. Considerable effort has been dedicated to investigating the use of regression models to predict radon levels by incorporating meteorological parameters such as temperature, humidity, and atmospheric pressure. The aim of this study is to improve the modeling of baseline radon concentrations by removing periodic sources of variability (primarily environmental and seasonal effects), rather than directly focusing on radon anomaly detection. Accurate background modeling is a prerequisite for reliable anomaly detection, as it enables a clearer distinction between normal fluctuations and potential anomalies arising from endogenous factors, including seismic or structural phenomena. In particular, we show that the impact of meteorological parameters on radon prediction can be effectively replaced by a seasonal regression model based on temporal dummy variables, without significant loss in predictive accuracy. This method offers a promising alternative for radon modeling, enabling early estimation of average radon levels and facilitating the timely identification of anomalous behavior. Our findings suggest that seasonal regression models based on dummy variables provide a robust and accurate framework for forecasting radon, with potential implications for improved seismic monitoring. Importantly, any future application to anomaly detection must also account for structural changes at the measurement site, which may independently affect radon emissions.

Subject terms: Scientific data, Geophysics, Statistics

Introduction

This study focuses on 222Rn, a naturally occurring radioactive isotope of radon. Its presence in the environment results from the decay of uranium and thorium in the Earth’s crust, and the subsequent migration of radon from the lithosphere towards the Earth’s surface and atmosphere. Over the past decades, radon emanation, exhalation, and diffusion mechanisms have been extensively studied, due to its significance as both a health hazard and a geophysical tracer1,2. Prolonged exposure to high concentrations of radon has been linked to lung cancer3, highlighting the importance of measuring and monitoring radon levels in indoor environments. In addition, radon is acknowledged to be an efficient marker for dynamic geological processes4, providing valuable insights into seismic activity, groundwater monitoring, volcanic eruptions, and tectonic movements.

Radon has a half-life of 3.824 days and it is detectable primarily through advection, with key carriers including carbon dioxide (Inline graphic), methylene (Inline graphic), and nitrogen (Inline graphic). The study of radon is complicated by the stochastic nature of its emissions and the wide range of factors that can affect its exhalation and diffusion (Fig. 1). It is widely acknowledged that meteorological parameters, including temperature, rainfall, barometric pressure, relative humidity, and wind variations, play a significant role in shaping the dynamics of radon release. Indeed, radon measurements are known to exhibit seasonal variability, driven by climatic factors that collectively cause dynamic fluctuations in radon levels: temperature affects radon diffusion, with warmer conditions enhancing gas transport; heavy rainfall reduces soil permeability, limiting radon’s ability to escape from the ground into the air; low atmospheric pressure increases the gradient between soil and atmosphere, driving radon release, while moist air tends to trap radon near the ground, limiting its dispersion; strong winds dilute and disperse radon, reducing concentrations near the ground by mixing it with the surrounding air. All of these factors result in radon seasonal patterns, and numerous efforts have been made to isolate the impact of weather-related parameters, with the aim of improving the accuracy of radon monitoring and its potential applications in geophysical studies and risk assessments. Table 1 summarizes some contributions proposed in the literature from the 2000s to the present. Indoor, outdoor, and soil radon concentrations have been monitored over various time intervals and regions around the world. Models capable of replicating the observed seasonality have been proposed by selecting subsets of meteorological parameters and applying different analysis techniques of increasing complexity - from correlation analysis511 to Multiple Linear Regression (MLR) models1216 and Multilayer Perceptron (MLP) neural networks1721. The wide range of solutions proposed, both in number and type of meteorological parameters identified as the most influential on radon concentrations, highlights the challenge of accurately quantifying the impact of weather on radon. While most proposed methods rely on linear, constant-parameter models, the general consensus is that radon concentrations show significant temporal and spatial variability, primarily influenced by variations in geological structures and climatic conditions. This variability points to potential cross-correlation among the selected predictors, and the influence of additional, unaccounted-for factors, like tidal forces13 and tectonic activity. The proposed additional mechanisms involve both the direct influence of tidal forces on the water table and the mixture of gases within the rocks (e.g. Inline graphic, water vapor, radon), as well as the indirect deformations caused by Earth tides, which promote rock relaxation and create pathways for gas transport. These processes are highly dependent on site-specific subsurface geological and hydro-geological conditions, adding further complexity to the assessment of the impact of weather on radon.

Fig. 1.

Fig. 1

Key factors influencing radon emanation, exhalation, and diffusion processes22. After radon is produced from the decay of radium within solid grains, it migrates towards lower concentration regions near the Earth’s surface, eventually entering the atmosphere. Radon flux is governed by geophysical, geological, and atmospheric factors. The intricate relationships among these variables entangle the understanding and prediction of radon concentration levels over time.

Table 1.

Summary of some contributions from the literature (references are specified in the first column) in which the seasonality of radon was modelled by selecting a set of meteorological parameters (T, temperature (soil and/or air); P, barometric pressure; H, relative humidity; R, rainfall; W, wind speed and/or direction; S, solar radiation) and a specific technique (see “Analysis” column: MLR, multiple linear regression” PCR, principal component regression; BDT, boosted decision trees; MLP, multilayer perceptron). The table also shows the region in which the radon measurement was made, the time interval during which it was monitored, and the results of the analysis (see “Location”, “Time”, and “Outcome” columns, respectively).

References Radon Location Time Weather Analysis Outcome
T P H R W S
20015 Indoor UK <1y Inline graphic Inline graphic Inline graphic Inline graphic Correlation analysis Radon primarily dependent on barometric pressure, vapor pressure*, and wind variations
200212 Indoor UK 2y Inline graphic Inline graphic Inline graphic Inline graphic MLR Most of the radon variation explained by temperature, less by wind speed, rainfall, and pressure
20036 Indoor, outdoor Brazil 1y Inline graphic Inline graphic Correlation analysis Radon inverse correlation with temperature and rainfall, after correction for a lag time
200323 Outdoor (borehole) Slovenia 1.5y Inline graphic Inline graphic Inline graphic Regression trees Regression trees outperform other regression models, radon anomalies before or during earthquakes
200524 Indoor, outdoor USA 1y Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Comparative analysis Radon heavily influenced by indoor and outdoor temperature differentials and rainfall
200613 Indoor UK <1y Inline graphic Inline graphic Inline graphic Correlation analysis, MLR Weak radon correlations with rainfall and mean daily temperature
20067 Outdoor Italy 3y Inline graphic Inline graphic Inline graphic Correlation analysis Radon negative correlation with temperature, positive correlation with pressure and relative humidity
200825 Soil, indoor Norway <1y Inline graphic Inline graphic Inline graphic Inline graphic Comparative analysis Radon variations predominantly caused by changes in air temperature, less by air pressure
201017 Outdoor (borehole) Slovenia 2y Inline graphic Inline graphic Inline graphic MLP Radon highest sensitivity to soil and air temperature, lowest to rainfall. Prediction of seismic events
20108 Indoor Italy 1y Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Correlation analysis Radon strong correlation with rainfall and humidity, difference between building floors
201414 Soil India 2y Inline graphic Inline graphic Inline graphic Inline graphic MLR Radon primarily dependent on relative humidity, correlation with seismic activity
201418 Indoor Serbia <1y Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic BDT, MLP BDT method is better than MLP, soil temperature is the most important parameter
201510 Indoor Italy 4y Inline graphic Inline graphic Inline graphic Correlation analysis Radon significant positive correlation with temperature, less with pressure; negative with rainfall
20159 Indoor USA 1y Inline graphic Inline graphic Inline graphic Inline graphic Correlation analysis Radon negative correlation with indoor humidity, outdoor temperature and wind speed
201615 Outdoor Italy 4y Inline graphic Inline graphic Inline graphic Inline graphic MLR, PCR Radon primarily dependent on temperature and humidity, no clear correlation with volcanic activity
201916 Outdoor Germany 2y Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic MLR Soil wetness strongly affects radon, different influencing-mechanisms of soil and air temperature on radon
202019 Soil Italy 7y Inline graphic Inline graphic Inline graphic MLP Radon forecasting and anomaly detection; volcanic unrest affects radon emissions**
202120 Soil Pakistan <1y Inline graphic Inline graphic Inline graphic MLP, MLR, Decision Trees MLP outperforms other techniques, radon anomalies around earthquakes’ occurrences
202321 Soil India 3y Inline graphic Inline graphic Inline graphic Inline graphic Wavelets, MLR, MLP MLP outperforms other techniques, radon anomalies correlated with seismic events

*Computed from relative humidity and temperature.

**Also fumarolic tremor, background seismicity and Inline graphic gas concentration used to train the neural network.

In this paper, we analyze long-term and continuous radon emissions recorded by the IRON (Italian Radon mOnitoring Network), which consists of multiple radon monitoring stations distributed across the Italian peninsula (see “Radon monitoring stations” section for details on the IRON installation sites and instrumentation used for this study). The radon time series are examined alongside hourly meteorological data (refer to “Meteorological stations” section for information about the meteorological station locations and data). Leveraging these extensive datasets, we demonstrate that long-term radon levels can be accurately estimated using linear regression models that incorporate temporal dummy functions designed to capture seasonal patterns, achieving performance comparable to models that rely on meteorological parameters. It should be emphasized that our approach is strictly statistical in nature and does not incorporate any modeling of the underlying physical mechanisms governing radon variability.

Results

The IRON has been monitoring radon emissions across Italy since 2009. This study analyzed 18 radon time series, selecting those with at least two and a half years of data and corresponding meteorological records. As detailed in “Regression analysis” section, we applied multiple regression analyses to the radon time series, a summary of which is presented in Table 2. First, we used temperature, pressure, rainfall, relative humidity, wind strength, and solar radiation meteorological variables as predictors to model variations in the radon time series based on meteorological conditions. As alternative approaches, we used dummy variables representing monthly, fortnightly, weekly, and half-weekly intervals (with a separate regression for each interval) to capture temporal patterns in radon fluctuations. More details about the dummy-based regression models are described in “Regression analysis” section. The predictive accuracy of each model was assessed using the Mean Absolute Percentage Error (MAPE) evaluated over extended test periods, similar to the approach used by Ambrosino19. The results of regression analyses applied to each of the 18 radon time series are shown in Tables 3 and 4. As an example, the regressions applied to the ISAR time series are shown in Fig. 2. Finally, Figs. 3 and 4 show the probability density functions and the cumulative distributions of MAPE values for the different cases.

Table 2.

Summary of the regression models used for analysis of radon time series.

Regression model Predictors Dependent variable
Meteorological model 6 meteorological variables Radon
Half-weekly model 104 half-weekly dummy variables Radon
Weekly model 52 weekly dummy variables Radon
Fortnightly model 26 fortnightly dummy variables Radon
Monthly model 12 monthly dummy variables Radon

Table 3.

Summary of the train Mean Absolute Percentage Error (MAPE) values resulting from the five regression models applied to the radon time series. The mean and the median values for each column (model) are also reported.

Station Meteo model Half-weekly model Weekly model Fortnightly model Monthly model
AQU 27.7820 17.9739 18.4055 19.1432 21.6786
BADI 7.8455 6.1138 6.1216 6.1795 6.4609
BATI 31.4061 22.3017 22.4146 22.6624 23.5177
BAT2 25.3646 18.2463 18.5067 19.1186 21.0716
CDCA 31.0440 21.1901 21.2111 21.5050 21.6961
CMDO 14.9466 14.3038 14.3766 14.6659 14.7001
CMPL 13.1382 11.3898 11.5913 12.0082 12.8327
CTTR 11.2221 8.7434 8.7961 8.8331 8.9027
DUR 10.9860 7.5774 7.5914 7.5724 7.6837
FRME_E146 36.7184 22.6046 22.8946 25.3004 24.7512
FRME_H146 13.1882 9.7227 9.7959 10.0501 9.7003
GUAR2 25.1024 19.9661 20.0449 20.0886 21.1411
ISAR 11.8780 9.0291 9.2322 9.7397 11.0554
MMN 48.3965 39.1922 40.1925 40.5053 41.6651
MMNG 30.9141 19.5807 19.9170 20.2785 20.2163
MTR1 30.8632 21.7546 22.1629 22.6970 23.7447
NRCA 22.3031 14.5267 14.5600 14.6166 14.6356
SSFR 72.3525 51.0414 51.9348 53.7294 59.8330
Median 25.2335 18.1101 18.4561 19.1309 20.8941
Mean 26.0383 18.9150 19.1254 19.5422 20.6712

Table 4.

Summary of the test Mean Absolute Percentage Error (MAPE) values resulting from the five regression models applied to the radon time series. The mean and the median values for each column (model) are also reported.

Station Meteo model Half-weekly model Weekly model Fortnightly model Monthly model
AQU 36.0843 24.2457 24.3190 24.2303 26.6852
BADI 17.3287 16.4086 16.3971 16.4291 16.6514
BAT1 24.7033 38.7609 38.5701 38.2332 37.1515
BAT2 90.0601 96.0027 95.7498 95.6324 92.4582
CDCA 26.6328 19.2363 19.1746 17.9338 18.0511
CMDO 13.4836 13.1286 13.1059 13.0816 12.7315
CMPL 22.7445 17.4978 17.3654 16.3228 16.1692
CTTR 17.1341 15.5653 15.5348 15.5279 15.5077
DUR 6.2596 10.1388 10.1361 9.7264 8.5610
FRME_E146 30.1782 13.0719 12.3976 11.5404 11.6641
FRME_H146 16.0860 13.1607 13.0251 12.9233 12.6491
GUAR2 20.2106 17.3486 17.1175 17.0071 17.9196
ISAR 12.5604 11.2503 10.8161 11.4446 9.7241
MMN 47.8595 36.0497 35.9958 36.5308 37.4817
MMNG 32.6649 18.0612 17.3993 17.8657 18.0950
MTR1 24.0605 22.7551 22.7559 22.3802 20.6080
NRCA 19.1912 19.6213 19.4909 19.3692 19.7476
SSFR 133.5172 125.4396 125.8500 126.6282 127.7041
Median 23.4025 17.7795 17.6481 17.4364 17.9267
Mean 32.8200 29.3191 29.2074 29.0851 28.8579

Fig. 2.

Fig. 2

Comparison of regression models for the ISAR station. The black line represents the observed data, the red line shows the model predictions during the training period, and the blue line shows the predictions during the test period. Subfigure a presents the regression model that uses meteorological parameters as predictors. Subfigure b shows the model incorporating half-weekly dummy variables, while subfigure c uses weekly dummies. Subfigure d displays the model based on fortnightly dummy variables, and subfigure e illustrates the regression using monthly dummy variables.

Fig. 3.

Fig. 3

Kernel smoothing for Probability Density Function (PDF) estimate of test MAPE values. To control the smoothness of the resulting density curve, it was used a bandwidth value calculated by the normal approximation method (or Silverman’s rule of thumb). As one can see, there is a shift between the meteorological model MAPE peak and the dummy functions density ones. On the other hand, the overall distributions of the two approaches exhibit a very similar pattern.

Fig. 4.

Fig. 4

Empirical cumulative distribution function (CDF) of test MAPE values. As observed, the elbow point in the meteorological approach shows a displacement compared to the ones in the dummy functions cases. However, the overall distributions (particularly in the tails) appear very similar across all approaches.

To compare the accuracies of the meteorological regression model and the four dummy-based models in a statistical meaningful way, we applied two different non-parametric statistical tests: the Smirnov test and the Wilcoxon signed rank test (see “Statistical tests” section). The obtained p-values are summarized in Table 5.

Table 5.

Resulting p-values from statistical tests comparing the meteorological regression model with dummy-based models.

Half-weekly model Weekly model Fortnightly model Monthly model
Smirnov 0.4255 0.4255 0.4255 0.4255
Wilcoxon right-tailed 0.0203 0.0183 0.0164 0.0083
Wilcoxon two-tailed 0.0386 0.0347 0.0311 0.0156

Discussion

In this study, we compared the performance of a meteorological regression model with four alternative models incorporating dummy variables. It is important to emphasize that the primary objective of this analysis was to obtain statistical evidence supporting the hypothesis that the use of dummy variables does not compromise predictive accuracy relative to models based on meteorological predictors. A preliminary qualitative analysis of the MAPE values reported in Table 5 further supports this hypothesis. Indeed, models employing meteorological variables tend to exhibit higher prediction errors compared to those using dummy variables.

Although a model comparison was not the primary focus of our work, we evaluated the differences between various dummy variable configurations. As shown in the results, these differences are relatively small for both mean and median MAPEs (below Inline graphic). Therefore, no single configuration can be universally recommended, and the choice largely depends on the characteristics of the data and the specific objectives of the analysis. In general, the selection of the temporal resolution for dummy variables should reflect a balance between the degree of variability observed in the time series (based on prior exploratory analysis) and the risk of introducing multicollinearity. When intra-month variability is limited, a monthly approach may be sufficient and preferable due to its simplicity and robustness. Conversely, in the presence of rapid and high-amplitude fluctuations, a finer temporal resolution (such as half-week intervals) may provide a more accurate representation of the seasonal pattern.

A series of statistical tests was conducted to determine whether the dummy-based models could be considered statistically equivalent to the meteorological model in predicting radon concentrations. All p-values obtained from the Smirnov test exceed 0.42, indicating the absence of statistically significant differences between the cumulative MAPE distributions of the meteorological model and those of the various dummy-based models. According to the Smirnov statistical meaning, this suggests that the overall error distributions of the meteorological model and each dummy-based model are similar, with no strong evidence of disparity.

Notably, we obtain identical p-values for the different Smirnov tests. This can be due to the following considerations. First, the Smirnov statistic is defined as the maximum vertical distance between the Empirical Cumulative Distribution Functions (ECDFs) of two samples (see “Statistical tests” section). Because ECDFs are step functions, small perturbations in the input data may not alter the maximum distance between them. As a result, the test statistic remains unchanged across slightly different datasets. In addition, the p-value associated with the Smirnov statistic is typically computed using an asymptotic approximation derived under the null hypothesis that the two samples originate from the same continuous distribution. As discussed in26, this approximation becomes more accurate as sample size increases. For small samples, it may produce identical p-values for a range of statistic values due to limited resolution and rounding in the asymptotic formula.

Further insights were gained from the two-tailed and right-tailed Wilcoxon signed-rank tests, with the right-tailed test performed under the assumption that the dummy-based models’ errors exceed that of the meteorological model. The p-values from both sets of tests are all below 0.05, indicating that the dummy-based models may even outperform the meteorological model. However, this consideration extends well beyond the scope of our analysis. In our view, the mere equivalence of the two models is already a highly significant scientific finding.

The apparent discrepancy between the “no difference” outcome of the Smirnov test and the difference detected by the Wilcoxon test—a situation that can occasionally arise in statistical analysis—can be attributed to the differing sensitivities of the two methods. The Smirnov test is more responsive to differences in the overall geometry of the MAPE distribution, whereas the Wilcoxon test is primarily sensitive to shifts in central tendency, particularly median values. To clarify this point, we plotted both the probability density functions and the cumulative distribution functions of the MAPE values for the different models (Figs. 3 and 4), which visually support this interpretation. Nevertheless, the results of all statistical tests align with our main objective, reinforcing the conclusion that the dummy-based regression models do not suffer any loss in predictive accuracy compared to the model employing meteorological variables as predictors.

While it is widely acknowledged that the relationship between radon and meteorological parameters is nonlinear, suggesting a shift towards machine learning approaches (such as neural networks) for radon forecasting, it is important to note that the success of neural network models is highly dependent on their architecture and data processing procedures. In particular, when exogenous variables such as meteorological parameters are involved, the performance of neural networks becomes highly sensitive to the quality, availability, and predictive reliability of the inputs. Furthermore, deep learning methods are computationally intensive and time-consuming, making the selection of relevant variables a critical step before implementing such approaches. We plan to apply neural network approach using dummy functions in the future. In this study, we demonstrated that long-term radon forecasting can be effectively achieved using convenient seasonal dummy variables, offering an efficient and practical alternative to more complex machine learning techniques. However, before engaging in the design of more complex architectures, it was essential to assess whether the use of simple, interpretable seasonal dummy variables alone could provide a comparable level of accuracy. This preliminary step was necessary to justify the added complexity of future models and to verify if meteorological variables are indispensable for long-term radon forecasting. Future work will explore whether neural network architectures, potentially hybridized with seasonal dummy inputs, can further improve forecasting performance. The goal is to determine if such hybrid models can enhance predictive accuracy without compromising the operational feasibility and computational efficiency required for practical applications.

While the results support the effectiveness of our approach, some limitations should be acknowledged. First, the method is purely statistical and does not account for the physical processes underlying radon production, migration, and emission. This simplification may reduce its applicability in contexts where such processes are dominant or where site-specific geological or hydrological conditions introduce complex, non-seasonal dynamics. Moreover, the use of temporal dummy variables assumes the stability of seasonal patterns over time, an assumption that could be violated in the presence of structural changes at the monitoring site or long-term environmental shifts. Although our statistical tests on independent test sets do not suggest overfitting, models with higher-resolution dummy variables involve a greater number of parameters. Despite these limitations, the proposed dummy-based regression approach provides a simple, transparent, and computationally efficient method to model long-term radon variability. Its ability to reproduce seasonal patterns without relying on external meteorological inputs makes it particularly valuable in contexts where such data are unavailable or unreliable, and it offers a solid foundation for subsequent anomaly detection analyses.

Methods

The starting dataset

This section provides information about the dataset used for the analysis, consisting of radon and meteorological time series.

Radon monitoring stations

Since 2009, the IRON network27 has been providing near real-time measurements of radon emissions from various stations located throughout Italy, mainly concentrated in the Central-Southern Apennines (see Fig. 5). The stations included in this study differ in both their installation types and the radon detectors employed. Specifically, radon has been measured passively using proprietary INGV instruments based on Lucas cell detectors, from here named just Lucas, and AER-C Algade©(http://www.algade.com/) detectors. For Lucas, radon gas diffuses into the detector’s flask, whose inner wall is coated with silver-activated zinc sulfide (ZnS), serving as the scintillating material that detects radon progeny. Lucas configured acquisition window is about 2 hours long (115 minutes of data acquisition followed by 5 minutes of standby time). The minimum detectable concentration ranges from 3 to Inline graphic, depending on the electronics used and variations in the deposition of the ZnS scintillating layer within the Lucas cell. AER-C is a small sized commercial solid-state radon detector. The sensitivity of this instrument ranges between 15 and Inline graphic per pulse per hour. The acquisition window has been configured to 4 hours and measurements have been adjusted for local absolute humidity28.

Fig. 5.

Fig. 5

Locations of the IRON stations used in this study (green dots), distributed across central and southern Italy (refer to Table 6 for latitude/longitude coordinates of each station). The figure has been realized using MATLAB® version 2024a).

As detailed in Table 6, the majority of detectors’ installations are indoor (Inline graphic) and shelter (Inline graphic). Indoor detectors are located in the basement of a building, typically in a room with the smallest aeration system possible and with restricted access, in order to reduce the influence of any anthropogenic activities. In the case of shelter installation, the radon instrument is housed in a small shelter alongside other seismic and/or geodetic monitoring equipment. In borehole (Inline graphic) and cavity (Inline graphic) installations, the radon detector is placed in boreholes less than 2 meters deep and in underground cavities, such as aqueducts, tunnels or mines, respectively.

Table 6.

List of IRON stations from which radon time series data were collected. For each station, the table provides the station name, the installation type, the radon detector used (Lucas cells, AER-C), start and end dates of correspondent time series, and the total number of effective acquisition days (after excluding periods when the detector was off).

Station Site Instrument Start date End date # days
AQU Shelter AERC 2019-06-29 2021-12-31 917
BADI Shelter LUCAS 2014-08-28 2020-06-01 1784
BAT1 Borehole AERC 2018-01-13 2021-06-29 1038
BAT2 Borehole AERC 2018-01-12 2021-07-06 1041
CDCA Shelter LUCAS 2014-01-23 2020-06-03 2145
CMDO Indoor AERC 2018-02-23 2021-12-31 1388
CMPL Indoor LUCAS 2015-08-01 2019-06-25 1253
CTTR Indoor LUCAS 2014-01-01 2021-12-02 2593
DUR Shelter AERC 2019-04-16 2021-12-31 991
FRME_E146 Indoor LUCAS 2014-01-01 2017-01-14 978
FRME_H146 Indoor LUCAS 2017-08-05 2021-12-31 1532
GUAR2 Shelter AERC 2018-02-03 2021-12-31 1397
ISAR Indoor AERC 2018-12-11 2021-12-31 1109
MMN Indoor LUCAS 2014-01-01 2021-09-03 2741
MMNG Indoor LUCAS 2014-01-01 2021-11-26 2308
MTR1 Cavity AERC 2018-07-04 2021-12-31 1214
NRCA Shelter LUCAS 2016-08-26 2021-12-31 1807
SSFR Borehole LUCAS 2014-01-01 2021-12-31 1976

Starting from the available IRON dataset29,30, this study focuses on radon data corrected for internal humidity dependency. Among these, radon time series with corresponding meteorological data were retrieved from a PostgreSQL relational database (specifically designed and implemented to support IRON31,32). Since the installation of different instruments at the same station can generate inconsistent time series, we treated as separate series those defined by different station-instrument pairs. We then selected radon time series spanning at least two and a half years. Finally, we excluded the following ones, from stations

  • RDP, RDPT, RPD1, RDP2: These series are highly discontinuous due to significant human activity and ongoing construction work in the area.

  • MURB: This serie exhibits an excessively high trend toward the end, making it completely unpredictable.

As a result, the final dataset consists of 18 time series: two from the FRME station (denoted as FRME_E146 and FRME_H146) and the remaining from different stations, which are referred to simply only by their station acronyms.

Table 6 provides the start and the end dates for each downloaded radon time series. Some data gaps exist, corresponding to periods when no data were acquired. The last column of Table 6 indicates the number of effective days, excluding the intervals when the radon detector was inactive. Despite the data gaps, the selected time series are sufficiently long to capture the seasonal variations of radon, with all time series providing a sufficient coverage for analyzing radon seasonal trends. Figure 6 highlights the consistency of data collection across the stations, showing the number of days per month in the respective time series.

Fig. 6.

Fig. 6

(Top) Heat map displaying the total number of days per month during which each station collected data throughout its entire deployment period. The intensity of the color indicates the frequency of data collection, with brighter shades representing more measurements. For example, the “CTTR” station shows 248 days of data collection in October over a span of 8 years, indicating that it operated every day of October each year. (Bottom) A heat map similar to the top one, but highlighting the number of times the station recorded at least one measurement in a given month throughout its entire deployment. For example, the “CDCA” station shows the number 6 for July, meaning that over its 7-year deployment, there was at least one data point recorded in July for 6 of those years.

The radon levels used in this study were smoothed using a 15-day moving average, a procedure previously adopted for managing IRON acquisitions10,11. Each data point was smoothed by a mean calculated over a sliding window of length 15 days across neighboring elements, centered about the current and previous data points.

Meteorological stations

Radon measurements have been analyzed together with the time series of the following meteorological variables: temperature (Inline graphic), pressure (mb), rainfall (mm/h), relative humidity (Inline graphic), solar radiation (Inline graphic), and wind strength (kn). Temperature was measured along with radon at each IRON station, while other meteorological data were collected from 3B Meteo meteorological stations located near the IRON stations (see Table 7). On average, the meteorological stations are approximately Inline graphic away from the corresponding IRON stations, with the farthest distance being 22 km. Dedicated procedures were developed to automatically retrieve data from the 3B Meteo stations on a daily basis. Weather data were collected hourly, resampled to match the time intervals of the radon measurements, and then smoothed applying a 15-day moving average, as for radon data.

Table 7.

Location names and latitude/longitude coordinates (in degrees) of the IRON stations (left columns) and 3B-Meteo meteorological stations (right columns) used in the analysis. The very last column specifies the distance R (in km) between the IRON station and the correspondent meteorological station. For AQU and CTTR time series, meteorological data have been downloaded from two 3B-Meteo stations.

Station IRON station location Lat [°] Lon [°] Meteo station location Lat [°] Lon [°] R [km]
AQU Preturo (AQ) 42.383 13.316 Rocca di Cambio (AQ) 42.214 13.462 22.29
Aquila Periferia Ovest (AQ) 42.394 13.265 4.36
BADI Badiali (PG) 43.509 12.244 Città di Castello (PG) 43.457 12.231 5.85
BAT1 Gubbio (PG) 43.381 12.435 Pietralunga (PG) 43.465 12.426 9.30
BAT2 Pietralunga (PG) 43.370 12.409 Pietralunga (PG) 43.465 12.426 10.61
CDCA Città Di Castello (PG) 43.458 12.233 Città Di Castello (PG) 43.457 12.231 0.15
CMDO Montedoro (CL) 37.463 13.822 Montedoro (CL) 37.454 13.815 1.23
CMPL Campoli Appennino (FR) 41.737 13.678 Campoli Appennino (FR) 41.735 13.682 0.40
CTTR Cittareale (RI) 42.617 13.159 Cittareale (RI) 42.589 13.090 6.48
Norcia (PG) 42.790 13.151 19.19
DUR Duronia (CB) 41.650 14.466 Duronia (CB) 41.659 14.458 1.29

FRME

_E146

Forme (AQ) 42.111 13.442 Massa d’albe (AQ) 42.107 13.394 3.98

FRME

_H146

Forme (AQ) 42.111 13.442 Massa d’albe (AQ) 42.107 13.394 3.98
GUAR2 Guarcino (FR) 41.794 13.312 Guarcino (FR) 41.800 13.311 0.71
ISAR Ischia Castello (NA) 40.731 13.964 Barano d’Ischia (NA) 40.709 13.914 4.90
MMN Mormanno Faro (CS) 39.899 15.990 Mormanno (CS) 39.880 15.980 2.35
MMNG Mormanno Ghiro (CS) 39.885 16.025 Mormanno (CS) 39.880 15.980 3.90
MTR1 Monte Trocchio (FR) 41.467 13.863 Cassino (FR) 41.486 13.833 3.26
NRCA Pie La Rocca (PG) 42.833 13.114 Norcia (PG) 42.790 13.090 5.23
SSFR Sassoferrato (AN) 43.436 12.782 Sassoferrato (AN) 43.430 12.857 6.12

Regression analysis

Regression analysis is a statistical technique used to quantify and model the relationship between a dependent variable and one or more independent variables. The simplest form of regression is a linear model, where the dependent variable is expressed as a linear combination of the input variables:

graphic file with name d33e2857.gif 1

In a supervised learning framework, the goal is to estimate the parameters Inline graphic that best describe this relationship, ensuring that the predicted values of y are as close as possible to the observed data. The most common method is least squares estimation, which minimizes the sum of squared residuals across all data points.

In this study, we used two distinct regression approaches to analyze radon time series, focusing specifically on capturing periodic variations on an annual timescale, as it is well known that radon levels exhibit just diurnal and yearly periodicity11. In the first approach, we used the meteorological variables listed in “Meteorological stations” section as predictors. In the second approach, we introduced dummy variables to better account for radon temporal patterns. A temporal dummy variable is a binary categorical variable that segments a time series into distinct seasonal components, allowing the regression model to account for periodic variations of the dependent variable35. We introduced four types of dummy functions: monthly, fortnightly, weekly, and half-weekly. Each function takes the value 1 during its corresponding time period and 0 otherwise. For example, the monthly dummy function for January is set to 1 for all January observations, regardless of the year, and 0 for all other months.

The two approaches just presented result in five distinct regression models:

  1. Meteorological model: capturing temporal variations of radon as a response of meteorological condition,
    graphic file with name d33e2896.gif 2
    where Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic represent the regression coefficients associated with temperature Inline graphic, pressure Inline graphic, rainfall Inline graphic, relative humidity Inline graphic, solar radiation Inline graphic, and wind strength Inline graphic respectively.
  2. Monthly model: capturing seasonal variations on a monthly scale,
    graphic file with name d33e2984.gif 3
    where Inline graphic represents the dummy variable for month m, which takes the value 1 if the observation belongs to month m and 0 otherwise.
  3. Fortnightly model: accounting for biweekly cycles in radon concentrations,
    graphic file with name d33e3010.gif 4
    where Inline graphic represents the fortnightly dummy variable, dividing the year into 26 two-week periods.
  4. Weekly model: identifying weekly fluctuations in radon levels,
    graphic file with name d33e3030.gif 5
    where Inline graphic denotes the weekly dummy variable, which takes the value 1 for observations in week w and 0 otherwise.
  5. Half-weekly model: capturing finer temporal variations,
    graphic file with name d33e3053.gif 6
    where Inline graphic represents a half-weekly dummy variable, dividing the year into 104 half-week intervals.

These regression models have been applied to each of the 18 of radon time series (Table 6), and results have been compared to assess the effectiveness of dummy variables in reconstructing the average radon trend. Each regression was trained using the first Inline graphic of the data and tested on the remaining last Inline graphic to ensure robust evaluation. It is important to highlight that the test periods were specifically chosen to occur at the end of each time series, allowing us to assess the models’ forecasting performance for future, unseen data. Table 8 specifies training and test time intervals for each station. The table also specifies which seasons are covered by the test time interval for at least one month, confirming that the models were tested across different seasonal conditions. The Mean Absolute Percentage Error (MAPE) was used to assess model performance. MAPE provides a standardized measure of predictive accuracy and is defined as follows:

graphic file with name d33e3086.gif 7

where Inline graphic is the actual value, Inline graphic is the forecast value and N is the total number of points.

Table 8.

Regression training and test time intervals for each IRON station. Train and test time intervals correspond to 80Inline graphic and 20Inline graphic of total coverage respectively. The test seasons column specifies which seasons (check marks) are covered by the test time interval for at least one month.

Station Train data interval Test data interval Test seasons
Start date End date Start date End date Winter Spring Summer Autumn
AQU 2019-06-29 2021-06-30 2021-06-30 2021-12-31 Inline graphic Inline graphic
BADI 2014-08-28 2019-06-11 2019-06-11 2020-06-01 Inline graphic Inline graphic Inline graphic Inline graphic
BAT1 2018-01-13 2020-12-08 2020-12-08 2021-06-29 Inline graphic Inline graphic
BAT2 2018-01-12 2021-01-11 2021-01-11 2021-07-06 Inline graphic Inline graphic
CDCA 2014-01-23 2018-11-22 2018-11-22 2020-06-03 Inline graphic Inline graphic Inline graphic
CMDO 2018-02-23 2021-03-14 2021-03-14 2021-12-31 Inline graphic Inline graphic Inline graphic Inline graphic
CMPL 2015-08-01 2018-09-30 2018-09-30 2019-06-25 Inline graphic Inline graphic Inline graphic
CTTR 2014-01-01 2020-06-18 2020-06-19 2021-12-02 Inline graphic Inline graphic Inline graphic Inline graphic
DUR 2019-04-16 2021-04-30 2021-04-30 2021-12-31 Inline graphic Inline graphic Inline graphic
FRME_E146 2014-01-01 2016-07-03 2016-07-03 2017-01-14 Inline graphic Inline graphic
FRME_H146 2017-08-05 2021-02-12 2021-02-12 2021-12-31 Inline graphic Inline graphic Inline graphic Inline graphic
GUAR2 2018-02-03 2021-03-29 2021-03-29 2021-12-31 Inline graphic Inline graphic Inline graphic
ISAR 2018-12-11 2021-06-04 2021-06-04 2021-12-31 Inline graphic Inline graphic
MMN 2014-01-01 2020-01-25 2020-01-25 2021-09-03 Inline graphic Inline graphic Inline graphic Inline graphic
MMNG 2014-01-01 2019-10-12 2019-10-13 2021-11-26 Inline graphic Inline graphic Inline graphic Inline graphic
MTR1 2018-07-04 2021-04-27 2021-04-27 2021-12-31 Inline graphic Inline graphic Inline graphic
NRCA 2016-08-26 2021-01-05 2021-01-05 2021-12-31 Inline graphic Inline graphic Inline graphic
SSFR 2014-01-01 2019-02-01 2019-02-01 2020-05-31 Inline graphic Inline graphic Inline graphic Inline graphic

Statistical tests

The MAPE values on the test datasets were compared using two different statistical tests: the Smirnov test (known also as two-sample Kolmogorov-Smirnov test; KS), and the Wilcoxon signed rank test.

The Smirnov test is a nonparametric statistical test33 that quantifies the difference between two Empirical Cumulative Distribution Functions (ECDFs). In this study, we applied it to compare the ECDF of the MAPE values from the meteorological model with the ECDF of the MAPE values from each dummy model. By measuring the maximum absolute difference between the two ECDFs, the test helps assess whether the error distributions of the dummy models deviate significantly from that of the meteorological model. Mathematically, given two cumulative distribution functions Inline graphic and Inline graphic, the Smirnov statistics Inline graphic is defined as:

graphic file with name d33e3799.gif 8

The null hypothesis Inline graphic states that the two distributions are identical. The test returns a p-value that determines whether to reject Inline graphic at a given significance level (0.05 in this case). Since the Smirnov test is sensitive to differences in both the location and shape of distributions, it is useful for detecting deviations in the overall error structure.

The Wilcoxon signed-rank test is a widely used nonparametric procedure34, but instead of comparing full distributions, it assesses a zero-median difference between two sampled populations with paired observations. In the two-tailed version of the test, the null hypothesis Inline graphic states that the median of the differences between the two paired samples is zero, meaning there is no significant difference in the central tendency between the two distributions. In our study, the Wilcoxon two-tailed test was used to compare the MAPE values of the meteorological model with those of each dummy-based model. If the p-value from the test is below the significance threshold (0.05), it indicates that there is a significant difference between the two models.

For this study, we also applied the one-tailed (right-sided) version of the Wilcoxon test to determine whether the dummy models outperforms the meteorological one, with the null hypothesis Inline graphic stating that the median of the dummy model errors is more than or equal to the median of the meteorological model errors. A low p-value (below 0.05) would indicate that the meteorological model produces significantly higher errors than the dummy models, whereas a higher p-value would suggest no strong evidence against equivalence.

Each of these tests contributes a different perspective: the Smirnov statistics checks for overall distribution differences, while Wilcoxon one focuses on median differences. Together, they offer a comprehensive statistical evaluation of whether the dummy models can be considered interchangeable with the meteorological model.

Acknowledgements

We sincerely thank all the individuals involved in the IRON network, whose efforts were essential for the collection of the data used in this study. We are particularly thankful to Dr. Gianfranco Galli for his valuable revisions to the parts related to the instrumentation. We are grateful to the reviewers for their insightful and constructive comments, which greatly contributed to improving the clarity, rigor, and overall quality of the manuscript. We also extend our gratitude to 3B Meteo for supplying the meteorological data that were crucial for the analysis conducted. Finally, we acknowledge the financial support from the PIANETA DINAMICO-Sibilla project, which partially funded this research.

Author contributions

The entire article, along with the detailed conclusions, resulted from a collective brainstorming effort, and the writing and revision processes were equally collaborative, as was the code implemented to perform the analysis. More specifically, A.P. conceived the core idea of the article and, regarding the coding, focused on generalizing the code for regression, while V.V. adapted it to the specific analysis and added the section related to plots and statistical testing. G.R. contributed to the review of the state of the art and the comparison of different methodological approaches, and worked on the automated procedures for assessing data coverage and continuity. G.R. and V.V. ensured the selection of stations without data gaps and prepared the tables and figures included in the paper.

Data availability

The IRON radon time series used for this study are available at https://doi.org/10.6084/m9.figshare.27292554.v1. Please refer to the corresponding author for additional requests.

Declarations

Competing interests

The authors declare that they have no competing financial or non-financial interests that could have influenced the work presented in this study.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Giulia Romoli and Veronica Vignoli have contributed equally to this work.

References

  • 1.Gillmore, G. K., Crockett, R. G. & Przylibski, T. IGCP project 571: Radon, health and natural hazards. Nat. Hazards Earth Syst. Sci.10, 2051–2054 (2010). [Google Scholar]
  • 2.Gillmore, G. K., Perrier, F. E. & Crockett, R. G. M. Radon, Health and Natural Hazards: a signpost for assessment and protection in the 21st century In: Radon, Health and Natural Hazards Geological Society of London, (2018).
  • 3.Gillmore, G. K., Crockett, R. G. M. & Phillips, P. S. Radon as a carcinogenic built-environmental pollutant In Radon, Health and Natural Hazards Geological Society of London, In Radon, (2018).
  • 4.Pulinets, S. et al. Radon variability as a result of interaction with the environment. Atmosphere15, 167 (2024). [Google Scholar]
  • 5.Marley, F. Investigation of the air pressure characteristics influencing the variability of radon gas and radon progeny in domestic vernacular buildings. Health Phys.81, 57–69 (2001). [DOI] [PubMed] [Google Scholar]
  • 6.Magalhães, M., Amaral, E., Sachett, I. & Rochedo, E. Radon-222 in brazil: an outline of indoor and outdoor measurements. J. Environ. Radioact.67, 131–143 (2003). [DOI] [PubMed] [Google Scholar]
  • 7.Desideri, D., Roselli, C., Feduzi, L. & Assunta Meli, M. Monitoring the atmospheric stability by using radon concentration measurements: A study in a central Italy site. J. Radioanal. Nucl. Chem.270, 523–530 (2006). [Google Scholar]
  • 8.De Francesco, S., Tommasone, F. P., Cuoco, E. & Tedesco, D. Indoor radon seasonal variability at different floors of buildings. Radiat. Meas.45, 928–934 (2010). [Google Scholar]
  • 9.Xie, D., Liao, M. & Kearfott, K. J. Influence of environmental factors on indoor radon concentration levels in the basement and ground floor of a building-a case study. Radiat. Meas.82, 52–58 (2015). [Google Scholar]
  • 10.Piersanti, A., Cannelli, V. & Galli, G. Long term continuous radon monitoring in a seismically active area. Ann. Geophys.58, S0437–S0437 (2015). [Google Scholar]
  • 11.Siino, M., Scudero, S., Cannelli, V., Piersanti, A. & D’Alessandro, A. Multiple seasonality in soil radon time series. Sci. Rep.9, 8610 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rowe, J. E., Kelly, M. & Price, L. E. Weather system scale variation in radon-222 concentration of indoor air. Sci. Total Environ.284, 157–166 (2002). [DOI] [PubMed] [Google Scholar]
  • 13.Groves-Kirkby, C., Denman, A., Crockett, R., Phillips, P. & Gillmore, G. Identification of tidal and climatic influences within domestic radon time-series from Northamptonshire, UK. Sci. Total Environ.367, 191–202 (2006). [DOI] [PubMed] [Google Scholar]
  • 14.Jaishi, H. P., Singh, S., Tiwari, R. P. & Tiwari, R. C. Analysis of soil radon data in earthquake precursory studies. Ann. Geophys.57, S0544–S0544 (2014). [Google Scholar]
  • 15.Laiolo, M. et al. The effects of environmental parameters on diffuse degassing at Stromboli volcano: Insights from joint monitoring of soil co2 flux and radon activity. J. Volcanol. Geotherm. Res.315, 65–78 (2016). [Google Scholar]
  • 16.Yang, J. et al. Modeling of radon exhalation from soil influenced by environmental parameters. Sci. Total Environ.656, 1304–1311 (2019). [DOI] [PubMed] [Google Scholar]
  • 17.Torkar, D., Zmazek, B., Vaupotič, J. & Kobal, I. Application of artificial neural networks in simulating radon levels in soil gas. Chem. Geol.270, 1–8 (2010). [Google Scholar]
  • 18.Maletić, D. M. et al. Comparison of multivariate classification and regression methods for the indoor radon measurements. Nucl. Technol. Radiat. Protect.29, 17–23 (2014). [Google Scholar]
  • 19.Ambrosino, F., Sabbarese, C., Roca, V., Giudicepietro, F. & Chiodini, G. Analysis of 7-years radon time series at Campi Flegrei area (Naples, Italy) using artificial neural network method. Appl. Radiat. Isot.163, 109239 (2020). [DOI] [PubMed] [Google Scholar]
  • 20.Haider, T. et al. Identification of radon anomalies induced by earthquake activity using intelligent systems. J. Geochem. Explor.222, 106709 (2021). [Google Scholar]
  • 21.Jaishi, H. P., Singh, S., Tiwari, R. P. & Tiwari, R. C. Analysis of subsurface soil radon with the environmental parameters and its relation with seismic events. J. Geol. Soc. India99, 847–858 (2023). [Google Scholar]
  • 22.Hassan, N. M. et al. Radon migration process and its influence factors; review. Jpn. J. Health Phys.44, 218–231 (2009). [Google Scholar]
  • 23.Zmazek, B., Todorovski, L., Džeroski, S., Vaupotič, J. & Kobal, I. Application of decision trees to the analysis of soil radon data for earthquake prediction. Appl. Radiat. Isot.58, 697–706 (2003). [DOI] [PubMed] [Google Scholar]
  • 24.Kitto, M. Interrelationship of indoor radon concentrations, soil-gas flux, and meteorological parameters. J. Radioanal. Nucl. Chem.264, 381–385 (2005). [Google Scholar]
  • 25.Sundal, A. V., Valen, V., Soldal, O. & Strand, T. The influence of meteorological parameters on soil radon levels in permeable glacial sediments. Sci. Total Environ.389, 418–428 (2008). [DOI] [PubMed] [Google Scholar]
  • 26.Massey, F. J. Jr. The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc.46, 68–78. 10.2307/2280095 (1951). [Google Scholar]
  • 27.Cannelli, V., Piersanti, A., Galli, G. & Melini, D. Italian radon monitoring network (iron): A permanent network for near real-time monitoring of soil radon emission in Italy. Ann. Geophys.61, 444 (2018). [Google Scholar]
  • 28.Galli, G., Cannelli, V., Nardi, A. & Piersanti, A. Implementing soil radon detectors for long term continuous monitoring. Appl. Radiat. Isot.153, 108813 (2019). [DOI] [PubMed] [Google Scholar]
  • 29.Piersanti, A. et al. Soil radon time series from the Italian radon monitoring network (iron). Sci. Data12(415), 1–7. 10.1038/s41597-025-04735-0 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Piersanti, A. et al. Soil radon and meteo timeseries from the Italian Radon mOnitoring Network (IRON) 2009–2021. Figshare Dataset10.6084/m9.figshare.27292554.v1 (2025). [Google Scholar]
  • 31.Cannelli, V. Iron-db: a database for the italian radon monitoring network. Rapporti Tecnici-INGV10.13127/rpt/371 (2017).
  • 32.Vignoli, V. & Pignatelli, A. A new RDBMS and structure for the iron database. Rapporti Tecnici-INGV10.13127/rpt/481 (2024).
  • 33.Berger, V. W. & Zhou, Y. Kolmogorov–Smirnov Test: Overview (Statistics reference online, Wiley statsref, 2014). [Google Scholar]
  • 34.Gibbons, J. & Chakraborti, S. Nonparametric Statistical Inferencing (2011).
  • 35.Draper, N.R. & Smith, H. Applied Regression Analysis 3rd ed., John Wiley and Sons, Inc., New York (1998) SBN 0-471-17082-8. See Chapter 14.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The IRON radon time series used for this study are available at https://doi.org/10.6084/m9.figshare.27292554.v1. Please refer to the corresponding author for additional requests.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES