A systematic approach to modeling monthly maximum temperature and total rainfall in Kenya

Kevin Otieno; Linda Chaba; Collins Odhiambo; Bernard Omolo

doi:10.1038/s41598-025-12810-0

. 2025 Aug 28;15:31758. doi: 10.1038/s41598-025-12810-0

A systematic approach to modeling monthly maximum temperature and total rainfall in Kenya

Kevin Otieno ^1,^✉, Linda Chaba ^1,², Collins Odhiambo ^1,³, Bernard Omolo ^1,⁴

PMCID: PMC12394451 PMID: 40877330

Abstract

Goodness of fit (GOF) test approaches for selecting probability distributions of climatic variables are pervasive in the statistical literature. However, a combined approach of multiple tests remains underutilized despite evidence supporting their improved precision. Increased erratic climatic conditions pose severe threats to economic stability, necessitating robust statistical methods for climate modeling. To address this need, this study evaluates probability distributions for climatic variables using a comprehensive approach that combines multiple tests. A scoring system ranked each distribution’s performance across tests, with a composite score indicating the best fit. To assess robustness, sensitivity analysis on the best-performing distribution examined the influence of partitioning data into different segments (block sizes). The results show a generalized extreme value (GEV) distribution consistently outperforming other temperature and rainfall data distributions across multiple metrics. Extended block sizes capture long-term climatic patterns but introduce greater uncertainty due to fewer data points, while shorter block sizes tend to overfit. Intermediate block sizes provide a balance, producing reliable parameter estimates and stable return levels. These findings underscore the importance of selecting suitable block sizes and confirm the robustness of the GEV distribution for climate modeling. The study contributes to improved methodologies for risk assessment and climate adaptation strategies, particularly in regions such as Kenya.

Keywords: Goodness of fit tests, Probability distributions, Temperature, Rainfall, Block size

Subject terms: Hydrology, Statistics, Climate and Earth system modelling

Introduction

Kenya’s increasing exposure to the effects of climate variability is a pressing issue, especially with erratic rainfall patterns and rising high-temperature patterns significantly affecting its key sectors. Agriculture, a backbone of Kenya’s economy^1,2, is particularly vulnerable, as unpredictable weather disrupts planting and harvesting cycles, reduces yields, and exacerbates food insecurity. Infrastructure, too, faces challenges, with extreme weather events such as floods and droughts causing damage to roads, bridges, and other critical systems. The cumulative effect of these climate-induced challenges undermines the country’s overall economic stability, highlighting the urgent need for robust mitigation and adaptation strategies.

The effects of climate variability are particularly evident in regions like Marsabit, where prolonged droughts and heavy rainfall lead to severe consequences. Droughts reduce water availability, hinder crop growth, and limit pastures, leading to crop failures and livestock losses, exacerbating food insecurity^3–5. In contrast, intense rainfall causes soil erosion, farmland flooding, and infrastructure damage, imposing significant financial burdens on the government for repairs and diverting resources from development projects.

These recurring events underscore the urgent need for sustainable strategies, such as climate-resilient agricultural practices, improved water management systems, and robust infrastructure design. Investments in early warning systems and community-based adaptation measures are also critical to mitigating the impacts on vulnerable populations.

A deeper understanding of climate variability, such as rainfall and temperature, can be achieved through probability distributions, which provide valuable tools to analyze climate patterns⁶. Globally, researchers have identified region- and time-dependent distributions for these variables, with models such as GEV, Gamma, log-normal, and Weibull frequently recommended for climatic data. Notable studies include those by Sharma and Singh⁷, Dzupire et al.⁸, Athulya and James⁹, Ozonur et al.¹⁰, Ximenes et al.¹¹, Hussain et al.¹², Singirankabo and Iyamuremye¹³ and Agbonaye and Izinyon¹⁴. For example, Ximenes et al.¹¹ found Gamma and Weibull to be optimal for monthly precipitation in Northeast Brazil, while Douka and Karacostas¹⁵ identified GEV and log-normal as suitable for extreme precipitation in Thessaloniki, Greece. The differences in the probability distributions between¹¹ and¹⁵ can be attributed to different geographical locations; Greece is located between Inline graphic and northeast Brazil is . Their work on these regions also employed different periods; Greece’s data comprised monthly precipitation records from 1988 to 2017, whereas the study on Northeast Brazil used hourly rainfall data from 1947 to 2003. These studies and a summary in Table 1 demonstrate the importance of selecting appropriate probability distributions for accurate climate modeling.

Table 1.

Literature results of probability density functions (PDF) fitted to rainfall data.

Author	Region	PDF assessed	Best PDF
⁷	Pantnagar, India Annual Maximum daily rainfall	Normal, LN, Gamma, Weibull, Pearson, GEV	LN, Gamma
¹⁴	South Eastern Nigeria Annual Maximum daily rainfall	EVI, GEV, GPA, LN, Pearson type III and log Pearson type III	GEV
¹⁶	Jericho, Ibadan Nigeria Daily rainfall	Exponential, Gamma,Normal and Poisson	Exponential
¹⁷	Japan Annual Maximum hourly rainfall	Normal, LN, Gumbel, G2, P3 and LP3	LP3
¹⁸	Bangladesh Annual Maximum daily rainfall	Normal, N2, N3, N4 and N5	Normal
¹⁹	Narok Town, Kenya, Annual Maximum daily rainfall	Normal, LN, Weibull, Gamma, Gumbel, Exponential and Pareto	GEV, Gumbel, Gamma
²⁰	Colombia Annual Maximum hourly rainfall	Gumbel, LP3, P3, Normal, GEV.	GEV
⁶	Qatar Annual Maximum daily rainfall	Normal, LN, Gamma, Gumbel, Log-logistic, GEV, Pearson, LP3, Beta, Weibull and General Pareto	GEV
²¹	Wilayah Persekutuan, Malaysia Annual Maximum hourly rainfall	Exponential, Gamma, Weibull and Mixed Exponential	Mixed Exponential

Open in a new tab

LN - Lognormal, GPA - Generalized Pareto distribution, EV1 - Extreme value type I distribution, G2 - Gamma 2,

GEV - Generalized Extreme Value, P3 - Pearson type 3, LP3 - Log-Pearson type 3, N2 - Mixtures of two normal

N3 - Mixtures of three normal, N4 - Mixtures of four normal, N5 - Mixtures of five normal

Extensive research has also been conducted to identify the best-fitting probability distributions for temperature data. Key studies include those by Athulya and James⁹, Dzupire et al.⁸, Hasan²², Hossain²³, Hussain et al.¹² and Ozonur et al.¹⁰. These studies have explored various distributions, including the normal, log-normal, Gamma, and Weibull distributions. For instance, Hussain et al.¹² identified the Generalized Pareto (GP), Extreme Value (EV), and GEV models as suitable for modeling temperature data. Similarly, Hasan²² employed ten continuous distributions, including the exponential, Gamma, Log-Gamma, Beta, normal, log-normal, Erlang, power function, Rayleigh, and Weibull distributions, with the Beta distribution emerging as the best fit for the temperature data.

This study aims to identify the most appropriate probability distributions for modeling monthly maximum temperatures and total monthly rainfall in Kenya. The analysis is based on a comprehensive data set covering the last 73 years, capturing the impacts of recent climatic changes. By incorporating these extensive and up-to-date data, the study ensures that the models account for evolving climate patterns. For instance, accurate descriptions of climatic data provide a better understanding of the probability distributions of maximum temperatures and total rainfall, which helps capture the frequency and intensity of climatic events, such as heat waves and heavy downpours. These models also enhance predictive capabilities by leveraging historical trends and recent shifts, improving forecasting accuracy and facilitating better preparation for future climatic scenarios. Additionally, by identifying the underlying distributions, the study supports data-driven decision-making, providing a critical foundation for risk assessment and resource allocation in agriculture, water management, and disaster response sectors.

The study makes a significant contribution to modeling climatic events through three key focus areas. First, it provides a comprehensive theoretical framework for understanding and applying statistical distributions in hydrology and climate studies. The framework offers precise definitions of commonly used distributions, facilitating their identification and application to various climatic datasets. It also includes robust parameter estimation methodologies that ensure accurate modeling of climatic variables. Furthermore, the study outlines strategies for selecting extreme values tailored to specific extreme value distributions, enabling the precise focus on significant climatic events.

Second, the research emphasizes the application of GOF tests to identify the most suitable probability distributions for climatic data. Detailed discussions on the implementation of GOF tests enhance the accuracy and reliability of the models. This methodological rigor improves the alignment of models with observed data and bolsters their credibility for practical applications in risk assessment and decision-making.

Lastly, we emphasized the significance of temporal pattern analysis through block size selection, a crucial factor in statistical modeling that directly impacts the capture of temporal patterns in climatic data. We conducted a sensitivity analysis to assess the impact of varying block sizes on the GEV distribution. This analysis combined graphical methods, GOF tests, return level estimates for various periods, and confidence intervals. By examining the effect of block size on model performance and extremal forecast, this section provides valuable insights into the stability and reliability of the GEV distribution across varying temporal resolutions.

The paper is structured as follows. “Methods” section provides a detailed description of the data, the procedure for selecting candidate probability distributions, parameter estimation methods, and the implementation of GOF tests, including the combined approach of multiple GoF tests. “Results and discussion” section presents summary statistics, results from the selection of candidate distributions, findings from the GoF tests, and insights from the sensitivity analysis. Finally, “Conclusion” section concludes the paper by summarizing the key findings and their implications for climate modeling and risk assessment.

Methods

Data

The monthly maximum temperature (Tmax) and total precipitation (Prep) data for Kenya, covering the period 1950–2022, were sourced from the World Bank Climate Change Knowledge Portal²⁴. The precipitation data (Prep), measured in millimeters, represents the total accumulation of monthly rainfall. This provides a comprehensive measure of rainfall intensity and distribution across different months. The temperature data (Tmax), recorded in degrees Celsius, captures the highest daily maximum temperature observed each month, offering valuable insights into extreme temperature events.

Selection of candidate probability distributions

A review of existing literature identified probability distributions commonly applied in hydrological studies: exponential, Gamma, Weibull, log-normal, logistic, Gumbel, GPD, and GEV, as referenced by^{7–12,14,16–18,20}. Similarly, for temperature data, these distributions, in addition to a normal distribution, were identified as suitable candidates, supported by findings from²² and other related studies. Table 2 describes each probability distribution function. These distributions were selected due to their suitability in modeling skewed, heavy-tailed, or extreme data characteristics commonly found in climatic datasets. The Cullen and Frey graph²⁵ was used to preliminarily assess the shape characteristics of the data, guiding the selection of appropriate distributions for further analysis.

Table 2.

Description of various probability distribution functions.

Distributions	Probability density functions	Parameters
Normal		: standard deviation
Normal		: mean
Lognormal		: scale parameter
Lognormal		: shape parameter
Weibull
		: shape parameter
		: scale parameter
GEV		: location parameter
		: shape parameter
		: scale parameter

Exponential		: rate parameter
Exponential
Gamma		: shape parameter
Gamma		: scale parameter
Logistic		: location parameter
Logistic		s: scale parameter
Gumbel		: location parameter
		: scale parameter
		: shape parameter
Uniform		a: Lower bound
Uniform		b: Upper bound
GPD	, ,	: location parameter
		: scale parameter
		: shape parameter

Open in a new tab

Parameter estimation

In statistical modeling, parameter estimation is essential due to the typically unknown nature of most model parameters. Commonly employed methods include the Method of Moments, L-moments, Maximum Likelihood Estimation (MLE), and LH-moments, as noted in studies by Al Mamoon and Rahman⁶ and Haddad and Rahman²⁶. In this paper, we employ the MLE method for parameter estimation across the analyzed distributions, as it is one of the most widely applied and robust methods. MLE is favored for its consistency and efficiency, particularly in large samples, as it maximizes the likelihood of the observed data and often yields more reliable results compared to other methods such as Moments, L-moments, and LH-moments, particularly in terms of asymptotic properties. Research, including foundational studies by Fisher²⁷, Zong²⁸ and Naghettini²⁹, has demonstrated that MLE’s variance and bias are comparatively low, thereby enhancing its suitability across a broad range of distributions. These qualities render MLE exceptionally reliable for environmental datasets, including temperature and rainfall measurements, where precision and robustness are critical.

Goodness of fit tests

The suitability of each probability distribution was assessed using a suite of GOF tests, including the Kolmogorov-Smirnov (KS), Anderson-Darling (AD), Cramer-von Mises (CvM), and Chi-Square tests. These tests evaluate the alignment between theoretical and empirical data, with KS tests focusing on overall distributional fit^15,30, AD and CvM emphasizing tail behavior^{15,26,31–33}, and Chi-Square examining frequency alignment¹⁹. Additional evaluation was performed using Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) to balance model complexity and fit^10,12,22,26, along with Root Mean Square Error (RMSE) to quantify predictive accuracy¹⁴.

Comprehensive scoring methodology

The literature indicates a lack of suitable GOF tests designed to effectively distinguish between empirical and theoretical distributions³⁴. Numerous studies have shown that the best-fit probability distribution can vary significantly between different regions, even for the same variable³². In response to these challenges, we adopt a comprehensive scoring methodology, as outlined in previous studies^14,17,22,35. This method employs an integrated scoring approach that incorporates multiple GOF tests, information criteria, and graphical analyses to ensure a robust selection of the optimal probability distribution model. Each distribution model is subjected to several GOF tests, with a scoring system applied whereby the best-performing model in each test receives the highest rank. To enhance the rigor of the selection process, each model’s rank is determined independently for each GOF test and then aggregated across all tests to produce a composite score. For graphical assessments, rankings are informed by visual inspection of density plots and quantile-quantile (Q–Q) plots, providing additional insight into the best-fitting model.

Results and discussion

This section provides statistical results from the analysis. The dataset used in this study assumes an independent and identically distributed (iid). We tested for stationarity using the Augmented Dickey-Fuller (ADF) test, randomness using the Wald-Wolfowitz runs test, and independence using the Ljung-Box test to verify adherence to these assumptions. All tests were performed at Inline graphic significance level. The results indicated that the data were stationary and random but exhibited autocorrelation; therefore, the data were aggregated using block analysis.

Summary statistics

Table 3 shows the descriptive statistics for the annual maximum temperature and total rainfall for Kenya.

Table 3.

Summary statistics for the monthly maximum temperature ( Inline graphic C) and total monthly rainfall (mm).

Variable	Maximum temperature	Total rainfall
Observations	876	876
Average	26.23	63.97
Standard deviation (SD)	1.27	42.72
Minimum	23.16	2.46
q25	25.32	35.90
Median	26.23	50.90
q75	27.15	81.88
Maximum	29.97	280.32
Skewness	0.12	1.46
Kurtosis	2.43	5.43

Open in a new tab

The maximum temperature (Tmax) for 876 observations has an average of Inline graphic with low variability (standard deviation = 1.27) and a range from to . The interquartile range to highlights a concentration around the median , with a near-symmetrical distribution (skewness = 0.12) and a relatively flat shape (kurtosis = 2.43). The findings resonate with previous studies in^1,2, which indicate that while temperature variability at the national level tends to be low due to data aggregation, an increase in temperature has been observed in most regions across the country.

In contrast, Total rainfall (Prep) exhibits much higher variability, with a mean of 63.97 and a standard deviation of 42.72, ranging from 2.46 to 280.32. This wide range reflects the variability and extreme nature of rainfall. Quartiles (q25 = 35.90, q75 = 81.88) and a median of 50.90 indicate a right-skewed distribution (skewness = 1.46), while positive kurtosis (5.43) points to heavy tails, signifying extreme events. The findings also align with the evidence^1,2.

Choice of candidate distributions

For the temperature data in Fig. 1a, the Cullen and Frey graph shows that the distribution approximates the normal region with a slight platykurtic shape, identifying the normal, uniform, log-normal, Gamma, Weibull, and logistic distributions as potential candidates. Studies, such as¹², have shown that extreme value distributions are suitable for modeling temperature data; therefore, these distributions were also considered potential candidates. In the rainfall data in Fig. 1b, the distribution exhibits positive skewness and high kurtosis, suggesting alignment with distributions such as log-normal, Gamma, Weibull, and exponential. Given the presence of extreme values, models that account for extreme behavior, specifically the GPD and GEV distributions, were also included in the analysis.

Inline graphic — Cullen and Frey plots for Assessing best-fit distribution of (a) Maximum temperature (C) and (b) Total rainfall (mm).

Model fitting was conducted using MLE for parameter estimation. For extreme value distributions, the Block Maxima (BM) and Peak Over Threshold (POT) approaches were used to determine the number of block maxima and thresholds required to fit GEV and GPD distributions, respectively. The BM approach is widely used in extreme value analysis to capture maximum events within defined time intervals, such as annual maxima, and it is commonly applied for environmental and climate data^30,36,37. For the POT method, which is well-suited to modeling excesses over a specified threshold, the Mean Residual Life (MRL) plot was generated as shown in Fig. 2, and visual inspection was used to determine an appropriate threshold for each variable^13,37. The blue curve in Fig. 2 represents the observed mean excess values { Inline graphic } , the red lines denote the upper and lower confidence intervals and threshold defines the limit for identifying extreme events ³⁸. In Fig. 2a, a threshold in the range of 50 to 150 is suitable, as it provides a stable mean excess with narrower confidence intervals. This indicates that values above this threshold exhibit behavior suitable for modeling with a GPD. For temperature, the MRL plot in Fig. 2b did not suggest a proper threshold, hence the initial guess of a threshold around Inline graphic , where the confidence intervals remain relatively narrow, indicating reliable estimates. However, after approximately 28, the confidence intervals begin to widen slightly, indicating increased uncertainty in the mean excess values at higher thresholds. The GPD parameters were estimated based on observations exceeding this threshold.

Fig. 2 — Mean residual life (MRL) plots for evaluating threshold selection in (a) Total rainfall (mm) and (b) Maximum temperature (C).

Graphical assessments and GOF tests results

Graphical assessments

Density and Q–Q plots were generated to compare the observed data with several fitted theoretical distributions. For temperature data, the density plot in Fig. 3 shows that the GEV, Gamma, and log-normal distributions provide the best fit, capturing both the central peak and tail behavior. The normal, Weibull, and logistic distributions also perform reasonably well but exhibit slight deviations in the tails. In contrast, the uniform distribution shows significant discrepancies, particularly in the extremes, suggesting its unsuitability for modeling extreme temperature events. The Q–Q plots in Fig. 4 reveal that most distributions demonstrate deviations in the tails, with the GEV and normal distributions showing the closest adherence to the theoretical quantiles. Among the fitted distributions, the GEV, normal, log-normal, and Gamma distributions provide the best fit in that order, followed by the logistic and Weibull distributions, which exhibit moderate deviations. In contrast, the GPD and uniform distributions exhibit a substantial lack of fit, particularly at the lower and upper tails. This visual approach to identifying the best-fitting distribution is inherently subjective and, therefore, cannot be relied upon solely. To enhance robustness, these results were complemented with findings from other GOF tests to improve the reliability of distribution selection.

Fig. 3 — Density plots of observed and simulated maximum temperature (C) data to assess the performance of probability distributions.

Fig. 4 — Quantile–Quantile (Q–Q) plots for comparing the fit of eight probability distributions to maximum temperature (C).

Similarly, for the rainfall data in Fig. 5, the GEV, Gamma, and log-normal distributions show the closest alignment with the actual observed data, effectively capturing the shape and spread of the distribution. The Weibull distribution provides a moderate fit, performing well in the central range but diverging in the tails. In contrast, the exponential and GPD distributions exhibit substantial deviations, failing to represent the empirical distribution, especially at the extremes accurately. The Q–Q plots in Fig. 6 reinforce these findings, with the GEV and Gamma distributions displaying the best adherence to the theoretical quantile line, followed by the log-normal and Weibull distributions. Exponential and GPD exhibit the weakest performance. These results are consistent with previous studies, such as²¹, which identified the GEV distribution as the most appropriate model for extreme rainfall events.

Fig. 6 — Q–Q plots to compare the fit of six probability distributions for total rainfall (mm) data.

GOF tests

The GOF analysis in Table 4 (a) identifies the GEV distribution as the most suitable model for the maximum temperature data. The GEV distribution achieves the lowest statistics for the KS (0.0297), AD (0.8890), and CvM (0.1335) statistics, accompanied by high p-values (0.4206, 0.4211, and 0.4442), indicating a strong alignment with the observed data. It also produces the lowest Chi-square statistic (3.5969, p = 0.9637) and achieves superior performance in terms of AIC (2,898.30), BIC (2,912.63) and RMSE (1.5694), highlighting its precision and efficiency. Other distributions, such as the normal, log-normal, and Gamma, provide moderate fits, with non-significant GOF statistics but higher AIC and BIC values, along with RMSE values that reflect less accuracy compared to the GEV. Conversely, the Weibull, Uniform, Logistic, and GPD distributions exhibit poor performance, with high test statistics, low p-values, and significant deviations from the observed data. The Uniform and GPD distributions show extreme misalignment, as evidenced by infinite AD statistics, high Chi-square values, and elevated RMSE scores, confirming their unsuitability for modeling maximum temperature data.

Table 4.

Goodness of fit test results for temperature and rainfall distributions.

Distribution	KS tests		AD tests		CvM tests		Chi square test		Information criterion
Distribution	Statistic	p value	Statistic	p value	Statistic	p value	Statistic	p value	AIC	BIC	RMSE
(a) Temperature distributions
Normal	0.0401	0.1202	1.4212	0.1965	0.2031	0.2617	16.3101	0.0911	2910.44	2919.99	1.3321
LogNormal	0.0342	0.2559	1.2911	0.2353	0.1898	0.2882	16.5991	0.0837	2907.32	2916.87	1.3335
Gamma	0.0362	0.2006	1.2962	0.2336	0.1883	0.2913	16.2235	0.0934	2907.81	2917.36	1.3324
Weibull	0.0626	0.0021	7.7696	0.0001	1.0687	0.0017	51.381	0.0000	3012.31	3021.86	1.4798
Uniform	0.0457	0.0517	Inf	0.0000	12.1632	0.0000	389.4772	0.0000	2954.14	2963.69	1.819
Logistic	0.1993	0.0000	2.745	0.037	0.3695	0.0871	36.0075	0.0001	3365.02	3374.57	1.3876
GEV	0.0297	0.4206	0.889	0.4211	0.1335	0.4442	3.5969	0.9637	2898.30	2912.63	1.5694
GPD	1.000	0.000	Inf	0.000	237.66	0.000	Inf	0.0000	1965.84	1974.98	25.0933
(b) Rainfall distributions
LogNormal	0.0491	0.0292	2.2209	0.0697	0.3347	0.1083	12.34	0.2628	8731.72	8741.27	69.72
Exponential	0.2020	0.0000	58.9464	0.0000	10.6643	0.0000	198.86	0.0001	9039.54	9044.31	81.22
Gamma	0.0594	0.0041	3.4029	0.0172	0.6878	0.0136	36.40	0.0000	8717.89	8727.44	62.96
Weibull	0.0765	0.0001	7.6860	0.0002	1.3937	0.0003	70.78	0.0000	8762.53	8772.08	63.38
GEV	0.0315	0.3487	1.1799	0.2753	0.1891	0.2897	17.13	0.0716	8713.87	8728.20	58.86
GPD	0.6694	0.0000	341.7810	0.0000	68.2021	0.0000	273.02	0.0000	4286.20	4294.42	77.84

Open in a new tab

For the rainfall data in Table 4 (b), the GEV distribution also emerges as the most robust model, as reflected in the highest p-values for the tests KS (0.3487), AD (0.2753), and CvM (0.2897), indicating minimal deviation from observed data. Furthermore, the GEV achieves among the lowest AIC (8713.87) and BIC (8728.19) values, highlighting its parsimony and suitability for modeling rainfall patterns. Its superior predictive accuracy is evident from the lowest RMSE value (58.86), reinforcing its reliability. Concerning chi-square tests, the log-normal distribution was found to have the lowest chi-square value, indicating a better fit. Yuan et al.¹⁷ also had a similar finding when they used Chi-square tests to evaluate the best fit for the frequency analysis of the annual maximum hourly precipitation. In contrast, the GPD and exponential distributions perform poorly, with significant p-values, high Chi-square statistics, and elevated RMSE values, indicating substantial deviation and limited applicability for modeling rainfall data.

A comprehensive scoring method was used to further evaluate the best-fitting distributions, with findings presented in Table 5. Analysis for temperature distributions in Table 5 (a) revealed that the GEV consistently outperformed others as observed in³⁹, achieving the highest overall rank with a total score of 17. This was supported by its superior performance in key tests, including KS, AD, and CVM tests. The Gamma and log-normal distributions ranked second and third, respectively, demonstrating moderate fits across multiple metrics. However, distributions like Weibull, Uniform, Logistic, and GPD performed poorly, accumulating higher total scores and displaying suboptimal results in density plots and QQ plots.

Table 5.

Goodness of fit rankings for temperature and rainfall distributions.

Distribution	KS test	AD test	CVM test	CHI test	AIC	BIC	RMSE	Density plot	QQ plot	Total score	Overall rank
(a) Temperature distributions
Normal	4	4	4	3	5	5	1	4	2	32	4
Log-normal	2	2	3	4	3	3	3	3	3	26	3
Gamma	3	3	2	2	4	4	2	1	4	25	2
Weibull	6	6	6	6	7	7	5	5	6	54	5
Uniform	5	8	7	7	6	6	7	6	8	60	8
Logistic	7	5	5	5	8	8	4	8	5	55	7
GEV	1	1	1	1	2	2	6	2	1	17	1
GPD	8	7	8	8	1	1	8	7	7	55	6
(b) Rainfall distributions
Log-normal	2	2	2	1	4	4	4	2	3	24	3
Exponential	5	5	5	5	6	6	6	5	5	48	6
Gamma	3	3	3	3	3	2	2	1	2	22	2
Weibull	4	4	4	4	5	5	3	3	4	36	4
GEV	1	1	1	2	2	3	1	4	1	16	1
GPD	6	6	6	6	1	1	5	6	6	43	5

Open in a new tab

For rainfall distributions, the ranking analysis in Table 5 (b) also confirms that the GEV distribution again emerged as the top performer, ranking first with a total score of 16. These findings are supported by Agbonaye and Izinyon¹⁴, Al Mamoon and Rahman⁶, Alam et al.¹⁸, Coronado-Hernández et al.³⁶, Fadhilah et al.²¹, Ghosh et al.⁴⁰, Ng et al.³⁵ and Yuan et al.¹⁷. Its strength was evident across most GOF tests, where it outperformed or closely matched the best-performing distributions in each category. The Gamma distribution ranked second, showcasing a strong overall fit with balanced performance across metrics. Log-normal followed in third place, excelling in certain tests but lagging in others, such as AIC and BIC. In contrast, the exponential and Weibull distributions demonstrated weaker fits, while the GPD distribution consistently ranked lowest.

Sensitivity analysis

To evaluate the robustness of the GEV distribution’s fit to rainfall data, a sensitivity analysis was performed using various block sizes designed to capture diverse temporal patterns and extremes. Block size refers to a series of independent groups of observations of a particular length³⁸. According to Coles and Coles³⁸, block sizes are often selected to capture a specific period. In this work, the block sizes included annual, seasonal, monthly, 5-year, 10-year, 12-month moving averages, 6-month intervals, and 4-month intervals. Annual blocks, where maximum values were extracted per year, followed the methodologies outlined in^38,41. Seasonal blocks were based on quarterly aggregations, as indicated by⁴² and⁴¹. Monthly blocks were used to capture monthly maxima, as discussed in⁴³ and⁴². For longer-term patterns, multi-year blocks of 5-year and 10-year intervals were established, consistent with approaches adopted in studies such as⁴⁴. A 12-month moving average window assessed rolling maxima, highlighting shifts in trends. Event-based blocks focused on the most extreme events by isolating total rainfall above the 95th percentile following the techniques used in⁴⁵. For intermediate seasonality, semi-annual blocks were divided each year into January–June and July–December intervals, consistent with approaches used by^42,43,46. Furthermore, a regional seasonal classification for Kenya was used to account for local climatic variations, with blocks corresponding to the “Hot and Dry”, “Long Rainy”, “Cool”, and “Short Rainy” seasons, building on the framework proposed by⁴⁷. For each block length, maximum values were extracted and the GEV parameters were estimated and presented in Table 6.

Table 6.

ML estimates and significance of location, scale, and shape parameter for temperature and rainfall distribution.

Model	Parameter	Maximum temperature			Total rainfall
Model	Parameter	Estimate	Std Error	p value	Estimate	Std Error	p value
Annual	Location	27.818	0.1015		128.1784	4.8262
Annual	Scale	0.805	0.0690		35.9544	3.5314
Annual	Shape	− 0.325	0.0537		− 0.0563	0.0974	0.5633
Quarterly	Location	26.437	0.0739		68.5266	2.4114
Quarterly	Scale	1.129	0.0534		34.8286	1.8871
Quarterly	Shape	− 0.250	0.0424		0.1231	0.0593
Monthly	Location	25.764	0.0462		42.9879	1.0338
Monthly	Scale	1.236	0.0330		26.7123	0.8179
Monthly	Shape	− 0.248	0.0221		0.1837	0.0295
5-Year	Location	28.404	0.1846		182.8691	10.2237
5-Year	Scale	0.622	0.1298		33.6435	7.3215
5-Year	Shape	− 0.255	0.1806	0.1580	− 0.1164	0.2184	0.5942
10-Year	Location	28.649	0.2487		217.9374	14.7674
10-Year	Scale	0.538	0.1914		34.3022	11.3699
10-Year	Shape	− 0.143	0.4396	0.7458	− 0.4127	0.3434	0.2295
12-Month moving average	Location	25.780	0.0460		42.8250	1.0363
12-Month moving average	Scale	1.220	0.0330		26.6061	0.8195
12-Month moving average	Shape	− 0.242	0.0230		0.1827	0.0297
Event-based	Location	28.641	0.0382		170.5155	2.5342
Event-based	Scale	0.204	0.0316		13.9905	2.2711
Event-based	Shape	0.245	0.1757	0.1627	0.3974	0.1753
Semi-annual	Location	26.976	0.0927		100.9257	3.5479
Semi-annual	Scale	1.009	0.0669		36.9574	2.6486
Semi-annual	Shape	− 0.267	0.0564		0.0049	0.0749	0.9481
Seasons	Location	26.487	0.0825		62.9546	2.1866
Seasons	Scale	1.308	0.0595		32.2959	1.7431
Seasons	Shape	− 0.350	0.0293		0.1850	0.0543

Open in a new tab

Significant values are in bold.

For both rainfall and temperature data, parameter estimates reveal notable differences between block sizes, particularly in the shape parameter, which defines tail behavior. For rainfall, annual, 5-year, and 10-year blocks exhibited non-significant negative shape parameters Inline graphic , indicating a Weibull class of distribution as reported in³⁰ and uncertainty in tail estimates for these broader temporal aggregations. In contrast, mid-range blocks, such as monthly, quarterly, event-based, and seasonal, yielded significant positive shape parameters, reflecting the heavy-tailed Frechet class of distributions with well-defined extremal patterns. This is in agreement with Moccia et al.³³ although the findings of Onwuegbuche et al.⁴⁸ and Singirankabo et al.³⁷ revealed that Gumbel is the optimal distribution. The location and scale parameters were consistently significant Inline graphic across all block sizes, indicating reliable estimation of central tendency and variability. The event-based block for rainfall, with a high shape estimate (0.3974), suggested a heavier tail and a higher propensity for extreme rainfall events compared to other blocks. For temperature data, location and scale parameters were also consistently significant across all blocks, confirming stable estimates of central tendency and variability. However, the shape parameter was not significant for the 5-year, 10-year, and event-based models, indicating uncertainty in tail estimates, which is likely due to the limited number of data points or the irregular occurrence of extreme events. In contrast, the quarterly, monthly, and seasonal models produced significant shape parameters, suggesting that they provide more robust and reliable tail estimates for predicting rare and extreme values in both temperature and rainfall.

The model diagnostic tests in Table 7 reveal that the 10-year and 5-year blocks provide the best fit for both rainfall and temperature data, achieving the lowest AIC and BIC values (e.g., AIC = 74.406 and 146.985 for rainfall), indicating strong model parsimony and minimal information loss. These longer blocks effectively capture long-term extreme trends but rely on fewer data points (n = 7 and 14), which increase uncertainty in parameter estimates due to increased variances, as demonstrated by⁴⁶. This finding aligns with studies by^38,41, which emphasize the effectiveness of larger blocks in capturing long-term climatic trends by averaging out short-term fluctuations, thereby focusing on extreme patterns. Event-based and annual blocks also perform well for rainfall, with low AIC and BIC values, reflecting their stability in representing extreme events with adequate data, as supported by⁴². In contrast, higher-frequency blocks, such as monthly and 12-month moving average models, exhibit much higher AIC and BIC values for both rainfall and temperature, suggesting potential overfitting and inefficiency in capturing extreme patterns, a limitation also noted by⁴³. Mid-range blocks, including quarterly, semi-annual, and seasonal, achieve moderate AIC and BIC values for both datasets, offering a balanced approach that captures seasonal variability while maintaining sufficient stability for reliable parameter estimation. This perspective is supported by studies such as^15,42,46, which highlight the value of intermediate temporal scales in balancing the trade-offs between long-term trend analysis and sufficient data representation.

Table 7.

Model performance metrics for maximum temperature ( Inline graphic C) and total rainfall (mm) across different blocks.

Models	Blocks	Maximum temperature				Total rainfall
Models	Blocks	LogLikelihood	AIC	BIC	n	LogLikelihood	AIC	BIC	n
1	Annual	− 84.807	175.613	182.485	73	− 374.640	755.281	762.152	73
2	Quarterly	− 456.207	918.413	929.444	292	− 1519.203	3044.407	3055.437	292
3	Monthly	− 1446.151	2898.303	2912.629	876	− 4353.937	8713.874	8728.200	876
4	5-Year	− 13.415	32.830	34.747	14	− 70.493	146.985	148.902	14
5	10-Year	− 6.223	18.446	18.284	7	− 34.203	74.406	74.244	7
6	12-Month moving average	− 1420.455	2846.911	2861.199	865	− 4295.304	8596.607	8610.895	865
7	Event-based	− 5.382	16.764	21.977	42	− 195.366	396.732	402.085	44
8	Semi-annual	− 210.050	426.100	435.051	146	− 758.480	1522.961	1531.911	146
9	Seasons	− 480.508	967.016	978.046	292	− 1507.363	3020.726	3031.756	292

Open in a new tab

In addition, we computed the return levels for different return periods to determine how various models estimate the extremes. The return level represents the magnitude of an event expected to be equaled or exceeded, on average, once within a specified return period^38,48. The findings in Fig. 7 for temperature and rainfall data reveal distinct patterns across models when estimating extremes at various return periods. For temperature in Fig. 7a , the 10-year and 5-year models consistently produce the highest return levels, maintaining stability across increasing return periods as observed in⁴⁸, indicating their robustness in estimating extreme values over longer intervals. In contrast, models with finer resolutions, such as monthly and 12-month moving averages, yield lower return levels with modest increases over time, suggesting a limited capacity to capture rare extremes. The quarterly and semi-annual models show moderate return levels, providing a balanced estimation that captures both seasonal variability and long-term trends. For rainfall in Fig. 7b, a similar pattern emerges, with the 10-year, 5-year, and seasonal models achieving the highest and most stable return levels, while finer models like monthly and 12-month moving averages display lower return levels and less pronounced growth across return periods. The event-based model exhibits high initial return levels but shows a plateau at more extended periods, indicating potential limitations in capturing prolonged extremes. Overall, the 10-year, 5-year, and seasonal models appear to be the most consistent for temperature and rainfall extremes.

Fig. 7 — Return level plots for different block sizes for (a) Maximum temperature (C) and (b) Total rainfall (mm).

Finally, we used a density plot to check how each model captures the distribution of maximum temperatures and total rainfall. In the temperature plot in Fig. 8a , the 10-year, 5-year, and event-based models displayed the most concentrated curves, suggesting a narrower range with more pronounced extremes. Models with higher temporal resolutions, like monthly and 12-month moving averages, exhibit wider density curves, indicating a broader distribution that captures more frequent fluctuations but is less focused on extremes. The quarterly and semi-annual models fall between these extremes, striking a balance between stability and variability. For rainfall data in Fig. 8b, a similar pattern emerges: the 10-year and 5-year models show steeper, more concentrated curves, indicating that they effectively capture rare, high-magnitude events. In contrast, finer-resolution models, such as monthly and 12-month moving averages, have flatter curves, capturing a wider range of data with less emphasis on extremes.

Fig. 8 — Density plots for different block sizes for (a) Maximum temperature (C) and (b) Total rainfall (mm).

Conclusion

In this study, we have assessed various probability distributions for modeling maximum temperature and total rainfall data using a systematic and comprehensive approach that combines several GOF tests and graphical tools. In addition, we have identified the optimal block size for the GEV distribution using return levels across different periods, as well as log-likelihood, AIC, and BIC. Insights from GOF tests highlighted that the GEV, Gamma, and log-normal distributions were well-suited for both maximum temperature and total rainfall datasets, as they consistently aligned with empirical data. On the other hand, distributions such as uniform, Weibull, and logistic showed a poor fit across multiple metrics, underscoring their limitations in capturing the complexities of climatic variables. The GEV distribution emerged as the optimal model for rainfall and temperature data, consistently outperforming others in key metrics such as the AIC, BIC, and RMSE. It also demonstrated superior performance in GOF tests, including the KS, AD, and CVM tests. This strong performance affirms the robustness of the GEV distribution in modeling climatic extremes and its capacity to provide reliable insights into long-term trends.

Block size analysis revealed the effectiveness of longer temporal aggregations, such as 10-year and 5-year blocks, which produced stable and high return levels across return periods, effectively capturing long-term extreme trends. However, these longer blocks increased uncertainty in parameter estimates due to fewer data points. In contrast, intermediate blocks, such as quarterly and seasonal, struck a balance by capturing seasonal variations while maintaining stability and reliable parameter estimates with moderate AIC and BIC values. High-frequency blocks, such as monthly and 12-month moving averages, although rich in data, exhibited higher AIC and BIC values, suggesting potential overfitting and inefficiency in representing extreme values.

The results of this study are important for Kenya and the East African region, as the adopted methodology can be applied. The comprehensive GOF tests also enhance forecasting temperature and rainfall data, which is crucial for risk assessment and the development of climate adaptation strategies. With this knowledge, predictions and preparations for catastrophic events, such as floods, droughts, or rising temperatures, can be enhanced. With better forecasts, policymakers and the government can improve infrastructure for water catchment systems and enhance agricultural activities through proper planning and disaster preparedness.

However, a key limitation of this study is its focus on individual probability distributions for temperature and rainfall without explicitly addressing the interdependence between these variables. Since temperature and rainfall are inherently related, accurate risk assessments and effective climate adaptation strategies require consideration of their associations. Extensive research has been conducted on the dependence between temperature and rainfall; therefore, future studies should prioritize exploring dependence structures within a multivariate framework using the fitted probability distributions identified in this study. Advanced approaches such as copula models or joint distribution analyses could provide deeper insights into the interactions between these variables, particularly under extreme climatic conditions. Such efforts would significantly enhance the reliability of climate models and their applicability to integrated risk assessment frameworks.

To build on this work, future research should focus on applying this methodology at finer spatial scales using real datasets from various regions in Kenya. Conducting probability distribution analyses at regional levels, incorporating block size analysis, and integrating data from multiple weather stations could yield region-specific insights into seasonal rainfall patterns, further informing targeted climate adaptation strategies. From a policy perspective, the results underscore the need for data-driven strategies that take into account both individual and joint variability of climatic variables. Policymakers should leverage these insights to design robust adaptation measures, such as enhancing agricultural planning, improving water resource management, and enhancing infrastructure resilience tailored to Kenya’s specific climate challenges.

Acknowledgements

The authors acknowledge with gratitude the support from Strathmore Institute of Mathematical Sciences, Strathmore University and the DAAD [ST32 - PKZ: 91789473] in the production of this manuscript.

Author contributions

K.O., B.O. and L.C. conceived the project. K.O. performed the analysis and drafted the manuscript with substantial contributions from B.O., L.C., and C.O. All authors have read and approved the final version of the manuscript.

Data availability

The data that support the findings of this study are accessible to registered users (free registration) on the World Bank, Climate Change Knowledge Portal (https://climateknowledgeportal.worldbank.org/).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.GOK. Kenya Climate Smart Agriculture Strategy, 2017–2026 (Ministry of Agriculture, Livestock and Fisheries, 2017).
2.Jalango, D. et al. Climate smart agriculture investment plan for kenya. In Accelerating Impacts of CGIAR Climate Research for Africa (AICCRA) (2022).
3.Nyika, J. M. Climate change situation in Kenya and measures towards adaptive management in the water sector. In Research Anthology on Environmental and Societal Impacts of Climate Change, 1857–1872 (IGI Global, 2022).
4.Ngure, M. W., Wandiga, S. O., Olago, D. O. & Oriaso, S. O. Climate change stressors affecting household food security among Kimandi–Wanyaga smallholder farmers in Murang’a County, Kenya. Open Agric.6, 587–608 (2021). [Google Scholar]
5.Mkonda, M. Y. & He, X. Are rainfall and temperature really changing? Farmer’s perceptions, meteorological data, and policy implications in the tanzanian semi-arid zone. Sustainability9, 1412 (2017). [Google Scholar]
6.Al Mamoon, A. & Rahman, A. Selection of the best fit probability distribution in rainfall frequency analysis for Qatar. Nat. Hazards86, 281–296 (2017). [Google Scholar]
7.Sharma, M. A. & Singh, J. B. Use of probability distribution in rainfall analysis. N. Y. Sci. J.3, 40–49 (2010). [Google Scholar]
8.Dzupire, N. C., Ngare, P. & Odongo, L. A copula based bi-variate model for temperature and rainfall processes. Sci. Afr.8, e00365 (2020). [Google Scholar]
9.Athulya, P. & James, K. Best fit probability distributions for monthly radiosonde weather data. Int. J. Adv. Manag. Technol. Eng. Sci.7, 24–31 (2017). [Google Scholar]
10.Ozonur, D., Pobocikova, I. & de Souza, A. Statistical analysis of monthly rainfall in central west Brazil using probability distributions. Model. Earth Syst. Environ.7, 1979–1989 (2021). [Google Scholar]
11.Ximenes, P. S. M. P., Silva, A. S. A., Ashkar, F. & Stosic, T. Best-fit probability distribution models for monthly rainfall of northeastern brazil. Water Sci. Technol.84, 1541–1556 (2021). [DOI] [PubMed] [Google Scholar]
12.Hussain, B. et al. Interdependence between temperature and precipitation: Modeling using copula method toward climate protection. Model. Earth Syst. Environ.8, 2753–2766 (2022). [Google Scholar]
13.Singirankabo, E. & Iyamuremye, E. Modelling extreme rainfall events in Kigali city using generalized pareto distribution. Meteorol. Appl.29, e2076 (2022). [Google Scholar]
14.Agbonaye, A. & Izinyon, O. Best-fit probability distribution model for rainfall frequency analysis of three cities in south eastern Nigeria. Niger. J. Environ. Sci. Technol. (NIJEST)1, 34–42 (2017). [Google Scholar]
15.Douka, M. & Karacostas, T. Statistical analyses of extreme rainfall events in Thessaloniki, Greece. Atmos. Res.208, 60–77 (2018). [Google Scholar]
16.Oseni, B. A. & Ayoola, F. J. Fitting the statistical distribution for daily rainfall in Ibadan, based on chi-square and Kolmogorov–Smirnov goodness-of-fit tests. West Afr. J. Ind. Acad. Res.7, 93–100 (2013). [Google Scholar]
17.Yuan, J., Emura, K., Farnham, C. & Alam, M. A. Frequency analysis of annual maximum hourly precipitation and determination of best fit probability distribution for regions in Japan. Urban Clim.24, 276–286 (2018). [Google Scholar]
18.Alam, M. A., Farnham, C. & Emura, K. Best-fit probability models for maximum monthly rainfall in Bangladesh using Gaussian mixture distributions. Geosciences8, 138 (2018). [Google Scholar]
19.Houessou-Dossou, E. A. Y., Mwangi Gathenya, J., Njuguna, M. & Abiero Gariy, Z. Flood frequency analysis using participatory GIS and rainfall data for two stations in Narok town, Kenya. Hydrology6, 90 (2019). [Google Scholar]
20.Coronado-Hernández, Ó. E., Merlano-Sabalza, E., Díaz-Vergara, Z. & Coronado-Hernández, J. R. Selection of hydrological probability distributions for extreme rainfall events in the regions of Colombia. Water12, 1397 (2020). [Google Scholar]
21.Fadhilah, Y. et al. Fitting the best-fit distribution for the hourly rainfall amount in the Wilayah Persekutuan. Jurnal Teknologi46, 49–58 (2007). [Google Scholar]
22.Hasan, R. H. R. Estimating the best-fitted probability distribution for monthly maximum temperature at the Sylhet station in Bangladesh. J. Math. Stat. Stud.2, 60–67 (2021). [Google Scholar]
23.Hossain, M. Fitting the probability distribution of monthly maximum temperature of some selected stations from the northern part of Bangladesh. Int. J. Ecol. Econ. Stat.39, 80–91 (2018). [Google Scholar]
24.WorldBank. Climate change knowledge portal (2024). Accessed 16 Sept 2023.
25.CullenFrey, A. Probabilistic techniques in exposure assessment (1999).
26.Haddad, K. & Rahman, A. Selection of the best fit flood frequency distribution and parameter estimation procedure: A case study for Tasmania in Australia. Stoch. Environ. Res. Risk Assess.25, 415–428 (2011). [Google Scholar]
27.Fisher, R. A. On the mathematical foundations of theoretical statistics. In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, vol. 222, 309–368 (1922).
28.Zong, Z. Information-Theoretic Methods for Estimating of Complicated Probability Distributions Vol. 207 (Elsevier, 2006). [Google Scholar]
29.Naghettini, M. Fundamentals of Statistical Hydrology (Springer, 2017). [Google Scholar]
30.Chikobvu, D. & Chifurira, R. Modelling of extreme minimum rainfall using generalised extreme value distribution for Zimbabwe. S. Afr. J. Sci.111, 01–08 (2015). [Google Scholar]
31.Sukrutha, A., Dyuthi, S. R. & Desai, S. Probability distribution for monthly precipitation data in India. arXiv preprintarXiv:1708.03144 (2017).
32.Lima, A. O. et al. Extreme rainfall events over Rio de Janeiro state, brazil: Characterization using probability distribution functions and clustering analysis. Atmos. Res.247, 105221 (2021). [Google Scholar]
33.Moccia, B., Mineo, C., Ridolfi, E., Russo, F. & Napolitano, F. Probability distributions of daily rainfall extremes in Lazio and Sicily, Italy, and design rainfall inferences. J. Hydrol. Reg. Stud.33, 100771 (2021). [Google Scholar]
34.Razali, N. M. et al. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Model. Anal.2, 21–33 (2011). [Google Scholar]
35.Ng, J. et al. Investigation of the best fit probability distribution for annual maximum rainfall in Kelantan river basin. In IOP Conference Series: Earth and Environmental Science, vol. 476, 012118 (IOP Publishing, 2020).
36.Coronado-Hernández, Ó. E., Merlano-Sabalza, E., Díaz-Vergara, Z. & Coronado-Hernández, J. R. Selection of hydrological probability distributions for extreme rainfall events in the regions of Colombia. Water12, 1397 (2020). [Google Scholar]
37.Singirankabo, E., Iyamuremye, E., Habineza, A. & Nelson, Y. Statistical modelling of maximum temperature in Rwanda using extreme value analysis. Open J. Math. Sci.7, 180–195 (2023). [Google Scholar]
38.Coles, S. & Coles, S. Basics of statistical modeling. In An Introduction to Statistical Modeling of Extreme Values 18–44 (2001).
39.Ng, J. et al. Statistical modelling of extreme temperature in peninsular Malaysia. In IOP Conference Series: Earth and Environmental Science, vol. 1022, 012072 (IOP Publishing, 2022).
40.Ghosh, S., Roy, M. K. & Biswas, S. C. Determination of the best fit probability distribution for monthly rainfall data in Bangladesh. Am. J. Math. Stat.6, 170–174 (2016). [Google Scholar]
41.Villarini, G., Smith, J. A., Serinaldi, F. & Ntelekos, A. A. Analyses of seasonal and annual maximum daily discharge records for central Europe. J. Hydrol.399, 299–312 (2011). [Google Scholar]
42.Hasan, H., Radi, N. A. & Kassim, S. Modeling of extreme temperature using generalized extreme value (GEV) distribution: A case study of Penang. Proc. World Congr. Eng.1, 181–186 (2012). [Google Scholar]
43.Ender, M. & Ma, T. Extreme value modeling of precipitation in case studies for China. Int. J. Sci. Innov. Math. Res. (IJSIMR)2, 23–36 (2014). [Google Scholar]
44.Fowler, H. & Kilsby, C. A regional frequency analysis of united kingdom extreme rainfall from 1961 to 2000. Int. J. Climatol. J. R. Meteorol. Soc.23, 1313–1334 (2003). [Google Scholar]
45.Gilleland, E., Ribatet, M. & Stephenson, A. G. A software review for extreme value analysis. Extremes16, 103–119 (2013). [Google Scholar]
46.Özari, Ç., Eren, Ö. & Saygin, H. A new methodology for the block maxima approach in selecting the optimal block size. Tehnički vjesnik26, 1292–1296 (2019). [Google Scholar]
47.Musyoka, M. M. Spatial–Temporal Characteristics of Rainfall Events in Kenya. Ph.D. thesis, University of Nairobi (2020).
48.Onwuegbuche, F. C. et al. Application of extreme value theory in predicting climate change induced extreme rainfall in Kenya. Int. J. Stat. Probab.8, 85–94 (2019). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.GOK. Kenya Climate Smart Agriculture Strategy, 2017–2026 (Ministry of Agriculture, Livestock and Fisheries, 2017).

[CR2] 2.Jalango, D. et al. Climate smart agriculture investment plan for kenya. In Accelerating Impacts of CGIAR Climate Research for Africa (AICCRA) (2022).

[CR3] 3.Nyika, J. M. Climate change situation in Kenya and measures towards adaptive management in the water sector. In Research Anthology on Environmental and Societal Impacts of Climate Change, 1857–1872 (IGI Global, 2022).

[CR4] 4.Ngure, M. W., Wandiga, S. O., Olago, D. O. & Oriaso, S. O. Climate change stressors affecting household food security among Kimandi–Wanyaga smallholder farmers in Murang’a County, Kenya. Open Agric.6, 587–608 (2021). [Google Scholar]

[CR5] 5.Mkonda, M. Y. & He, X. Are rainfall and temperature really changing? Farmer’s perceptions, meteorological data, and policy implications in the tanzanian semi-arid zone. Sustainability9, 1412 (2017). [Google Scholar]

[CR6] 6.Al Mamoon, A. & Rahman, A. Selection of the best fit probability distribution in rainfall frequency analysis for Qatar. Nat. Hazards86, 281–296 (2017). [Google Scholar]

[CR7] 7.Sharma, M. A. & Singh, J. B. Use of probability distribution in rainfall analysis. N. Y. Sci. J.3, 40–49 (2010). [Google Scholar]

[CR8] 8.Dzupire, N. C., Ngare, P. & Odongo, L. A copula based bi-variate model for temperature and rainfall processes. Sci. Afr.8, e00365 (2020). [Google Scholar]

[CR9] 9.Athulya, P. & James, K. Best fit probability distributions for monthly radiosonde weather data. Int. J. Adv. Manag. Technol. Eng. Sci.7, 24–31 (2017). [Google Scholar]

[CR10] 10.Ozonur, D., Pobocikova, I. & de Souza, A. Statistical analysis of monthly rainfall in central west Brazil using probability distributions. Model. Earth Syst. Environ.7, 1979–1989 (2021). [Google Scholar]

[CR11] 11.Ximenes, P. S. M. P., Silva, A. S. A., Ashkar, F. & Stosic, T. Best-fit probability distribution models for monthly rainfall of northeastern brazil. Water Sci. Technol.84, 1541–1556 (2021). [DOI] [PubMed] [Google Scholar]

[CR12] 12.Hussain, B. et al. Interdependence between temperature and precipitation: Modeling using copula method toward climate protection. Model. Earth Syst. Environ.8, 2753–2766 (2022). [Google Scholar]

[CR13] 13.Singirankabo, E. & Iyamuremye, E. Modelling extreme rainfall events in Kigali city using generalized pareto distribution. Meteorol. Appl.29, e2076 (2022). [Google Scholar]

[CR14] 14.Agbonaye, A. & Izinyon, O. Best-fit probability distribution model for rainfall frequency analysis of three cities in south eastern Nigeria. Niger. J. Environ. Sci. Technol. (NIJEST)1, 34–42 (2017). [Google Scholar]

[CR15] 15.Douka, M. & Karacostas, T. Statistical analyses of extreme rainfall events in Thessaloniki, Greece. Atmos. Res.208, 60–77 (2018). [Google Scholar]

[CR16] 16.Oseni, B. A. & Ayoola, F. J. Fitting the statistical distribution for daily rainfall in Ibadan, based on chi-square and Kolmogorov–Smirnov goodness-of-fit tests. West Afr. J. Ind. Acad. Res.7, 93–100 (2013). [Google Scholar]

[CR17] 17.Yuan, J., Emura, K., Farnham, C. & Alam, M. A. Frequency analysis of annual maximum hourly precipitation and determination of best fit probability distribution for regions in Japan. Urban Clim.24, 276–286 (2018). [Google Scholar]

[CR18] 18.Alam, M. A., Farnham, C. & Emura, K. Best-fit probability models for maximum monthly rainfall in Bangladesh using Gaussian mixture distributions. Geosciences8, 138 (2018). [Google Scholar]

[CR19] 19.Houessou-Dossou, E. A. Y., Mwangi Gathenya, J., Njuguna, M. & Abiero Gariy, Z. Flood frequency analysis using participatory GIS and rainfall data for two stations in Narok town, Kenya. Hydrology6, 90 (2019). [Google Scholar]

[CR20] 20.Coronado-Hernández, Ó. E., Merlano-Sabalza, E., Díaz-Vergara, Z. & Coronado-Hernández, J. R. Selection of hydrological probability distributions for extreme rainfall events in the regions of Colombia. Water12, 1397 (2020). [Google Scholar]

[CR21] 21.Fadhilah, Y. et al. Fitting the best-fit distribution for the hourly rainfall amount in the Wilayah Persekutuan. Jurnal Teknologi46, 49–58 (2007). [Google Scholar]

[CR22] 22.Hasan, R. H. R. Estimating the best-fitted probability distribution for monthly maximum temperature at the Sylhet station in Bangladesh. J. Math. Stat. Stud.2, 60–67 (2021). [Google Scholar]

[CR23] 23.Hossain, M. Fitting the probability distribution of monthly maximum temperature of some selected stations from the northern part of Bangladesh. Int. J. Ecol. Econ. Stat.39, 80–91 (2018). [Google Scholar]

[CR24] 24.WorldBank. Climate change knowledge portal (2024). Accessed 16 Sept 2023.

[CR25] 25.CullenFrey, A. Probabilistic techniques in exposure assessment (1999).

[CR26] 26.Haddad, K. & Rahman, A. Selection of the best fit flood frequency distribution and parameter estimation procedure: A case study for Tasmania in Australia. Stoch. Environ. Res. Risk Assess.25, 415–428 (2011). [Google Scholar]

[CR27] 27.Fisher, R. A. On the mathematical foundations of theoretical statistics. In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, vol. 222, 309–368 (1922).

[CR28] 28.Zong, Z. Information-Theoretic Methods for Estimating of Complicated Probability Distributions Vol. 207 (Elsevier, 2006). [Google Scholar]

[CR29] 29.Naghettini, M. Fundamentals of Statistical Hydrology (Springer, 2017). [Google Scholar]

[CR30] 30.Chikobvu, D. & Chifurira, R. Modelling of extreme minimum rainfall using generalised extreme value distribution for Zimbabwe. S. Afr. J. Sci.111, 01–08 (2015). [Google Scholar]

[CR31] 31.Sukrutha, A., Dyuthi, S. R. & Desai, S. Probability distribution for monthly precipitation data in India. arXiv preprintarXiv:1708.03144 (2017).

[CR32] 32.Lima, A. O. et al. Extreme rainfall events over Rio de Janeiro state, brazil: Characterization using probability distribution functions and clustering analysis. Atmos. Res.247, 105221 (2021). [Google Scholar]

[CR33] 33.Moccia, B., Mineo, C., Ridolfi, E., Russo, F. & Napolitano, F. Probability distributions of daily rainfall extremes in Lazio and Sicily, Italy, and design rainfall inferences. J. Hydrol. Reg. Stud.33, 100771 (2021). [Google Scholar]

[CR34] 34.Razali, N. M. et al. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Model. Anal.2, 21–33 (2011). [Google Scholar]

[CR35] 35.Ng, J. et al. Investigation of the best fit probability distribution for annual maximum rainfall in Kelantan river basin. In IOP Conference Series: Earth and Environmental Science, vol. 476, 012118 (IOP Publishing, 2020).

[CR36] 36.Coronado-Hernández, Ó. E., Merlano-Sabalza, E., Díaz-Vergara, Z. & Coronado-Hernández, J. R. Selection of hydrological probability distributions for extreme rainfall events in the regions of Colombia. Water12, 1397 (2020). [Google Scholar]

[CR37] 37.Singirankabo, E., Iyamuremye, E., Habineza, A. & Nelson, Y. Statistical modelling of maximum temperature in Rwanda using extreme value analysis. Open J. Math. Sci.7, 180–195 (2023). [Google Scholar]

[CR38] 38.Coles, S. & Coles, S. Basics of statistical modeling. In An Introduction to Statistical Modeling of Extreme Values 18–44 (2001).

[CR39] 39.Ng, J. et al. Statistical modelling of extreme temperature in peninsular Malaysia. In IOP Conference Series: Earth and Environmental Science, vol. 1022, 012072 (IOP Publishing, 2022).

[CR40] 40.Ghosh, S., Roy, M. K. & Biswas, S. C. Determination of the best fit probability distribution for monthly rainfall data in Bangladesh. Am. J. Math. Stat.6, 170–174 (2016). [Google Scholar]

[CR41] 41.Villarini, G., Smith, J. A., Serinaldi, F. & Ntelekos, A. A. Analyses of seasonal and annual maximum daily discharge records for central Europe. J. Hydrol.399, 299–312 (2011). [Google Scholar]

[CR42] 42.Hasan, H., Radi, N. A. & Kassim, S. Modeling of extreme temperature using generalized extreme value (GEV) distribution: A case study of Penang. Proc. World Congr. Eng.1, 181–186 (2012). [Google Scholar]

[CR43] 43.Ender, M. & Ma, T. Extreme value modeling of precipitation in case studies for China. Int. J. Sci. Innov. Math. Res. (IJSIMR)2, 23–36 (2014). [Google Scholar]

[CR44] 44.Fowler, H. & Kilsby, C. A regional frequency analysis of united kingdom extreme rainfall from 1961 to 2000. Int. J. Climatol. J. R. Meteorol. Soc.23, 1313–1334 (2003). [Google Scholar]

[CR45] 45.Gilleland, E., Ribatet, M. & Stephenson, A. G. A software review for extreme value analysis. Extremes16, 103–119 (2013). [Google Scholar]

[CR46] 46.Özari, Ç., Eren, Ö. & Saygin, H. A new methodology for the block maxima approach in selecting the optimal block size. Tehnički vjesnik26, 1292–1296 (2019). [Google Scholar]

[CR47] 47.Musyoka, M. M. Spatial–Temporal Characteristics of Rainfall Events in Kenya. Ph.D. thesis, University of Nairobi (2020).

[CR48] 48.Onwuegbuche, F. C. et al. Application of extreme value theory in predicting climate change induced extreme rainfall in Kenya. Int. J. Stat. Probab.8, 85–94 (2019). [Google Scholar]

PERMALINK

A systematic approach to modeling monthly maximum temperature and total rainfall in Kenya

Kevin Otieno

Linda Chaba

Collins Odhiambo

Bernard Omolo

Abstract

Introduction

Table 1.

Methods

Data

Selection of candidate probability distributions

Table 2.

Parameter estimation

Goodness of fit tests

Comprehensive scoring methodology

Results and discussion

Summary statistics

Table 3.

Choice of candidate distributions

Fig. 1.

Fig. 2.

Graphical assessments and GOF tests results

Graphical assessments

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

GOF tests

Table 4.

Table 5.

Sensitivity analysis

Table 6.

Table 7.

Fig. 7.

Fig. 8.

Conclusion

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases