Bayesian modelling of oxygen saturation (SpO2) of cardiac patients using the asymmetric generalized error distribution

Tayu Nigusie Abebe; Ayele Taye Goshu

doi:10.1186/s12887-025-06094-6

. 2025 Oct 2;25:727. doi: 10.1186/s12887-025-06094-6

Bayesian modelling of oxygen saturation (SpO₂) of cardiac patients using the asymmetric generalized error distribution

Tayu Nigusie Abebe ^1,^✉, Ayele Taye Goshu ¹

PMCID: PMC12492735 PMID: 41039283

Abstract

Background

Oxygen saturation (SpO₂) is a crucial parameter for monitoring the health of cardiac patients. It measures the percentage of hemoglobin in the blood that is saturated with Oxygen. The study aims to analyze longitudinal Oxygen saturation (SpO₂) levels and identify its determinants among cardiac patients.

Methods

Bayesian linear mixed-effects model with the asymmetric generalized error distribution (AGED) was used to analyze the data. The data comprises 323 children diagnosed with cardiac disease. AGED outperforms the others distributions and indicates robust and effective choice to analysis the data.

Results

The estimated shape parameters of AGED are significant Inline graphic (95% CI: 2.31, 2.58) which is degree of asymmetry, and (95% CI: 3.18, 3.46) is associated with peakedness of the distribution. The finding reveals that corrective surgery, pulmonary hypertension, cardiomyopathy, anemia, nutritional status, and hemoglobin levels are significantly associated with Oxygen saturation (SpO₂). Pulmonary hypertension, cardiomyopathy, and under nutrition are found to lower SpO₂. In contrast, higher hemoglobin and corrective surgery are significantly associated with higher SpO₂. The AGED fitted to the data, and found to be important for analyzing data characterized by asymmetry and excess kurtosis.

Keywords: Asymmetry, Cardiac disease, Children, Linear mixed model, Oxygen saturation, Patients

Introduction

Oxygen saturation (SpO₂) is a crucial parameter for monitoring the health of children with cardiac disease [1]. It measures the percentage of hemoglobin in the blood that is saturated with Oxygen. In children with cardiac disease, maintaining adequate Oxygen levels is essential for ensuring overall health [2].

Accurate and continuous monitoring of SpO₂ is vital for several reasons, including assessment of cardiac function, valuation of disease progression, guidance for clinical interventions, and early detection of complications [3]. It helps clinicians to assess disease severity and guide treatment decisions, such as the timing of surgical interventions or the need for supplemental Oxygen [3]. Furthermore, SpO₂ monitoring is critical as indicate complications like pulmonary hypertension or residual shunts [3, 4].

Despite its use, interpreting SpO₂ in children with cardiac disease requires careful consideration. Normal SpO₂ levels in healthy children range from 95 to 100%, indicating efficient Oxygen uptake in the lungs and adequate delivery [4]. However, in children with cardiac disease, this process is often disrupted due to structural or functional abnormalities in the heart [5]. Factors such as anemia and the presence of abnormal hemoglobin can affect SpO₂ [4, 6]. The minor deviations from normal levels can have significant implications for growth, development, and survival [6].

This study aims to explore SpO₂ measures among cardiac patients. By understanding the relationship between SpO₂ and various covariates, healthcare providers can optimize care strategies to improve outcomes for this vulnerable population [4, 5]. This understanding can lead to more effective treatment plans, and ultimately enhancing the quality of life [6].

To achieve this objective, a linear mixed-effects model is used. In these models, the response is a function of fixed effects, unobservable individual-specific random effects, and an error term [7]. A mixed-effects model, in which both the fixed and random effects contribute linearly to the response variable, is known as the Linear Mixed-Effects Model (LMM) [8].

Linear mixed effect model (LMM) is not sufficient for modelling a non-normal data [8, 9]. A typical assumption of LMM is the normality of the random error, which is unrealistic and overly restrictive [9, 10]. The distribution of random error may become asymmetric due to the presence of outliers, making it impossible to realize the normality assumption. The use of asymmetric distribution is needed to overcome the problem of deviation from normality. The idea of using generalized error distribution (GED) instead of the classical normal distribution in linear mixed modelling is suggested and demonstrated by Markos and Ayele [11]. Here we apply this idea for the new asymmetric generalized error distribution (AGED) developed by Tayu and Ayele [12], which have capable of modelling data with various degrees of asymmetry and outliers.

The aim of this study is Bayesian modelling of longitudinal SpO₂ measures using AGED and identifies determinants of SpO₂ among cardiac patients having clinical follow-up at the Cardiac Center. The linear mixed effects model with the asymmetric generalized error distribution is integrated into the analysis.

Research method

Research philosophical worldviews

The post-positivist philosophical worldview is adopted for this study, which is applicable to both quantitative and qualitative research methodologies [13]. This philosophy focused on identifying and assessing the causes that determine outcomes, through the observation and measurement of objective reality or theory verification.

Study design

Observational study design is employed [14]. Researchers investigate the effects of certain factors without manipulating who is, or isn’t, exposed to them. A retrospective cohort study design is used, where the cohort is identified from past records, and outcomes are measured from that point forward. We used retrospective cohort study design due to costly to conduct a prospective study [15]. We use inclusion and exclusion criteria, and minimize loss to follow-up to reduce the potential selection bias.

Study setting

The Cardiac Center-Ethiopia, located in Addis Ababa city, is chosen for the study. The Center is a non-profit organization that serves children with cardiac case and receives referral from nearby states. Currently, the Center provides the largest cardiac surgery in Ethiopia.

Study population and sampling

The study population is children with cardiac disease follow-up at the Center. Children under the age of 18 years are recruited for the study. The study encompasses patients admitted to the center and followed-up from September 2021 to August 2022. A simple random sampling technique [16] with a check list is used to select a representative samples. The inclusion criteria: children under the age of 18 years who have been diagnosed with cardiac disease and are under follow-up at the Center. Participants must have been admitted and followed at the Center between September 2021 and August 2022 and must have attended at least three follow-up visits during this period. Exclusion criteria include patients who have attended fewer than three follow-up visits or have incomplete or missing information in their medical records.

Sample size determination

Before data is collected, emphasis is needed on determination of sample size. The sample size is determined based on financial affordability, time constraints, and data analysis method [16]. When we determine the sample size ( Inline graphic ), it must at the very least meet criteria: optimum in predictor effect, small absolute difference and precise estimation [17]. The sample size that meet al.l the three criteria provide the optimum values required for the study [17].

We design a study to fit a model, we ensure the sample size is adequate in terms of the number of participants ( Inline graphic ) and events () relative to the number of predictor parameters (). We assume there are potential predictors for inclusion in the model, and for suitable choice ofand based on previous study, the required sample can be determined [17, 18]. We need to include predictor used in the model with targeted expected shrinkage of 0.90, and the required sample size can be obtained by:

Therefore the optimum sample size required for study is 323 participants.

Data

The data were extracted from patients’ medical records, which include demographics and clinical information. To collect data, ethical approval was obtained from the relevant institutional review boards. The data collection protocols adhered strictly to confidentiality standards, ensuring the protection of sensitive information.

Missing data in longitudinal studies is a common challenge arising from reasons such as participant dropouts or intermittent data collection issues [19]. It can be either Missing Completely at Random (MCAR), Missing at Random (MAR) or Missing Not at Random (MNAR). The missing considered in the data is MCAR, where the missing data is independent of both observed and unobserved data [19].

The study variables

The outcome variable considered in this study is the longitudinal Oxygen saturation (SpO2), which is measured repeatedly for patient follow-ups from the start of diagnosis until either mortality, loss to follow-up, or continue follow-up. The covariates are: gender (boy, girl), residence (rural, urban), age, weight, cardiac disease (congenital, acquired), corrective surgery (yes, no), chamber enlargement (yes, no), pulmonary hypertension (yes, no), cardiomopthy (yes, no), pneumonia (yes, no), anemia (yes, no), New York Heart Association (I, II, III, IV), nutrition status (under weight, normal weight), ejection fraction, hemoglobin, birth order, and family size.

The asymmetric generalized error distribution

The asymmetric generalized error distribution (AGED) is a flexible statistical distribution that generalizes the traditional generalized error distribution [20] to allow for asymmetry and outliers or peakedness in data [12]. The distribution is developed by Tayu and Ayele [12] and suggested for modelling data that exhibit asymmetry and peakedness. The probability density function Inline graphic of AGED model is expressed as:

where function Inline graphic and is Euler gamma function. The distribution is denoted by AGED with four parameters representing location , scale , and degree of peakedness , and asymmetry parameter of the distribution.

The three special futures are considered within the AGED, which include the generalized error distribution, normal distribution and skew normal distributions. If the shape parameter Inline graphic , the distribution is generalized error distribution, If and , the distribution is skew normal distribution, and If and , the distribution is the normal distribution.

Linear mixed effects model with the new distribution AGED

The LMM is popular for modelling normally distributed data [21]. However, the normal distribution is not good enough to model data having asymmetry and outliers [22, 23]. The idea of using generalized error distribution instead of the classical normal distribution in LMM is suggested and demonstrated by Markos and Ayele [11]. We apply this idea for the new asymmetric generalized error distribution (AGED) developed by Tayu and Ayele [12], which have capable of modelling data with various degrees of asymmetry and outliers. The most commonly used linear mixed model (LMM) was proposed by Laird and Ware [24] for a continuous response. The LMM using asymmetric generalized error distribution or AGED-LMM proposed in this study extends the linear mixed model proposed by Laird and Ware [24].

Let Inline graphic be the observations on Oxygen saturation SpO₂ for a patient at visiting time . The linear mixed model with asymmetric generalized error distribution or AGED-LMM is defined as follows:

where Inline graphic denotes the design matrix consisting of predictors of the response and a vector of 1’s for constant intercept, combined with vector of fixed effects . For the random component, denotes the design matrix consisting of predictors combined with vector of random effects . All the random errors and random effects in the linear mixed effects model (3) are assumed to be pairwise independent and each having AGED instead of normal distribution. Some of the predictors can be time varying.

Bayesian estimation

Bayesian estimation is defined by posterior distribution [25]. The posterior distribution Inline graphic involves the product of the likelihood function and prior distribution with normalizing constant [25].

Likelihood function

We assume that observation of the longitudinal outcome Inline graphic is independent given the random effects and parameters of the longitudinal model with distribution.

Inline graphic . The random effects are . The likelihood functions can be written as the forms [26]:

Prior distribution

Prior distribution of the parameters Inline graphic are: individual parameters are assumed to be random variables having independently normally distributed with mean zero and larger variance 1000, shape parameters of AGED both assume prior density function, and variance term have .

Posterior distribution

The posterior distribution Inline graphic is given by,

where Inline graphic is the likelihood function for the Oxygen saturation response , with being the prior probability distribution, and is the normalizing constant. The Markov Chain Monte Carlo (MCMC) is applied [25, 26] to simulate from the posterior distribution.

Implementation and model selection

The models were fitted using Stan software [27]. Stan is a better choice in fitting complex models with complicated posteriors that usually involve high correlation among parameters [27]. Implementation of complex models in Stan is much easier as computational issues may arise in other software’s [28].

We used two criteria for model selection. The first one is Leave-One-Out Information Criterion (LOOIC), is a Bayesian model evaluation metric that estimates the predictive accuracy of a model by using leave-one-out cross-validation [29]. The lower LOOIC values indicating better predictive accuracy. LOOIC is robust against improper priors and is computationally stable [29]. The second criterion is the Widely Applicable Information Criterion (WAIC). Smaller values of WAIC indicate better fit [30].

Widely applicable information criterion

The widely applicable information criterion (WAIC) is a fully Bayesian estimator that averages over the posterior distribution of Inline graphic . For a observation , this criterion measures the predictive accuracy of the model based on the log-posterior predictive distribution of the parameter vector [30]. WAIC can be expressed as [30]

Where lppd (Log Pointwise Predictive Density)

Lower WAIC values indicate a better model fit with higher predictive accuracy. The penalty term Inline graphic accounts for model complexity, balancing goodness-of-fit and overfitting.

Leave-one-out information criterion

The Leave-One-Out Information Criterion (LOOIC) is a Bayesian model evaluation metric that estimates the predictive accuracy of a model by using leave-one-out cross-validation [29]. It is particularly useful for comparing models and assessing their generalization to unseen data [29]. The LOOIC is calculated as [29]

where Inline graphic (Log Pointwise Predictive Density):

LOOIC stabilize computations, especially in cases with influential observations. It allows for direct comparison between models, with lower LOOIC values indicate better predictive accuracy and less overfitting [29]. It is often used to compare multiple models fitted to the same data.

Results and discussion

Descriptive analysis

The study included 323 children diagnosed with cardiac disease (CD). Of these, 182 (56.35%) are girls, and 56.04% are under the age of 5 years. The mean age is 7.73 years, ranging from 1 day to 18 years. Two hundred fifteen (66.56%) of the children resides in urban areas (Table 1).

Table 1.

Demographic characteristics of study participants under follow-up

Covariates	Attributes	Total	(%100)
Gender	Boy	141	43.65
Gender	Girl	182	56.35
Age (months)	< 24	52	16.10
	24–59	129	39.94
	60–143	92	28.48
	>=144	50	15.48
Birth order	First	123	38.08
	Second and third	159	49.23
	Forth or more	41	12.69
Residence	Urban	215	66.56
Residence	Rural	108	33.44
Father occupation	Employed	141	43.65
	Farmer	98	30.34
	Merchant	34	10.53
	Other	50	15.48
Mother occupation	Housewife	182	56.35
	Employed	97	30.03
	Merchant	20	6.19
	Other	24	7.43
Economic status	Below	160	49.54
	Average	122	37.77
	Above	41	12.69
Family size	< 4	242	74.92
Family size	>= 4	81	25.08

Open in a new tab

Regarding the occupational status of their families, 43.65% of fathers and 30.03% of mothers are employed by the government, and 74.82% of families have fewer than four members (Table 1).

Pneumonia is identified in 17.96% of the children, and anemia is also common in the study area, with a prevalence rate of 35.29%. Regarding family history of cardiac disease, 27 (8.36%) participants have a history of cardiac disease. Among the study participants, 129 (39.96%) are between the ages of 24–59 months, followed by 92 (28.48%) between the ages of 60–144 months. Of the study participants, 47(14.55%) children underwent corrective surgery, while the remaining 276 (85.45%) are awaiting surgery (Table 2).

Table 2.

Clinical characteristics of study participants under follow-up

Covariates	Attributes	Total	Percent
Cardiac disease	Congenital	230	71.21
Cardiac disease	Acquired	93	28.79
Chamber enlargements	Yes	167	51.70
Chamber enlargements	No	156	48.30
Pulmonary hypertension	Yes	52	16.10
Pulmonary hypertension	No	271	83.90
Corrective surgery	Yes	47	14.55
Corrective surgery	No	276	85.45
Nutritional status	Under weight	192	59.40
Nutritional status	Healthy weight	131	40.60
ROSS/NYHA	I	110	34.05
	II	125	38.69
	III	62	19.19
	IV	26	8.05
Anemia	Yes	114	35.29
Anemia	No	209	64.71
Pneumonia	Yes	58	17.96
Pneumonia	No	265	82.04
Cardiomopthy	Yes	78	24.15
Cardiomopthy	No	245	75.85

Open in a new tab

Analysis of the longitudinal data

When data are repeatedly measured over time, detecting the differences from an individual’s evolve through time is important. Table 3 indicates the average estimate of Oxygen saturation (SpO₂) measures over follow-up periods. As depicted in the Table 3, the baseline average of SpO₂ is estimated to be 92.947 with standard deviation of 4.959. But, by the end of follow-up period, the average of SpO₂ measures is estimated to be 91.901 with standard deviation of 3.489.

Table 3.

Average estimate of oxygen saturation (SpO₂) over follow-up periods

Follow-up periods	1	2	3	4	5	6
Mean	92.947	92.706	91.524	92.050	91.654	91.901
Variance	24.596	27.208	25.640	20.379	16.721	12.175
Standard deviation	4.959	5.216	5.064	4.514	4.089	3.489
CV	5.335	5.626	5.532	4.903	4.461	3.796

Open in a new tab

The analysis also reveals the variability in SpO₂ measures over the follow-up periods, as indicated by the coefficient of variation (CV). The variability in SpO₂ measures at the start of follow-up is 5.335% and at the second follow-up period is about 3.796% as indicated by coefficient of variation. Based on the result, the variability in SpO₂ measures is slightly decreases over follow-up periods. It shows that the patients begin with varying at baseline and differ at different follow-up period. We also indicate the declines of SpO₂ measures during the study periods (Fig. 1) and (Table 3).

Fig. 1 — Box plot of longitudinal SpO2 measures over follow-up or visit time

To understand the relationship between the longitudinal SpO₂ measures and follow-up periods, mean structures is plotted in Fig. 2. The figure shows the mean of SpO₂ measures over follow-up periods, which indicates slight decrease in the mean of SpO₂ of cardiac patients during follow-up periods (Fig. 2).

Before proceed to further analysis, the distribution of the data are assessed. Various distributions are considered to model the data, including the AGED, generalized error distribution, skew normal distribution, and normal distribution. These distributions are fitted to the data and compared using common goodness-of-fit (GOF) statistics (Table 4): Akaike’s Information Criteria (AIC), Consistent Akaike Information Criteria (CAIC), Hannan–Quinn Information Criteria (HQIC), Kolmogorov–Smirnov Statistics (K-S), and Bayesian Information Criterion (BIC) [31, 32].

Table 4.

MLEs and GOF statistics results of the oxygen saturation (SpO₂) data

Parameter	Distributions
Parameter	AGED	GED	Normal	Skew Normal
Location	95.04	91.43	91.58	93.07
Scale	1.647	0.530	0.457	0.438
Shape	3.530	1.848	-	-
Shape	3.331	-	-	-
AIC	1433.176	2047.525	2035.355	1911.583
CAIC	1433.204	2047.542	2035.363	1911.592
BIC	1454.26	2063.338	2045.897	1922.126
HQIC	1441.047	2053.428	2039.291	1915.519

Open in a new tab

When estimating the parameters of the distributions, we examine them through goodness-of-fit (GOF) statistics to identify which distribution is best fits the data. Lower values indicate a better fit to the data [31, 32]. The parameter estimates and GOF statistics value of the distributions are presented in Table 4, including skewness and kurtosis coefficients. It is evident that the goodness-of-fit (GOF) statistic values of the AGED are lower, indicating its superiority in fitting the data compared to other distributions.

Figure 3 displays the histogram and the estimated density function of the Asymmetric Generalized Error Distribution (AGED). The figure demonstrates that the AGED provides a closer fit to the data, indicating that the AGED is best fit to the data.

Fig. 3 — Histogram and fitted density function of AGED to Oxygen saturation (SpO₂) data specific to an individuals

We analyzed SpO₂ measures using different distributions. We found AGED useful for data with variations in degrees of asymmetry and peakedness. We utilized this distribution in the linear mixed-effects model. As shown in Fig. 3, the histogram of SpO₂ clearly indicates non-normality or asymmetric nature. Also, the residual plot suggests deviation from normality. Nevertheless, the plot triggered us to consider additional models. Along with this information, the four statistical models with different error distributions are employed and compared Table 5.

Table 5.

Model comparison

Model	Normal	Skew Normal	GED	AGED
WAIC	1120.33	1088.46	1054.24	1040.59
LOOIC	1144.10	1090.30	1059.02	1044.90

Open in a new tab

The results presented in Table 5 indicate that the AGED model outperforms the normal, Skew normal and GED, as evidenced by lower WAIC and LOOIC values [29, 30]. These metrics are particularly advantageous for Bayesian model comparison, as they balance model fit with complexity to prevent overfitting. In conclusion, the AGED model provides a more generalizable and robust framework for analyzing SpO₂ data, validating its selection as the most appropriate model for this study.

AGED-linear mixed effect model analysis

The results of the Bayesian linear mixed model analysis with the asymmetric generalized error distribution for the SpO₂ measures of cardiac patients is displayed in Table 6. The table includes the estimated posterior mean, standard deviation, 95% credible intervals, and estimated parameters of AGED. The results reveal that pulmonary hypertension, corrective surgery, anemia, hemoglobin levels, nutritional status, cardiomyopathy, and observation time are significantly associated with Oxygen saturation (SpO₂), as their corresponding credible intervals do not include zero. The estimated parameters of the AGED are also significant, indicates a fit of the distribution to the data (Table 6).

Table 6.

Parameter estimates with 95% credible intervals of the bayesian linear mixed models fitted to the oxygen saturation (SpO₂) data

Covariates		SD	95% CI
Intercept	85.44	2.18	(81.16, 89.75)	1.00
Observation time	0.09	0.02	(0.06, 0.13)	1.00
Observation time²	−1.44	0.33	(−0.2.13,−0.077)	1.02
Age	−0.19	0.24	(−0.66, 0.27)	1.00
Residence: Urban	0.49	0.16	(−0.43, 1.39)	1.00
Gender: Male	0.06	0.34	(−0.80, 0.93)	1.00
Surgery: Yes	0.45	0.22	(0.11, 0.78)	1.00
Anemia: Yes	−0.67	0.12	(−1.312, −0.028)	1.00
Pulmonary: Yes	−0.44	0.34	(−0.84, −0.05)	1.02
Pneumonia: Yes	−0.50	0.11	(−1.70, 0.69)	1.00
Cardiompathy: Yes	−0.32	0.29	(−0.65, −0.09)	1.00
BMI: Under Weight	−1.33	0.18	(−2.29, −0.39)	1.00
Hemoglobin	0.71	0.18	(0.37, 1.05)	1.00
Parameters	Estimate	SD	95% CI
	4.02	0.13	(3.59, 4.47)	1.00
	0.55	0.11	(0.30, 0.75)	1.01
	0.53	0.12	(0.28, 0.74)	1.00
sigma	3.31	0.09	(3.17, 3.45)	1.00
lamda	3.32	0.08	(3.18, 3.46)	1.00
alpha	2.44	0.07	(2.31, 2.58)	1.00

Open in a new tab

The estimated shape parameters of the asymmetric generalized error distribution are significant: Inline graphic (CI: 2.31, 2.58) is associated with degree of asymmetry, and (CI: 3.18, 3.46) is associated with peakedness of the distribution. This significance is indicated by the 95% credible intervals, which do not contain zero. The estimated standard deviation of the random intercepts and slope is 4.02 (CI: 3.59, 4.47) and 0.55 (CI: 0.30, 0.75). The lower estimated variance of the random slope compared to the intercept indicates higher between-subject variability at baseline. It was found that the intercept and slope parameters is positively correlated as Inline graphic = 0.53 (CI: 0.28, 0.74) (Table 6).

Based on the results presented in Table 6, we found that corrective surgery, pulmonary hypertension, cardiomyopathy, anemia, nutritional status, hemoglobin levels, and observation time are significantly associated with Oxygen saturation (SpO₂) among cardiac patients under follow-up. Pulmonary hypertension, cardiomyopathy, and nutritional status can lower SpO₂ by 0.44 (CI: −0.84,−0.05), 0.32 (CI: −0.65,−0.09), and 1.33 (CI: −2.29,−0.39), respectively. Higher Oxygen saturation (SpO₂) indicates a better cardiac condition in patients. Corrective surgery is associated with improved SpO₂ in cardiac patients, with an effect size of 0.45 (CI: 0.11, 0.78). Patients who underwent corrective surgery shows improved SpO₂ measures. Thus, pulmonary hypertension, cardiomyopathy, and under nutrition are associated to lower SpO₂, whereas hemoglobin levels and corrective surgery are associated with improved SpO₂ in cardiac patients (Table 6).

Assessment of convergence

Convergence assessment is important in data analysis. It helps to determine whether the model is adequately converged and estimated parameters are stable and reliable [33]. The convergence of the MCMC was monitored using the trace plots and Gelman-Rubin Inline graphic diagnostics [34].

The Inline graphic values given in Table 6 along with the trace plots shown in Figs. 4 indicate that the MCMC runs attained convergence. The trace plots indicate stable mixing of chains; with no evidence of divergent transitions. Additionally, values are precisely less than 1.05 [34] for all parameters, signifying perfect convergence of the Markov Chain Monte Carlo (MCMC) sampler and ensuring the stability and reproducibility of the results.

Fig. 4 — Histogram of posterior distributions and trace plots for estimates of parameters of AGED; sigma scale parameter, (lamda) and (alpha) shape parameter

Sensitivity Analysis

Sensitivity to prior distribution in Bayesian models evaluates how the choice of prior impacts posterior results [35]. It identifies whether conclusions are influenced more by data or prior assumptions. We used alternative priors, performing prior predictive and posterior predictive checks, and using metrics like Kullback-Leibler [36] divergence to measure differences. Visual diagnostics such as density plots also used to assess the effects of varying priors, ensuring the model is robust and conclusions are reliable and the result was does not sensitive to prior distribution.

Discussion

In this work, we discussed implementation of Bayesian linear mixed effects models using AGED for the distribution of Oxygen saturation (SpO₂₎ of cardiac patients. The common assumption of normality is relaxed. Instead, we use a flexible distribution, including generalized error distribution, skew normal distribution and AGED. We have compared the models using WAIC and LOOIC, as evidenced by lower WAIC and LOOIC values [29, 30] indicate better model fit. Our results confirm the results of LMM with AGED over the other models based on the criteria. These metrics are advantageous for Bayesian model comparison, as they balance model fit with complexity to prevent overfitting [29, 30]. Finally, the Bayesian linear mixed model with AGED is introduced, which can handle outliers and asymmetry in the data effectively.

Sensitivity to prior distribution was evaluates using alternative priors, performing prior predictive and posterior predictive checks [35], and using metrics like Kullback-Leibler [36] divergence to measure differences. Visual diagnostics such as density plots also used to assess the effects of varying priors, ensuring the model is robust and conclusions are reliable and the result was does not sensitive to prior distribution [35].

The finding from this study is consistent with the study by [37], they compared different distribution for the data while only one is selected. The asymmetric generalized error distribution is selected in this study and as indicated in [12], the AGED can gain more insight than the other distributions. This choice underscores the flexibility and robustness of AGED in handling data with outliers and asymmetry, making it a valuable in modeling the Oxygen saturation (SpO₂) levels in cardiac patients.

The finding reveals that pulmonary hypertension, cardiompathy, and nutritional status are lowers the Oxygen saturation (SpO₂) by 0.44, 0.32, and − 1.33, respectively and concordance with the study by [2] and [38]. Higher Oxygen saturation (SpO₂) associated in better cardiac condition of patients as indicated in [38]. Additionally, anemia also significantly lower the SpO₂ as in [39].

Patients who undergo corrective surgery is associated with improved SpO₂ levels, which is support by the finding by [39]. This suggests that surgical intervention can lead to better Oxygenation and overall enhanced cardiac health. Additionally, a higher rate of change in biomarker is associated with good health conditions in cardiac patients.

Pulmonary hypertension, cardiompathy, and under nutrition are significant factors associated with lower SpO₂ level, corroborating the results presented in studies [39, 40]. Conversely, hemoglobin levels and corrective surgery are positively associated with SpO₂ measures, as also noted in study [39]. This indicates that improving hemoglobin levels and undergoing corrective surgical procedures can enhance Oxygen saturation, contributing to better overall cardiac patients.

Despite the valuable insights provided, this study has several limitations. First, the data were collected from a single center, which may limit the generalizability of the findings to broader populations or settings. Second, although the Bayesian linear mixed-effects model with AGED accounts for asymmetry and excess kurtosis, the model assumptions may still be sensitive to unmeasured confounding variables or model misspecification. Finally, the exclusion of patients with fewer than three follow-up visits or missing data may have introduced selection bias, potentially affecting the representativeness of the sample.

Conclusions

The study focuses on developing a Bayesian linear mixed-effects model using the asymmetric generalized error distribution (AGED) to analyze longitudinal Oxygen saturation (SpO₂) levels among cardiac patients. By comparing the AGED with other distributions such as the generalized error distribution, normal, and skew normal distributions, it is evident that AGED outperforms the distributions. This indicates that AGED is a robust and effective choice to analysis SpO₂ data, especially in the presence of asymmetry and outliers. The estimated shape parameters of the asymmetric generalized error distribution are significant Inline graphic (95% CI: 2.31, 2.58) indicates asymmetry of the distribution and (95% CI: 3.18, 3.46) associated with peakedness of the distribution. The analysis shows that the variables; corrective surgery, pulmonary hypertension, cardiomyopathy, anemia, nutritional status, hemoglobin, and observation time are significantly associated with Oxygen saturation (SpO₂) level of cardiac patients under clinical follow-up. Pulmonary hypertension, cardiomyopathy, and under nutrition are found to significantly lower SpO₂ levels. In contrast, higher hemoglobin and corrective surgery are significantly associated with higher SpO₂ levels. Specifically, low hemoglobin level is associated with a decline in SpO₂, whereas patients undergoing corrective surgery exhibited improved SpO₂. A higher rate of change in biomarkers indicates better health conditions in cardiac patients. The Bayesian linear mixed model with AGED can model data with various degrees of asymmetry and excess kurtosis. Future studies could investigate models with various types of random effects and include more covariates to further enhance the insights.

Acknowledgements

The authors would like to sincerely thank the Cardiac Center-Ethiopia and staff for providing us the data used in this study. We extend our sincere gratitude to the Kotebe University of Education for supports and coordination.

Authors’ contributions

TA and AG contributed to conceptualization and design of the study. TA and AG critically revised the paper. TA and AG analysis the data and wrote the initial draft of the manuscript. All authors participated in the revision, read and approved the final version for submission.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Nield LE, Qi X, Yoo S-J, Valsangiacomo ER, Hornberger LK, Wright GA. MRI-based blood oxygen saturation measurements in infants and children with congenital heart disease. Pediatr Radiol. 2002;32:518–22. [DOI] [PubMed] [Google Scholar]
2.Kussman BD, Laussen PC, Benni PB, McGowan FX Jr, McElhinney DB. Cerebral oxygen saturation in children with congenital heart disease and chronic hypoxemia. Anesth Analg. 2017;125(1):234–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.de Rosa A, Betini RC. Noncontact SpO 2 measurement using Eulerian video magnification. IEEE Trans Instrum Meas. 2019;69(5):2120–30. [Google Scholar]
4.Binene V, Panauwe D, Kauna R, Vince JD, Duke T. Oxygen saturation reference ranges and factors affecting SpO2 among children living at altitude. Arch Dis Child. 2021;106(12):1160–4. [DOI] [PubMed] [Google Scholar]
5.Garde A, Karlen W, Dehkordi P, Wensley D, Ansermino JM, Dumont GA. Oxygen saturation in children with and without obstructive sleep apnea using the phone-oximeter. 2013 35th Annual Int Conf IEEE Eng Med Biology Soc (EMBC). 2013;IEEE:2531–4. [DOI] [PubMed] [Google Scholar]
6.Pons-Odena M, et al. SpO2/FiO2 as a predictor of non-invasive ventilation failure in children with hypoxemic respiratory insufficiency. J Pediatr Intensive Care. 2013;2(03):111–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Pinheiro JC, Bates DM. Mixed-effects models in S and S-PLUS. New York: Springer New York; 2000.
8.Peng H, Lu Y. Model selection in linear mixed effect models. J Multivar Anal. 2012;109:109–29. [Google Scholar]
9.Jiang J, Nguyen T. Linear Mixed Models: Part I. InLinear and Generalized Linear Mixed Models and Their Applications. New York: Springer New York. 2021;23:1–61.
10.Gałecki A, Burzykowski T. Linear mixed-effects model. InLinear mixed-effects models using R: a step-by-step approach. New York: Springer New York. 2012;28:245–273.
11.Erango MA, Goshu AT. Bayesian joint modelling of survival time and longitudinal CD4 cell counts using accelerated failure time and generalized error distributions. Open J Model Simul. 2019;7(01):79. [Google Scholar]
12.Abebe TN, Goshu AT. Asymmetric generalized error distribution with its properties and applications. Front Appl Math Stat. 2024;10:1398137. [Google Scholar]
13.Crook C, Garratt D. The positivist paradigm in contemporary social science research. Research methods in the social sciences. 2005;207:214.
14.Carlson MDA, Morrison RS. Study design, precision, and validity in observational studies. J Palliat Med. 2009;12(1):77–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Talari K, Goyal M. Retrospective studies–utility and caveats. J R Coll Physicians Edinb. 2020;50(4):398–402. [DOI] [PubMed] [Google Scholar]
16.Cochran WG. Sampling techniques. Johan Wiley Sons Inc. 1977.
17.Riley RD, et al. Minimum sample size for developing a multivariable prediction model: part I–continuous outcomes. Stat Med. 2019;38(7):1262–75. [DOI] [PubMed] [Google Scholar]
18.Chen LM, Ibrahim JG, Chu H. Sample size and power determination in joint modeling of longitudinal and survival data. Stat Med. 2011;30(18):2295–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18(1):1–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.McDonald JB. 14 Probability distributions for financial models. Handbook of statistics. 1996;14:427–61.
21.Bolker BM. Linear and generalized linear mixed models. Ecological statistics: contemporary theory and application. 2015;2015:309–33.
22.Arellano-Valle RB, Bolfarine H, Lachos VH. Bayesian inference for skew-normal linear mixed models. J Appl Stat. 2007;34(6):663–82. [Google Scholar]
23.Lachos VH, Ghosh P, Arellano-Valle RB. Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica. 2010:303–22.
24.Laird N. Random effects and the linear mixed model, in Analysis of Longitudinal and Cluster-Correlated Data. Institute of Mathematical Statistics. 2004;8:79–96. [Google Scholar]
25.Forbes C, Evans M, Hastings N, Peacock B. Statistical distributions. John Wiley & Sons; 2011 Mar 21.Forbes C, Evans M, Hastings N, Peacock B. Statistical distributions. Wiley; 2011.
26.Barnard GA. The use of the likelihood function in statistical practice. InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Univ of California Press; 1967;1:27–40.
27.Carpenter B, et al. Stan: A probabilistic programming Language. J Stat Softw. 2017;76:1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Beraha M, Falco D, Guglielmi A. JAGS, NIMBLE, Stan: a detailed comparison among Bayesian MCMC software, arXiv Prepr. arXiv2107.09357. 2021.
29.Yuan J, Li Y-M, Liu C-L, Zha XF. Leave-one-out cross-validation based model selection for manifold regularization, in International Symposium on Neural Networks. Springer. 2010:457–464.
30.Watanabe S. A widely applicable Bayesian information criterion. The Journal of Machine Learning Research. 2013;14(1):867–97.
31.McKinley RL, Mills CN. A comparison of several goodness-of-fit statistics. Applied Psychological Measurement. 1985;9(1):49–57.
32.Xavier T. Goodness of fit tests for Rayleigh distribution, arXiv Prepr. arXiv2208.08698. 2022.
33.Fabreti LG, Höhna S. Convergence assessment for bayesian phylogenetic analysis using MCMC simulation. Methods Ecol Evol. 2022;13(1):77–90. [Google Scholar]
34.Peskun PH. Optimum monte-carlo sampling using markov chains. Biometrika. 1973;60(3):607–12. 10.2307/2334940.
35.Oakley JE, O’Hagan A. Probabilistic sensitivity analysis of complex models: a bayesian approach. J R Stat Soc Ser B Stat Methodol. 2004;66(3):751–69. [Google Scholar]
36.Teixeira R, O’Connor A, Nogal M. Probabilistic sensitivity analysis of offshore wind turbines using a transformed kullback-leibler divergence. Struct Saf. 2019;81:101860. [Google Scholar]
37.Ferreira CS, Bolfarine H, Lachos VH. Linear mixed models based on skew scale mixtures of normal distributions. Commun Stat Comput. 2022;51(12):7194–214. [Google Scholar]
38.Ucrós S, Granados CM, Castro-Rodríguez JA, Hill CM. Oxygen saturation in childhood at high altitude: a systematic review. High Alt Med Biol. 2020;21(2):114–25. [DOI] [PubMed] [Google Scholar]
39.Hirai N, Saito J, Nakai K, Noguchi S, Hashiba E, Hirota K. Association between regional oxygen saturation and central venous saturation in pediatric patients undergoing cardiac surgery: A prospective observational study. Pediatr Anesth. 2023;33(11):913–22. [DOI] [PubMed] [Google Scholar]
40.Fenton KN, Freeman K, Glogowski K, Fogg S, Duncan KF. The significance of baseline cerebral oxygen saturation in children undergoing congenital heart surgery. Am J Surg. 2005;190(2):260–3. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

[CR1] 1.Nield LE, Qi X, Yoo S-J, Valsangiacomo ER, Hornberger LK, Wright GA. MRI-based blood oxygen saturation measurements in infants and children with congenital heart disease. Pediatr Radiol. 2002;32:518–22. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Kussman BD, Laussen PC, Benni PB, McGowan FX Jr, McElhinney DB. Cerebral oxygen saturation in children with congenital heart disease and chronic hypoxemia. Anesth Analg. 2017;125(1):234–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.de Rosa A, Betini RC. Noncontact SpO 2 measurement using Eulerian video magnification. IEEE Trans Instrum Meas. 2019;69(5):2120–30. [Google Scholar]

[CR4] 4.Binene V, Panauwe D, Kauna R, Vince JD, Duke T. Oxygen saturation reference ranges and factors affecting SpO2 among children living at altitude. Arch Dis Child. 2021;106(12):1160–4. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Garde A, Karlen W, Dehkordi P, Wensley D, Ansermino JM, Dumont GA. Oxygen saturation in children with and without obstructive sleep apnea using the phone-oximeter. 2013 35th Annual Int Conf IEEE Eng Med Biology Soc (EMBC). 2013;IEEE:2531–4. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Pons-Odena M, et al. SpO2/FiO2 as a predictor of non-invasive ventilation failure in children with hypoxemic respiratory insufficiency. J Pediatr Intensive Care. 2013;2(03):111–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Pinheiro JC, Bates DM. Mixed-effects models in S and S-PLUS. New York: Springer New York; 2000.

[CR8] 8.Peng H, Lu Y. Model selection in linear mixed effect models. J Multivar Anal. 2012;109:109–29. [Google Scholar]

[CR9] 9.Jiang J, Nguyen T. Linear Mixed Models: Part I. InLinear and Generalized Linear Mixed Models and Their Applications. New York: Springer New York. 2021;23:1–61.

[CR10] 10.Gałecki A, Burzykowski T. Linear mixed-effects model. InLinear mixed-effects models using R: a step-by-step approach. New York: Springer New York. 2012;28:245–273.

[CR11] 11.Erango MA, Goshu AT. Bayesian joint modelling of survival time and longitudinal CD4 cell counts using accelerated failure time and generalized error distributions. Open J Model Simul. 2019;7(01):79. [Google Scholar]

[CR12] 12.Abebe TN, Goshu AT. Asymmetric generalized error distribution with its properties and applications. Front Appl Math Stat. 2024;10:1398137. [Google Scholar]

[CR13] 13.Crook C, Garratt D. The positivist paradigm in contemporary social science research. Research methods in the social sciences. 2005;207:214.

[CR14] 14.Carlson MDA, Morrison RS. Study design, precision, and validity in observational studies. J Palliat Med. 2009;12(1):77–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Talari K, Goyal M. Retrospective studies–utility and caveats. J R Coll Physicians Edinb. 2020;50(4):398–402. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Cochran WG. Sampling techniques. Johan Wiley Sons Inc. 1977.

[CR17] 17.Riley RD, et al. Minimum sample size for developing a multivariable prediction model: part I–continuous outcomes. Stat Med. 2019;38(7):1262–75. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Chen LM, Ibrahim JG, Chu H. Sample size and power determination in joint modeling of longitudinal and survival data. Stat Med. 2011;30(18):2295–309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18(1):1–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.McDonald JB. 14 Probability distributions for financial models. Handbook of statistics. 1996;14:427–61.

[CR21] 21.Bolker BM. Linear and generalized linear mixed models. Ecological statistics: contemporary theory and application. 2015;2015:309–33.

[CR22] 22.Arellano-Valle RB, Bolfarine H, Lachos VH. Bayesian inference for skew-normal linear mixed models. J Appl Stat. 2007;34(6):663–82. [Google Scholar]

[CR23] 23.Lachos VH, Ghosh P, Arellano-Valle RB. Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica. 2010:303–22.

[CR24] 24.Laird N. Random effects and the linear mixed model, in Analysis of Longitudinal and Cluster-Correlated Data. Institute of Mathematical Statistics. 2004;8:79–96. [Google Scholar]

[CR25] 25.Forbes C, Evans M, Hastings N, Peacock B. Statistical distributions. John Wiley & Sons; 2011 Mar 21.Forbes C, Evans M, Hastings N, Peacock B. Statistical distributions. Wiley; 2011.

[CR26] 26.Barnard GA. The use of the likelihood function in statistical practice. InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Univ of California Press; 1967;1:27–40.

[CR27] 27.Carpenter B, et al. Stan: A probabilistic programming Language. J Stat Softw. 2017;76:1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Beraha M, Falco D, Guglielmi A. JAGS, NIMBLE, Stan: a detailed comparison among Bayesian MCMC software, arXiv Prepr. arXiv2107.09357. 2021.

[CR29] 29.Yuan J, Li Y-M, Liu C-L, Zha XF. Leave-one-out cross-validation based model selection for manifold regularization, in International Symposium on Neural Networks. Springer. 2010:457–464.

[CR30] 30.Watanabe S. A widely applicable Bayesian information criterion. The Journal of Machine Learning Research. 2013;14(1):867–97.

[CR31] 31.McKinley RL, Mills CN. A comparison of several goodness-of-fit statistics. Applied Psychological Measurement. 1985;9(1):49–57.

[CR32] 32.Xavier T. Goodness of fit tests for Rayleigh distribution, arXiv Prepr. arXiv2208.08698. 2022.

[CR33] 33.Fabreti LG, Höhna S. Convergence assessment for bayesian phylogenetic analysis using MCMC simulation. Methods Ecol Evol. 2022;13(1):77–90. [Google Scholar]

[CR34] 34.Peskun PH. Optimum monte-carlo sampling using markov chains. Biometrika. 1973;60(3):607–12. 10.2307/2334940.

[CR35] 35.Oakley JE, O’Hagan A. Probabilistic sensitivity analysis of complex models: a bayesian approach. J R Stat Soc Ser B Stat Methodol. 2004;66(3):751–69. [Google Scholar]

[CR36] 36.Teixeira R, O’Connor A, Nogal M. Probabilistic sensitivity analysis of offshore wind turbines using a transformed kullback-leibler divergence. Struct Saf. 2019;81:101860. [Google Scholar]

[CR37] 37.Ferreira CS, Bolfarine H, Lachos VH. Linear mixed models based on skew scale mixtures of normal distributions. Commun Stat Comput. 2022;51(12):7194–214. [Google Scholar]

[CR38] 38.Ucrós S, Granados CM, Castro-Rodríguez JA, Hill CM. Oxygen saturation in childhood at high altitude: a systematic review. High Alt Med Biol. 2020;21(2):114–25. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Hirai N, Saito J, Nakai K, Noguchi S, Hashiba E, Hirota K. Association between regional oxygen saturation and central venous saturation in pediatric patients undergoing cardiac surgery: A prospective observational study. Pediatr Anesth. 2023;33(11):913–22. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Fenton KN, Freeman K, Glogowski K, Fogg S, Duncan KF. The significance of baseline cerebral oxygen saturation in children undergoing congenital heart surgery. Am J Surg. 2005;190(2):260–3. [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian modelling of oxygen saturation (SpO2) of cardiac patients using the asymmetric generalized error distribution

Tayu Nigusie Abebe

Ayele Taye Goshu

Abstract

Background

Methods

Results

Introduction

Research method

Research philosophical worldviews

Study design

Study setting

Study population and sampling

Sample size determination

Data

The study variables

The asymmetric generalized error distribution

Linear mixed effects model with the new distribution AGED

Bayesian estimation

Likelihood function

Prior distribution

Posterior distribution

Implementation and model selection

Widely applicable information criterion

Leave-one-out information criterion

Results and discussion

Descriptive analysis

Table 1.

Table 2.

Analysis of the longitudinal data

Table 3.

Fig. 1.

Fig. 2.

Table 4.

Fig. 3.

Table 5.

AGED-linear mixed effect model analysis

Table 6.

Assessment of convergence

Fig. 4.

Sensitivity Analysis

Discussion

Conclusions

Acknowledgements

Authors’ contributions

Funding

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Bayesian modelling of oxygen saturation (SpO₂) of cardiac patients using the asymmetric generalized error distribution