Skip to main content
Journal of Health, Population, and Nutrition logoLink to Journal of Health, Population, and Nutrition
. 2026 Jan 3;45:46. doi: 10.1186/s41043-025-01213-6

Association between the oxidative balance score and all-cause mortality in patients with cardiovascular disease-cancer comorbidity

Fen Liu 1,2,3,#, Jian Wang 4,#, Si-Ao Wen 4, Si-Ling Peng 4, Yan-Cheng Jiang 1,2,3, Zheng-Yu Liu 1,2,3,, Ya-Yu You 1,2,3,
PMCID: PMC12866453  PMID: 41484685

Abstract

Purpose

The Oxidative Balance Score (OBS) is a composite measure of systemic oxidative stress. This study aims to evaluate the impact of OBS on all-cause mortality in patients with cardiovascular disease–cancer comorbidity and to use machine learning to identify related factors.

Methods

We analyzed data from the 2007–2018 US National Health and Nutrition Examination Survey (NHANES). Cox regression, Kaplan-Meier analysis, restricted cubic splines (RCS), and subgroup analysis were used to explore the association between OBS and CVD-cancer comorbidity. Five machine learning models were constructed and compared to identify the optimal CVD-cancer comorbidity risk prediction model, and feature importance was assessed.

Results

Among the study participants, compared to participants in the lowest tertile of the OBS score, those in the highest tertile exhibited a lower risk of all-cause mortality (HR = 0.78, 95% CI: 0.64–0.95, p = 0.016). RCS showed that OBS had a no nonlinear evidences with CVD-cancer comorbidity. In subgroup analyses, the association remained consistent across all subgroups, with no statistically significant interaction observed (all P for interaction > 0.05). The random forest algorithm was identified as the optimal predictive model through machine learning evaluation. Decision curve analysis (DCA) and calibration curves further supported the internal validity of the model. SHAP analysis revealed that age, smoking intensity, niacin intake, and selenium levels were the most influential predictive factors.

Conclusions

This study demonstrates a significant inverse association between higher OBS and all-cause mortality in patients with cardiovascular disease–cancer comorbidity and provides an interpretable machine learning model to predict this comorbidity.

Supplementary Information

The online version contains supplementary material available at 10.1186/s41043-025-01213-6.

Keywords: Cardiovascular disease, Cancer, Epidemiology, Machine learning, Oxidative balance score

Introduction

The converging trends of population aging, improved cancer survival, and the high prevalence of cardiometabolic risk factors have led to a growing population of individuals living with both CVD and cancer [1]. This comorbidity presents a major clinical challenge and contributes significantly to mortality and healthcare burden worldwide [2, 3].

While traditionally viewed as separate disease entities, CVD and cancer share common risk factors and, more importantly, several fundamental biological pathways [46]. Among these, oxidative stress serves as a central mechanism, driving cellular damage, chronic inflammation, and pathological processes in both conditions [79]. To quantitatively assess an individual’s overall oxidative stress burden, the OBS was developed as an integrated measure of pro- and antioxidant exposures from diet and lifestyle [10, 11]. A higher OBS, indicating a more favorable antioxidant profile, has been linked to a reduced risk of incident cancer and mortality in various settings [12, 13]. However, despite its established role in individual diseases, the prognostic value of the OBS in patients with established CVD-cancer comorbidity remains unknown. This represents a critical knowledge gap, as the shared oxidative pathway may be particularly relevant in this high-risk, comorbid population.

To address this, we investigated the association between OBS and all-cause mortality in adults with CVD-cancer comorbidity using data from the NHANES. Furthermore, while traditional statistical models can establish association, they are often limited in handling complex, high-dimensional data for risk prediction. We therefore integrated a machine learning (ML) approach to complement our epidemiological analysis. The objectives of this dual analytical framework were twofold: first, to determine the mortality risk associated with OBS using conventional survival models, and second, to leverage ML techniques to build a robust comorbidity predictive model and identify the most influential oxidative stress-related factors, thereby providing deeper insights into the key drivers of risk in this comorbid population.

Materials and methods

Study design and population

Data from the National Institutes of Health (NIH) website (https://www.cdc.gov/nchs/nhanes/index.htm) were downloaded for the years 2007–2018, resulting in a total of 59,842 patient records after data aggregation, integration, and cleaning. Based on missing data, the following exclusions were made: (1) 13,791 patients were excluded due to missing data on demographic and covariates information; (2) 14,686 patients were excluded due to missing OBS documentation (3) 11,941 patients were further excluded due to missing data on confirmed cardiovascular disease, cancer diagnoses, and mortality status. Ultimately, a total of 19,424 patients were included in the study. The flowchart was shown in Fig. 1.

Fig. 1.

Fig. 1

Flow chart of the study participants

Assessment of OBS

The OBS was computed by integrating both pro-oxidant and antioxidant components derived from dietary and lifestyle factors. The dietary segment of the OBS comprised 16 nutrients: dietary fiber, carotene, riboflavin, niacin, vitamin B6, total folate, vitamin B12, vitamin C, vitamin E, calcium, magnesium, zinc, copper, selenium, total fat, and iron. Dietary data were collected through two 24-hour recall interviews administered at mobile examination centers, with a 3- to 10- day interval between sessions. The average daily intake of each nutrient was calculated as a weighted average from the two recalls, incorporating the day 2 dietary sample weights (WTDRD2) provided by NHANES to obtain nationally representative estimates. For nutrients with missing data in one of the recalls (e.g., due to non-response), the available value from a single recall was used, provided it met NHANES quality control criteria. Participants with missing data for any dietary component were excluded.

The lifestyle-related OBS components comprised physical activity, BMI, alcohol intake, and smoking status. Following conventional scoring criteria, pro-oxidant factors were defined as total fat, iron, BMI, alcohol consumption, and smoking; all other components were considered antioxidant factors.

Physical activity was assessed in accordance with NHANES guidelines using a metabolic equivalent (MET) score [14]. This score incorporated occupational (both vigorous and moderate intensity) and leisure-time physical activity (vigorous and moderate), as well as walking or bicycling for transportation. The MET score was calculated as: weekly frequency × duration per session × recommended MET value for each activity.

Smoking status was objectively evaluated based on serum cotinine levels, the primary metabolite of nicotine. Alcohol intake was classified into three categories according to standard OBS scoring methods: heavy drinkers (≥ 15 g/day for women; ≥30 g/day for men), non-heavy drinkers (0–15 g/day for women; 0–30 g/day for men), and non-drinkers, assigned scores of 0, 1, and 2, respectively.

All components were stratified by sex and grouped into tertiles. Antioxidant factors were scored from 0 to 2 (from the lowest to the highest tertile), whereas pro-oxidant factors were inversely scored from 2 to 0 (from the lowest to the highest tertile) [15]. A detailed summary of the OBS scoring protocol is provided in Supplementary Table S1. The names of variables for each component of the OBS were showed in Supplementary Table S2.

To evaluate the robustness of the scoring system, a sensitivity analysis was performed by creating an alternative OBS using sex-specific tertiles for all components. The results from both scoring methods were highly consistent, confirming that our primary findings are not sensitive to the specific scoring approach and demonstrate reliable robustness.

Assessment of cardiovascular Disease-Cancer comorbidity

CVD was defined by meeting any of the following criteria: (1) an average systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 85 mmHg across four measurements, (2) a self-reported physician diagnosis of congestive heart failure, coronary heart disease, angina, myocardial infarction, or stroke, or (3) a self-reported history of hypertension or current use of antihypertensive medication. Cancer status was confirmed by either a self-reported physician diagnosis, medical record documentation of cancer type, or use of antitumor drugs. Comorbidity was defined as the concurrent presence of both CVD and cancer in a participant [16].

Covariates

This study included the following covariates based on established methodologies: age, sex, marital status (married vs. unmarried), race/ethnicity (Mexican American; non-Hispanic White; non-Hispanic Black; other races), education level (< high school diploma vs. ≥ high school), poverty-to-income ratio (PIR), body mass index (BMI), hyperlipidemia history, and diabetes status. Poverty status is determined based on household income and the Poverty Index Ratio (PIR): poor (PIR < 1.0) and non-poor (PIR ≥ 1.0). Diabetes is defined as meeting any of the following criteria: (1) Fasting Plasma Glucose (FPG) ≥ 7 mmol/L; (2) Glycated Haemoglobin (HbA1c) ≥ 6.5%; (3) Clinically diagnosed history of diabetes; (4) Currently receiving glucose-lowering medication (including oral hypoglycemic agents and/or insulin therapy). Hyperlipidemia was defined as meeting any of the following criteria: (1) total cholesterol ≥ 200 mg/dL; (2) fasting triglycerides ≥ 150 mg/dL; (3) HDL cholesterol below sex-specific thresholds (females < 50 mg/dL, males < 40 mg/dL); (4) LDL cholesterol ≥ 130 mg/dL; or (5) clinician-diagnosed hyperlipidemia.

Assessment of mortality

We used the NHANES Public-Use Linked Mortality File, which employs probabilistic matching to National Death Index (NDI) records for death ascertainment. Cause-specific mortality classification in the NDI demonstrates high validity, with minimal misclassification risk based on established validation studies.

Machine learning model development and evaluation

To predict cardiovascular disease-cancer comorbidity, we developed and compared five machine learning algorithms—RPART, random forest, k-nearest neighbors, naïve Bayes, and LightGBM—using a dataset of 19,424 participants with 6% comorbidity prevalence. To address significant class imbalance, we implemented synthetic minority oversampling technique (SMOTE) during cross-validation and applied class weighting strategies. Model development incorporated ten repetitions of 10-fold cross-validation with hyperparameter optimization via Bayesian/grid search, evaluating performance through comprehensive metrics including AUC-ROC, precision-recall AUC, sensitivity, specificity, F1-score and Brier score. The optimal model was selected based on both discriminative ability and calibration performance, with model interpretability enhanced through SHAP analysis to quantify individual predictor contributions and ensure clinical relevance. A list of the variables’ names included in the SHAP plot was showed in Supplementary Table S3.

Statistical analysis

To account for the complex, multistage probability sampling design of the NHANES, multi-cycle merged sample weights were applied to all analyses to ensure nationally representative estimates of health statistics. Continuous variables were summarized as weighted means ± standard deviation (Mean ± SD), and between-group comparisons were performed using weighted analysis of variance (ANOVA). Categorical variables were reported as weighted frequencies and percentages, with intergroup differences assessed using weighted chi-square tests.

A weighted multivariate Cox proportional hazards model was used to assess the association between the OBS tertiles and all-cause mortality in patients with cardiovascular disease-cancer comorbidity. Survival differences across OBS tertiles were evaluated using Kaplan-Meier curves with stratified log-rank tests. We implemented a sequential adjustment approach in three nested models: Model 1 was unadjusted; Model 2 added demographic covariates (age, gender, race, marital status, education level, and poverty-income ratio); and Model 3 further incorporated clinical comorbidities (hypertension, hyperlipidemia, and diabetes). Stratified analyses were conducted to assess potential effect modification by demographic and clinical characteristics, including age, gender, race, education level, marital status, poverty-income ratio, body mass index, hypertension, hyperlipidemia, and diabetes. Formal interaction testing was performed by introducing product terms between OBS and each stratification variable. Sensitivity analyses were conducted to examine the robustness of the primary findings.

All analyses were conducted using R software (version 4.4.0), with a statistical significance threshold set at p < 0.05.

Results

Baseline characteristics of participants

The baseline characteristics of 19,424 included participants was shown in Table 1, including 1,215 patients with cardiovascular disease-cancer comorbidity. The majority of participants are married (63.51%) and predominantly non-Hispanic White (68.29%), with most having completed high school or higher education (87.40%). Participants were stratified into tertiles based on OBS levels. Notably, compared with those in the Q3 group of the OBS, participants in the Q1 group were more likely to have lower income and educational attainment, a higher body mass index, and to be current smokers. They also exhibited higher prevalence rates of cardiovascular disease–cancer comorbidity, diabetes, and hyperlipidemia.

Table 1.

Baseline characteristics of all participants

Characteristic Overall
(N = 19,424)
Q1
(N = 7,562)
Q2
(N = 5,992)
Q3
(N = 5,870)
p-value
Gender, n(%) < 0.001
 Man 9,930 (50.78%) 2,822 (34.57%) 3,149 (51.14%) 3,959 (67.06%)
 Female 9,494 (49.22%) 4,740 (65.43%) 2,843 (48.86%) 1,911 (32.94%)
Age, n(%) 0.002
 20–40 7,445 (41.35%) 2,767 (39.68%) 2,284 (40.25%) 2,394 (44.07%)
 41–60 6,695 (37.18%) 2,530 (36.87%) 2,086 (37.99%) 2,079 (36.76%)
 > 60 5,284 (21.47%) 2,265 (23.45%) 1,622 (21.76%) 1,397 (19.17%)
Race, n(%) < 0.001
 Mexican American 2,793 (8.38%) 1,003 (8.17%) 864 (8.56%) 926 (8.43%)
 Non-Hispanic Black 3,881 (10.16%) 2,003 (14.99%) 1,078 (9.11%) 800 (6.16%)
 Non-Hispanic White 8,485 (68.29%) 3,019 (64.22%) 2,696 (68.90%) 2,770 (71.90%)
 Other 4,265 (13.17%) 1,537 (12.61%) 1,354 (13.42%) 1,374 (13.52%)
Education level, n(%) < 0.001
 Below high school 3,894 (12.60%) 1,886 (16.90%) 1,099 (11.59%) 909 (9.11%)
 High School or above 15,530 (87.40%) 5,676 (83.10%) 4,893 (88.41%) 4,961 (90.89%)
PIR, n(%) < 0.001
 Not poor 15,919 (87.48%) 5,865 (82.97%) 5,016 (89.18%) 5,038 (90.55%)
 Poor 3,505 (12.52%) 1,697 (17.03%) 976 (10.82%) 832 (9.45%)
Marital status, n(%) < 0.001
 Married 11,794 (63.51%) 4,122 (57.39%) 3,804 (66.56%) 3,868 (66.97%)
 Unmarried 7,630 (36.49%) 3,440 (42.61%) 2,188 (33.44%) 2,002 (33.03%)
Hypertension, n(%) < 0.001
 No 10,259 (58.62%) 3,661 (55.10%) 3,159 (57.29%) 3,439 (63.46%)
 Yes 9,165 (41.38%) 3,901 (44.90%) 2,833 (42.71%) 2,431 (36.54%)
Hyperlipidemia, n(%) < 0.001
 No 6,616 (33.79%) 2,370 (30.51%) 1,991 (32.87%) 2,255 (38.01%)
 Yes 12,808 (66.21%) 5,192 (69.49%) 4,001 (67.13%) 3,615 (61.99%)
Diabetes, n(%) < 0.001
 No 16,466 (89.36%) 6,167 (86.92%) 5,132 (89.26%) 5,167 (91.94%)
 Yes 2,958 (10.64%) 1,395 (13.08%) 860 (10.74%) 703 (8.06%)
CC, n(%) 0.050
 No 18,209 (93.83%) 7,054 (93.29%) 5,635 (93.51%) 5,520 (94.67%)
 Yes 1,215 (6.17%) 508 (6.71%) 357 (6.49%) 350 (5.33%)
CVD, n(%) < 0.001
 No 10,098 (57.87%) 3,592 (54.36%) 3,104 (56.43%) 3,402 (62.79%)
 Yes 9,326 (42.13%) 3,970 (45.64%) 2,888 (43.57%) 2,468 (37.21%)
Cancer, n(%) 0.331
 No 17,683 (90.17%) 6,859 (89.54%) 5,478 (90.26%) 5,346 (90.73%)
 Yes 1,741 (9.83%) 703 (10.46%) 514 (9.74%) 524 (9.27%)
BMI (Kg/m2 ) 28.75 ± 6.58 29.87 ± 6.98 28.77 ± 6.66 27.58 ± 5.85 < 0.001
Fiber (g/d) 17.71 ± 9.31 10.93 ± 4.54 16.96 ± 6.00 25.34 ± 9.76 < 0.001
Calcium (mg/d) 989.85 ± 507.02 654.15 ± 273.62 947.97 ± 359.14 1,372.55 ± 545.94 < 0.001
Zinc (mg/d) 11.72 ± 6.35 7.60 ± 3.26 11.28 ± 4.52 16.34 ± 7.10 < 0.001
Copper (mg/d) 1.31 ± 0.76 0.84 ± 0.32 1.26 ± 0.53 1.84 ± 0.92 < 0.001
Selenium (mcg/d) 116.47 ± 53.49 81.33 ± 28.93 114.22 ± 38.21 154.57 ± 59.17 < 0.001
Magnesium (mg/d) 312.01 ± 135.30 201.66 ± 59.57 301.90 ± 70.68 434.46 ± 134.54 < 0.001
Vitamin C (mg/d) 83.35 ± 75.47 48.87 ± 47.73 79.46 ± 64.60 122.28 ± 88.62 < 0.001
Vitamin E (mg/d) 8.93 ± 5.80 5.36 ± 2.48 8.46 ± 3.86 13.01 ± 7.00 < 0.001
Vitamin B12 (mcg/d) 5.24 ± 4.79 3.10 ± 2.20 5.07 ± 4.10 7.60 ± 6.07 < 0.001
Vitamin B6 (mg/d) 2.20 ± 1.44 1.35 ± 0.72 2.15 ± 1.32 3.13 ± 1.53 < 0.001
Carotene (RE/d) 2,397.00 ± 3,899.74 1,244.90 ± 1,861.41 2,343.17 ± 2,985.77 3,627.74 ± 5,491.79 < 0.001
Riboflavin (mg/d) 2.23 ± 1.33 1.44 ± 0.76 2.17 ± 1.05 3.09 ± 1.50 < 0.001
Niacin (mg/d) 26.57 ± 13.43 17.75 ± 6.86 26.21 ± 11.13 35.95 ± 14.20 < 0.001
Total folate (mcg/d) 415.51 ± 222.43 259.94 ± 97.74 393.10 ± 135.39 595.62 ± 247.64 < 0.001
Total fat (g/d) 82.27 ± 38.09 59.70 ± 23.92 81.82 ± 29.04 105.82 ± 42.81 < 0.001
Iron (mg/d) 15.18 ± 7.75 10.04 ± 3.92 14.48 ± 4.76 21.10 ± 8.82 < 0.001
Cotinine (ng/ml) 52.77 ± 123.16 74.09 ± 140.24 50.05 ± 123.44 33.39 ± 98.23 < 0.001
Alcohol (g/d) 10.01 ± 22.20 7.91 ± 18.53 11.46 ± 25.97 10.83 ± 21.72 < 0.001
Physical activity (minute/week) 5,087.68 ± 12,095.32 4,665.17 ± 12,471.26 4,903.03 ± 12,998.46 5,690.61 ± 10,745.40 < 0.001

Continuous variables were presented as mean ± SD; Categorical variables were presented as n (%); SD, Standard deviation; CC: cardiovascular disease and cancer comorbidity; CVD: cardiovascular disease; PA: physical activity; BMI: body mass index; PIR: Poverty-Income Ratio

Association between OBS with all-cause mortality of CVD-cancer population

Multivariable Cox regression analyses were performed to evaluate the association between the OBS and all-cause mortality (Fig. 2). When treated as a continuous variable in the fully adjusted model (Model 3), each one-unit increase in the OBS was associated with a 5% reduction in the risk of all-cause mortality (HR = 0.95, 95% CI: 0.91–0.98, p = 0.006). Similarly, when analyzed as tertiles, participants in the highest tertile (Q3) exhibited a 22% significantly lower risk of mortality compared to those in the lowest tertile (Q1) (HR = 0.78, 95% CI: 0.64–0.95, p = 0.016).

Fig. 2.

Fig. 2

Association between OBS and CVD-cancer comorbidity. Mode1: Unadjusted model. Model 2: Adjusted for age, sex, race, PIR, marital status and education levels. Model 3: Additional adjusted diabetes, hyperlipidemia and hypertension

Survival patterns of CVD-cancer populations in different tertiles of OBS score

Kaplan-Meier curves demonstrated a significant association between OBS tertiles and clinical outcomes. As depicted in Fig. 3A, individuals in the highest OBS tertile (Q3) experienced superior overall survival compared to those in the lowest tertile (Q1) (p = 0.005). In a complementary fashion, analysis of the composite cardiovascular-cancer endpoint showed a significantly elevated cumulative incidence in the Q1 group relative to the Q3 group (p < 0.005, Fig. 3B).

Fig. 3.

Fig. 3

(A) Kaplan-Meier curve of OBS level and the incidence of CVD-cancer comorbidity. (B) Cumulative incidence curves of OBS levels and the incidence of CVD-cancer comorbidity

RCS curve regression results

Using restricted cubic splines (Fig. 4), we assessed the dose-response relationship between OBS and all-cause mortality in patients with cardiovascular disease-cancer comorbidity. The analysis found no evidence of a nonlinear relationship. (p for nonlinear = 0.799).

Fig. 4.

Fig. 4

Restricted cubic spline plot of the association between OBS and CVD-cancer comorbidity. (A) The association between OBS and CVD-cancer comorbidity. (B) The analysis was adjusted for gender, age, race, marital status, education levels, PIR and BMI. (C) The analysis was adjusted for gender, age, marital status, education levels, PIR, diabetes, hyperlipidemia and hypertension

Subgroup analysis

To evaluate the consistency of the association between the OBS and mortality in patients with CVD-cancer comorbidity across the general population, we performed subgroup analyses and interaction tests based on age, gender, race, marital status, education, income, diabetes, hyperlipidemia, and hypertension. In subgroup analyses, the association between OBS and all-cause mortality was generally consistent across all predefined subgroups, with no statistically significant interaction detected (all p for interaction > 0.05) (Fig. 5).

Fig. 5.

Fig. 5

Subgroup analysis of the associations between OBS and CVD-cancer comorbidity. The analysis was adjusted for gender, age, race, marital, education, PIR, diabetes, hyperlipidemia and hypertension

Sensitivity analysis

The significant inverse association between the OBS and all-cause mortality proved robust in sensitivity analyses that excluded early deaths. Further analysis sequentially omitting each OBS component confirmed that the overall association was not dependent on any single element, with most components demonstrating significant individual protective effects (Supplementary Table S4).

Machine learning results

To comprehensively evaluate the performance of the machine learning models, we applied a multidimensional assessment framework (Table 2). The random forest (RF) model exhibited strong performance across key metrics, achieving the lowest classification error rate (0.091), the highest area under the receiver operating characteristic curve (AUC = 0.825) (Fig. 6A), as well as the highest accuracy (0.908) and specificity (0.959). These results indicate excellent discriminative ability for negative cases and high overall predictive reliability. The LightGBM model also performed competitively, attaining the highest F1‑score (0.313) and the largest area under the precision‑recall curve (0.241), reflecting its balanced performance in identifying positive cases and handling class imbalance.

Table 2.

Evaluation metrics of the models constructed by each method

Machine learner Classification
error rate
ACC AUC F1 Score Sensitivity Specificity Brier Score Area under the
PR curve
RPART 0.169 0.831 0.819 0.313 0.618 0.845 0.117 0.194
RF 0.091 0.908 0.825 0.160 0.140 0.959 0.071 0.182
K–KNN 0.300 0.699 0.769 0.225 0.700 0.701 0.196 0.154
NB 0.332 0.668 0.741 0.217 0.739 0.663 0.250 0.162
LightGBM 0.169 0.830 0.820 0.313 0.621 0.844 0.103 0.241

AUC: area under the curve; ACC: accuracy; RPART: recursive partitioning and regression trees; RF: random forest, K–KNN: Kernel k-nearest neighbors, NB: naïve bayes; LightGBM: light gradient boosting machine

Fig. 6.

Fig. 6

(A) Comparison of receiver operating characteristic curves with five machine learning models in predicting CVD-cancer comorbidity; (B)Calibration plots for machine learning models predicting cardiovascular and cancer comorbidity. (C) Decision curve analysis comparing machine learning models for predicting cardiovascular and cancer comorbidity

Both ensemble methods showed relatively good calibration, as evidenced by low Brier scores (RF: 0.071; LightGBM: 0.103). The calibration curve further confirmed good calibration performance for the RF model (Fig. 6B). Moreover, decision‑curve analysis (DCA) demonstrated that the RF and LightGBM models provided higher net benefits across the entire threshold probability range compared with the “intervene‑all” or “intervene‑none” strategies (Fig. 6C).

Feature importance and the role of features in the model

Supplementary Figure S1 illustrates the correlation analysis among the variables included in the model. SHAP analysis was used to interpret the model, with Fig. 7 displaying the top 15 features and their contributions to prediction outcomes. Each point in the figure represents an individual sample, and its horizontal position reflects the SHAP value of the corresponding feature for the sample (i.e., the degree of influence). The analysis results show that age is an important contributing factor to the prediction results. Among the lifestyle factors, smoking status is also a significant feature. The model analysis indicates that the intake of dietary niacin and selenium is associated with a reduced individual risk. In addition, we plotted scatter plots of SHAP values against each variable (including antioxidant and pro-oxidant factors) to assess the correlation (Supplementary Figure S2). The results show that age, fat intake, copper level, cotinine concentration, vitamin E level, and vitamin C level are positively correlated with their corresponding SHAP values.

Fig. 7.

Fig. 7

Contribution of variables to CVD-cancer comorbidity incidence prediction using SHAP values (A) The heat plot of SHAP values illustrates the relationships between variables and CVD-cancer comorbidity. (B) The bar plot shows each variable’s contribution to CVD-cancer comorbidity, with bar length indicating the contribution extent. (C). SHAP force plot

Discussion

To our knowledge, this is among the first studies to comprehensively evaluate the association between the OBS and survival outcomes among individuals with coexisting CVD and cancer, utilizing a nationally representative sample from six NHANES cycles (2007–2018). Our analysis demonstrated a significant inverse association between a higher OBS and lower all-cause mortality risk in this comorbidity population. This association remained consistent across multiple subgroup analyses.

CVD and cancer are leading causes of death worldwide [17]. Evidence shows that patients with cardiovascular diseases and heart failure (HF) have an increased incidence of cancer [18, 19]. Cancer patients, especially long-term survivors, exhibit higher susceptibility to cardiovascular diseases, which may be related to cardiotoxic treatments, shared risk factors, or common pathological biological mechanisms [20]. Oxidative stress is a key pathogenic link between the two diseases [21]. In cardiovascular diseases, it disrupts redox homeostasis—that is, the balance between reactive oxygen species (ROS) and antioxidant defenses—leading to biomolecular oxidative damage and cellular dysfunction [22]. Similarly, cancer cells also show elevated ROS levels and redox imbalance [22, 23]. An inflammatory diet may promote oxidative stress and systemic inflammation, thereby increasing oxidative stress levels in the body. Therefore, maintaining low systemic inflammation and reducing oxidative stress are crucial for lowering the risk of cardiovascular disease-cancer comorbidity-related mortality [24, 25].

The OBS provides a robust composite measure of systemic oxidative status by integrating multiple pro-oxidant and antioxidant factors. Unlike isolated biomarkers that capture only transient oxidative changes [26], OBS reflects the cumulative and interactive effects of dietary and lifestyle exposures, thereby enabling a more comprehensive assessment of redox balance.

The adoption of IoT (Internet of Things) devices [27], such as wearable sensors and implantable monitors, is transforming the management of cardiovascular disease and cancer by enabling continuous, real-time tracking of vital signs and biomarkers. This capability supports the early detection of clinical deterioration and enhanced monitoring of treatment-related side effects, promoting a shift toward personalized and proactive healthcare. Building upon these technological advances, ML algorithms have become a cornerstone of clinical decision-making, providing critical insights to enhance diagnostic and prognostic accuracy [2830]. A key advantage of ML models is their capacity to integrate multifaceted input variables, thereby enabling a more comprehensive assessment of risk profiles. Unlike traditional statistical methods, which often identify only a limited set of independent predictors, data-driven ML techniques can uncover complex, synergistic relationships among multiple disease-influencing factors. This capability offers a broader and more in-depth panoramic view of risk composition [31, 32].

Recent studies have demonstrated the utility of ML-based predictive models in identifying nutrient-associated CVD risk profiles and evaluating the impact of lifestyle behaviors on cardiovascular and all-cause mortality [33, 34]. In the present study, we employed RPART, RF, K-NN, NB, and LightGBM algorithms to construct predictive models for CVD-cancer comorbidity. Among these, the RF model significantly outperformed centrally trained single models and other candidates across key metrics, including Discriminative Ability and Calibrative Ability, achieving a notable AUC of 0.825. These findings underscore the benefit of incorporating the OBS for predicting CVD-cancer comorbidity, as it provides a more holistic representation of an individual’s health status.

To elucidate the predictive mechanism of the optimal model and identify the most influential risk variables, we performed SHAP analysis to quantify the contribution of each variable to the combined risk of cardiovascular disease and cancer. While the analysis confirmed age as the most significant predictor—with SHAP values increasing markedly with advancing age, indicating a strong positive association with adverse outcomes—this finding was anticipated and aligns with well-established biological mechanisms. These include NLRP3 inflammasome activation [35], oxidative stress induced by mitochondrial dysfunction, immune dysregulation [36], and persistent low-grade inflammation, all of which are promoted by aging and contribute to both cardiovascular and oncologic pathogenesis. Notably, beyond this expected association, the SHAP analysis provided deeper mechanistic insights by revealing specific, non-linear risk transitions across the age continuum and highlighting interactive effects between age and other biomarkers, such as those reflected in the OBS. This underscores the role of oxidative stress as a central driver interlinking aging-related pathophysiology and carcinogenesis, whether as a cause or consequence of mitochondrial impairment.

Niacin, a B-group vitamin, contributes to metabolic regulation and vascular protection. Evidence from experimental models of atherosclerosis suggests that niacin can suppress the expression of inflammatory cytokines, inhibit NF-κB pathway activation, and attenuate apoptosis in vascular smooth muscle cells [37]. Furthermore, epidemiological studies conducted in other populations indicate that higher dietary niacin intake is associated with reduced all-cause and cardiovascular mortality, effects potentially mediated through its antioxidant, anti-inflammatory, and anti-apoptotic properties [38, 39]. This external evidence provides a plausible mechanistic framework for, but does not directly prove, the predictive role of niacin observed in our cohort.

Similarly, the predictive importance of selenium in our model is consistent with its well-characterized biological functions. Selenium exerts its antioxidant effects primarily through its incorporation into selenoproteins, such as glutathione peroxidase [40], which is critical for regulating reactive oxygen and nitrogen species and maintaining redox homeostasis. Epidemiological observations have independently linked selenium deficiency to an elevated risk of neurodegenerative disorders, cardiovascular diseases, and certain cancers [41, 42]. Thus, the association captured by our model aligns with this pre-existing biological and epidemiological knowledge [43], reinforcing the plausibility of selenium’s role while highlighting the need for further targeted research to confirm a causal relationship in the context of comorbidity.

Several important limitations of this study should be acknowledged. First, while the overall NHANES sample is substantial, the number of participants with specific CVD-cancer comorbidity was relatively limited, which may restrict the statistical power for detecting more subtle associations and increase the risk of model overfitting, despite our use of cross-validation techniques. Second, the cross-sectional design of the NHANES database precludes the establishment of temporal sequence, limiting causal inference between the OBS and the comorbidity outcome. The absence of longitudinal data also restricts our ability to observe the long-term dynamics of oxidative balance and its cumulative health impacts. Third, although we adjusted for numerous covariates, residual confounding remains possible due to unmeasured or imperfectly captured factors, such as detailed socioeconomic indicators, environmental exposures, and specific health behaviors. Furthermore, key clinical variables, including specific medication use, treatment regimens, and therapy adherence, were not available and may introduce bias into risk estimations. Fourth, the OBS calculation was constrained by available data and did not encompass all known dietary and lifestyle components related to oxidative stress. Fifth, the primary reliance on self-reported diagnoses for CVD and cancer, without systematic verification against clinical records, may lead to misclassification bias. Finally, our findings are derived from a U.S. multi-ethnic population, and their generalizability to other populations with differing genetic backgrounds, dietary patterns, and healthcare contexts requires external validation.

Conclusion

This study reveals a perceptible association between the OBS and the risk of all-cause mortality with cardiovascular disease-cancer comorbidity. The findings underscore the potential importance of an antioxidant-rich diet and lifestyle in this high-risk population. However, it is crucial to recognize that factors such as socioeconomic status, underlying health conditions, or unmeasured variables may have influenced the observed outcomes. Although assessing oxidative balance via OBS shows promise, further research—particularly prospective studies and external validation—is essential to confirm the association and explore its potential clinical applicability.

Supplementary Information

Supplementary Material 1. (10.8KB, docx)
Supplementary Material 4. (14.3KB, docx)

Acknowledgements

The authors sincerely thank the NHANES team for providing valuable survey data. Thanks to all who participated in this study.

Abbreviations

OBS

Oxidative Balance Score

CVD

Cardiovascular disease

NHANES

National Health and Nutrition Examination Survey

RCS

Restricted cubic spline

BMI

Body mass index

RIP

Family income to poverty

SD

Standard deviation

ML

Machine Learning

AUC

Area under the curve

ACC

Accuracy

ROC

Receiver Operating Characteristic

RPART

Recursive partitioning and regression trees

RF

Random forest

K–KNN

Kernel k-nearest neighbors

NB

Naïve bayes

LightGBM

Light gradient boosting machine

SHAP

SHapley Additive exPlanations

DCA

Decision curve analysis

ROS

Reactive oxygen species

IoT

Internet of Things

Author contributions

FL designed research; JW conducted research; ASW analyzed data; CYJ and LSP wrote the paper. YYY and YZL had primary responsibility for final content. All authors read and approved the final manuscript.

Funding

The study was supported by the General Project of Health and Medical Research in Hunan Province (No. 20254420) and Changsha Natural Science Foundation (No. kq2502274).

Data availability

The National Health and Nutrition Examination Survey data set is publicly available at the National Center for Health Statistics of the Centers for Disease Control and Prevention.

Declarations

Ethics approval and consent to participate

NHANES is conducted by the Centers for Disease Control and Prevention (CDC) and the National Center for Health Statistics (NCHS). The NCHS Research Ethics Review Board revi, ewed and approved the NHANES study protocol. All participants signed a written informed consent form.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Fen Liu and Jian Wang contributed equally to this work and should be considered co-first authors.

Contributor Information

Zheng-Yu Liu, Email: liuzhengyu@hunnu.edu.cn.

Ya-Yu You, Email: youyayu2333@hunnu.edu.cn.

References

  • 1.de Boer RA, Meijers WC, van der Meer P, van Veldhuisen DJ. Cancer and heart disease: associations and relations. Eur J Heart Fail. 2019;21:1515–25. 10.1002/ejhf.1539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Raisi-Estabragh Z, Manisty CH, Cheng RK, Lopez Fernandez T, Mamas MA. Burden and prognostic impact of cardiovascular disease in patients with cancer. Heart. 2023;109:1819–26. 10.1136/heartjnl-2022-321324. [DOI] [PubMed] [Google Scholar]
  • 3.Aboumsallem JP, Moslehi J, de Boer RA. Reverse cardio-oncology: cancer development in patients with cardiovascular disease. J Am Heart Assoc. 2020;9:e013754. 10.1161/jaha.119.013754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Libby P, Kobold S. Inflammation: a common contributor to cancer, aging, and cardiovascular diseases-expanding the concept of cardio-oncology. Cardiovasc Res. 2019;115:824–9. 10.1093/cvr/cvz058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hasin T, Gerber Y, Weston SA, Jiang R, Killian JM, Manemann SM, et al. Heart failure after myocardial infarction is associated with increased risk of cancer. J Am Coll Cardiol. 2016;68:265–71. 10.1016/j.jacc.2016.04.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Masoudkabir F, Mohammadifard N, Mani A, Ignaszewski A, Davis MK, Vaseghi G, et al. Shared lifestyle-related risk factors of cardiovascular disease and cancer: evidence for joint prevention. Sci World J. 2023;2023:2404806. 10.1155/2023/2404806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Di Fusco SA, Cianfrocca C, Bisceglia I, Spinelli A, Alonzo A, Mocini E, et al. Potential pathophysiologic mechanisms underlying the inherent risk of cancer in patients with atherosclerotic cardiovascular disease. Int J Cardiol. 2022;363:190–5. 10.1016/j.ijcard.2022.06.048. [DOI] [PubMed] [Google Scholar]
  • 8.Barrera G. Oxidative stress and lipid peroxidation products in cancer progression and therapy. ISRN Oncol. 2012;2012:137289. 10.5402/2012/137289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Incalza MA, D’Oria R, Natalicchio A, Perrini S, Laviola L, Giorgino F. Oxidative stress and reactive oxygen species in endothelial dysfunction associated with cardiovascular and metabolic diseases. Vascul Pharmacol. 2018;100:1–19. 10.1016/j.vph.2017.05.005. [DOI] [PubMed] [Google Scholar]
  • 10.Hernández-Ruiz Á, García-Villanova B, Guerra-Hernández EJ, Carrión-García CJ, Amiano P, Sánchez MJ, et al. Oxidative balance scores (OBSs) integrating nutrient, food and lifestyle dimensions: development of the NutrientL-OBS and FoodL-OBS. Antioxidants. 2022. 10.3390/antiox11020300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Goodman M, Bostick RM, Dash C, Flanders WD, Mandel JS. Hypothesis: oxidative stress score as a combined measure of pro-oxidant and antioxidant exposures. Ann Epidemiol. 2007;17:394–9. 10.1016/j.annepidem.2007.01.034. [DOI] [PubMed] [Google Scholar]
  • 12.Hasani M, Alinia SP, Khazdouz M, Sobhani S, Mardi P, Ejtahed HS, et al. Oxidative balance score and risk of cancer: a systematic review and meta-analysis of observational studies. BMC Cancer. 2023;23:1143. 10.1186/s12885-023-11657-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Xu Z, Liu D, Zhai Y, Tang Y, Jiang L, Li L, et al. Association between the oxidative balance score and all-cause and cardiovascular mortality in patients with diabetes and prediabetes. Redox Biol. 2024;76:103327. 10.1016/j.redox.2024.103327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lei X, Xu Z, Chen W. Association of oxidative balance score with sleep quality: NHANES 2007–2014. J Affect Disord. 2023;339:435–42. 10.1016/j.jad.2023.07.040. [DOI] [PubMed] [Google Scholar]
  • 15.Zhang W, Peng S-F, Chen L, Chen H-M, Cheng X-E, Tang Y-H. Association between the oxidative balance score and telomere length from the national health and nutrition examination survey 1999–2002. Oxid Med Cell Longev. 2022;2022:1345071. 10.1155/2022/1345071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Qi X, Wang S, Fang C, Jia J, Lin L, Yuan T. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 2025;79:103470. 10.1016/j.redox.2024.103470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wilcox NS, Amit U, Reibel JB, Berlin E, Howell K, Ky B. Cardiovascular disease and cancer: shared risk factors and mechanisms. Nat Rev Cardiol. 2024;21:617–31. 10.1038/s41569-024-01017-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ferlay J, Colombet M, Soerjomataram I, Dyba T, Randi G, Bettio M, et al. Cancer incidence and mortality patterns in Europe: estimates for 40 countries and 25 major cancers in 2018. European journal of cancer (Oxford, England : 1990). 2018;103:356–87. 10.1016/j.ejca.2018.07.005. [DOI] [PubMed] [Google Scholar]
  • 19.Crespo-Leiro MG, Anker SD, Maggioni AP, Coats AJ, Filippatos G, Ruschitzka F, et al. European society of cardiology heart failure long-term registry (ESC-HF-LT): 1-year follow-up outcomes and differences across regions. Eur J Heart Fail. 2016;18:613–25. 10.1002/ejhf.566. [DOI] [PubMed] [Google Scholar]
  • 20.Koene RJ, Prizment AE, Blaes A, Konety SH. Shared risk factors in cardiovascular disease and cancer. Circulation. 2016;133:1104–14. 10.1161/circulationaha.115.020406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bertero E, Canepa M, Maack C, Ameri P. Linking heart failure to cancer: background evidence and research perspectives. Circulation. 2018;138:735–42. 10.1161/circulationaha.118.033603. [DOI] [PubMed] [Google Scholar]
  • 22.Valko M, Leibfritz D, Moncol J, Cronin MT, Mazur M, Telser J. Free radicals and antioxidants in normal physiological functions and human disease. Int J Biochem Cell Biol. 2007;39:44–84. 10.1016/j.biocel.2006.07.001. [DOI] [PubMed] [Google Scholar]
  • 23.Valko M, Rhodes CJ, Moncol J, Izakovic M, Mazur M. Free radicals, metals and antioxidants in oxidative stress-induced cancer. Chem Biol Interact. 2006;160:1–40. 10.1016/j.cbi.2005.12.009. [DOI] [PubMed] [Google Scholar]
  • 24.Saeidnia S, Abdollahi M. Antioxidants: friends or foe in prevention or treatment of cancer: the debate of the century. Toxicol Appl Pharmcol. 2013;271:49–63. 10.1016/j.taap.2013.05.004. [DOI] [PubMed] [Google Scholar]
  • 25.Glasauer A, Chandel NS. Targeting antioxidants for cancer therapy. Biochem Pharmacol. 2014;92:90–101. 10.1016/j.bcp.2014.07.017. [DOI] [PubMed] [Google Scholar]
  • 26.Vasbinder A, Cheng RK, Heckbert SR, Thompson H, Zaslavksy O, Chlebowski RT, et al. Chronic oxidative stress as a marker of long-term radiation-induced cardiovascular outcomes in breast cancer. J Cardiovasc Transl Res. 2023;16:403–13. 10.1007/s12265-022-10320-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mulita F, Verras G-I, Anagnostopoulos C-N, Kotis K. A smarter health through the internet of surgical things. Sens (Basel Switz). 2022;22:4577. 10.3390/s22124577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhou H, Tang J, Zheng H. Machine learning for medical applications. Sci World J. 2015;2015:825267. 10.1155/2015/825267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–30. 10.1161/circulationaha.115.001593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine learning methods in drug discovery. Molecules. 2020. 10.3390/molecules25225277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ley C, Martin RK, Pareek A, Groll A, Seil R, Tischer T. Machine learning and conventional statistics: making sense of the differences. Knee Surg Sports Traumatol Arthrosc. 2022;30:753–7. 10.1007/s00167-022-06896-6. [DOI] [PubMed] [Google Scholar]
  • 32.Charilaou P, Battat R. Machine learning models and over-fitting considerations. World J Gastroenterol. 2022;28:605–7. 10.3748/wjg.v28.i5.605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Guo X, Ma M, Zhao L, Wu J, Lin Y, Fei F, et al. The association of lifestyle with cardiovascular and all-cause mortality based on machine learning: a prospective study from the NHANES. BMC Public Health. 2025;25:319. 10.1186/s12889-025-21339-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Morgenstern JD, Rosella LC, Costa AP, Anderson LN. Development of machine learning prediction models to explore nutrients predictive of cardiovascular disease using Canadian linked population-based data. Appl Physiol Nutr Metab. 2022;47:529–46. 10.1139/apnm-2021-0502. [DOI] [PubMed] [Google Scholar]
  • 35.Xu Z, Li D, Qu W, Yin Y, Qiao S, Zhu Y, et al. Card9 protects sepsis by regulating Ripk2-mediated activation of NLRP3 inflammasome in macrophages. Cell Death Dis. 2022;13:502. 10.1038/s41419-022-04938-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Shaito A, Aramouni K, Assaf R, Parenti A, Orekhov A, Yazbi AE, et al. Oxidative stress-induced endothelial dysfunction in cardiovascular diseases. Front Biosci Landmark Ed. 2022;27:105. 10.31083/j.fbl2703105. [DOI] [PubMed] [Google Scholar]
  • 37.Ganji SH, Kashyap ML, Kamanna VS. Niacin inhibits fat accumulation, oxidative stress, and inflammatory cytokine IL-8 in cultured hepatocytes: impact on non-alcoholic fatty liver disease. Metab Clin Exp. 2015;64:982–90. 10.1016/j.metabol.2015.05.002. [DOI] [PubMed] [Google Scholar]
  • 38.Yang R, Zhu M, Fan S, Zhang J. Niacin intake and mortality (total and cardiovascular disease) in patients with cardiovascular disease: insights from NHANES 2003–2018. Nutr J. 2024;23:123. 10.1186/s12937-024-01027-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Su G, Sun G, Liu H, Shu L, Zhang J, Guo L, et al. Niacin suppresses progression of atherosclerosis by inhibiting vascular inflammation and apoptosis of vascular smooth muscle cells. Med Sci Monit. 2015;21:4081. 10.12659/msm.895547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rusetskaya NY, Fedotov IV, Koftina VA, Borodulin VB. [selenium compounds in redox regulation of inflammation and apoptosis]. Biomed Khim. 2019;65:165. 10.18097/pbmc20196503165. [DOI] [PubMed] [Google Scholar]
  • 41.Wen Y, Zhang L, Li S, Wang T, Jiang K, Zhao L, et al. Effect of dietary selenium intake on CVD: a retrospective cohort study based on China health and nutrition survey (CHNS) data. Public Health Nutr. 2024;27:e122. 10.1017/S1368980024000703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Arques S. Serum albumin and cardiovascular disease: state-of-the-art review. Ann Cardiol Angeiol (paris). 2020;69:192–200. 10.1016/j.ancard.2020.07.012. [DOI] [PubMed] [Google Scholar]
  • 43.Kuria A, Tian H, Li M, Wang Y, Aaseth JO, Zang J, et al. Selenium status in the body and cardiovascular disease: a systematic review and meta-analysis. Crit Rev Food Sci Nutr. 2021;61:3616. 10.1080/10408398.2020.1803200. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1. (10.8KB, docx)
Supplementary Material 4. (14.3KB, docx)

Data Availability Statement

The National Health and Nutrition Examination Survey data set is publicly available at the National Center for Health Statistics of the Centers for Disease Control and Prevention.


Articles from Journal of Health, Population, and Nutrition are provided here courtesy of BMC

RESOURCES