Abstract
Purpose
The Oxidative Balance Score (OBS) is a composite measure of systemic oxidative stress. This study aims to evaluate the impact of OBS on all-cause mortality in patients with cardiovascular disease–cancer comorbidity and to use machine learning to identify related factors.
Methods
We analyzed data from the 2007–2018 US National Health and Nutrition Examination Survey (NHANES). Cox regression, Kaplan-Meier analysis, restricted cubic splines (RCS), and subgroup analysis were used to explore the association between OBS and CVD-cancer comorbidity. Five machine learning models were constructed and compared to identify the optimal CVD-cancer comorbidity risk prediction model, and feature importance was assessed.
Results
Among the study participants, compared to participants in the lowest tertile of the OBS score, those in the highest tertile exhibited a lower risk of all-cause mortality (HR = 0.78, 95% CI: 0.64–0.95, p = 0.016). RCS showed that OBS had a no nonlinear evidences with CVD-cancer comorbidity. In subgroup analyses, the association remained consistent across all subgroups, with no statistically significant interaction observed (all P for interaction > 0.05). The random forest algorithm was identified as the optimal predictive model through machine learning evaluation. Decision curve analysis (DCA) and calibration curves further supported the internal validity of the model. SHAP analysis revealed that age, smoking intensity, niacin intake, and selenium levels were the most influential predictive factors.
Conclusions
This study demonstrates a significant inverse association between higher OBS and all-cause mortality in patients with cardiovascular disease–cancer comorbidity and provides an interpretable machine learning model to predict this comorbidity.
Supplementary Information
The online version contains supplementary material available at 10.1186/s41043-025-01213-6.
Keywords: Cardiovascular disease, Cancer, Epidemiology, Machine learning, Oxidative balance score
Introduction
The converging trends of population aging, improved cancer survival, and the high prevalence of cardiometabolic risk factors have led to a growing population of individuals living with both CVD and cancer [1]. This comorbidity presents a major clinical challenge and contributes significantly to mortality and healthcare burden worldwide [2, 3].
While traditionally viewed as separate disease entities, CVD and cancer share common risk factors and, more importantly, several fundamental biological pathways [4–6]. Among these, oxidative stress serves as a central mechanism, driving cellular damage, chronic inflammation, and pathological processes in both conditions [7–9]. To quantitatively assess an individual’s overall oxidative stress burden, the OBS was developed as an integrated measure of pro- and antioxidant exposures from diet and lifestyle [10, 11]. A higher OBS, indicating a more favorable antioxidant profile, has been linked to a reduced risk of incident cancer and mortality in various settings [12, 13]. However, despite its established role in individual diseases, the prognostic value of the OBS in patients with established CVD-cancer comorbidity remains unknown. This represents a critical knowledge gap, as the shared oxidative pathway may be particularly relevant in this high-risk, comorbid population.
To address this, we investigated the association between OBS and all-cause mortality in adults with CVD-cancer comorbidity using data from the NHANES. Furthermore, while traditional statistical models can establish association, they are often limited in handling complex, high-dimensional data for risk prediction. We therefore integrated a machine learning (ML) approach to complement our epidemiological analysis. The objectives of this dual analytical framework were twofold: first, to determine the mortality risk associated with OBS using conventional survival models, and second, to leverage ML techniques to build a robust comorbidity predictive model and identify the most influential oxidative stress-related factors, thereby providing deeper insights into the key drivers of risk in this comorbid population.
Materials and methods
Study design and population
Data from the National Institutes of Health (NIH) website (https://www.cdc.gov/nchs/nhanes/index.htm) were downloaded for the years 2007–2018, resulting in a total of 59,842 patient records after data aggregation, integration, and cleaning. Based on missing data, the following exclusions were made: (1) 13,791 patients were excluded due to missing data on demographic and covariates information; (2) 14,686 patients were excluded due to missing OBS documentation (3) 11,941 patients were further excluded due to missing data on confirmed cardiovascular disease, cancer diagnoses, and mortality status. Ultimately, a total of 19,424 patients were included in the study. The flowchart was shown in Fig. 1.
Fig. 1.
Flow chart of the study participants
Assessment of OBS
The OBS was computed by integrating both pro-oxidant and antioxidant components derived from dietary and lifestyle factors. The dietary segment of the OBS comprised 16 nutrients: dietary fiber, carotene, riboflavin, niacin, vitamin B6, total folate, vitamin B12, vitamin C, vitamin E, calcium, magnesium, zinc, copper, selenium, total fat, and iron. Dietary data were collected through two 24-hour recall interviews administered at mobile examination centers, with a 3- to 10- day interval between sessions. The average daily intake of each nutrient was calculated as a weighted average from the two recalls, incorporating the day 2 dietary sample weights (WTDRD2) provided by NHANES to obtain nationally representative estimates. For nutrients with missing data in one of the recalls (e.g., due to non-response), the available value from a single recall was used, provided it met NHANES quality control criteria. Participants with missing data for any dietary component were excluded.
The lifestyle-related OBS components comprised physical activity, BMI, alcohol intake, and smoking status. Following conventional scoring criteria, pro-oxidant factors were defined as total fat, iron, BMI, alcohol consumption, and smoking; all other components were considered antioxidant factors.
Physical activity was assessed in accordance with NHANES guidelines using a metabolic equivalent (MET) score [14]. This score incorporated occupational (both vigorous and moderate intensity) and leisure-time physical activity (vigorous and moderate), as well as walking or bicycling for transportation. The MET score was calculated as: weekly frequency × duration per session × recommended MET value for each activity.
Smoking status was objectively evaluated based on serum cotinine levels, the primary metabolite of nicotine. Alcohol intake was classified into three categories according to standard OBS scoring methods: heavy drinkers (≥ 15 g/day for women; ≥30 g/day for men), non-heavy drinkers (0–15 g/day for women; 0–30 g/day for men), and non-drinkers, assigned scores of 0, 1, and 2, respectively.
All components were stratified by sex and grouped into tertiles. Antioxidant factors were scored from 0 to 2 (from the lowest to the highest tertile), whereas pro-oxidant factors were inversely scored from 2 to 0 (from the lowest to the highest tertile) [15]. A detailed summary of the OBS scoring protocol is provided in Supplementary Table S1. The names of variables for each component of the OBS were showed in Supplementary Table S2.
To evaluate the robustness of the scoring system, a sensitivity analysis was performed by creating an alternative OBS using sex-specific tertiles for all components. The results from both scoring methods were highly consistent, confirming that our primary findings are not sensitive to the specific scoring approach and demonstrate reliable robustness.
Assessment of cardiovascular Disease-Cancer comorbidity
CVD was defined by meeting any of the following criteria: (1) an average systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 85 mmHg across four measurements, (2) a self-reported physician diagnosis of congestive heart failure, coronary heart disease, angina, myocardial infarction, or stroke, or (3) a self-reported history of hypertension or current use of antihypertensive medication. Cancer status was confirmed by either a self-reported physician diagnosis, medical record documentation of cancer type, or use of antitumor drugs. Comorbidity was defined as the concurrent presence of both CVD and cancer in a participant [16].
Covariates
This study included the following covariates based on established methodologies: age, sex, marital status (married vs. unmarried), race/ethnicity (Mexican American; non-Hispanic White; non-Hispanic Black; other races), education level (< high school diploma vs. ≥ high school), poverty-to-income ratio (PIR), body mass index (BMI), hyperlipidemia history, and diabetes status. Poverty status is determined based on household income and the Poverty Index Ratio (PIR): poor (PIR < 1.0) and non-poor (PIR ≥ 1.0). Diabetes is defined as meeting any of the following criteria: (1) Fasting Plasma Glucose (FPG) ≥ 7 mmol/L; (2) Glycated Haemoglobin (HbA1c) ≥ 6.5%; (3) Clinically diagnosed history of diabetes; (4) Currently receiving glucose-lowering medication (including oral hypoglycemic agents and/or insulin therapy). Hyperlipidemia was defined as meeting any of the following criteria: (1) total cholesterol ≥ 200 mg/dL; (2) fasting triglycerides ≥ 150 mg/dL; (3) HDL cholesterol below sex-specific thresholds (females < 50 mg/dL, males < 40 mg/dL); (4) LDL cholesterol ≥ 130 mg/dL; or (5) clinician-diagnosed hyperlipidemia.
Assessment of mortality
We used the NHANES Public-Use Linked Mortality File, which employs probabilistic matching to National Death Index (NDI) records for death ascertainment. Cause-specific mortality classification in the NDI demonstrates high validity, with minimal misclassification risk based on established validation studies.
Machine learning model development and evaluation
To predict cardiovascular disease-cancer comorbidity, we developed and compared five machine learning algorithms—RPART, random forest, k-nearest neighbors, naïve Bayes, and LightGBM—using a dataset of 19,424 participants with 6% comorbidity prevalence. To address significant class imbalance, we implemented synthetic minority oversampling technique (SMOTE) during cross-validation and applied class weighting strategies. Model development incorporated ten repetitions of 10-fold cross-validation with hyperparameter optimization via Bayesian/grid search, evaluating performance through comprehensive metrics including AUC-ROC, precision-recall AUC, sensitivity, specificity, F1-score and Brier score. The optimal model was selected based on both discriminative ability and calibration performance, with model interpretability enhanced through SHAP analysis to quantify individual predictor contributions and ensure clinical relevance. A list of the variables’ names included in the SHAP plot was showed in Supplementary Table S3.
Statistical analysis
To account for the complex, multistage probability sampling design of the NHANES, multi-cycle merged sample weights were applied to all analyses to ensure nationally representative estimates of health statistics. Continuous variables were summarized as weighted means ± standard deviation (Mean ± SD), and between-group comparisons were performed using weighted analysis of variance (ANOVA). Categorical variables were reported as weighted frequencies and percentages, with intergroup differences assessed using weighted chi-square tests.
A weighted multivariate Cox proportional hazards model was used to assess the association between the OBS tertiles and all-cause mortality in patients with cardiovascular disease-cancer comorbidity. Survival differences across OBS tertiles were evaluated using Kaplan-Meier curves with stratified log-rank tests. We implemented a sequential adjustment approach in three nested models: Model 1 was unadjusted; Model 2 added demographic covariates (age, gender, race, marital status, education level, and poverty-income ratio); and Model 3 further incorporated clinical comorbidities (hypertension, hyperlipidemia, and diabetes). Stratified analyses were conducted to assess potential effect modification by demographic and clinical characteristics, including age, gender, race, education level, marital status, poverty-income ratio, body mass index, hypertension, hyperlipidemia, and diabetes. Formal interaction testing was performed by introducing product terms between OBS and each stratification variable. Sensitivity analyses were conducted to examine the robustness of the primary findings.
All analyses were conducted using R software (version 4.4.0), with a statistical significance threshold set at p < 0.05.
Results
Baseline characteristics of participants
The baseline characteristics of 19,424 included participants was shown in Table 1, including 1,215 patients with cardiovascular disease-cancer comorbidity. The majority of participants are married (63.51%) and predominantly non-Hispanic White (68.29%), with most having completed high school or higher education (87.40%). Participants were stratified into tertiles based on OBS levels. Notably, compared with those in the Q3 group of the OBS, participants in the Q1 group were more likely to have lower income and educational attainment, a higher body mass index, and to be current smokers. They also exhibited higher prevalence rates of cardiovascular disease–cancer comorbidity, diabetes, and hyperlipidemia.
Table 1.
Baseline characteristics of all participants
| Characteristic | Overall (N = 19,424) |
Q1 (N = 7,562) |
Q2 (N = 5,992) |
Q3 (N = 5,870) |
p-value |
|---|---|---|---|---|---|
| Gender, n(%) | < 0.001 | ||||
| Man | 9,930 (50.78%) | 2,822 (34.57%) | 3,149 (51.14%) | 3,959 (67.06%) | |
| Female | 9,494 (49.22%) | 4,740 (65.43%) | 2,843 (48.86%) | 1,911 (32.94%) | |
| Age, n(%) | 0.002 | ||||
| 20–40 | 7,445 (41.35%) | 2,767 (39.68%) | 2,284 (40.25%) | 2,394 (44.07%) | |
| 41–60 | 6,695 (37.18%) | 2,530 (36.87%) | 2,086 (37.99%) | 2,079 (36.76%) | |
| > 60 | 5,284 (21.47%) | 2,265 (23.45%) | 1,622 (21.76%) | 1,397 (19.17%) | |
| Race, n(%) | < 0.001 | ||||
| Mexican American | 2,793 (8.38%) | 1,003 (8.17%) | 864 (8.56%) | 926 (8.43%) | |
| Non-Hispanic Black | 3,881 (10.16%) | 2,003 (14.99%) | 1,078 (9.11%) | 800 (6.16%) | |
| Non-Hispanic White | 8,485 (68.29%) | 3,019 (64.22%) | 2,696 (68.90%) | 2,770 (71.90%) | |
| Other | 4,265 (13.17%) | 1,537 (12.61%) | 1,354 (13.42%) | 1,374 (13.52%) | |
| Education level, n(%) | < 0.001 | ||||
| Below high school | 3,894 (12.60%) | 1,886 (16.90%) | 1,099 (11.59%) | 909 (9.11%) | |
| High School or above | 15,530 (87.40%) | 5,676 (83.10%) | 4,893 (88.41%) | 4,961 (90.89%) | |
| PIR, n(%) | < 0.001 | ||||
| Not poor | 15,919 (87.48%) | 5,865 (82.97%) | 5,016 (89.18%) | 5,038 (90.55%) | |
| Poor | 3,505 (12.52%) | 1,697 (17.03%) | 976 (10.82%) | 832 (9.45%) | |
| Marital status, n(%) | < 0.001 | ||||
| Married | 11,794 (63.51%) | 4,122 (57.39%) | 3,804 (66.56%) | 3,868 (66.97%) | |
| Unmarried | 7,630 (36.49%) | 3,440 (42.61%) | 2,188 (33.44%) | 2,002 (33.03%) | |
| Hypertension, n(%) | < 0.001 | ||||
| No | 10,259 (58.62%) | 3,661 (55.10%) | 3,159 (57.29%) | 3,439 (63.46%) | |
| Yes | 9,165 (41.38%) | 3,901 (44.90%) | 2,833 (42.71%) | 2,431 (36.54%) | |
| Hyperlipidemia, n(%) | < 0.001 | ||||
| No | 6,616 (33.79%) | 2,370 (30.51%) | 1,991 (32.87%) | 2,255 (38.01%) | |
| Yes | 12,808 (66.21%) | 5,192 (69.49%) | 4,001 (67.13%) | 3,615 (61.99%) | |
| Diabetes, n(%) | < 0.001 | ||||
| No | 16,466 (89.36%) | 6,167 (86.92%) | 5,132 (89.26%) | 5,167 (91.94%) | |
| Yes | 2,958 (10.64%) | 1,395 (13.08%) | 860 (10.74%) | 703 (8.06%) | |
| CC, n(%) | 0.050 | ||||
| No | 18,209 (93.83%) | 7,054 (93.29%) | 5,635 (93.51%) | 5,520 (94.67%) | |
| Yes | 1,215 (6.17%) | 508 (6.71%) | 357 (6.49%) | 350 (5.33%) | |
| CVD, n(%) | < 0.001 | ||||
| No | 10,098 (57.87%) | 3,592 (54.36%) | 3,104 (56.43%) | 3,402 (62.79%) | |
| Yes | 9,326 (42.13%) | 3,970 (45.64%) | 2,888 (43.57%) | 2,468 (37.21%) | |
| Cancer, n(%) | 0.331 | ||||
| No | 17,683 (90.17%) | 6,859 (89.54%) | 5,478 (90.26%) | 5,346 (90.73%) | |
| Yes | 1,741 (9.83%) | 703 (10.46%) | 514 (9.74%) | 524 (9.27%) | |
| BMI (Kg/m2 ) | 28.75 ± 6.58 | 29.87 ± 6.98 | 28.77 ± 6.66 | 27.58 ± 5.85 | < 0.001 |
| Fiber (g/d) | 17.71 ± 9.31 | 10.93 ± 4.54 | 16.96 ± 6.00 | 25.34 ± 9.76 | < 0.001 |
| Calcium (mg/d) | 989.85 ± 507.02 | 654.15 ± 273.62 | 947.97 ± 359.14 | 1,372.55 ± 545.94 | < 0.001 |
| Zinc (mg/d) | 11.72 ± 6.35 | 7.60 ± 3.26 | 11.28 ± 4.52 | 16.34 ± 7.10 | < 0.001 |
| Copper (mg/d) | 1.31 ± 0.76 | 0.84 ± 0.32 | 1.26 ± 0.53 | 1.84 ± 0.92 | < 0.001 |
| Selenium (mcg/d) | 116.47 ± 53.49 | 81.33 ± 28.93 | 114.22 ± 38.21 | 154.57 ± 59.17 | < 0.001 |
| Magnesium (mg/d) | 312.01 ± 135.30 | 201.66 ± 59.57 | 301.90 ± 70.68 | 434.46 ± 134.54 | < 0.001 |
| Vitamin C (mg/d) | 83.35 ± 75.47 | 48.87 ± 47.73 | 79.46 ± 64.60 | 122.28 ± 88.62 | < 0.001 |
| Vitamin E (mg/d) | 8.93 ± 5.80 | 5.36 ± 2.48 | 8.46 ± 3.86 | 13.01 ± 7.00 | < 0.001 |
| Vitamin B12 (mcg/d) | 5.24 ± 4.79 | 3.10 ± 2.20 | 5.07 ± 4.10 | 7.60 ± 6.07 | < 0.001 |
| Vitamin B6 (mg/d) | 2.20 ± 1.44 | 1.35 ± 0.72 | 2.15 ± 1.32 | 3.13 ± 1.53 | < 0.001 |
| Carotene (RE/d) | 2,397.00 ± 3,899.74 | 1,244.90 ± 1,861.41 | 2,343.17 ± 2,985.77 | 3,627.74 ± 5,491.79 | < 0.001 |
| Riboflavin (mg/d) | 2.23 ± 1.33 | 1.44 ± 0.76 | 2.17 ± 1.05 | 3.09 ± 1.50 | < 0.001 |
| Niacin (mg/d) | 26.57 ± 13.43 | 17.75 ± 6.86 | 26.21 ± 11.13 | 35.95 ± 14.20 | < 0.001 |
| Total folate (mcg/d) | 415.51 ± 222.43 | 259.94 ± 97.74 | 393.10 ± 135.39 | 595.62 ± 247.64 | < 0.001 |
| Total fat (g/d) | 82.27 ± 38.09 | 59.70 ± 23.92 | 81.82 ± 29.04 | 105.82 ± 42.81 | < 0.001 |
| Iron (mg/d) | 15.18 ± 7.75 | 10.04 ± 3.92 | 14.48 ± 4.76 | 21.10 ± 8.82 | < 0.001 |
| Cotinine (ng/ml) | 52.77 ± 123.16 | 74.09 ± 140.24 | 50.05 ± 123.44 | 33.39 ± 98.23 | < 0.001 |
| Alcohol (g/d) | 10.01 ± 22.20 | 7.91 ± 18.53 | 11.46 ± 25.97 | 10.83 ± 21.72 | < 0.001 |
| Physical activity (minute/week) | 5,087.68 ± 12,095.32 | 4,665.17 ± 12,471.26 | 4,903.03 ± 12,998.46 | 5,690.61 ± 10,745.40 | < 0.001 |
Continuous variables were presented as mean ± SD; Categorical variables were presented as n (%); SD, Standard deviation; CC: cardiovascular disease and cancer comorbidity; CVD: cardiovascular disease; PA: physical activity; BMI: body mass index; PIR: Poverty-Income Ratio
Association between OBS with all-cause mortality of CVD-cancer population
Multivariable Cox regression analyses were performed to evaluate the association between the OBS and all-cause mortality (Fig. 2). When treated as a continuous variable in the fully adjusted model (Model 3), each one-unit increase in the OBS was associated with a 5% reduction in the risk of all-cause mortality (HR = 0.95, 95% CI: 0.91–0.98, p = 0.006). Similarly, when analyzed as tertiles, participants in the highest tertile (Q3) exhibited a 22% significantly lower risk of mortality compared to those in the lowest tertile (Q1) (HR = 0.78, 95% CI: 0.64–0.95, p = 0.016).
Fig. 2.
Association between OBS and CVD-cancer comorbidity. Mode1: Unadjusted model. Model 2: Adjusted for age, sex, race, PIR, marital status and education levels. Model 3: Additional adjusted diabetes, hyperlipidemia and hypertension
Survival patterns of CVD-cancer populations in different tertiles of OBS score
Kaplan-Meier curves demonstrated a significant association between OBS tertiles and clinical outcomes. As depicted in Fig. 3A, individuals in the highest OBS tertile (Q3) experienced superior overall survival compared to those in the lowest tertile (Q1) (p = 0.005). In a complementary fashion, analysis of the composite cardiovascular-cancer endpoint showed a significantly elevated cumulative incidence in the Q1 group relative to the Q3 group (p < 0.005, Fig. 3B).
Fig. 3.
(A) Kaplan-Meier curve of OBS level and the incidence of CVD-cancer comorbidity. (B) Cumulative incidence curves of OBS levels and the incidence of CVD-cancer comorbidity
RCS curve regression results
Using restricted cubic splines (Fig. 4), we assessed the dose-response relationship between OBS and all-cause mortality in patients with cardiovascular disease-cancer comorbidity. The analysis found no evidence of a nonlinear relationship. (p for nonlinear = 0.799).
Fig. 4.
Restricted cubic spline plot of the association between OBS and CVD-cancer comorbidity. (A) The association between OBS and CVD-cancer comorbidity. (B) The analysis was adjusted for gender, age, race, marital status, education levels, PIR and BMI. (C) The analysis was adjusted for gender, age, marital status, education levels, PIR, diabetes, hyperlipidemia and hypertension
Subgroup analysis
To evaluate the consistency of the association between the OBS and mortality in patients with CVD-cancer comorbidity across the general population, we performed subgroup analyses and interaction tests based on age, gender, race, marital status, education, income, diabetes, hyperlipidemia, and hypertension. In subgroup analyses, the association between OBS and all-cause mortality was generally consistent across all predefined subgroups, with no statistically significant interaction detected (all p for interaction > 0.05) (Fig. 5).
Fig. 5.
Subgroup analysis of the associations between OBS and CVD-cancer comorbidity. The analysis was adjusted for gender, age, race, marital, education, PIR, diabetes, hyperlipidemia and hypertension
Sensitivity analysis
The significant inverse association between the OBS and all-cause mortality proved robust in sensitivity analyses that excluded early deaths. Further analysis sequentially omitting each OBS component confirmed that the overall association was not dependent on any single element, with most components demonstrating significant individual protective effects (Supplementary Table S4).
Machine learning results
To comprehensively evaluate the performance of the machine learning models, we applied a multidimensional assessment framework (Table 2). The random forest (RF) model exhibited strong performance across key metrics, achieving the lowest classification error rate (0.091), the highest area under the receiver operating characteristic curve (AUC = 0.825) (Fig. 6A), as well as the highest accuracy (0.908) and specificity (0.959). These results indicate excellent discriminative ability for negative cases and high overall predictive reliability. The LightGBM model also performed competitively, attaining the highest F1‑score (0.313) and the largest area under the precision‑recall curve (0.241), reflecting its balanced performance in identifying positive cases and handling class imbalance.
Table 2.
Evaluation metrics of the models constructed by each method
| Machine learner | Classification error rate |
ACC | AUC | F1 Score | Sensitivity | Specificity | Brier Score | Area under the PR curve |
|---|---|---|---|---|---|---|---|---|
| RPART | 0.169 | 0.831 | 0.819 | 0.313 | 0.618 | 0.845 | 0.117 | 0.194 |
| RF | 0.091 | 0.908 | 0.825 | 0.160 | 0.140 | 0.959 | 0.071 | 0.182 |
| K–KNN | 0.300 | 0.699 | 0.769 | 0.225 | 0.700 | 0.701 | 0.196 | 0.154 |
| NB | 0.332 | 0.668 | 0.741 | 0.217 | 0.739 | 0.663 | 0.250 | 0.162 |
| LightGBM | 0.169 | 0.830 | 0.820 | 0.313 | 0.621 | 0.844 | 0.103 | 0.241 |
AUC: area under the curve; ACC: accuracy; RPART: recursive partitioning and regression trees; RF: random forest, K–KNN: Kernel k-nearest neighbors, NB: naïve bayes; LightGBM: light gradient boosting machine
Fig. 6.
(A) Comparison of receiver operating characteristic curves with five machine learning models in predicting CVD-cancer comorbidity; (B)Calibration plots for machine learning models predicting cardiovascular and cancer comorbidity. (C) Decision curve analysis comparing machine learning models for predicting cardiovascular and cancer comorbidity
Both ensemble methods showed relatively good calibration, as evidenced by low Brier scores (RF: 0.071; LightGBM: 0.103). The calibration curve further confirmed good calibration performance for the RF model (Fig. 6B). Moreover, decision‑curve analysis (DCA) demonstrated that the RF and LightGBM models provided higher net benefits across the entire threshold probability range compared with the “intervene‑all” or “intervene‑none” strategies (Fig. 6C).
Feature importance and the role of features in the model
Supplementary Figure S1 illustrates the correlation analysis among the variables included in the model. SHAP analysis was used to interpret the model, with Fig. 7 displaying the top 15 features and their contributions to prediction outcomes. Each point in the figure represents an individual sample, and its horizontal position reflects the SHAP value of the corresponding feature for the sample (i.e., the degree of influence). The analysis results show that age is an important contributing factor to the prediction results. Among the lifestyle factors, smoking status is also a significant feature. The model analysis indicates that the intake of dietary niacin and selenium is associated with a reduced individual risk. In addition, we plotted scatter plots of SHAP values against each variable (including antioxidant and pro-oxidant factors) to assess the correlation (Supplementary Figure S2). The results show that age, fat intake, copper level, cotinine concentration, vitamin E level, and vitamin C level are positively correlated with their corresponding SHAP values.
Fig. 7.
Contribution of variables to CVD-cancer comorbidity incidence prediction using SHAP values (A) The heat plot of SHAP values illustrates the relationships between variables and CVD-cancer comorbidity. (B) The bar plot shows each variable’s contribution to CVD-cancer comorbidity, with bar length indicating the contribution extent. (C). SHAP force plot
Discussion
To our knowledge, this is among the first studies to comprehensively evaluate the association between the OBS and survival outcomes among individuals with coexisting CVD and cancer, utilizing a nationally representative sample from six NHANES cycles (2007–2018). Our analysis demonstrated a significant inverse association between a higher OBS and lower all-cause mortality risk in this comorbidity population. This association remained consistent across multiple subgroup analyses.
CVD and cancer are leading causes of death worldwide [17]. Evidence shows that patients with cardiovascular diseases and heart failure (HF) have an increased incidence of cancer [18, 19]. Cancer patients, especially long-term survivors, exhibit higher susceptibility to cardiovascular diseases, which may be related to cardiotoxic treatments, shared risk factors, or common pathological biological mechanisms [20]. Oxidative stress is a key pathogenic link between the two diseases [21]. In cardiovascular diseases, it disrupts redox homeostasis—that is, the balance between reactive oxygen species (ROS) and antioxidant defenses—leading to biomolecular oxidative damage and cellular dysfunction [22]. Similarly, cancer cells also show elevated ROS levels and redox imbalance [22, 23]. An inflammatory diet may promote oxidative stress and systemic inflammation, thereby increasing oxidative stress levels in the body. Therefore, maintaining low systemic inflammation and reducing oxidative stress are crucial for lowering the risk of cardiovascular disease-cancer comorbidity-related mortality [24, 25].
The OBS provides a robust composite measure of systemic oxidative status by integrating multiple pro-oxidant and antioxidant factors. Unlike isolated biomarkers that capture only transient oxidative changes [26], OBS reflects the cumulative and interactive effects of dietary and lifestyle exposures, thereby enabling a more comprehensive assessment of redox balance.
The adoption of IoT (Internet of Things) devices [27], such as wearable sensors and implantable monitors, is transforming the management of cardiovascular disease and cancer by enabling continuous, real-time tracking of vital signs and biomarkers. This capability supports the early detection of clinical deterioration and enhanced monitoring of treatment-related side effects, promoting a shift toward personalized and proactive healthcare. Building upon these technological advances, ML algorithms have become a cornerstone of clinical decision-making, providing critical insights to enhance diagnostic and prognostic accuracy [28–30]. A key advantage of ML models is their capacity to integrate multifaceted input variables, thereby enabling a more comprehensive assessment of risk profiles. Unlike traditional statistical methods, which often identify only a limited set of independent predictors, data-driven ML techniques can uncover complex, synergistic relationships among multiple disease-influencing factors. This capability offers a broader and more in-depth panoramic view of risk composition [31, 32].
Recent studies have demonstrated the utility of ML-based predictive models in identifying nutrient-associated CVD risk profiles and evaluating the impact of lifestyle behaviors on cardiovascular and all-cause mortality [33, 34]. In the present study, we employed RPART, RF, K-NN, NB, and LightGBM algorithms to construct predictive models for CVD-cancer comorbidity. Among these, the RF model significantly outperformed centrally trained single models and other candidates across key metrics, including Discriminative Ability and Calibrative Ability, achieving a notable AUC of 0.825. These findings underscore the benefit of incorporating the OBS for predicting CVD-cancer comorbidity, as it provides a more holistic representation of an individual’s health status.
To elucidate the predictive mechanism of the optimal model and identify the most influential risk variables, we performed SHAP analysis to quantify the contribution of each variable to the combined risk of cardiovascular disease and cancer. While the analysis confirmed age as the most significant predictor—with SHAP values increasing markedly with advancing age, indicating a strong positive association with adverse outcomes—this finding was anticipated and aligns with well-established biological mechanisms. These include NLRP3 inflammasome activation [35], oxidative stress induced by mitochondrial dysfunction, immune dysregulation [36], and persistent low-grade inflammation, all of which are promoted by aging and contribute to both cardiovascular and oncologic pathogenesis. Notably, beyond this expected association, the SHAP analysis provided deeper mechanistic insights by revealing specific, non-linear risk transitions across the age continuum and highlighting interactive effects between age and other biomarkers, such as those reflected in the OBS. This underscores the role of oxidative stress as a central driver interlinking aging-related pathophysiology and carcinogenesis, whether as a cause or consequence of mitochondrial impairment.
Niacin, a B-group vitamin, contributes to metabolic regulation and vascular protection. Evidence from experimental models of atherosclerosis suggests that niacin can suppress the expression of inflammatory cytokines, inhibit NF-κB pathway activation, and attenuate apoptosis in vascular smooth muscle cells [37]. Furthermore, epidemiological studies conducted in other populations indicate that higher dietary niacin intake is associated with reduced all-cause and cardiovascular mortality, effects potentially mediated through its antioxidant, anti-inflammatory, and anti-apoptotic properties [38, 39]. This external evidence provides a plausible mechanistic framework for, but does not directly prove, the predictive role of niacin observed in our cohort.
Similarly, the predictive importance of selenium in our model is consistent with its well-characterized biological functions. Selenium exerts its antioxidant effects primarily through its incorporation into selenoproteins, such as glutathione peroxidase [40], which is critical for regulating reactive oxygen and nitrogen species and maintaining redox homeostasis. Epidemiological observations have independently linked selenium deficiency to an elevated risk of neurodegenerative disorders, cardiovascular diseases, and certain cancers [41, 42]. Thus, the association captured by our model aligns with this pre-existing biological and epidemiological knowledge [43], reinforcing the plausibility of selenium’s role while highlighting the need for further targeted research to confirm a causal relationship in the context of comorbidity.
Several important limitations of this study should be acknowledged. First, while the overall NHANES sample is substantial, the number of participants with specific CVD-cancer comorbidity was relatively limited, which may restrict the statistical power for detecting more subtle associations and increase the risk of model overfitting, despite our use of cross-validation techniques. Second, the cross-sectional design of the NHANES database precludes the establishment of temporal sequence, limiting causal inference between the OBS and the comorbidity outcome. The absence of longitudinal data also restricts our ability to observe the long-term dynamics of oxidative balance and its cumulative health impacts. Third, although we adjusted for numerous covariates, residual confounding remains possible due to unmeasured or imperfectly captured factors, such as detailed socioeconomic indicators, environmental exposures, and specific health behaviors. Furthermore, key clinical variables, including specific medication use, treatment regimens, and therapy adherence, were not available and may introduce bias into risk estimations. Fourth, the OBS calculation was constrained by available data and did not encompass all known dietary and lifestyle components related to oxidative stress. Fifth, the primary reliance on self-reported diagnoses for CVD and cancer, without systematic verification against clinical records, may lead to misclassification bias. Finally, our findings are derived from a U.S. multi-ethnic population, and their generalizability to other populations with differing genetic backgrounds, dietary patterns, and healthcare contexts requires external validation.
Conclusion
This study reveals a perceptible association between the OBS and the risk of all-cause mortality with cardiovascular disease-cancer comorbidity. The findings underscore the potential importance of an antioxidant-rich diet and lifestyle in this high-risk population. However, it is crucial to recognize that factors such as socioeconomic status, underlying health conditions, or unmeasured variables may have influenced the observed outcomes. Although assessing oxidative balance via OBS shows promise, further research—particularly prospective studies and external validation—is essential to confirm the association and explore its potential clinical applicability.
Supplementary Information
Acknowledgements
The authors sincerely thank the NHANES team for providing valuable survey data. Thanks to all who participated in this study.
Abbreviations
- OBS
Oxidative Balance Score
- CVD
Cardiovascular disease
- NHANES
National Health and Nutrition Examination Survey
- RCS
Restricted cubic spline
- BMI
Body mass index
- RIP
Family income to poverty
- SD
Standard deviation
- ML
Machine Learning
- AUC
Area under the curve
- ACC
Accuracy
- ROC
Receiver Operating Characteristic
- RPART
Recursive partitioning and regression trees
- RF
Random forest
- K–KNN
Kernel k-nearest neighbors
- NB
Naïve bayes
- LightGBM
Light gradient boosting machine
- SHAP
SHapley Additive exPlanations
- DCA
Decision curve analysis
- ROS
Reactive oxygen species
- IoT
Internet of Things
Author contributions
FL designed research; JW conducted research; ASW analyzed data; CYJ and LSP wrote the paper. YYY and YZL had primary responsibility for final content. All authors read and approved the final manuscript.
Funding
The study was supported by the General Project of Health and Medical Research in Hunan Province (No. 20254420) and Changsha Natural Science Foundation (No. kq2502274).
Data availability
The National Health and Nutrition Examination Survey data set is publicly available at the National Center for Health Statistics of the Centers for Disease Control and Prevention.
Declarations
Ethics approval and consent to participate
NHANES is conducted by the Centers for Disease Control and Prevention (CDC) and the National Center for Health Statistics (NCHS). The NCHS Research Ethics Review Board revi, ewed and approved the NHANES study protocol. All participants signed a written informed consent form.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fen Liu and Jian Wang contributed equally to this work and should be considered co-first authors.
Contributor Information
Zheng-Yu Liu, Email: liuzhengyu@hunnu.edu.cn.
Ya-Yu You, Email: youyayu2333@hunnu.edu.cn.
References
- 1.de Boer RA, Meijers WC, van der Meer P, van Veldhuisen DJ. Cancer and heart disease: associations and relations. Eur J Heart Fail. 2019;21:1515–25. 10.1002/ejhf.1539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Raisi-Estabragh Z, Manisty CH, Cheng RK, Lopez Fernandez T, Mamas MA. Burden and prognostic impact of cardiovascular disease in patients with cancer. Heart. 2023;109:1819–26. 10.1136/heartjnl-2022-321324. [DOI] [PubMed] [Google Scholar]
- 3.Aboumsallem JP, Moslehi J, de Boer RA. Reverse cardio-oncology: cancer development in patients with cardiovascular disease. J Am Heart Assoc. 2020;9:e013754. 10.1161/jaha.119.013754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Libby P, Kobold S. Inflammation: a common contributor to cancer, aging, and cardiovascular diseases-expanding the concept of cardio-oncology. Cardiovasc Res. 2019;115:824–9. 10.1093/cvr/cvz058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hasin T, Gerber Y, Weston SA, Jiang R, Killian JM, Manemann SM, et al. Heart failure after myocardial infarction is associated with increased risk of cancer. J Am Coll Cardiol. 2016;68:265–71. 10.1016/j.jacc.2016.04.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Masoudkabir F, Mohammadifard N, Mani A, Ignaszewski A, Davis MK, Vaseghi G, et al. Shared lifestyle-related risk factors of cardiovascular disease and cancer: evidence for joint prevention. Sci World J. 2023;2023:2404806. 10.1155/2023/2404806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Di Fusco SA, Cianfrocca C, Bisceglia I, Spinelli A, Alonzo A, Mocini E, et al. Potential pathophysiologic mechanisms underlying the inherent risk of cancer in patients with atherosclerotic cardiovascular disease. Int J Cardiol. 2022;363:190–5. 10.1016/j.ijcard.2022.06.048. [DOI] [PubMed] [Google Scholar]
- 8.Barrera G. Oxidative stress and lipid peroxidation products in cancer progression and therapy. ISRN Oncol. 2012;2012:137289. 10.5402/2012/137289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Incalza MA, D’Oria R, Natalicchio A, Perrini S, Laviola L, Giorgino F. Oxidative stress and reactive oxygen species in endothelial dysfunction associated with cardiovascular and metabolic diseases. Vascul Pharmacol. 2018;100:1–19. 10.1016/j.vph.2017.05.005. [DOI] [PubMed] [Google Scholar]
- 10.Hernández-Ruiz Á, García-Villanova B, Guerra-Hernández EJ, Carrión-García CJ, Amiano P, Sánchez MJ, et al. Oxidative balance scores (OBSs) integrating nutrient, food and lifestyle dimensions: development of the NutrientL-OBS and FoodL-OBS. Antioxidants. 2022. 10.3390/antiox11020300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Goodman M, Bostick RM, Dash C, Flanders WD, Mandel JS. Hypothesis: oxidative stress score as a combined measure of pro-oxidant and antioxidant exposures. Ann Epidemiol. 2007;17:394–9. 10.1016/j.annepidem.2007.01.034. [DOI] [PubMed] [Google Scholar]
- 12.Hasani M, Alinia SP, Khazdouz M, Sobhani S, Mardi P, Ejtahed HS, et al. Oxidative balance score and risk of cancer: a systematic review and meta-analysis of observational studies. BMC Cancer. 2023;23:1143. 10.1186/s12885-023-11657-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Xu Z, Liu D, Zhai Y, Tang Y, Jiang L, Li L, et al. Association between the oxidative balance score and all-cause and cardiovascular mortality in patients with diabetes and prediabetes. Redox Biol. 2024;76:103327. 10.1016/j.redox.2024.103327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lei X, Xu Z, Chen W. Association of oxidative balance score with sleep quality: NHANES 2007–2014. J Affect Disord. 2023;339:435–42. 10.1016/j.jad.2023.07.040. [DOI] [PubMed] [Google Scholar]
- 15.Zhang W, Peng S-F, Chen L, Chen H-M, Cheng X-E, Tang Y-H. Association between the oxidative balance score and telomere length from the national health and nutrition examination survey 1999–2002. Oxid Med Cell Longev. 2022;2022:1345071. 10.1155/2022/1345071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Qi X, Wang S, Fang C, Jia J, Lin L, Yuan T. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 2025;79:103470. 10.1016/j.redox.2024.103470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wilcox NS, Amit U, Reibel JB, Berlin E, Howell K, Ky B. Cardiovascular disease and cancer: shared risk factors and mechanisms. Nat Rev Cardiol. 2024;21:617–31. 10.1038/s41569-024-01017-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ferlay J, Colombet M, Soerjomataram I, Dyba T, Randi G, Bettio M, et al. Cancer incidence and mortality patterns in Europe: estimates for 40 countries and 25 major cancers in 2018. European journal of cancer (Oxford, England : 1990). 2018;103:356–87. 10.1016/j.ejca.2018.07.005. [DOI] [PubMed] [Google Scholar]
- 19.Crespo-Leiro MG, Anker SD, Maggioni AP, Coats AJ, Filippatos G, Ruschitzka F, et al. European society of cardiology heart failure long-term registry (ESC-HF-LT): 1-year follow-up outcomes and differences across regions. Eur J Heart Fail. 2016;18:613–25. 10.1002/ejhf.566. [DOI] [PubMed] [Google Scholar]
- 20.Koene RJ, Prizment AE, Blaes A, Konety SH. Shared risk factors in cardiovascular disease and cancer. Circulation. 2016;133:1104–14. 10.1161/circulationaha.115.020406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bertero E, Canepa M, Maack C, Ameri P. Linking heart failure to cancer: background evidence and research perspectives. Circulation. 2018;138:735–42. 10.1161/circulationaha.118.033603. [DOI] [PubMed] [Google Scholar]
- 22.Valko M, Leibfritz D, Moncol J, Cronin MT, Mazur M, Telser J. Free radicals and antioxidants in normal physiological functions and human disease. Int J Biochem Cell Biol. 2007;39:44–84. 10.1016/j.biocel.2006.07.001. [DOI] [PubMed] [Google Scholar]
- 23.Valko M, Rhodes CJ, Moncol J, Izakovic M, Mazur M. Free radicals, metals and antioxidants in oxidative stress-induced cancer. Chem Biol Interact. 2006;160:1–40. 10.1016/j.cbi.2005.12.009. [DOI] [PubMed] [Google Scholar]
- 24.Saeidnia S, Abdollahi M. Antioxidants: friends or foe in prevention or treatment of cancer: the debate of the century. Toxicol Appl Pharmcol. 2013;271:49–63. 10.1016/j.taap.2013.05.004. [DOI] [PubMed] [Google Scholar]
- 25.Glasauer A, Chandel NS. Targeting antioxidants for cancer therapy. Biochem Pharmacol. 2014;92:90–101. 10.1016/j.bcp.2014.07.017. [DOI] [PubMed] [Google Scholar]
- 26.Vasbinder A, Cheng RK, Heckbert SR, Thompson H, Zaslavksy O, Chlebowski RT, et al. Chronic oxidative stress as a marker of long-term radiation-induced cardiovascular outcomes in breast cancer. J Cardiovasc Transl Res. 2023;16:403–13. 10.1007/s12265-022-10320-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mulita F, Verras G-I, Anagnostopoulos C-N, Kotis K. A smarter health through the internet of surgical things. Sens (Basel Switz). 2022;22:4577. 10.3390/s22124577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhou H, Tang J, Zheng H. Machine learning for medical applications. Sci World J. 2015;2015:825267. 10.1155/2015/825267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–30. 10.1161/circulationaha.115.001593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine learning methods in drug discovery. Molecules. 2020. 10.3390/molecules25225277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ley C, Martin RK, Pareek A, Groll A, Seil R, Tischer T. Machine learning and conventional statistics: making sense of the differences. Knee Surg Sports Traumatol Arthrosc. 2022;30:753–7. 10.1007/s00167-022-06896-6. [DOI] [PubMed] [Google Scholar]
- 32.Charilaou P, Battat R. Machine learning models and over-fitting considerations. World J Gastroenterol. 2022;28:605–7. 10.3748/wjg.v28.i5.605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Guo X, Ma M, Zhao L, Wu J, Lin Y, Fei F, et al. The association of lifestyle with cardiovascular and all-cause mortality based on machine learning: a prospective study from the NHANES. BMC Public Health. 2025;25:319. 10.1186/s12889-025-21339-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Morgenstern JD, Rosella LC, Costa AP, Anderson LN. Development of machine learning prediction models to explore nutrients predictive of cardiovascular disease using Canadian linked population-based data. Appl Physiol Nutr Metab. 2022;47:529–46. 10.1139/apnm-2021-0502. [DOI] [PubMed] [Google Scholar]
- 35.Xu Z, Li D, Qu W, Yin Y, Qiao S, Zhu Y, et al. Card9 protects sepsis by regulating Ripk2-mediated activation of NLRP3 inflammasome in macrophages. Cell Death Dis. 2022;13:502. 10.1038/s41419-022-04938-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shaito A, Aramouni K, Assaf R, Parenti A, Orekhov A, Yazbi AE, et al. Oxidative stress-induced endothelial dysfunction in cardiovascular diseases. Front Biosci Landmark Ed. 2022;27:105. 10.31083/j.fbl2703105. [DOI] [PubMed] [Google Scholar]
- 37.Ganji SH, Kashyap ML, Kamanna VS. Niacin inhibits fat accumulation, oxidative stress, and inflammatory cytokine IL-8 in cultured hepatocytes: impact on non-alcoholic fatty liver disease. Metab Clin Exp. 2015;64:982–90. 10.1016/j.metabol.2015.05.002. [DOI] [PubMed] [Google Scholar]
- 38.Yang R, Zhu M, Fan S, Zhang J. Niacin intake and mortality (total and cardiovascular disease) in patients with cardiovascular disease: insights from NHANES 2003–2018. Nutr J. 2024;23:123. 10.1186/s12937-024-01027-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Su G, Sun G, Liu H, Shu L, Zhang J, Guo L, et al. Niacin suppresses progression of atherosclerosis by inhibiting vascular inflammation and apoptosis of vascular smooth muscle cells. Med Sci Monit. 2015;21:4081. 10.12659/msm.895547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rusetskaya NY, Fedotov IV, Koftina VA, Borodulin VB. [selenium compounds in redox regulation of inflammation and apoptosis]. Biomed Khim. 2019;65:165. 10.18097/pbmc20196503165. [DOI] [PubMed] [Google Scholar]
- 41.Wen Y, Zhang L, Li S, Wang T, Jiang K, Zhao L, et al. Effect of dietary selenium intake on CVD: a retrospective cohort study based on China health and nutrition survey (CHNS) data. Public Health Nutr. 2024;27:e122. 10.1017/S1368980024000703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Arques S. Serum albumin and cardiovascular disease: state-of-the-art review. Ann Cardiol Angeiol (paris). 2020;69:192–200. 10.1016/j.ancard.2020.07.012. [DOI] [PubMed] [Google Scholar]
- 43.Kuria A, Tian H, Li M, Wang Y, Aaseth JO, Zang J, et al. Selenium status in the body and cardiovascular disease: a systematic review and meta-analysis. Crit Rev Food Sci Nutr. 2021;61:3616. 10.1080/10408398.2020.1803200. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The National Health and Nutrition Examination Survey data set is publicly available at the National Center for Health Statistics of the Centers for Disease Control and Prevention.







