Machine Learning-Based Model for Predicting Coronary Heart Disease Using Preβ HDL and Cytokines as Plasma Biomarkers

Seema Singh Saharan; Kate Townsend Creasy; Lauren Birnbaum; Eveline O Stock; Jelena Mustra Rakic; Xiaoli Tian; Arun Prakash; Mary Malloy; John Kane

doi:10.1007/978-3-031-94950-0_13

. Author manuscript; available in PMC: 2025 Sep 15.

Published in final edited form as: Proc (Int Conf Comput Sci Comput Intell). 2025 Aug 29;2507:139–153. doi: 10.1007/978-3-031-94950-0_13

Machine Learning-Based Model for Predicting Coronary Heart Disease Using Preβ HDL and Cytokines as Plasma Biomarkers

Seema Singh Saharan ¹, Kate Townsend Creasy ², Lauren Birnbaum ³, Eveline O Stock ^4,⁸, Jelena Mustra Rakic ^4,⁵, Xiaoli Tian ⁶, Arun Prakash ^6,⁷, Mary Malloy ^4,^8,⁹, John Kane ^4,^8,¹⁰

PMCID: PMC12433607 NIHMSID: NIHMS2109672 PMID: 40955341

Abstract

Coronary heart disease (CHD) remains the leading cause of global mortality, per the Center for Disease Control. Thus, it is important to develop novel and improved methods for CHD prediction, detection, and early intervention. Our study aims to assess the predictive efficacy of plasma Preβ High-Density Lipoprotein (HDL) and cytokines as biomarkers of CHD, utilizing machine learning (ML) algorithms to enhance risk predictions.

In a case-control study, we explored the potential of 35 plasma cytokines in conjunction with Preβ HDL levels to discriminate “at risk” CHD patients from non-affected, control subjects. The dataset contains data on 108 individuals and is divided into two cohorts: 41 individuals with CHD and 67 individuals in the Control group. Leveraging random forest, coupled with feature engineering and importance techniques, the dataset underwent synthetic augmentation, yielding a total of 20,000 samples.

In comparison to the Control group, individuals in the CHD group exhibited significantly higher levels of Plasma Preβ HDL, with mean values of 13.5 mg/dL apoA1 and 10.2 mg/dL apoA1 respectively (p < 0.05). The second random forest classifier incorporating: Preβ HDL, FGF-Basic, MCP-1, Eotaxin, IL-10, IL-9, IL-1β achieved a F1 score, prediction accuracy, and AUROC score of 100%.

The remarkable results derived from the random forest classifiers underscore the need for further exploration into the predictive potential of Preβ HDL and plasma cytokines in the development of CHD, using ML methodologies. Further investigation may lead to the identification of novel drug targets for more effective therapeutic interventions.

Keywords: Atherosclerosis, Cholesterol/Efflux, Apolipoprotein, Inflammation, Lipids, AUROC, random forest, F1 score, Prediction Accuracy

Introduction

Cholesterol efflux capacity is associated with lower levels of prevalent and incident CHD, even after adjustment for HDL-C and apolipoprotein A-1 levels. Preβ High-Density Lipoproteins (HDL) are dense, nascent HDL particles composed of apolipoprotein A-1 molecules that are complexed with phospholipids and are the recipient particles for cholesterol effluxed from the artery wall (1). High levels of Preβ HDL result when efflux is impaired (2). Preβ HDL are precursors of spherical HDL containing cholesteryl esters which are generated after the efflux of free cholesterol and acquisition of apolipoproteins A-2, E, and C. The role of Preβ HDL at the commencement of reverse cholesterol transport from cells in peripheral tissues, including artery wall cells and macrophages, to the liver has been recognized (3). Additionally, pioneering, and advanced artificial intelligence and machine learning (ML) algorithms can gain insights and patterns from biomarkers such as Preβ HDL to provide recommendations and risk assessments to inform clinical decisions. In essence, the predictive prowess of Preβ HDL as an index of functional impairment of cholesterol efflux can be enhanced by the application of ML algorithms to improve risk prediction of CHD. Cytokines are small signaling molecules such as interferons, interleukins, and growth factors, that are secreted by cells to modulate the body’s immune and inflammatory responses (4). Both pro- and anti-inflammatory cytokines are active in mammalian systems and are known to play roles in the development and accelerated progression of coronary heart disease (4). Contemporary research predicts the distinct individual impact of cytokines and Preβ HDL for CHD risk stratification. In our research we are integrating the measurements of Preβ HDL and cytokines to predict the risk of CHD using ML algorithms such as Random Forest. This research highlights the significance and future exploratory potential of Preβ HDL in addition to the power of ML using circulating cytokine profiles for the prediction of CHD risk.

Materials and Methods

Subjects and Experimental Design

The dataset comprises measurement of 35 cytokines and Preβ HDL measurements in 108 individuals, with 41 individuals in the CHD group and 67 in the Control group. The summary of the biomarkers and the details related to 35 cytokines is provided in Table 1 and Table 2 respectively.

Table 1. Clinical Demographic Profile Among CHD and Controls Groups.

A retrospective collection of plasma samples from patients with diagnosed coronary heart disease (CHD) and healthy controls were included in the study. The two groups were matched based on male/female ratio and total plasma cholesterol. Inclusion criteria comprised patients who self-reported white/European race/ethnicity and non-smoker status, confirmed by referencing medical records to ensure non-use of lipid-lowering medications. Comparative analysis of total control to total CHD revealed that CHD patients were significantly older and had higher BMIs and elevated plasma Preβ HDL concentrations, while otherwise plasma lipids levels showed no significant differences between the groups. Significance between Control and CHD groups was determined using a t-test with p <0.05.

	Control (N=67)		Coronary Heart Disease (N=41)		Significance
Male/Female	M: N = 29 (43%) F: N = 38 (57%)		M: N =19 (46%) F: N = 22 (54%)		No
	Average (S.D.)	Range	Average (S.D.)	Range
Age	35.3 (11.3)	18–58	50.7 (7.9)	29–61	Yes
Body Mass Index (kg/m²)	24.9 (4.3)	18–42	26.9 (5.5)	18.3–47.4	Yes
Total Cholesterol (mg/dL)	184.1 (36.3)	68–285	193.5 (57.8)	102–335	No
Triglycerides (mg/dL)	85.8 (39)	33–276	98.0 (48.1)	33–270	No
LDL-Cholesterol (mg/dL)	102.8 (34.3)	17–189	116.4 (52.9)	21–243	No
HDL-Cholesterol (mg/dL)	64.0 (17.8)	25–101	59.6 (22.3)	29–107	No
hsCRP (mg/L)	2.8 (2.7)	0.01–18.0	2.4 (3)	0.01–13.0	No
Preβ HDL (mg/dL apoA1)	10.2 (4.0)	3.7–20.3	13.5 (5.4)	3.7–25.6	Yes

Open in a new tab

Table 2. Summaries of 35 Cytokine Profiles Among CHD and Control Groups.

Profile summaries include mean, standard deviation, range of values stratified based on CHD and Control. The final column using the t-test identifies whether the mean value of a cytokine is statistically significantly different from across CHD and Control at p < 0.05.

Cytokine	Control (N = 67)		Coronary Heart Disease (N = 41)		Significance
	M: N = 29/ F: N = 38		M: N = 19/ F: N = 22
	Average (S.D.)	Range	Average (S.D.)	Range
FGF-Basic	10.9(9.1)	2–72.1	21.7(38.3)	2–247.3	Yes
IL-1β	6.6(2.8)	1.3–10.2	4.7(3.1)	1.5–9.3	Yes
GCSF	44.4(24.6)	5.9–114.1	50.6(41.6)	5.9–234	No
IL-10	21.1(10.1)	0.1–30.1	17.9(22.9)	0.1–105.8	No
IL-13	7.6(3.4)	0.9–17.2	6.6(5.4)	0.9–30.2	No
IL-6	5.9(2.5)	0.2–8.9	5.5(4.7)	0.6–22.1	No
IL-12	58.3(23.4)	25.3–133.7	91.1(86.6)	22.3–493.9	Yes
RANTES	1781.7(799.8)	65.5–3440.5	1754.8(722.1)	66–3383.1	No
Eotaxin	13.9(8.2)	4.7–47.3	21.4(14.5)	4.2–72.5	Yes
IL-17α	4.8(2.9)	1–10.6	4.4(6.4)	1–30.9	No
MIP1-α	29.3(11.1)	7.2–93.8	29(19)	9.1–109.9	No
GMCSF	13.3(6.2)	1.4–17.3	8.6(7.6)	1.4–23.2	Yes
MIP1-β	39.6(48.8)	13.1–420.8	55.9(97.4)	11.7–602.2	No
MCP-1	115.1(75.4)	52.6–473.4	141.8(69.9)	56.6–389.1	No
IL-15	55.8(73.9)	5.9–580.6	63.2(94.7)	5.9–398.5	No
EGF	10(5.7)	0.3–30.3	10.2(12.8)	0.3–69.9	No
IL-5	5.7(4.5)	0.1–35.3	3.7(4.4)	0.1–24.3	Yes
HGF	164.9(179)	27.9–1473	187.2(184.9)	27.9–1135.9	No
VEGF	1.7(0.8)	0.1–3.5	1.4(1.5)	0.1–8.5	No
IL-1α	5.7(3.5)	0.6–26	7.3(15.9)	0.8–101.8	No
IFN-Ƴ	6.1(1.5)	1.7–9.3	5.5(1.6)	1.3–8.8	No
IL-17F	91.6(89.9)	0.2–734.2	103.2(205.5)	0.2–1211	No
IFN-α	15.5(14.4)	2.6–95.5	31.1(65.8)	2.6–398.8	No
IL-9	2.8(1.2)	0.4–5.8	3(5.5)	0.4–35.7	No
IL-1RA	51(91.2)	8.1–609.8	80.1(208.5)	8.2–1327	No
TNF-α	8.6(4.4)	0.1–12.5	6.3(7.8)	0.1–39.9	No
IL-3	8.8(9.1)	1–69.5	7.4(8.5)	1–41.9	No
IL-2	8.4(5.2)	0.1–30.8	5.8(5.6)	0.1–24.5	Yes
IL-7	2.3(4.8)	0.5–34.4	2(3.7)	0.5–21.2	No
IP-10	15.1(13.6)	4.5–82.4	16.5(15.9)	3.3–91	No
IL-2R	72.8(30.1)	1.1–147.3	86.1(108.7)	1.1–695.1	No
IL-22	17.9(22.9)	4.3–160.7	27.9(54.4)	4.3–260.6	No
MIG	30(71.4)	2.3–596.3	36.1(63.3)	2.3–378.1	No
IL-4	49.9(24.2)	2.6–81.3	32.7(30.8)	2.6–110.7	Yes
IL-8	13.3(5.7)	0.2–19.6	11.8(13.2)	0.2–82	No

Open in a new tab

Biochemical measurements were performed at UCSF (University of California San Francisco) using established assays: plasma concentrations of total cholesterol (TC) and triglycerides (TG), were measured by COBAS chemical analyzer (Roche Diagnostics). HDL-C was measured after magnesium and dextran sulfate precipitation of ApoB-containing lipoproteins, and LDL-C was calculated using the Friedewald formula. Biochemical measurements, medical histories, smoking status, and medications were reported in the Genomic Resource database and used to select plasma samples for this study.

Preβ HDL measurements: Plasma Preβ HDL was quantified by gel electrophoresis followed by immunoprecipitation with ApoA1 antibodies stained with Coomasie blue dye and visualized by scanning densitometry as previously described (5).

Study Subjects and Biochemical Data.

The plasma from 108 self-reported Caucasian subjects with either diagnosed CHD (n=41) or no history of heart disease or inflammatory diagnosis (Control, n=67) were used for the study (Table 1). The proportion of males and females was similar in both study groups (CHD: 53.7% female; Control 56.7% female). The CHD group was older (mean age 50.7 years) and had higher BMI (26.9) compared to Control (mean age 35.3 years, BMI 24.9). The groups were matched for plasma lipid levels with no statistical differences between Control and CHD subjects in plasma TC, TG, LDL-C, or HDL-C. Plasma hsCRP, a marker of cardiovascular risk, was also not statistically different in the groups. However, plasma Preβ HDL was significantly higher in the CHD group compared to Control subjects (mean 13.5 and 10.2 mg/dL apoA1, respectively).

Ethics Approval and Consent to Participate

This study was approved by the UCSF Institutional Review Board Committee on Human Research and conducted in accordance with the principles of the Declaration of Helsinki, and all subjects provided written informed consent prior to participation.

Consent for Publication

Not applicable.

Cytokine Array

Cytokine detection and quantification were performed using the Human Immune Monitoring 35-Plex ProcartaPlex Panel (Invitrogen / ThermoFisher). The 35-plex panel allows for measurement of human cytokines, chemokines, growth factors, and soluble receptors in single well format. Cytokine quantification was analyzed simultaneously using xMAP^® technology (Luminex^® Corporation). Cytokine raw data was analyzed by x Ponent^® software and each data point was extrapolated by a 5-Parameter Logistic (5-PL) weighted algorithm. The results for each analyte are expressed in pg/mL using the standard curve for each protein assayed.

Random Forest

Random forest is a non-parametric, eager learner algorithm that makes no assumptions regarding the underlying data distribution and can handle both quantitative as well as categorical data. This versatile ML technique can be used for both supervised learning classification as well as numerical prediction. Random Forest is similar to bagging (bootstrap aggregation) in that both algorithms construct trees using bootstrap samples. Unlike bagging, at each partition, Random Forest selects the best feature from a randomly selected subset of features. This strategy overall diversifies the ensemble of uncorrelated trees producing a superior performance compared to bagging and several other ML algorithms. For regression, prediction is obtained by averaging the numerical target variable across all trees whereas for classification, majority voting is determined across all trees to produce the final class label of the target variable.

Gini index is the criterion used to split the tree at decision nodes by determining the impurity of the node. Gini index quantifies the probability of misclassifying a randomly chosen element in a node, assuming it is randomly labeled according to the distribution of classes in that node.

gin i_{ImpurityIndex} = 1 - \sum_{i = 1}^{n} {(p_{i})}^{2}

(1)

Where pi is the proportion of each class in that node. Gini index of 0 indicates a perfect partition purity and a value of 1 indicates maximum impurity.

Random Forest with Variable Importance

Feature importance used for classification is mathematically calculated using criteria such as impurity index. Feature importance indices are computed as the mean and standard deviation of accumulation of the impurity decrease within each tree. This technique identified the predictor variables in terms of how important the variable is based on its predictive ability of differentiating CHD and control.

Evaluation Measures:

1. F1 score:

F1 score, a harmonic mean evaluation metric, balances the tradeoff between precision and recall, where precision is the percentage of predicted positive cases that are actually positive, and recall is the percentage of actual positive cases that were identified. Mathematically,

Precision = \frac{T P}{T P + F P}

(2)

Recall = \frac{T P}{T P + F N}

(3)

F 1 score = \frac{2 * Precision * Recall}{Precision + Recall} = \frac{2 * T P}{2 * T P + F P + F N}

(4)

Mathematically, it employs two parameters:

TP = True Positive TN = True Negative

FP = False Positive FN = False Negative

A high F1 score implies that both false negative and false positives are low. Even though F1 score is an appropriate evaluation measure in this context, it is sensitive to false negatives because imbalanced data impacts the recall component. To enhance the F1 score it is important to balance the data.

2. AUROC (Area under Receiver Operating Characteristic)

AUROC (6) is a curve that generates aggregated performance of a classification model at all possible probability thresholds. AUROC Curve is a performance evaluation measure that exhibits the quality of binary classification at different thresholds. The value of this measure ranges from 0 to 1. A value of 0 implies that the classification was entirely faulty and a value of 1 implies that the classification is accurate. Mathematically, the AUROC curve plots the total positive rate (TPR) i.e., sensitivity versus the false positive rate (FPR) i.e., 1- specificity. 95% CIs are also calculated for evaluating the separability efficacy within a reasonable range in effect, basically checking the distinction between TN and TP cases.

3. Prediction Accuracy

Prediction accuracy ascertains the true cases, negative and positive, that were predicted accurately.

Prediction Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(5)

Classifier Experimental Design

This study constructed two classifiers to identify CHD versus Control. Primarily, F1 score, AUROC score, and prediction accuracy are used to evaluate the quality of the classification. The data is balanced and synthetically augmented with the objective of generating a better classification. The majority class is under sampled, and the minority class is oversampled, merged, and augmented to obtain a total of 20,000 samples. The data is partitioned into a 25% –75% train test split and to improve generalizability of the model, overfitting is decreased by implementing stratified cross validation which ensures each stratified fold contains approximately the same percentage of samples of each target class as the complete set.

Results

Testing Results

The results are generated using two Random Forest classifiers, the first uses 35 cytokines and Preβ HDL and the second uses only seven of the most impactful predictor variables. The performance evaluation measures employed are F1 score and AUROC score, and these are presented in the following tables (Table 3, 4). The graphical description of the classification using Prediction accuracy and AUROC is demonstrated by the figures (Fig. 1, 2, 4, 5). The variable importance is displayed in the figure (Fig. 3).

Table 3. Classifier 1 Prediction Accuracy Using Random Forest with 35 Cytokines and Preβ HDL.

Classifier algorithm details related to the first Random Forest algorithm using 35 cytokines and Preβ HDL as predictor features with diagnosed CHD and Controls. The evaluation metrics included prediction accuracy, AUROC efficacy percentage and the F1 score.

Algorithm	Classification Details	Predictor Feature Space	AUROC	Prediction Accuracy	F1 Score
Random Forest	Number of estimators = 200 Depth of trees = 5	35 Cytokines and Preβ HDL	100%	98.2%	98%

Open in a new tab

Table 4. Classifier 2 Prediction Accuracy Using Random Forest with the Six Most Prominent Cytokines and Preβ HDL.

Classifier 2 description using Random Forest using the most prominent biomarkers as predictor features: Preβ HDL, FGF-Basic, MCP-1, Eotaxin, IL-10, IL-9, IL-1β with diagnosed CHD and Controls. Prediction accuracy, AUROC score and F1 score measures were used to evaluate this classifier.

Algorithm	Classification Details	Predictor Feature Space	AUROC	Prediction Accuracy	F1 Score
Random Forest	Number of estimators = 200 Depth of trees = 5	Preβ HDL, FGF-Basic, MCP-1, Eotaxin, IL-10, IL-9, IL-1β.	100%	100%	100%

Open in a new tab

Figure 1. — Performance of Random Forest using prediction accuracy as an evaluation measure to differentiate CHD from Control cohort. The classifier was implemented using 35 cytokines and Preβ HDL and obtained a prediction accuracy of 98%

Figure 2. — Performance of Random Forest using AUROC score to differentiate CHD group from the Control group using 35 cytokines and Preβ HDL. A perfect AUROC score of 100% was obtained for this classifier.

Figure 4. — Performance of Random Forest calculated to differentiate the CHD cohort from the Control cohort using the six most prominent cytokines and Preβ HDL. A 100% prediction accuracy was obtained using Preβ HDL and 6 most prominent cytokines: IL-1β, IL-10, FGF Basic, GCSF,IL-8, and GMCSF.

Figure 5. — Performance of Random Forest using a AUROC score to differentiate CHD group from the Control group. The predictor feature space was composed of using Preβ HDL and 6 most prominent cytokines: IL-1β, IL-10, FGF Basic, GCSF,IL-8, and GMCSF. A 100% AUROC score was generated using the seven predictor features.

Figure 3. — Identifying 13 most impactful predictor feature variables used for implementing Classifier 1 to differentiate the CHD cohort from the Control cohort. The thirteen most important cytokines that were identified are: IL-1β, IL-10, FGF Basic, GCSF,IL-8, and GMCSF,IL-13,IL-7,IL-4, Preβ HDL, TNF-α,IFN-γ,IL-3.

Classifier 1

The first Random Forest classifier is constructed using 35 cytokine biomarker variables and Preβ HDL to optimize the performance. The quantitative evaluation assessment is presented in the table (Table 3) and the corresponding AUROC score of 100% has been graphically displayed in the figure (Fig. 1, 2). Additionally, this algorithm achieved a prediction accuracy of 98% and a F1 score of 98%. The 13 most important predictor variables in terms of distinguishing CHD versus control are represented in the figure (Fig. 3).

Classifier 2

The second Random Forest classifier is constructed by using Preβ HDL, the most impactful biomarker, in conjunction with the next six prominent cytokines out of 35 that accounted for the performance achieved by the first classifier. The seven most impactful biomarkers displayed in the graph (Fig. 3) include Preβ HDL, FGF-Basic, MCP-1, Eotaxin, IL-10, IL-9, IL-1β. The following table (Table 4) shows that the algorithm achieved an AUROC score of 100%, prediction accuracy of 100% and F1 score of 100%. The prediction accuracy confusion matrix is represented visually (Fig. 4) and AUROC score is visually presented using the graph (Fig. 5).

Classifier Comparison

Both classifiers demonstrated an exceptionally good performance based on the F1 score and AUROC score. The second classifier implemented with the seven most impactful predictors variables outperformed the first classifier that was constructed using 36 predictor variables. The F1 score, prediction accuracy and AUROC score of the second classifier is 100%. The impactful cytokines used for the second classifier presented in the table (Table 4) are as follows: Preβ HDL, FGF-Basic, MCP-1, Eotaxin, IL-10, IL-9, IL-1β.

Discussion

The research addressed in this paper has provided incremental insights in regard to the complex role of cytokines and other molecular biomarkers such as Preβ HDL in relation to the development or progression of CHD. Previous studies into the effects of Preβ HDL on the development of CHD and myocardial infarction (MI) have revealed higher levels of, percent and absolute, Preβ HDL in CHD and MI groups, 21.3%, 25.1% (CHD, MI) and 31.9%, 33.1% (CHD, MI) respectively (3). On similar lines, investigation into the impacts of Preβ HDL on CHD and MI risk have found that increased levels of Preβ HDL were more predictive for MI risk than the development of CHD – with the net reclassification index of 0.21 and integrated discrimination improvement of 0.01 obtained for MI risk being of statistical significance (2). These studies demonstrate the predictive power of Preβ HDL in the context of identifying CHD risk and provide a justifiable basis to undertake an extensive and exhaustive investigation. Preβ HDL provides insight into vascular health, reverse cholesterol transport, and measures a pathway not captured by conventional risk factors (2,3,7–10).

The inflammatory hypothesis highlights the pivotal role of inflammation in both the onset and progression of atherosclerotic vascular conditions. A potential key in preventing or treating CHD lies in addressing the inflammatory process through monitoring cytokine levels. A review focusing on the significance of cytokines in vascular diseases development, identified the independent relationship between five pro-inflammatory cytokines: IL-6, IL-18, MMP-9, sCD40L, and TNF-α and the risk of CHD. Findings from the study indicated a positive correlation between IL-6, IL-18, and TNF-α with CHD risk, reinforcing the inflammatory hypothesis of vascular disease. The identification of IL-6 and TNF-α as potential targets for drug intervention shows promise in addressing CHD. Research investigating the impact of Canakinumab, an interleukin-1beta inhibitor, on heart failure prevention has provided additional support for theories regarding the use of IL-6, TNF-α, and other cytokines as therapeutic targets. The study shows a reduction in hospitalization for heart failure (HHF) that was dependent on the dosage of Canakinumab, with those receiving the highest dose of Canakinumab experiencing the greatest reduction in HHF. These outcomes indicate that IL-1 beta inhibitors may play a role in heart failure prevention, warranting further investigation (11).

ML has shown significant potential in predicting and detecting disease risk that facilitates timely intervention and treatment^1–5. The combination of clinical expertise and ML models, built using diverse data from multiple sources, including real time monitoring, generates comprehensive risk profiles that allow for precision based personalized treatment plans. Prior research into the effects of 35 plasma cytokines in relation to CAD risk via K Nearest Neighbor (k-NN) and Random Forest ML algorithms has demonstrated the predictive power of ML algorithms. Random Forest proved to be an exemplary predictor of CAD risk with AUROC scores of 0.99 (17).

Taken together, these results provide support for further investigation into the predictive nature of plasma cytokines on the development of CHD through ML techniques. This research powered by Random Forest, provides definitive evidence that plasma cytokines IL-10, IL-1β, FGF-Basic along with Preβ HDL have exceptional predictive prowess with regards to identifying individuals with CHD risk. Broadly, this study highlights the importance of Preβ HDL as a contributing factor in determining whether an individual is at risk of a CHD condition, since it was identified as a one of the most prominent predictor feature variables. The synergistic role of cytokines and Preβ HDL in the etiology of CHD is also empirically demonstrated by the second classifier which generated a superior AUROC curve prediction accuracy of 100%.

Both this study and the reviewed research illustrate the importance of using ML to enhance the understanding of the complex interplay between cytokines, a well-established biomarker of risk (Preβ HDL), and other molecular biomarkers to predict CHD. As we enter the era of precision medicine, where artificial intelligence fueled by big data and advanced biomarker discoveries, conducting precise analyses becomes imperative. These analyses aid in understanding the interactions and roles of biomarkers in disease causation and are pivotal in the development of new targeted therapy to save lives.

Impedance of the retrieval of cholesterol from the artery wall is an independent contributor to clinical coronary disease. Therefore, the measurement of Preβ HDL and specific cytokines shows potential for more precise targeted treatments for individuals with vascular disease. Preβ HDL stands as a strong independent risk factor for CHD and should be included in routine risk assessments. Patients experiencing progressive CHD, despite normal or controlled atherogenic lipoprotein levels, might have evidence of inflammatory markers, such as high IL-1, or impairment of reverse cholesterol transport, evidenced by elevated Preβ HDL levels. The advent of effective cytokine inhibitors opens new pathways for intervention. Recent studies on colchicine, as a known therapeutic agent for gout and familial Mediterranean fever, suggest potential therapeutic effects in arterial disease (18). Its well-established anti-inflammatory effects are under investigation in the context of CAD management. The findings of LoDoCo2 and COLCOT studies strongly advocate for the use of low dose colchicine in the management of CAD, leading to its FDA approval in June of this year (18–21).

The impact on CHD can be further investigated in future studies by expanding the dataset, incorporating additional cytokines, and obtaining emerging data from other sources. Existing data indicate the involvement of inflammatory mediators in atherogenesis, implying that a number of cytokines may be involved. The identification of these previously unrecognized risk factors unveils multiple molecular mechanisms related to arterial disease, potentially revealing crucial additional targets for future therapeutic interventions.

These data support the identification of two important new domains involving reverse cholesterol transport and inflammatory mechanisms. Laboratory indicators of these significant factors should be incorporated in the assessment of risk and potential interventions in the future.

Acknowledgments

We wish to extend our deepest thanks to the study participants.

Funding Sources:

This research was supported by the NIH under Ruth L. Kirschstein National Research Service Award 2T32HL007731-26 from the Department of Health and Human Services Public Health Services (KTC). Additional support was provided by the Read Foundation Charitable Trust and the Campini Foundation (JPK).

Abbreviations and Acronyms:

AUROC: Area Under Receiver Operating Characteristic
CHD: Coronary Heart Disease
CAD: Coronary Artery Disease
CANTOS: Canakinumab Anti-inflammatory Thrombosis Outcome Study
COLCOT: Colchicine Cardiovascular Outcomes Trial
FPR: False Positive Rate
HHF: Hospitalization for Heart Failure
IL-1: Interleukin-1
IL-6: Interleukin-6
IL-18: Interleukin-18
k-NN: K-Nearest Neighbor
LoDoCo2: Low-Dose Colchicine 2
MMP-9: Metalloproteinase-9
ML: Machine Learning
sCD40L: Soluble CD40 ligand
TC: Total Cholesterol
TG: Triglycerides
TPR: True Positive Rate
UCSF: University of California, San Francisco

Footnotes

Declarations of interest: none

Data Availability

The data used is composed of confidential patient health information. The data obtained from the Kane laboratory and Genomic Resource in Cardiovascular and Metabolic Disease at UCSF are protected by HIPAA and cannot be released to the public. The deidentified data are summarized within the manuscript. We are prepared to answer any questions that may arise regarding the data used in our research.

References

1.Kaptoge S, Seshasai SR, Gao P, Freitag DF, Butterworth AS, Borglykke A, Di Angelantonio E, Gudnason V, Rumley A, Lowe GD, et al. Inflammatory cytokines and risk of coronary heart disease: new prospective study and updated meta-analysis. Eur Heart J. 2014;35:578–589. doi: 10.1093/eurheartj/eht367 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Guey LT, Pullinger CR, Ishida BY, O'Connor PM, Zellner C, Francone OL, Laramie JM, Naya-Vigne JM, Siradze KA, Deedwania P, et al. Relation of increased prebeta-1 high-density lipoprotein levels to risk of coronary heart disease. Am J Cardiol. 2011;108:360–366. doi: 10.1016/j.amjcard.2011.03.054 [DOI] [PubMed] [Google Scholar]
3.Pullinger CR, O'Connor PM, Naya-Vigne JM, Kunitake ST, Movsesyan I, Frost PH, Malloy MJ, Kane JP. Levels of Prebeta-1 High-Density Lipoprotein Are a Strong Independent Positive Risk Factor for Coronary Heart Disease and Myocardial Infarction: A Meta-Analysis. J Am Heart Assoc. 2021;10:e018381. doi: 10.1161/JAHA.120.018381 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zhang JM, An J. Cytokines, inflammation, and pain. Int Anesthesiol Clin. 2007;45:27–37. doi: 10.1097/AIA.0b013e318034194e [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Quinn AG, Schwemberger R, Stock EO, Movsesyan I, Axtell A, Chang S, Ishida BY, Malloy MJ, Kane JP, Pullinger CR. Moderate statin treatment reduces prebeta-1 high-density lipoprotein levels in dyslipidemic patients. J Clin Lipidol. 2017;11:908–914. doi: 10.1016/j.jacl.2017.04.118 [DOI] [PubMed] [Google Scholar]
6.Hajian-Tilaki K. Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian J Intern Med. 2013;4:627–635. [PMC free article] [PubMed] [Google Scholar]
7.A.R. Sharrett CMB, Coady SA, Heiss G, Sorlie PD, Catellier D and Patsch W. Coronary Heart Disease Prediction From Lipoprotein Cholesterol Levels, Triglycerides, Lipoprotein(a), Apolipoproteins A-I and B, and HDL Density Subfractions The Atherosclerosis Risk in Communities (ARIC) Study. Circulation. 2001;104:6. [Google Scholar]
8.Kane JP, Malloy Mary J. Prebeta-1 HDL and coronary heart disease. Current Opinion in Lipidology. 2012:5. doi: 10.1097/MOL.0b013e328353eef1 [DOI] [Google Scholar]
9.T Miida YN, Inano K, Matsuto T, Yamaguchi T, Tsuda T, Okada M. Pre beta 1-high-density lipoprotein increases in coronary artery disease. Clinical Chemistry. 1996;42:4. [Google Scholar]
10.Rindert de Vries FGPAvT, Robin PF. Dullaart Carotid intima media thickness is related positively to plasma pre ß-high density lipoproteins in non-diabetic subjects. Clinica Chimica Acta. 2012;413:5. doi: 10.1016/j.cca.2011.11.001 [DOI] [Google Scholar]
11.Everett BM, Cornel JH, Lainscak M, Anker SD, Abbate A, Thuren T, Libby P, Glynn RJ, Ridker PM. Anti-Inflammatory Therapy With Canakinumab for the Prevention of Hospitalization for Heart Failure. Circulation. 2019;139:1289–1299. doi: 10.1161/CIRCULATIONAHA.118.038010 [DOI] [PubMed] [Google Scholar]
12.Alizadehsani R, Habibi J, Alizadeh Sani Z, Mashayekhi H, Boghrati R, Ghandeharioun A, Khozeimeh F, Alizadeh-Sani F. Diagnosing Coronary Artery Disease via Data Mining Algorithms by Considering Laboratory and Echocardiography Features. Res Cardiovasc Med. 2013;2:133–139. doi: 10.5812/cardiovascmed.10888 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Nils Hampe JMW, van Velzen Sanne G. M., Leiner Tim, and Išgum Ivana. Machine Learning for Assessment of Coronary Artery Disease in Cardiac CT: A Survey. frontiers in Cardiovascular Medicine. 2019. [Google Scholar]
14.Qurat-ul-ain Mastoi TYW, Gopal Raj Ram, and Iqbal Uzair. Automated Diagnosis of Coronary Artery Disease: A Review and Workflow. Cardiology Research and Practice. 2018. doi: 10.1155/2018/2016282 [DOI] [Google Scholar]
15.Indu Saini DS, Khosla Arun. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research. 2013;4:14. doi: 10.1016/j.jare.2012.05.007 [DOI] [Google Scholar]
16.Carlos Martin-Isla VMC, Izquierdo Cristian, Raisi-Estabragh Zahra, Baeßler Bettina Petersen Steffen E., Lekadir Karim. Image-Based Cardiac Diagnosis With Machine Learning: A Review. Frontiers in Cardiovascular Medicine. 2020;7. doi: 10.3389/fcvm.2020.00001 [DOI] [Google Scholar]
17.Saharan SS, Nagar P, Creasy KT, Stock EO, Feng J, Malloy MJ, Kane JP. Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines. BioData Min. 2021;14:26. doi: 10.1186/s13040-021-00260-z [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Deftereos Spyridon G. M, PhD, Beerkens Frans J., MD, Shah Binita, MD, Giannopoulos George, MD, PhD, Vrachatis Dimitrios A., MD, MS, Giotaki Sotiria G., MD, Siasos Gerasimos, MD, MS, Nicolas Johny, Arnott Clare, MBBS, PhD, Patel Sanjay, MBBS, PhD, Parsons Mark, MD, Tardif Jean-Claude, Kovacic Jason C., MD, and Dangas George D., MD, PhD. Colchicine in Cardiovascular Disease: In-Depth Review. Circulation. 2022;145:18. [DOI] [PubMed] [Google Scholar]
19.FDA. LODOCO (colchicine) tablets, for oral use. 2023.
20.Nidorf Stefan M. MD, Fiolet Aernoud T.L., M.D., Mosterd Arend, M.D., Eikelboom John W., M.D., Schut Astrid, M.Sc., Opstal Tjerk S.J., M.D., The Salem H.K., M.D., Xiao-Fang Xu, M.D., Ireland Mark A., M.D., Lenderink Timo, M.D., Latchem Donald, M.D., Hoogslag Pieter, M.D. Colchicine in Patients with Chronic Coronary Disease. The New England Journal of Medicine. 2020. doi: 10.1056/NEJMoa2021372 [DOI] [Google Scholar]
21.Marquis-Gravel J-CTaG. Low-Dose Colchicine for the Management of Coronary Artery Disease*. JACC. 2021;78:3. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[R1] 1.Kaptoge S, Seshasai SR, Gao P, Freitag DF, Butterworth AS, Borglykke A, Di Angelantonio E, Gudnason V, Rumley A, Lowe GD, et al. Inflammatory cytokines and risk of coronary heart disease: new prospective study and updated meta-analysis. Eur Heart J. 2014;35:578–589. doi: 10.1093/eurheartj/eht367 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Guey LT, Pullinger CR, Ishida BY, O'Connor PM, Zellner C, Francone OL, Laramie JM, Naya-Vigne JM, Siradze KA, Deedwania P, et al. Relation of increased prebeta-1 high-density lipoprotein levels to risk of coronary heart disease. Am J Cardiol. 2011;108:360–366. doi: 10.1016/j.amjcard.2011.03.054 [DOI] [PubMed] [Google Scholar]

[R3] 3.Pullinger CR, O'Connor PM, Naya-Vigne JM, Kunitake ST, Movsesyan I, Frost PH, Malloy MJ, Kane JP. Levels of Prebeta-1 High-Density Lipoprotein Are a Strong Independent Positive Risk Factor for Coronary Heart Disease and Myocardial Infarction: A Meta-Analysis. J Am Heart Assoc. 2021;10:e018381. doi: 10.1161/JAHA.120.018381 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Zhang JM, An J. Cytokines, inflammation, and pain. Int Anesthesiol Clin. 2007;45:27–37. doi: 10.1097/AIA.0b013e318034194e [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Quinn AG, Schwemberger R, Stock EO, Movsesyan I, Axtell A, Chang S, Ishida BY, Malloy MJ, Kane JP, Pullinger CR. Moderate statin treatment reduces prebeta-1 high-density lipoprotein levels in dyslipidemic patients. J Clin Lipidol. 2017;11:908–914. doi: 10.1016/j.jacl.2017.04.118 [DOI] [PubMed] [Google Scholar]

[R6] 6.Hajian-Tilaki K. Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian J Intern Med. 2013;4:627–635. [PMC free article] [PubMed] [Google Scholar]

[R7] 7.A.R. Sharrett CMB, Coady SA, Heiss G, Sorlie PD, Catellier D and Patsch W. Coronary Heart Disease Prediction From Lipoprotein Cholesterol Levels, Triglycerides, Lipoprotein(a), Apolipoproteins A-I and B, and HDL Density Subfractions The Atherosclerosis Risk in Communities (ARIC) Study. Circulation. 2001;104:6. [Google Scholar]

[R8] 8.Kane JP, Malloy Mary J. Prebeta-1 HDL and coronary heart disease. Current Opinion in Lipidology. 2012:5. doi: 10.1097/MOL.0b013e328353eef1 [DOI] [Google Scholar]

[R9] 9.T Miida YN, Inano K, Matsuto T, Yamaguchi T, Tsuda T, Okada M. Pre beta 1-high-density lipoprotein increases in coronary artery disease. Clinical Chemistry. 1996;42:4. [Google Scholar]

[R10] 10.Rindert de Vries FGPAvT, Robin PF. Dullaart Carotid intima media thickness is related positively to plasma pre ß-high density lipoproteins in non-diabetic subjects. Clinica Chimica Acta. 2012;413:5. doi: 10.1016/j.cca.2011.11.001 [DOI] [Google Scholar]

[R11] 11.Everett BM, Cornel JH, Lainscak M, Anker SD, Abbate A, Thuren T, Libby P, Glynn RJ, Ridker PM. Anti-Inflammatory Therapy With Canakinumab for the Prevention of Hospitalization for Heart Failure. Circulation. 2019;139:1289–1299. doi: 10.1161/CIRCULATIONAHA.118.038010 [DOI] [PubMed] [Google Scholar]

[R12] 12.Alizadehsani R, Habibi J, Alizadeh Sani Z, Mashayekhi H, Boghrati R, Ghandeharioun A, Khozeimeh F, Alizadeh-Sani F. Diagnosing Coronary Artery Disease via Data Mining Algorithms by Considering Laboratory and Echocardiography Features. Res Cardiovasc Med. 2013;2:133–139. doi: 10.5812/cardiovascmed.10888 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Nils Hampe JMW, van Velzen Sanne G. M., Leiner Tim, and Išgum Ivana. Machine Learning for Assessment of Coronary Artery Disease in Cardiac CT: A Survey. frontiers in Cardiovascular Medicine. 2019. [Google Scholar]

[R14] 14.Qurat-ul-ain Mastoi TYW, Gopal Raj Ram, and Iqbal Uzair. Automated Diagnosis of Coronary Artery Disease: A Review and Workflow. Cardiology Research and Practice. 2018. doi: 10.1155/2018/2016282 [DOI] [Google Scholar]

[R15] 15.Indu Saini DS, Khosla Arun. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research. 2013;4:14. doi: 10.1016/j.jare.2012.05.007 [DOI] [Google Scholar]

[R16] 16.Carlos Martin-Isla VMC, Izquierdo Cristian, Raisi-Estabragh Zahra, Baeßler Bettina Petersen Steffen E., Lekadir Karim. Image-Based Cardiac Diagnosis With Machine Learning: A Review. Frontiers in Cardiovascular Medicine. 2020;7. doi: 10.3389/fcvm.2020.00001 [DOI] [Google Scholar]

[R17] 17.Saharan SS, Nagar P, Creasy KT, Stock EO, Feng J, Malloy MJ, Kane JP. Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines. BioData Min. 2021;14:26. doi: 10.1186/s13040-021-00260-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Deftereos Spyridon G. M, PhD, Beerkens Frans J., MD, Shah Binita, MD, Giannopoulos George, MD, PhD, Vrachatis Dimitrios A., MD, MS, Giotaki Sotiria G., MD, Siasos Gerasimos, MD, MS, Nicolas Johny, Arnott Clare, MBBS, PhD, Patel Sanjay, MBBS, PhD, Parsons Mark, MD, Tardif Jean-Claude, Kovacic Jason C., MD, and Dangas George D., MD, PhD. Colchicine in Cardiovascular Disease: In-Depth Review. Circulation. 2022;145:18. [DOI] [PubMed] [Google Scholar]

[R19] 19.FDA. LODOCO (colchicine) tablets, for oral use. 2023.

[R20] 20.Nidorf Stefan M. MD, Fiolet Aernoud T.L., M.D., Mosterd Arend, M.D., Eikelboom John W., M.D., Schut Astrid, M.Sc., Opstal Tjerk S.J., M.D., The Salem H.K., M.D., Xiao-Fang Xu, M.D., Ireland Mark A., M.D., Lenderink Timo, M.D., Latchem Donald, M.D., Hoogslag Pieter, M.D. Colchicine in Patients with Chronic Coronary Disease. The New England Journal of Medicine. 2020. doi: 10.1056/NEJMoa2021372 [DOI] [Google Scholar]

[R21] 21.Marquis-Gravel J-CTaG. Low-Dose Colchicine for the Management of Coronary Artery Disease*. JACC. 2021;78:3. [Google Scholar]

PERMALINK

Machine Learning-Based Model for Predicting Coronary Heart Disease Using Preβ HDL and Cytokines as Plasma Biomarkers

Seema Singh Saharan, Ph.D.

Kate Townsend Creasy, Ph.D.

Lauren Birnbaum, B.S.

Eveline O Stock, M.D.

Jelena Mustra Rakic, Ph.D.

Xiaoli Tian, Ph.D.

Arun Prakash, M.D., Ph.D.

Mary Malloy, M.D.

John Kane, M.D., Ph.D.

Abstract

Introduction

Materials and Methods

Subjects and Experimental Design

Table 1. Clinical Demographic Profile Among CHD and Controls Groups.

Table 2. Summaries of 35 Cytokine Profiles Among CHD and Control Groups.

Study Subjects and Biochemical Data.

Ethics Approval and Consent to Participate

Consent for Publication

Cytokine Array

Random Forest

Random Forest with Variable Importance

Evaluation Measures:

1. F1 score:

2. AUROC (Area under Receiver Operating Characteristic)

3. Prediction Accuracy

Classifier Experimental Design

Results

Testing Results

Table 3. Classifier 1 Prediction Accuracy Using Random Forest with 35 Cytokines and Preβ HDL.

Table 4. Classifier 2 Prediction Accuracy Using Random Forest with the Six Most Prominent Cytokines and Preβ HDL.

Figure 1. Confusion Matrix for Classifier 1 Using Random Forest with 36 Predictor Features.

Figure 2. AUROC Plot for Classifier 1 Using Random Forest with 36 Predictor Features.

Figure 4. Confusion Matrix for Classifier 2 Using Random Forest with the Six Most Prominent Cytokines and Preβ HDL.

Figure 5. AUROC Plot for Classifier 2 Using Random Forest with the Six Most Prominent Cytokines and Preβ HDL.

Figure 3. Variable Importance Plot Identifying 13 Most Impactful Biomarkers from Amongst the Original 36 Biomarkers.

Classifier 1

Classifier 2

Classifier Comparison

Discussion

Acknowledgments

Funding Sources:

Abbreviations and Acronyms:

Footnotes

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases