A Cox-Based Risk Prediction Model for Early Detection of Cardiovascular Disease: Identification of Key Risk Factors for the Development of a 10-Year CVD Risk Prediction

Xiaona Jia; Mirza Mansoor Baig; Farhaan Mirza; Hamid GholamHosseini

doi:10.1155/2019/8392348

. 2019 Apr 9;2019:8392348. doi: 10.1155/2019/8392348

A Cox-Based Risk Prediction Model for Early Detection of Cardiovascular Disease: Identification of Key Risk Factors for the Development of a 10-Year CVD Risk Prediction

Xiaona Jia ¹, Mirza Mansoor Baig ^1,^✉, Farhaan Mirza ¹, Hamid GholamHosseini ¹

PMCID: PMC6481149 PMID: 31093375

Abstract

Background and Objective. Current cardiovascular disease (CVD) risk models are typically based on traditional laboratory-based predictors. The objective of this research was to identify key risk factors that affect the CVD risk prediction and to develop a 10-year CVD risk prediction model using the identified risk factors. Methods. A Cox proportional hazard regression method was applied to generate the proposed risk model. We used the dataset from Framingham Original Cohort of 5079 men and women aged 30-62 years, who had no overt symptoms of CVD at the baseline; among the selected cohort 3189 had a CVD event. Results. A 10-year CVD risk model based on multiple risk factors (such as age, sex, body mass index (BMI), hypertension, systolic blood pressure (SBP), cigarettes per day, pulse rate, and diabetes) was developed in which heart rate was identified as one of the novel risk factors. The proposed model achieved a good discrimination and calibration ability with C-index (receiver operating characteristic (ROC)) being 0.71 in the validation dataset. We validated the model via statistical and empirical validation. Conclusion. The proposed CVD risk prediction model is based on standard risk factors, which could help reduce the cost and time required for conducting the clinical/laboratory tests. Healthcare providers, clinicians, and patients can use this tool to see the 10-year risk of CVD for an individual. Heart rate was incorporated as a novel predictor, which extends the predictive ability of the past existing risk equations.

1. Introduction

Cardiovascular disease (CVD) describes various conditions that affect the functioning of heart/cardiovascular [1]. Due to the high rate of disease morbidity, CVD has become the leading cause of mortality around the world [2–4]. In New Zealand, statistics on CVD mortality in 2017 suggests that the percentage of deaths caused by CVD is 33% [4].

Majority of cardiovascular-related deaths are premature and preventable and can be improved by effective health management by employing effective diet plans, lifestyle interventions, and drug intervention [5]. To prevent CVD, a useful approach is to assess CVD risk regularly and then introduce new lifestyle adjustments or clinical treatments accordingly.

In the past decades, a great deal of research has been done on the CVD risk estimation such as the Framingham risk scores from the Framingham Heart Study (FHS) [6, 7], the QRISK equations [8], the Europe SCORE risk equations [9], the ASSIGN scores from the Scottish Heart Health Extended Cohort (SHHEC) [10], the Prospective Cardiovascular Master (PROCAM) equations [11], and the CUORE Cohort Study formulas [12]. These CVD risk prediction models have proved their effectiveness in the health and disease management for clinicians and individuals [13–15]. The new PREDICT CVD risk assessment equation developed for primary health care among the population in New Zealand has been integrated to the electronic health records (EHRs) and a web-based software called PREDICT has been developed to support general practices manage the CVD risk in primary care [13]. The PREDICT has got 400,728 patients assessed with the CVD risk and is becoming a useful tool for decision support and health management for general practitioners.

However, challenges and issues regarding the development of CVD risk estimation models still exist. CVD risk models [16–18] are based on single risk factor which cannot realize the influence of multiple factors simultaneously. Risk models [6, 8, 19] using statistical regression methods [20–22] prefer to use classic risk factors such as age, smoking, diabetes, sex, high blood pressure, and total cholesterol to estimate the risk score. Studies [18, 19, 23–27] applying data mining or machine learning techniques for the CVD risk estimations cannot provide an absolute risk estimation, although some of these models [18, 26] tried to incorporate novel predictors in the risk models. This research aims to identify the novel risk factors for CVD detection by conventional predictors and then enhance the risk estimation by developing a multiple-variable-based risk prediction model that targets the 5-year and 10-year CVD events.

2. Methods

2.1. Study Population

The study population selected from the Framingham Original Cohort study dataset [28, 29]. We obtained the ethics approval from NHLBI [30] and the Auckland University of Technology Ethics Committee (AUTEC) (Ref: 17/385 Early Detection and Self-Management of Cardiovascular Disease Using Artificial Intelligence-Based Model). The data from this cohort study includes a total of 5079 men and women aged 30-74 years free of CVD at the baseline, of them 3189 had CVD events eventually. Details of the CVD events distribution in male and female among the study population are summarized in Table 1.

Table 1.

CVD event distribution in male and female.

	Count.	CVD Events	Age Range
Male	2294	1560	30 - 74
Female	2785	1629	30 - 74
Total	5079	3189	30 - 74

Open in a new tab

2.2. Data Extraction

There are 32 exams in the Framingham Original Cohort study dataset, as shown in Appendix A. Data frame collected in the first exam “Exam1” was chosen to develop the CVD prediction model because it has the maximum number of samples 5209 subjects. Data from 130 subjects were removed because of the ethics protection. The other five exams are ranging from 8 to 12, marked with italic font (as shown in Table 7 of Appendix A) and will be used for the validation for the fitted model. Data of candidate risk factors (listed in Table 2) for creating the risk model was extracted.

Table 7.

Exams in the Framingham Original Cohort study data set.

Exams	Exam Date Range	Age Range	Mean Age	Attendees
Exam 1	1948 - 1953	28 - 74	44	5209
Exam 2	1950 - 1955	31 - 65	46	4792
Exam 3	1952 - 1956	32 - 67	48	4416
Exam 4	1954 - 1958	34 - 69	50	4541
Exam 5	1956 - 1960	37 - 70	52	4421
Exam 6	1958 - 1963	38 - 72	54	4259
Exam 7	1960 - 1964	40 - 74	55	4191
Exam 8	1962 - 1966	42 - 76	57	4030
Exam 9	1964 - 1968	44 - 78	59	3833
Exam 10	1966 - 1970	46 - 80	61	3595
Exam 11	1968 - 1971	49 - 81	62	2955
Exam 12	1971 - 1974	50 - 83	64	3261
Exam 13	1972 - 1976	53 - 85	66	3133
Exam 14	1975 - 1978	55 - 88	68	2871
Exam 15	1977 - 1979	57 - 89	69	2632
Exam 16	1979 - 1982	59 - 91	70	2351
Exam 17	1981 - 1984	61 - 93	72	2179
Exam 18	1983 - 1985	63 - 94	74	1825
Exam 19	1985 - 1988	65 - 96	75	1541
Exam 20	1986 - 1990	67 - 97	77	1401
Exam 21	1988 - 1992	69 - 99	79	1319
Exam 22	1990 - 1994	72 - 101	80	1166
Exam 23	1992 - 1996	73 - 101	81	1026
Exam 24	1995 - 1998	76 - 103	83	831
Exam 25	1997 - 1999	78 - 104	84	703
Exam 26	1999 - 2001	79 - 103	86	558
Exam 27	2002 - 2003	82 - 104	87	414
Exam 28	2004 - 2005	84 - 104	89	303
Exam 29	2006 - 2007	85 - 102	91	218
Exam 30	2008 - 2010	88 - 102	92	141
Exam 31	2010 - 2011	90 - 99	92	91
Exam 32	2012 - 2014	93 - 106	96	40

Open in a new tab

Table 2.

Description of candidate predictors.

ORDERS	PREDICTORS	UNITS	TYPES
1	AGE	YEARS	CONTINUOUS

2	SEX	0001 MALE 0002 FEMALE	CATEGORICAL

3	BMI	KG/M2	CONTINUOUS

4	HYPERTENSION	0000 NEGATIVE 0001 TRANSIENT 0002 PERMANENT 0003 TYPE UNKNOWN 0008 DOUBTFUL	CATEGORICAL

5	HISTORY OF NERVOUS HEART	0000 NO 0001 YES, DEFINITE	CATEGORICAL

6	HISTORY OF PERICARDITIS	0000 NO 0001 YES, DEFINITE	CATEGORICAL

7	HISTORY OF OTHER CVD	0000 NO 0001 YES, DEFINITE	CATEGORICAL

8	PREMATURE BEATS	0000 NO 0001 YES, DEFINITE 0002 YES, DOUBTFUL	CATEGORICAL

9	HISTORY OF ATRIOVENTRICULAR BLOCK	0000 NO 0001 YES, DEFINITE 0002 YES, DOUBTFUL	CATEGORICAL

10	HISTORY OF RHEUMATIC FEVER	0000 NONE 0001 YES 0008 DOUBTFUL	CATEGORICAL

11	HISTORY OF ALLERGY OR ASTHMA	0000 NEGATIVE 0001 ALLERGY, ALONE 0002 BRONCHIAL ASTHMA, ALONE, 0003 ALLERGY AND ASTHMA, TOGETHER	CATEGORICAL

12	HISTORY OF THYROID DISEASE	0000 NEGATIVE 0001 HYPERTHYROID ONLY 0002 HYPOTHYROID ONLY	CATEGORICAL

13	HISTORY OF SUBACUTE ENDOCARDITIS	0000 NO 0001 YES	CATEGORICAL

14	BLOOD PRESSURE SYSTOLIC	MM HG	CONTINUOUS

15	BLOOD PRESSURE DIASTOLIC	MM HG	CONTINUOUS

16	CIGARETTES PER DAY	LAPSE, FORM 8/50	CONTINUOUS

17	CIGARS PER DAY	LAPSE, FORM 8/50	CONTINUOUS

18	PIPERS PER DAY	LAPSE, FORM 8/50	CONTINUOUS

19	PULSE RATE	PER MINUTE	CONTINUOUS

20	DIABETES	0000 NO 0001 YES, DEFINITE	CATEGORICAL

Open in a new tab

2.3. Statistical Analysis

Cox proportional hazard regression analysis [22] was selected for developing the proposed risk model (one of the most accurate method belonging to the semiparametric statistical method). This research aims to develop a prediction model using multiple parameters to estimate the probability of developing CVD for an individual. There are mainly three statistical approaches in survival analysis, i.e., nonparametric, semiparametric, and parametric [31]. The nonparametric approaches can only perform univariate analysis with single predictor and therefore are not suitable for the study of continuous variables [22, 32]. Both parametric and semiparametric approaches can perform multiple parameter analysis. They assume that the predictors and the log hazard rate have a linear relationship between [33]. However, the Cox proportional hazard model has an advantage that only the rank orderings of the failure and censoring times are used to estimate and test the regression coefficients [22]. The Cox model is more efficient even though the assumption of the parametric models is met. When the assumptions are not met, the Cox regression analysis can still be used efficiently with an extended Cox regression from [34], but a parametric model such as Weibull survival distribution would be a null model.

Statistical analyses were performed in R Studio platform [35]. Missing values for candidate risk factors listed in Table 2 were imputed using Multiple Imputation [36]. Continuous and categorical variables were transformed and imputed using algorithms modified from Maximum Generalized Variance (MGV) in the SAS PRINQUAL procedure [37]. R function transcan inside the “Hmisc” package was used [35].

For candidate predictors listed in Table 2, two steps of variables selection from the list were performed. The first step was conducted in a “Forward Selection” manner [38]; i.e., the univariate Cox analysis was applied to all candidate variables. Insignificant predictors were filtered out based on a significance level p value >0.05. In the second step, all selected variables from the univariate analysis were entered into the multivariate Cox regression analysis to see how the risk factors jointly impact the incidence rate for CVD. Risk factors with a p value less than 0.05 will be finally decided.

In the validation stage, two approaches were undertaken to assess the predictive ability of our fitted model, statistical validation, and empirical validation. The statistical validation was performed with respect to both discrimination and calibration. The empirical validation was defined as an empirical comparison with a general CVD risk prediction model (the Framingham office-based risk equation [6]) in a horizontal and longitudinal perspective. The horizontal comparison was conducted by comparing with the Framingham prognostic model using data collected from multiple samples at the same time point. The longitudinal comparison was conducted by comparing with the Framingham prognostic model using data collected from specific examples at different time-points (fixed time intervals follow-up) and seeing the risk trend for an individual over time.

3. Results

3.1. Derivation of a 10-Year Risk Score for CVD

Risk factors included in the risk model are age, sex, body mass index (BMI), hypertension, systolic blood pressure (SBP), cigarettes per day, pulse rate, the status of diabetes. Characteristics of risk factors were listed in Table 3. Statistics of “Min.”, “1st Qu.”, “Median”, “Mean”, “3rd Qu.”, and “Max.” of these risk factors are summarized.

Table 3.

Summary statistics for risk factors used in risk model.

Predictors	Variables	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
AGE	Age	28	37	44	44.15	51	74
SEX	Sex	1	1	2	1.548	2	2
BMI	Bmi	14.12	22.66	25.17	25.61	27.92	56.68
HYPERTENSION	Hyp	0	0	0	0.147	0	1
BLOOD PRESSURE SYSTOLIC	Bps	84	122	136	138.6	150	270
CIGARETTES PER DAY	Cgrpd	0	5	20	16.26	20	60
PULSE RATE	Pr	37	67	75	75.61	83	170
DIABETES	Dia	0	0	0	0.0197	0	1

Open in a new tab

The regression coefficients, hazard ratios, and their corresponding upper and lower 95% confidence intervals (CI) were estimated, as presented in Table 4. Values of the baseline hazard rate where the time point is ten years were estimated as well, shown in Table 5. The 10-year baseline hazard rate is 0.1023354 at mean values of all covariates, 0.001863652 at all covariates equal to zero. Corresponding, the survival probability (exp⁡(basehaz)) is 0.9027267 at mean values and 0.9981381 at all covariates equal to zero.

Table 4.

Regression coefficients and hazard ratios in risk model.

Predictors	Variables	coef∗	Hazard Ratio	lower .95	upper .95
AGE	log of age	2.083643	8.033686	6.4082	10.0716
SEX	sex	-0.469719	0.625178	0.5787	0.6754
BMI	log of bmi	0.608864	1.838342	1.4368	2.3521
HYPERTENSION	hyp	0.241461	1.273108	1.1342	1.429
BLOOD PRESSURE SYSTOLIC	log of bps	1.682571	5.37937	3.7938	7.6277
CIGARETTES PER DAY	cgrpd	0.009669	1.009716	1.0065	1.013
PULSE RATE	log of pr	-0.30209	0.739271	0.5879	0.9297
DIABETES	dia	1.087501	2.96685	2.3244	3.7869

Open in a new tab

∗ Estimated regression coefficient.

Table 5.

Baseline hazard and survival at 10 years.

	Covariates at mean value	Covariates equal to zero
Baseline hazard estimate	0.1023354	0.001863652
Baseline survival estimate	0.9027267	0.9981381

Open in a new tab

The Cox model has an exponential form (see Equation (1)), where t represents the time that the event occurs; λ(t) is the hazard function for a subject at time t, determined by a set of m covariates (X₁, X₂,…, X_k); β₁, β₂,…β_k are the regression coefficients that measure the effect size of covariates; exp is the exponential function (exp⁡(X) = ex); λ₀(t) is the baseline hazard rate, an arbitrary (unknown) function, corresponding to the value of the hazard when all X_i equal zero.

\begin{matrix} λ (t) = λ_{0} (t) \exp (β_{1} X_{1} + β_{2} X_{2} + \dots + β_{k} X_{k}) \end{matrix}

(1)

So, the Cox model can be written as a survival function:

\begin{matrix} S (t) = {[S_{0} (t)]}^{\exp (\sum_{i = 1}^{k} β_{i} X_{i})} \end{matrix}

(2)

A general formula for computing risk estimates has the following form:

\begin{matrix} \hat{H (t)} = 1 - {[S_{0} (t)]}^{\exp (\sum_{i = 1}^{k} β_{i} X_{i} - \sum_{i = 1}^{k} β_{i} {\bar{X}}_{i})} \end{matrix}

(3)

where H(t) is the CVD risk estimated for an individual; S0(t) is baseline survival rate at follow-up time t, where t = 10 years (see Table 5), βi is the regression coefficient (see Table 4), X_i is the value of the i_th risk factor (if is continuous it is the log-transformed value), ${\bar{X}}_{i}$ is the corresponding mean, and k denotes the number of risk factors. The CVD risk function could be derived from (3), using regression coefficients from Table 4 and the baseline hazard rates from Table 5; hence, we computed the probability of developing any type of CVD for an individual. A case of computing the absolute risk score in 10 years was demonstrated in Appendix C.

3.2. Nomograms

A nomogram is a two-dimensional diagram to represent a mathematical function involving several predictors [39]. It is a simple graphical illustration to approximately predict a particular event based on conventional statistical regression methods such as Cox proportional hazards model for survival analysis [40]. A nomogram is accomplishing the estimation of individual survivals in 10 years and the median survival time by years was depicted in Figure 1.

Nomogram for predicting overall survival in 10 years.

In Figure 1, each predictor has a set of n scales, and there is a mapping between each scale and the “Points” scale. The bottoms are the corresponding 10-year survival estimates, and the median survival time (years). By accumulating the total points corresponding to the specific configuration of covariates for a patient, a clinician can then manually obtain the predicted value of the event for that patient.

3.3. Validation

The validation of the proposed predictive risk model was performed using traditional statistics. C-index (also called receiver operating characteristic (ROC) area) [41] was used to assess the goodness of the risk model based on a bootstrap internal resampling validation. From the statistical validation analysis, we got a C-index (area under the receiver operator curve [AUROC]) of 0.71 indicating moderately good discrimination.

Then, we performed an empirical validation by comparing our risk model with the Framingham Heart Study model in an external dataset horizontally and longitudinally over time. In the horizontal validation process, there were 2786 samples in the external dataset, and 1693 samples have got a CVD event. Risk scores using the FHS model and the proposed risk model were computed separately. Statistics of min (lower whisker), 1st quartile (the lower hinge), median, 3rd quartile (the upper hinge), and max (the extreme of the upper whisker) of estimated risks for all samples are depicted in Figure 2. This box-whisker graph in Figure 2 shows that the risks assessed by our Cox model are higher than the risk calculated by the Framingham model, but the error for five statistics (min, 1st Qu, median, mean, 3rd Qu., max) is within 0.02. For example, the median values of the FHS model and the Cox model are 0.1429475 and 0.1661985, respectively. For subjects with CVD event, the Cox model is much more accurate than the FHS model whereas for subjects without CVD, the Cox risk model overestimates the risk rate. Overall, the risk scale of the Cox model is consistent with the Framingham model, which highlights that the proposed Cox model is par with the FHS model.

Horizontal comparison between Cox model and FHS model.

In the longitudinal validation process, we selected four sex-specific subjects with or without CVD at the end of the Framingham Study. A summary of these four subjects is listed in Table 6 to confirm the longitudinal validation of the predicted CVD event.

Table 6.

Data summary for samples in the longitudinal validation.

Samples	Gender	CVD	Diabetes
Sample 1	Male	✘	✘
Sample 2	Male	✓	✓
Sample 3	Female	✘	✘
Sample 4	Female	✓	✓

Open in a new tab

For each sample, data with fixed time intervals (approximately two years) from longitudinal time follow-up are extracted. The data from five exams (Exam 8, Exam 9, Exam 10, Exam 11, and Exam 12) are extracted for comparison. Data summary for sample 1, sample 2, sample 3, and sample 4 are listed in Appendix B. For each sample, the risks of developing CVD in 10 years related to the selected five exams data are separately computed using the Cox model and the Framingham model. Then the trend of risk over the years with 5% error is depicted, as shown in Figure 3. This figure shows that the trend of risks of these two models are consistent and risks for a specific sample increase over time, the dotted trend lines in each graph represent the increase in the CVD risk over time. Also, samples (both male and female) with diabetes that developed CVD will have a higher risk than the ones with no developed CVD.

4. Discussion

It is widely accepted that CVD has become one of the significant public health issue globally [42, 43] and contributes significantly to the annual deaths globally. Previous studies have noted the importance of identifying associated risk factors and the early detection and intervention of CVDs [44–48] and investigated reducing the risk of developing CVD in early stages. Consequently, CVD risk prediction tools based on a single variable or multiple variables have been devised to yield estimates of the CVD risk [6, 8, 9, 14, 49–51].

Motivated by the objective of early detection and risk estimation of CVD, the present study was designed to identify novel CVD risk factors, determine the effect of these factors, and then develop a risk prediction model based on the identified factors. Although risk factors could vary from one specific CVD component to another, there is sufficient evidence that different types of CVD have commonalities of risk factors. We developed and validated a 10-year risk equation for CVD risk using follow-up data rigorously measured by the Framingham Heart Study.

This investigation extends the number of risk factors by the previous general CVD risk formulations, incorporating heart rate to estimate absolute CVD risk. The approach used in this research is based on advanced statistical techniques that allow reducing the bias in the assessment of true CVD risk. The whole process of data analysis strictly follows the guideline of regression modelling strategies and survival analysis [34, 52].

We use continuous variables (age, BMI, SBP, and pulse rate) to generate the model that performs better than other similar models developed using categorical variables. Compared with simpler approaches that try to make inferences of 5-year and 10-year risk models such as the model based on logistic regression analysis [53] and the CVD risk model using Kaplan-Meier and log-rank test [46], the proposed Cox risk model is more adequate and will avoid severe errors of underestimation or overestimation [22, 34]. Moreover, this model was developed based on a more substantial number of samples and events, suggesting a valid estimation of the real risk.

4.1. Comparison with Other CVD Risk Prediction Tools

The old version Framingham general CVD risk function [53] is useful for identifying persons at high risk of CVD, but it was based on a limited number of risk factors (serum cholesterol, SBP, smoking history, electrocardiogram, and glucose intolerance). The new Framingham laboratory-test-based formula [6] included HDL cholesterol in the risk function. The QRISK study investigators incorporated family history as a novel risk factor by the Framingham general formulas [8]. Although researchers have published risk scores [6, 8, 53] for predicting general CVDs, these functions did not include heart rate in the risk model.

Risk models formulated by using machine learning or data mining techniques have incorporated heart rate as a risk factor but tools that can predict CVD absolute risk are fewer. For example, a prediction tool [54] focuses on the classification of CVD event by employing the ANN and the Bayesian classifier based on heart rate variability. The diagnosis CVD model [27] categorizes the CVD risk as different levels but an absolute risk score cannot be obtained. Even though a supportive tool [19] will generate the estimate of a risk score, but the user can not know how many years the score is targeting.

Some equations only focused on specific CVD outcomes. The Europe SCORE project equations were developed for the fatal cardiovascular event [9]. These risk estimation tools [7, 14, 30] are just for coronary heart disease. Also, there are some risk models aiming stroke [16, 55]. Compared with these disease-specific models to estimate the risk of developing specific CVD outcomes, the present study generated a general CVD risk tool that could predict a global CVD risk as well as the risk of developing individual components.

Moreover, compared with the laboratory-based algorithms, the present research proposed a more straightforward way to estimate 10-year CVD risk based on risk factors. An individual can assess his or her CVD risk during an office visit or his monitoring of the combination of risk factors in the risk model, either manually or use some devices like wearable sensors.

4.2. Implication

The CVD risk prediction model could be implemented at the primary care for population analysis and identifying the high-risk individual. This would be a transformation in healthcare management of CVD at an individual as well as at a population level. However, with a small event size of diabetes, caution must be applied to the practice of this risk model. Even though we have used multiple imputation methods to impute the missing values for diabetes, the original feature of data in-balance, which decides that the imputed data frame for the “diabetes” might still have a data in-balance there. Advanced imputation methods need to be considered in the future for avoiding unexpected outcome caused by the diabetes data in-balance.

Our research aims to provide a CVD prediction model based on key risk factors, so that it can be used at the point-of-care for better and informed decision making. Thus, risk factors based on a clinical test such as total cholesterol, HDL cholesterol were not included, but some of these risk factors have a substantial effect on the development of CVD. We have provided a valid framework for creating a risk model using the Cox regression model; future work should consider risk factors not included in our model at this moment. Thus, expanding more predictors into the risk model is an important issue for future research.

5. Conclusion

The proposed study devised a risk prediction model based on multivariable predictors. A novel risk factor “heart rate” was incorporated into this risk equation by conventional risk factors. A satisfying predictive ability with C-index (AUROC) of 0.71 was obtained, which ensures the accuracy of estimating risk scores. Compared with studies focusing on specific diseases, the proposed algorithm can be applied to measure the 10-year risk of CVD. Health care professionals, public health physicians, practice managers, and individuals can run the proposed model to quantify risk at a population level, during patient consultation and identify high-risk individuals for further preventive health care for the entire practice.

Appendix

A. Exams in the Framingham Original Cohort Study Dataset

See Table 7.

B. Data Summary for Samples

See Tables 8–11.

Table 8.

Exam data for Sample 1: male without CVD.

Exams	age	bmi	bps	pr	cgrpd	smk
Exam 8	44	26.386894	120	82	40	1
Exam 9	45	26.826676	120	80	0	0
Exam 10	47	27.467643	118	70	20	1
Exam 11	49	28.222249	110	76	44	1
Exam 12	52	28.675012	110	80	50	1

Open in a new tab

Table 9.

Exam data for Sample 2: male with CVD and diabetes.

Exams	age	bmi	bps	pr	cgrpd	hyp	dia	smk
Exam 8	45	27.74258	132	83	20	0	0	1
Exam 9	47	26.26118	124	80	20	0	0	1
Exam 10	49	27.664352	130	78	20	1	0	1
Exam 11	51	27.121914	130	90	20	1	0	1
Exam 12	53	24.816551	122	82	20	0	1	1

Open in a new tab

Table 10.

Exam data for Sample 3: female without CVD.

Exams	age	bmi	bps	pr	cgrpd	smk
Exam 8	44	20.776333	110	70	20	1
Exam 9	46	20.265439	120	70	20	1
Exam 10	48	22.312012	118	73	20	1
Exam 11	50	21.797119	114	82	20	1
Exam 12	52	21.797119	130	76	20	1

Open in a new tab

Table 11.

Exam data for Sample 4: female with CVD and diabetes.

Exams	age	bmi	bps	pr	cgrpd	trt	hyp	dia	smk
Exam 8	46	21.793044	130	65	3	0	1	0	1
Exam 9	48	21.967388	170	75	16	0	1	0	1
Exam 10	50	22.494583	140	60	8	0	1	0	1
Exam 11	53	22.31746	140	63	8	0	1	0	1
Exam 12	54	23.380197	160	58	2	1	1	1	1

Open in a new tab

C. Computation of Absolute Risk

Here, we take a specific subject to illustrate the process of risk score calculation. This sample is a 44-year-old man not having diabetes and hypertension. He has a systolic blood pressure of 120 mm Hg, pulse rate of 82 per minute, BMI of 26.38689413 kg/m₂ and is a current smoker smoking 40 lapses per day, as shown in Table 12.

Table 12.

Data summary for the subject 15018644.

PREDICTORS	VALUES	UNITS
AGE	44	YEARS
SEX	1	MALE
BMI	26.38689413	KG/M2
HYPERTENSION	0	NO
TREATMENT OF HYPERTENSION	0	NO
BLOOD PRESSURE SYSTOLIC	120	MM HG
CIGARETTES PER DAY	40	LAPSE
SMOKING	1	YES
PULSE RATE	82	PER MINUTE
DIABETES	0	NO
COX MODEL RISK	12.57%
FHS MODEL RISK	11.86%

Open in a new tab

The risk estimate based on the Cox model is calculated as follows:

\begin{matrix} \sum_{i = 1}^{k} β_{i} X_{i} 2.083643 * \log (44) - 0.469719 * 1 \\ + 0.608864 * \log (26.386894) + 0.241461 \\ * 0 + 1.682571 * \log (120) - 0.302090 \\ * \log (82) + 0.009669 * 40 + 1.087501 \\ * 0 = 16.518741 \end{matrix}

(C.1)

\begin{matrix} \sum_{i = 1}^{k} β_{i} {\bar{X}}_{i} 2.083643 * 3.768 - 0.469719 * 1.548 \\ + 0.608864 * 3.230 + 0.241461 * 0.1469 \\ + 1.682571 * 4.913 - 0.302090 * 4.311 \\ + 0.009669 * 13.96 + 1.087501 * 0.02001 \\ = 16.518741 \end{matrix}

(C.2)

\begin{matrix} \hat{H (t)} {1 - [S_{0} (t)]}^{\exp (\sum_{i = 1}^{k} β_{i} X_{i} - \sum_{i = 1}^{k} β_{i} {\bar{X}}_{i})} \\ = 1 - {0.9027267}^{\exp (16518741 - 16.247045)} \\ = 0.125658 \approx 12.57 % \end{matrix}

(C.3)

Data Availability

The cardiovascular disease (CVD) data used to support the findings of this study were supplied by Framingham Heart Study-Cohort (FHS-Cohort) under license and so cannot be made freely available. Requests for access to these data should be made with Open BioLINCC Studies Group through this website https://biolincc.nhlbi.nih.gov/studies/framcohort/.

Additional Points

The main contribution of the present study is developing a risk prediction model for early detection of CVD. More specifically, the contribution can be summarized in four major respects: firstly, a novel risk factor “heart rate” was identified as significant for the development of CVD; secondly, an CVD risk prediction model aiming for early detection of CVD was developed based on various risk factors; thirdly, an absolute risk score in 10 years of CVD can be calculated using this risk model; lastly, multiple forms of the risk estimation of CVD, namely risk equation and nomogram, were also developed.

Conflicts of Interest

Authors declare no conflicts of interest.

Authors' Contributions

All authors contributed equally.

References

1.Mendis S., Puska P., Norrving B., et al. Global Atlas on Cardiovascular Disease Prevention and Control. World Health Organization; 2011. [Google Scholar]
2.Mozaffarian D., Benjamin E. J., Go A. S., et al. Heart disease and stroke statistics update: a report from the American Heart Association. Circulation. 2015;131(4):e29–e322. doi: 10.1161/CIR.0000000000000152. [DOI] [PubMed] [Google Scholar]
3.Chan W. C., Wright C., Riddell T., et al. Ethnic and socioeconomic disparities in the prevalence of cardiovascular disease in New Zealand. The New Zealand Medical Journal. 2008;121(1285) [PubMed] [Google Scholar]
4.Heart Foundation. General heart statistics in New Zealand. Heart Foundation; 2017. https://www.heartfoundation.org.nz/statistics. [Google Scholar]
5.McGill H. C., McMahan C. A., Gidding S. S. Preventing heart disease in the 21st century implications of the pathobiological determinants of atherosclerosis in youth (PDAY) study. Circulation. 2008;117(9):1216–1227. doi: 10.1161/circulationaha.107.717033. [DOI] [PubMed] [Google Scholar]
6.D'Agostino R. B., Sr., Vasan R. S., Pencina M. J., et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation. 2008;117(6):743–753. doi: 10.1161/CIRCULATIONAHA.107.699579. [DOI] [PubMed] [Google Scholar]
7.Lloyd-Jones D. M., Wilson P. W. F., Larson M. G., et al. Framingham risk score and prediction of lifetime risk for coronary heart disease. American Journal of Cardiology. 2004;94(1):20–24. doi: 10.1016/j.amjcard.2004.03.023. [DOI] [PubMed] [Google Scholar]
8.Hippisley-Cox J., Coupland C., Vinogradova Y., Robson J., May M., Brindle P. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. British Medical Journal. 2007;335(7611):136–141. doi: 10.1136/bmj.39261.471806.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Conroy R. M., Pyörälä K., Fitzgerald A. P., et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. European Heart Journal. 2003;24(11):987–1003. doi: 10.1016/S0195-668X(03)00114-3. [DOI] [PubMed] [Google Scholar]
10.Woodward M., Brindle P., Tunsfall-Pedoe H. Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohort (SHHEC) Heart. 2007;93(2):172–176. doi: 10.1136/hrt.2006.108167. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Assmann G., Cullen P., Schulte H. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the Prospective Cardiovascular Münster (PROCAM) study. Circulation. 2002;105(3):310–315. doi: 10.1161/hc0302.102575. [DOI] [PubMed] [Google Scholar]
12.Ferrario M., Chiodini P., Chambless L. E., et al. Prediction of coronary events in a low incidence population. Assessing accuracy of the CUORE Cohort Study prediction equation. International Journal of Epidemiology. 2005;34(2):413–421. doi: 10.1093/ije/dyh405. [DOI] [PubMed] [Google Scholar]
13.Wells S., Riddell T., Kerr A., et al. Cohort profile: the PREDICT cardiovascular disease cohort in New Zealand primary care (PREDICT-CVD 19) International Journal of Epidemiology. 2017;46(1):22–22. doi: 10.1093/ije/dyv312. [DOI] [PubMed] [Google Scholar]
14.Wilson P. W. F., D'Agostino R. B., Levy D., Belanger A. M., Silbershatz H., Kannel W. B. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–1847. doi: 10.1161/01.CIR.97.18.1837. [DOI] [PubMed] [Google Scholar]
15. Cardiovascular Disease Risk Assessment Steering Group and others, New Zealand primary care hand- book 2012. Wellington: Ministry of health; 2013 (2017)
16.Yu J., Dai L., Zhao Q., et al. Association of cumulative exposure to resting heart rate with risk of stroke in general population: the Kailuan cohort study. Journal of Stroke and Cerebrovascular Diseases. 2017;26(11):2501–2509. doi: 10.1016/j.jstrokecerebrovasdis.2017.05.037. [DOI] [PubMed] [Google Scholar]
17.Han K. H., Park K. C., Kim M. J., Kim Y. S., Chun H. Association between heart rate variability and 10-year atherosclerotic cardiovascular disease risk score. Atherosclerosis. 2017;263:e190–e191. doi: 10.1016/j.atherosclerosis.2017.06.611. [DOI] [Google Scholar]
18.Murukesan L., Murugappan M., Iqbal M., Saravanan K. Machine learning approach for sudden cardiac arrest prediction based on optimal heart rate variability features. Journal of Medical Imaging and Health Informatics. 2014;4(4):521–532. doi: 10.1166/jmihi.2014.1287. [DOI] [Google Scholar]
19.Unnikrishnan P., Kumar D. K., Poosapadi Arjunan S., Kumar H., Mitchell P., Kawasaki R. Development of health parameter model for risk prediction of CVD using SVM. Computational and Mathematical Methods in Medicine. 2016;2016:7. doi: 10.1155/2016/3016245.3016245 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Cannon A. Reliability Data Banks. Springer Science & Business Media; 2012. [Google Scholar]
21.Kaplan E. L., Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53(282):457–481. doi: 10.1080/01621459.1958.10501452. [DOI] [Google Scholar]
22.Cox D. R. Breakthroughs in Statistics. New York, NY, USA: Springer; 1992. Regression models and life-tables; pp. 527–541. (Springer Series in Statistics). [DOI] [Google Scholar]
23.Hachesu P. R., Ahmadi M., Alizadeh S., Sadoughi F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Health Informatics Journal. 2013;19(2):121–129. doi: 10.4258/hir.2013.19.2.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kim J., Lee J., Lee Y. Data-mining-based coronary heart disease risk prediction model using fuzzy logic and decision tree. Health Informatics Journal. 2015;21(3):167–174. doi: 10.4258/hir.2015.21.3.167. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Kumari M., Godara S. Comparative study of data mining classification methods in cardiovascular disease prediction. Semantic Scholar. 2011 [Google Scholar]
26.Melillo P., Izzo R., Orrico A., et al. Automatic prediction of cardiovascular and cerebrovascular events using heart rate variability analysis. PLoS ONE. 2015;10(3) doi: 10.1371/journal.pone.0118504.e0118504 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Vaanathi S. Cardiovascular disease prediction using fuzzy logic expert system. IUP Journal of Computer Sciences. 2017;11(3) [Google Scholar]
28.Dawber T. R., Kannel W. B., Lyell L. P. An approach to longitudinal studies in a community: the Framingham Study. Annals of the New York Academy of Sciences. 1963;107(1):539–556. doi: 10.1111/j.1749-6632.1963.tb13299.x. [DOI] [PubMed] [Google Scholar]
29.Kannel W. B., Feinleib M., Mcnamara P. M., Garrison R. J., Castelli W. P. An investigation of coronary heart disease in families: The framingham offspring study. American Journal of Epidemiology. 1979;110(3):281–290. doi: 10.1093/oxfordjournals.aje.a112813. [DOI] [PubMed] [Google Scholar]
30.Eckel R. H., Barouch W. W., Ershow A. G. Report of the national heart, lung, and blood institute-national institute of diabetes and digestive and kidney diseases working group on the pathophysiology of obesity-associated cardiovascular disease. Circulation. 2002;105(24):2923–2928. doi: 10.1161/01.cir.0000017823.53114.4c. [DOI] [PubMed] [Google Scholar]
31.Lee E. T., Wang J. Statistical Methods for Survival Data Analysis. Vol. 476. JohnWiley & Sons; 2003. [Google Scholar]
32.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports. 1966;50(3):163–170. [PubMed] [Google Scholar]
33.Efron B. The efficiency of Cox's likelihood function for censored data. Journal of the American Statistical Association. 1977;72(359):557–565. doi: 10.1080/01621459.1977.10480613. [DOI] [Google Scholar]
34.Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer; 2015. [Google Scholar]
35.Ihaka R., Gentleman R. R. A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5(3):299–314. [Google Scholar]
36.Van Buuren S. Flexible Imputation of Missing Data. CRC Press; 2012. [Google Scholar]
37.Kuhfeld W. F. The prinqual procedure, SAS/STAT Users Guide 2. pp. 1265–1323. 1990.
38.Chong I.-G., Jun C.-H. Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems. 2005;78(1-2):103–112. doi: 10.1016/j.chemolab.2004.12.011. [DOI] [Google Scholar]
39.Kattan M. W. Nomograms are superior to staging and risk grouping systems for identifying high-risk patients: preoperative application in prostate cancer. Current Opinion in Urology. 2003;13(2):111–116. doi: 10.1097/00042307-200303000-00005. [DOI] [PubMed] [Google Scholar]
40.Kattan M. W., Kantoff P. W., Kattan M., et al. Comparison of Cox regression with other methods for determining prediction models and nomograms. The Journal of Urology. 2003;170(6):S6–S10. doi: 10.1097/01.ju.0000094764.56269.2d. [DOI] [PubMed] [Google Scholar]
41.Hanley J. A., McNeil B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
42.Lopez A. D., Mathers C. D., Ezzati M., Jamison D. T., Murray C. J. Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. The Lancet. 2006;367(9524):1747–1757. doi: 10.1016/S0140-6736(06)68770-9. [DOI] [PubMed] [Google Scholar]
43.Hay D. S. Cardiovascular Disease in New Zealand, 2004: A Summary of Recent Statistical Information. National Heart Foundation of New Zealand; 2004. [Google Scholar]
44.Hubert H. B., Feinleib M., McNamara P. M., Castelli W. P. Obesity as an independent risk factor for cardiovascular disease: a 26-year follow-up of participants in the Framingham Heart Study. Circulation. 1983;67(5):968–977. doi: 10.1161/01.cir.67.5.968. [DOI] [PubMed] [Google Scholar]
45.Cupples L. Some risk factors related to the annual incidence of cardiovascular disease and death using pooled repeated biennial measurements. Framingham Heart Study. 1987 [Google Scholar]
46.Weiner D. E., Tighiouart H., Amin M. G., et al. Chronic kidney disease as a risk factor for cardiovascular disease and all-cause mortality: a pooled analysis of community-based studies. Journal of the American Society of Nephrology. 2004;15(5):1307–1315. doi: 10.1097/01.asn.0000123691.46138.e2. [DOI] [PubMed] [Google Scholar]
47.Böhm M., Swedberg K., Komajda M., et al. Heart rate as a risk factor in chronic heart failure (SHIFT): The association between heart rate and outcomes in a randomised placebo-controlled trial. The Lancet. 2010;376(9744):886–894. doi: 10.1016/S0140-6736(10)61259-7. [DOI] [PubMed] [Google Scholar]
48.Odden M. C., Shlipak M. G., Whitson H. E., et al. Risk factors for cardiovascular disease across the spectrum of older age: the Cardiovascular Health Study. Atherosclerosis. 2014;237(1):336–342. doi: 10.1016/j.atherosclerosis.2014.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.De Ruijter W., Westendorp R. G. J., Assendelft W. J. J., et al. Use of Framingham risk score and new biomarkers to predict cardiovascular mortality in older people: population based observational cohort study. BMJ. 2009;338(7688):219–222. doi: 10.1136/bmj.a3083. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Pencina M. J., D'Agostino R. B., Larson M. G., Massaro J. M., Vasan R. S. Predicting the 30-year risk of cardiovascular disease: the framingham heart study. Circulation. 2009;119(24):3078–3084. doi: 10.1161/CIRCULATIONAHA.108.816694. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Bannink L., Wells S., Broad J., Riddell T., Jackson R. Web-based assessment of cardiovascular disease risk in routine primary care practice in New Zealand: the first 18,000 patients (PREDICT CVD-1) The New Zealand Medical Journal. 2006;119(1245) [PubMed] [Google Scholar]
52.Kleinbaum D. G., Klein M. Survival Analysis. Vol. 3. Springer; 2010. [Google Scholar]
53.Kannel W. B., McGee D., Gordon T. A general cardiovascular risk profile: the Framingham study. American Journal of Cardiology. 1976;38(1):46–51. doi: 10.1016/0002-9149(76)90061-8. [DOI] [PubMed] [Google Scholar]
54.Kim H., Ishag M. I., Piao M., Kwon T., Ryu K. H. A data mining approach for cardiovascular disease diagnosis using heart rate variability and images of carotid arteries. Symmetry. 2016;8(6, article 47) doi: 10.3390/sym8060047. [DOI] [Google Scholar]
55.Parmar P., Krishnamurthi R., Ikram M. A., et al. The stroke riskometerTM app: validation of a data collection tool and stroke risk predictor. International Journal of Stroke. 2015;10(2):231–244. doi: 10.1111/ijs.12411. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1.Mendis S., Puska P., Norrving B., et al. Global Atlas on Cardiovascular Disease Prevention and Control. World Health Organization; 2011. [Google Scholar]

[B2] 2.Mozaffarian D., Benjamin E. J., Go A. S., et al. Heart disease and stroke statistics update: a report from the American Heart Association. Circulation. 2015;131(4):e29–e322. doi: 10.1161/CIR.0000000000000152. [DOI] [PubMed] [Google Scholar]

[B3] 3.Chan W. C., Wright C., Riddell T., et al. Ethnic and socioeconomic disparities in the prevalence of cardiovascular disease in New Zealand. The New Zealand Medical Journal. 2008;121(1285) [PubMed] [Google Scholar]

[B4] 4.Heart Foundation. General heart statistics in New Zealand. Heart Foundation; 2017. https://www.heartfoundation.org.nz/statistics. [Google Scholar]

[B5] 5.McGill H. C., McMahan C. A., Gidding S. S. Preventing heart disease in the 21st century implications of the pathobiological determinants of atherosclerosis in youth (PDAY) study. Circulation. 2008;117(9):1216–1227. doi: 10.1161/circulationaha.107.717033. [DOI] [PubMed] [Google Scholar]

[B6] 6.D'Agostino R. B., Sr., Vasan R. S., Pencina M. J., et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation. 2008;117(6):743–753. doi: 10.1161/CIRCULATIONAHA.107.699579. [DOI] [PubMed] [Google Scholar]

[B7] 7.Lloyd-Jones D. M., Wilson P. W. F., Larson M. G., et al. Framingham risk score and prediction of lifetime risk for coronary heart disease. American Journal of Cardiology. 2004;94(1):20–24. doi: 10.1016/j.amjcard.2004.03.023. [DOI] [PubMed] [Google Scholar]

[B8] 8.Hippisley-Cox J., Coupland C., Vinogradova Y., Robson J., May M., Brindle P. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. British Medical Journal. 2007;335(7611):136–141. doi: 10.1136/bmj.39261.471806.55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Conroy R. M., Pyörälä K., Fitzgerald A. P., et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. European Heart Journal. 2003;24(11):987–1003. doi: 10.1016/S0195-668X(03)00114-3. [DOI] [PubMed] [Google Scholar]

[B10] 10.Woodward M., Brindle P., Tunsfall-Pedoe H. Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohort (SHHEC) Heart. 2007;93(2):172–176. doi: 10.1136/hrt.2006.108167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Assmann G., Cullen P., Schulte H. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the Prospective Cardiovascular Münster (PROCAM) study. Circulation. 2002;105(3):310–315. doi: 10.1161/hc0302.102575. [DOI] [PubMed] [Google Scholar]

[B12] 12.Ferrario M., Chiodini P., Chambless L. E., et al. Prediction of coronary events in a low incidence population. Assessing accuracy of the CUORE Cohort Study prediction equation. International Journal of Epidemiology. 2005;34(2):413–421. doi: 10.1093/ije/dyh405. [DOI] [PubMed] [Google Scholar]

[B13] 13.Wells S., Riddell T., Kerr A., et al. Cohort profile: the PREDICT cardiovascular disease cohort in New Zealand primary care (PREDICT-CVD 19) International Journal of Epidemiology. 2017;46(1):22–22. doi: 10.1093/ije/dyv312. [DOI] [PubMed] [Google Scholar]

[B14] 14.Wilson P. W. F., D'Agostino R. B., Levy D., Belanger A. M., Silbershatz H., Kannel W. B. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–1847. doi: 10.1161/01.CIR.97.18.1837. [DOI] [PubMed] [Google Scholar]

[B15] 15. Cardiovascular Disease Risk Assessment Steering Group and others, New Zealand primary care hand- book 2012. Wellington: Ministry of health; 2013 (2017)

[B16] 16.Yu J., Dai L., Zhao Q., et al. Association of cumulative exposure to resting heart rate with risk of stroke in general population: the Kailuan cohort study. Journal of Stroke and Cerebrovascular Diseases. 2017;26(11):2501–2509. doi: 10.1016/j.jstrokecerebrovasdis.2017.05.037. [DOI] [PubMed] [Google Scholar]

[B17] 17.Han K. H., Park K. C., Kim M. J., Kim Y. S., Chun H. Association between heart rate variability and 10-year atherosclerotic cardiovascular disease risk score. Atherosclerosis. 2017;263:e190–e191. doi: 10.1016/j.atherosclerosis.2017.06.611. [DOI] [Google Scholar]

[B18] 18.Murukesan L., Murugappan M., Iqbal M., Saravanan K. Machine learning approach for sudden cardiac arrest prediction based on optimal heart rate variability features. Journal of Medical Imaging and Health Informatics. 2014;4(4):521–532. doi: 10.1166/jmihi.2014.1287. [DOI] [Google Scholar]

[B19] 19.Unnikrishnan P., Kumar D. K., Poosapadi Arjunan S., Kumar H., Mitchell P., Kawasaki R. Development of health parameter model for risk prediction of CVD using SVM. Computational and Mathematical Methods in Medicine. 2016;2016:7. doi: 10.1155/2016/3016245.3016245 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Cannon A. Reliability Data Banks. Springer Science & Business Media; 2012. [Google Scholar]

[B21] 21.Kaplan E. L., Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53(282):457–481. doi: 10.1080/01621459.1958.10501452. [DOI] [Google Scholar]

[B22] 22.Cox D. R. Breakthroughs in Statistics. New York, NY, USA: Springer; 1992. Regression models and life-tables; pp. 527–541. (Springer Series in Statistics). [DOI] [Google Scholar]

[B23] 23.Hachesu P. R., Ahmadi M., Alizadeh S., Sadoughi F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Health Informatics Journal. 2013;19(2):121–129. doi: 10.4258/hir.2013.19.2.121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Kim J., Lee J., Lee Y. Data-mining-based coronary heart disease risk prediction model using fuzzy logic and decision tree. Health Informatics Journal. 2015;21(3):167–174. doi: 10.4258/hir.2015.21.3.167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Kumari M., Godara S. Comparative study of data mining classification methods in cardiovascular disease prediction. Semantic Scholar. 2011 [Google Scholar]

[B26] 26.Melillo P., Izzo R., Orrico A., et al. Automatic prediction of cardiovascular and cerebrovascular events using heart rate variability analysis. PLoS ONE. 2015;10(3) doi: 10.1371/journal.pone.0118504.e0118504 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27.Vaanathi S. Cardiovascular disease prediction using fuzzy logic expert system. IUP Journal of Computer Sciences. 2017;11(3) [Google Scholar]

[B28] 28.Dawber T. R., Kannel W. B., Lyell L. P. An approach to longitudinal studies in a community: the Framingham Study. Annals of the New York Academy of Sciences. 1963;107(1):539–556. doi: 10.1111/j.1749-6632.1963.tb13299.x. [DOI] [PubMed] [Google Scholar]

[B29] 29.Kannel W. B., Feinleib M., Mcnamara P. M., Garrison R. J., Castelli W. P. An investigation of coronary heart disease in families: The framingham offspring study. American Journal of Epidemiology. 1979;110(3):281–290. doi: 10.1093/oxfordjournals.aje.a112813. [DOI] [PubMed] [Google Scholar]

[B30] 30.Eckel R. H., Barouch W. W., Ershow A. G. Report of the national heart, lung, and blood institute-national institute of diabetes and digestive and kidney diseases working group on the pathophysiology of obesity-associated cardiovascular disease. Circulation. 2002;105(24):2923–2928. doi: 10.1161/01.cir.0000017823.53114.4c. [DOI] [PubMed] [Google Scholar]

[B31] 31.Lee E. T., Wang J. Statistical Methods for Survival Data Analysis. Vol. 476. JohnWiley & Sons; 2003. [Google Scholar]

[B32] 32.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports. 1966;50(3):163–170. [PubMed] [Google Scholar]

[B33] 33.Efron B. The efficiency of Cox's likelihood function for censored data. Journal of the American Statistical Association. 1977;72(359):557–565. doi: 10.1080/01621459.1977.10480613. [DOI] [Google Scholar]

[B34] 34.Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer; 2015. [Google Scholar]

[B35] 35.Ihaka R., Gentleman R. R. A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5(3):299–314. [Google Scholar]

[B36] 36.Van Buuren S. Flexible Imputation of Missing Data. CRC Press; 2012. [Google Scholar]

[B37] 37.Kuhfeld W. F. The prinqual procedure, SAS/STAT Users Guide 2. pp. 1265–1323. 1990.

[B38] 38.Chong I.-G., Jun C.-H. Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems. 2005;78(1-2):103–112. doi: 10.1016/j.chemolab.2004.12.011. [DOI] [Google Scholar]

[B39] 39.Kattan M. W. Nomograms are superior to staging and risk grouping systems for identifying high-risk patients: preoperative application in prostate cancer. Current Opinion in Urology. 2003;13(2):111–116. doi: 10.1097/00042307-200303000-00005. [DOI] [PubMed] [Google Scholar]

[B40] 40.Kattan M. W., Kantoff P. W., Kattan M., et al. Comparison of Cox regression with other methods for determining prediction models and nomograms. The Journal of Urology. 2003;170(6):S6–S10. doi: 10.1097/01.ju.0000094764.56269.2d. [DOI] [PubMed] [Google Scholar]

[B41] 41.Hanley J. A., McNeil B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]

[B42] 42.Lopez A. D., Mathers C. D., Ezzati M., Jamison D. T., Murray C. J. Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. The Lancet. 2006;367(9524):1747–1757. doi: 10.1016/S0140-6736(06)68770-9. [DOI] [PubMed] [Google Scholar]

[B43] 43.Hay D. S. Cardiovascular Disease in New Zealand, 2004: A Summary of Recent Statistical Information. National Heart Foundation of New Zealand; 2004. [Google Scholar]

[B44] 44.Hubert H. B., Feinleib M., McNamara P. M., Castelli W. P. Obesity as an independent risk factor for cardiovascular disease: a 26-year follow-up of participants in the Framingham Heart Study. Circulation. 1983;67(5):968–977. doi: 10.1161/01.cir.67.5.968. [DOI] [PubMed] [Google Scholar]

[B45] 45.Cupples L. Some risk factors related to the annual incidence of cardiovascular disease and death using pooled repeated biennial measurements. Framingham Heart Study. 1987 [Google Scholar]

[B46] 46.Weiner D. E., Tighiouart H., Amin M. G., et al. Chronic kidney disease as a risk factor for cardiovascular disease and all-cause mortality: a pooled analysis of community-based studies. Journal of the American Society of Nephrology. 2004;15(5):1307–1315. doi: 10.1097/01.asn.0000123691.46138.e2. [DOI] [PubMed] [Google Scholar]

[B47] 47.Böhm M., Swedberg K., Komajda M., et al. Heart rate as a risk factor in chronic heart failure (SHIFT): The association between heart rate and outcomes in a randomised placebo-controlled trial. The Lancet. 2010;376(9744):886–894. doi: 10.1016/S0140-6736(10)61259-7. [DOI] [PubMed] [Google Scholar]

[B48] 48.Odden M. C., Shlipak M. G., Whitson H. E., et al. Risk factors for cardiovascular disease across the spectrum of older age: the Cardiovascular Health Study. Atherosclerosis. 2014;237(1):336–342. doi: 10.1016/j.atherosclerosis.2014.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 49.De Ruijter W., Westendorp R. G. J., Assendelft W. J. J., et al. Use of Framingham risk score and new biomarkers to predict cardiovascular mortality in older people: population based observational cohort study. BMJ. 2009;338(7688):219–222. doi: 10.1136/bmj.a3083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] 50.Pencina M. J., D'Agostino R. B., Larson M. G., Massaro J. M., Vasan R. S. Predicting the 30-year risk of cardiovascular disease: the framingham heart study. Circulation. 2009;119(24):3078–3084. doi: 10.1161/CIRCULATIONAHA.108.816694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51.Bannink L., Wells S., Broad J., Riddell T., Jackson R. Web-based assessment of cardiovascular disease risk in routine primary care practice in New Zealand: the first 18,000 patients (PREDICT CVD-1) The New Zealand Medical Journal. 2006;119(1245) [PubMed] [Google Scholar]

[B52] 52.Kleinbaum D. G., Klein M. Survival Analysis. Vol. 3. Springer; 2010. [Google Scholar]

[B53] 53.Kannel W. B., McGee D., Gordon T. A general cardiovascular risk profile: the Framingham study. American Journal of Cardiology. 1976;38(1):46–51. doi: 10.1016/0002-9149(76)90061-8. [DOI] [PubMed] [Google Scholar]

[B54] 54.Kim H., Ishag M. I., Piao M., Kwon T., Ryu K. H. A data mining approach for cardiovascular disease diagnosis using heart rate variability and images of carotid arteries. Symmetry. 2016;8(6, article 47) doi: 10.3390/sym8060047. [DOI] [Google Scholar]

[B55] 55.Parmar P., Krishnamurthi R., Ikram M. A., et al. The stroke riskometerTM app: validation of a data collection tool and stroke risk predictor. International Journal of Stroke. 2015;10(2):231–244. doi: 10.1111/ijs.12411. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Cox-Based Risk Prediction Model for Early Detection of Cardiovascular Disease: Identification of Key Risk Factors for the Development of a 10-Year CVD Risk Prediction

Xiaona Jia

Mirza Mansoor Baig

Farhaan Mirza

Hamid GholamHosseini

Abstract

1. Introduction

2. Methods

2.1. Study Population

Table 1.

2.2. Data Extraction

Table 7.

Table 2.

2.3. Statistical Analysis

3. Results

3.1. Derivation of a 10-Year Risk Score for CVD

Table 3.

Table 4.

Table 5.

3.2. Nomograms

Figure 1.

3.3. Validation

Figure 2.

Table 6.

Figure 3.

4. Discussion

4.1. Comparison with Other CVD Risk Prediction Tools

4.2. Implication

5. Conclusion

Appendix

A. Exams in the Framingham Original Cohort Study Dataset

B. Data Summary for Samples

Table 8.

Table 9.

Table 10.

Table 11.

C. Computation of Absolute Risk

Table 12.

Data Availability

Additional Points

Conflicts of Interest

Authors' Contributions

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases