Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: J Stroke Cerebrovasc Dis. 2021 Jul 28;30(10):106003. doi: 10.1016/j.jstrokecerebrovasdis.2021.106003

A NOVEL AFROCENTRIC STROKE RISK ASSESSMENT SCORE: MODELS FROM THE SIREN STUDY

Onoja Akpa 1,2,*, Fred S Sarfo 3,*, Mayowa Owolabi 4,5,**, Albert Akpalu 6, Kolawole Wahab 7, Reginald Obiako 8, Morenikeji Komolafe 9, Lukman Owolabi 10, Godwin O Osaigbovo 11, Godwin Ogbole 12, Hemant K Tiwari 13, Carolyn Jenkins 14, Adekunle G Fakunle 15, Samuel Olowookere 16, Ezinne O Uvere 15, Joshua Akinyemi 1, Oyedunni Arulogun 15, Josephine Akpalu 3, Moyinoluwalogo M Tito-Ilori 15, Osahon J Asowata 1,2, Philip Ibinaiye 8, Cynthia Akisanya 17, Olalekan I Oyinloye 7, Lambert Appiah 3, Taofik Sunmonu 18, Paul Olowoyo 19, Atinuke M Agunloye 15,20, Abiodun M Adeoye 15,20, Joseph Yaria 20, Daniel T Lackland 14, Donna Arnett 21, Ruth Y Laryea 6, Taiwo O Adigun 15, Akinkunmi P Okekunle 1,2, Benedict Calys-Tagoe 22, Okechukwu S Ogah 20, Mayowa Ogunronbi 16, Olugbo Y Obiabo 23, Suleiman Y Isah 10, Hamisu A Dambatta 10, Raelle Tagge 24, Obande Ogenyi 20, Bimbo Fawale 9, Chimdinma L Melikam 8, Akinola Onasanya 16, Sunday Adeniyi 7, Rufus Akinyemi 16,5, Bruce Ovbiagele 24, SIREN
PMCID: PMC8511059  NIHMSID: NIHMS1730506  PMID: 34332227

Abstract

Background:

Stroke risk can be quantified using risk factors whose effect sizes vary by geography and race. No stroke risk assessment tool exists to estimate aggregate stroke risk for indigenous African.

Objective:

To develop Afrocentric risk-scoring models for stroke occurrence.

Methods:

We evaluated 3,533 radiologically confirmed West African stroke cases paired 1:1 with age-, and sex-matched stroke-free controls in the SIREN study. The 7,066 subjects were randomly split into a training and testing set at the ratio of 85:15. Conditional logistic regression models were constructed by including 17 putative factors linked to stroke occurrence using the training set. Significant risk factors were assigned constant and standardized statistical weights based on regression coefficients (β) to develop an additive risk scoring system on a scale of 0 to 100%. Using the testing set, Receiver Operating Characteristics (ROC) curves were constructed to obtain a total score to serve as cut-off to discriminate between cases and controls. We calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) at this cut-off.

Results:

For stroke occurrence, we identified 15 traditional vascular factors. Cohen’s kappa for validity was maximal at a total risk score of 56% using both statistical weighting approaches to risk quantification and in both datasets. The risk score had a predictive accuracy of 76% (95%CI: 74 – 79%), sensitivity of 80.3%, specificity of 63.0%, PPV of 68.5% and NPV of 76.2% in the test dataset. For ischemic strokes, 12 risk factors had predictive accuracy of 78% (95%CI: 74–81%). For hemorrhagic strokes, 7 factors had a predictive accuracy of 79% (95%CI: 73–84%).

Conclusion:

The SIREN models quantify aggregate stroke risk in indigenous West Africans with good accuracy. Prospective studies are needed to validate this instrument for stroke prevention.

Keywords: Stroke, risk assessment score, riskometer, risk factor, Stroke Investigative Research and Education Network (SIREN), Africans

1.0. INTRODUCTION

The occurrence of stroke is driven by a combination of socio-demographic and cardio-metabolic factors whose contributions vary by geographical region and by race. Typically, clustering of risk factors in an individual is directly related to the probability of developing a stroke. There are several cardiovascular and stroke risk prediction models including the Framingham1 (USA), Prospective Cardiovascular Munster (PROCAM) study (USA)2, Global vascular risk score (GVRS) from the Northern Manhattan Study (USA)3, Reynolds risk score (USA)4, the China Multicenter Collaborative Study of Cardiovascular Epidemiology (MUCA) (China)5, the ASSIGN score from Scotland6, CUORE from Italy7, QRISK from England and Wales8, SCORE from Europe9 and finally a score from a Korean national health examination data.10 All these risk calculators were developed using prospectively collected data by using regression coefficients and relative risks for significant risk factors in Cox proportional models.

Conspicuously missing from the literature is a stroke prediction model for indigenous Africans.11 Stroke in sub-Saharan Africa has several distinctive features including earlier age of onset, proclivity towards severe strokes and high short-term mortality.12 Inadequate neurologists, paucity of acute stroke services, rehabilitation and social services means that a strategy to detect those at risk in the population for primary prevention is an urgent operational research priority. However, the perennial lack of resources to conduct prospective population-based studies in Africa means stroke prediction models should first begin from case-control studies for later validation in prospective cohorts. With this in mind, we sought to develop risk-scoring models for quantifying individuals’ aggregate risk of having a stroke and its two primary types.

2.0. METHODS

2.1. Study Design

The Stroke Investigative Research and Educational Network (SIREN) study is a case-control study involving 15 centers in Ghana and Nigeria.13 Ethical approval was obtained from all study sites and informed consent was obtained from all participants. Stroke cases were adults ≥18 years with first clinical stroke, with neuroimaging confirmation using CT or MRI scan within 10 days of symptom onset. Controls were consenting stroke-free adults who were recruited mostly from the communities in the catchment areas of the SIREN hospitals where stroke cases resided. Stroke-free status was ascertained using a locally pre-validated 8-item questionnaire for verifying stroke-free status (QVSFS).14 We matched stroke-free controls by age (±5 years), sex and ethnicity to minimize the potential confounding effect of these variables on the relationship between stroke and its risk factors.

2.2. Selection of vascular risk factors

The dataset on the 7,066 matched case-control subjects were randomly split into a training set (6,040) and testing set (1,066) at the ratio of 85:15. 15 The training set was used to identify putative stroke risk factors and to develop the risk quantification model while the testing set was used to test and evaluate the performance of method.

Seventeen (17) potential risk factors were initially tested for associations with stroke occurrence using the training dataset (Table 1). Procedures used to measure presence and definitions of these risk factors in our study have been previously described and briefly presented under the Supplementary information.13 These risk factors were derived from an extensive literature review, our clinical understanding of stroke risk, and empirical evidence (significant associations observed in bivariate analyses). Bivariate analyses on matched case-control pairs were conducted on the training dataset following the methods described by Greenberg and Ibrahim.16 A conditional multiple logistic regression analysis was then performed to assess which of the 17 putative factors were independently associated with stroke occurrence in the full model. Then a final model that included the risk factors attaining a significant α level of 0.05 was constructed (Table 1).

Table 1:

Conditional multiple logistic regression analysis for All stroke types using training data

Factors aOR 95%CI p-Value
Variables Reported in the Literature
Baseline age ≥50 y 2.27 (1.30–3.96) 0.004
Income level, <$100 USD 1.61 (1.32–1.98) <0.001
Education (none vs some) 1.38 (1.04–1.82) 0.02
Hypertension 15.08 (10.56–21.53) <0.001
Dyslipidemia 3.33 (2.63–4.21) <0.001
Diabetes mellitus 2.82 (2.25–3.54) <0.001
Cardiac disease 1.74 (1.21–2.52) 0.002
Family history of CVD 1.50 (1.20–1.87) <0.001
Raised waist-hip ratio 1.71 (1.35–2.16) <0.001
Current tobacco use 2.02 (0.98–4.14) 0.05
Current alcohol use 0.97 (0.77–1.22) 0.77
Stress 1.44 (1.12–1.84) 0.003
sprinkling salt on food at table 1.68 (1.13–2.50) 0.01
Low consumption of leafy green vegetable 2.67 (2.10–3.39) <0.001
Regular sugar consumption 1.24 (1.01–1.53) 0.03
Meat consumption 1.75 (1.36–2.26) <0.001
Physical inactivity 1.68 (1.02–2.76) 0.04
Final Model
Baseline age ≥50 y 2.18 (1.26–3.79) 0.005
Income level, <$100 USD 1.59 (1.30–1.95) <0.001
Education (none vs some) 1.37 (1.04–1.82) <0.001
Hypertension 15.47 (10.84–22.07) <0.001
Dyslipidemia 3.37 (2.67–4.26) <0.001
Diabetes mellitus 2.79 (2.22–3.50) <0.001
Cardiac disease 1.70 (1.18–2.45) 0.004
Family history of CVD 1.50 (1.20–1.86) <0.001
Raised waist-hip ratio 1.70 (1.34–2.15) <0.001
Stress 1.45 (1.13–1.85) 0.003
sprinkling salt on food at table 1.71 (1.15–2.54) 0.008
Low consumption of leafy green vegetable 2.66 (2.10–3.38) <0.001
Regular sugar consumption 1.25 (1.01–1.53) 0.03
Meat consumption 1.78 (1.38–2.29) <0.001
Physical inactivity 1.71 (1.04–2.81) 0.03

aOR- Adjusted Odds Ratio; CI- Confidence Interva

2.3. Development of a scoring system for stroke risk factors

The significant risk factors in the final model were then given statistical weights derived for each factor by a linear transformation of the regression coefficient for that factor in the final conditional logistic regression model. This linear transformation was essential to enhance interpretability and make the scoring system easy to use. We compared two approaches to assigning statistical weights. The first was a constant weightingc) obtained by rounding up β by multiplying by 10 and secondly by obtaining a standardized weighted β coefficient, βw=β(SxSy) where βw= standardized beta coefficient, β = estimated coefficient, Sx =Standard deviation of predictive variable and Sy=Standard deviation of response variable.

Each of the stroke cases and stroke-free controls in the training datasets were then scored individually using the risk scoring system developed. To enhance comparability and interpretability, scores were transformed on the scale of 0–100.17 To calculate the interval for the scores, we applied the Sturges rule.1819Thus, for both weighting approaches, the interval obtained was 7.3 ≈ 7. The predictive accuracies, sensitivities, specificities and Cohen’s kappa of the risk scoring system at various cut-offs of the aggregate score were calculated.21

In order to evaluate the performance of the scoring system, each of the 1,066 stroke cases and stroke-free controls in the test dataset were also scored individually using the risk scoring system (developed from the training data). The intervals obtained from the training set were applied to the scores obtained from the test set. The predictive accuracies, sensitivities, specificities and Cohen’s kappa of the risk scoring system were calculated for the same cut-offs used for the training set. The overall predictive accuracy was calculated as an equivalent of the Wilcoxon statistics as described by Hanley and McNeil.22 The best cut-off for the total score was obtained by plotting Receiver Operator Characteristics (ROC) Curves and the sensitivity, specificity, positive predictive accuracy and negative predictive accuracy were estimated at the cut-off points and presented for the training and test sets. Similar approaches were used for ischemic strokes and hemorrhagic strokes. All data analyses were performed using the R software (version 3.6.2) and IBM SPSS Statistics version

3.0. RESULTS

3.1. Risk scoring models for stroke occurrence

Fifteen (15) out of the 17 putative factors associated with stroke occurrence were included in our final model using the training dataset (Table 1). These were baseline age > 50 years, income level <$100, lack of formal education, hypertension, dyslipidemia, diabetes mellitus, cardiac disease, family history of cardiovascular disease, raised waist-hip ratio, stress, sprinkling salt on food at table, low-consumption of green leafy vegetable, regular sugar consumption, regular meat consumption and physical inactivity. Table 2 shows statistical weights assigned to each of the significant risk factors after linear transformation using either constant weighting or standardized weighting.

Table 2:

A scoring system for all stroke types using training data

Factors from the Final Model β Constant Weighting Standardized Weighting (βw)
(10*β) βw 10*βw
Baseline age ≥50 y 0.78 8 0.68 7
Income level, <$100 USD 0.47 5 0.46 5
Education (none vs some) 0.32 3 0.25 3
Hypertension 2.74 27 2.31 23
Dyslipidemia 1.22 12 1.12 11
Diabetes mellitus 1.03 10 0.86 9
Cardiac disease 0.53 5 0.29 3
Family history of CVD 0.40 4 0.38 4
Raised waist-hip ratio 0.53 5 0.45 4
Stress 0.37 4 0.28 3
sprinkling salt on food at table 0.54 5 0.29 3
Low consumption of leafy green vegetable 0.98 10 0.81 8
Regular sugar consumption 0.22 2 0.21 2
Meat consumption 0.58 6 0.43 4
Physical inactivity 0.54 5 0.21 2

β - Regression coefficients; βc – Constant-weighted Regression Coefficient; βw – Standardized-weighted Regression Coefficient

The range of possible total raw scores for an individual by our risk scoring system was 0 to 104 or 0 to 86 using either the constant weighting or standardized weighting approach respectively (and 0 to 100 for the transformed score). It should be noted that all predictive variables were dichotomized as either present or absent for each subject. Table 3 shows the classification of stroke cases and controls by the total risk score categories using a total score of 7 as the class intervals for the weighting approaches.

Table 3:

Classification of study subjects by total risk score for all stroke types in Training and Test Datasets

Training Dataset
Test Dataset
Score Range Control (N=3020) Cases (N=3020) Control (N=533) Cases (N=533)

Constant weighting [10*β] 0 – 6 13 7 1 0
7 – 13 107 23 13 1
14 – 20 222 19 19 11
21 – 27 304 21 36 2
28 – 34 315 46 49 11
35 – 41 301 64 54 12
42 – 48 361 144 70 22
49 – 55 414 307 68 43
56 – 62 429 585 92 97
63 – 69 360 882 45 100
70 – 76 134 547 43 70
77 – 83 53 301 38 88
84 – 90 7 65 5 52
91 – 97 0 8 0 21
98 – 100 0 1 0 3
Standardized weighting [10*βw] Score Range Control (N=3020) Cases (N=3020) Control (N=533) Cases (N=533)
0 – 6 13 7 4 0
7 – 13 90 22 10 1
14 – 20 201 17 32 12
21 – 27 332 19 29 1
28 – 34 272 36 49 13
35 – 41 294 63 50 9
42 – 48 333 121 70 25
49 – 55 397 237 92 44
56 – 62 419 506 65 92
63 – 69 361 719 48 92
70 – 76 196 647 49 90
77 – 83 88 459 27 73
84 – 90 22 135 6 52
91 – 97 2 30 2 27
98 – 100 0 2 0 2

βc – Constant-weighted Regression Coefficient; βw – Standardized-weighted Regression Coefficient

The validity characteristics of the risk scoring system at various cut-offs are shown for both the training and test sets in Table 4. The Cohen’s kappa for validity was maximal at a total risk score of 56% for both the constant and the standardized weighting methods. At this cut-off, the scoring system yielded a sensitivity score of 80.9% (specificity =58.2%) for the constant weighting, while for the standardized weighting, sensitivity was 80.3% (specificity =63.0%) for the test dataset. The ROC curve for the scoring system using the two weighting approaches produced the same AUC of 0.76, p<0.001 for the test dataset (Table 4 and Figure 1).

Table 4:

Performance of the risk score for all stroke types using training and test dataset

Training Dataset
Test Dataset
Cut off for the risk score Se% Sp% PPV % NPV % K Se% Sp% PPV % NPV % K

Constant weighting [10*β] ≥ 7 99.8 0.4 50.0 65.0 0.002 100.0 0.2 50.0 100 0.002
≥ 14 99.0 4.0 50.8 80.0 0.030 99.8 2.6 50.6 93.3 0.024
≥ 21 98.4 11.3 52.6 87.5 0.097 97.7 6.2 51.0 73.3 0.039
≥ 28 97.7 21.4 55.4 90.2 0.191 97.4 12.9 52.8 83.1 0.103
≥ 35 96.2 31.8 58.5 89.2 0.280 95.3 22.1 55.0 82.5 0.174
≥ 42 94.0 41.8 61.8 87.5 0.358 93.1 32.3 57.9 82.3 0.253
≥ 49 89.3 53.7 65.9 83.4 0.430 88.9 45.4 62.0 80.4 0.343
≥ 56 79.1 76.3 70.8 76.3 0.466 80.9 58.2 65.9 75.2 0.390
≥63 59.7 81.7 76.5 67.0 0.414 62.7 75.4 71.8 66.9 0.381
≥ 70 30.5 93.6 82.6 57.4 0.241 43.9 83.9 73.1 59.9 0.278
≥ 77 12.4 98.0 86.2 52.8 0.104 30.8 91.9 79.2 57.0 0.227
≥ 84 2.5 99.8 91.4 50.6 0.022 14.3 99.1 93.8 53.6 0.133
≥ 91 0.3 100 100 50.1 0.003 4.5 100 100 51.2 0.045
≥98 0.0 100 100 50.0 0.000 0.6 100 100 50.1 0.006
AUC (95%CI): 0.80(0.79 –0.81); p<0.001 AUC (95%CI): 0.76(0.73– 0.79); p<0.001
Standardized weighting [10*βw] Cut off for the risk score Se% Sp% PPV % NPV % K Se% Sp% PPV % NPV % K
≥ 7 99.8 0.4 50.0 65.0 0.002 100 0.8 50.2 100 0.008
≥ 14 99.0 78.0 50.6 78.0 0.025 99.8 2.6 50.6 93.3 0.024
≥ 21 98.5 86.9 52.3 86.9 0.085 97.6 8.6 51.6 78.0 0.062
≥ 28 97.8 21.1 55.3 90.7 0.189 97.4 14.1 53.1 84.3 0.114
≥ 35 96.7 30.1 96.7 90.0 0.267 94.9 23.3 55.3 82.1 0.182
≥ 42 94.6 39.8 61.1 88.0 0.344 93.2 32.6 58.1 82.9 0.259
≥ 49 90.6 50.8 64.8 84.3 0.414 88.6 45.8 62.0 80.0 0.343
≥ 56 82.7 64.0 69.7 78.7 0.467 80.3 63.0 68.5 76.2 0.433
≥63 66.0 77.8 74.9 69.6 0.438 63.0 75.2 71.8 67.1 0.383
≥ 70 42.2 60.8 80.5 60.8 0.320 45.8 84.2 74.4 60.8 0.300
≥ 77 20.7 96.3 84.8 54.8 0.170 28.9 93.4 81.5 56.8 0.223
≥ 84 5.5 99.2 87.4 51.2 0.047 15.2 98.5 91.0 53.7 0.137
≥ 91 1.1 99.9 94.1 50.2 0.010 5.4 99.6 93.5 51.3 0.051
≥98 0.1 100 100 50.0 0.001 0.4 100 100 50.1 0.004
AUC (95%CI): 0.80(0.79 – 0.81); p<0.001 AUC (95%CI): 0.76(0.74–0.79); p<0.001

Se-Sensitivity; Sp-Specificity; PPV-Positive Predictive Value; NPV-Negative Predictive Value; K-Cohen’s Kappa; AUC-Area Under the ROC; CI-Confidence Interval.

Fig. 1.

Fig. 1.

ROC Curves for training and testing dataset for all strokes using constant and standardised weighting.

3.2. Risk score model for occurrence of ischemic strokes

Among 2,203 ischemic strokes and matching controls, 12 risk factors were included in the final model using the training set (Supplementary Table 1). The statistical weights and classification of stroke cases and controls using the scoring systems are shown in Tables S2 and S3. The validity characteristics of the risk scoring system at various cut-off are presented for the training and test datasets in Table 5. The Cohen’s kappa was maximal at a total risk score of 56% for both the constant and the standardized weighting methods with AUCs of 0.78, p<0.001 (Figure 2) for the test dataset.

Table 5:

Performance of the risk score for Ischemic stroke types using training and test dataset

Training Dataset Test Dataset
Cut off for the risk score Se% Sp% PPV % NPV % K Se% Sp% PPV % NPV % K
Constant weighting [10*β] ≥ 8 99.7 1.5 50.3 82.4 0.012 100 0.9 50.2 100 0.009
≥ 16 99.2 6.4 51.5 88.9 0.056 98.5 3.6 50.5 70.6 0.021
≥ 24 98.5 17.3 54.4 92.0 0.158 97.6 12.4 52.7 83.7 0.100
≥ 32 97.3 24.9 56.5 90.3 0.223 97.0 18.8 54.4 86.1 0.158
≥ 40 93.3 42.1 61.7 86.3 0.354 93.9 31.8 57.9 84.0 0.258
≥ 48 89.2 54.3 66.1 83.4 0.435 90.6 47.3 63.2 83.4 0.379
≥ 56 75.5 72.3 73.2 74.7 0.478 80.0 62.4 68.0 75.7 0.424
≥ 64 58.0 83.6 78.0 66.6 0.416 59.4 77.9 72.9 65.7 0.373
≥72 31.3 94.9 85.9 58.0 0.262 43.9 88.2 78.8 61.1 0.321
≥ 80 14.4 97.9 87.1 53.3 0.122 23.3 97.0 88.5 55.8 0.203
≥ 88 2.7 99.9 96.2 50.7 0.026 7.0 99.7 95.8 99.7 0.067
≥96 0.3 100 100 50.1 0.003 1.2 100 100 50.3 0.012
AUC (95%CI): 0.81(0.80 – 0.82); p<0.001 AUC (95%CI): 0.78(0.74 – 0.81); p<0.001
Standardized weighting [10*βw] Cut off for the risk score Se% Sp% PPV % NPV % K Se% Sp% PPV % NPV % K
≥ 8 99.7 1.5 50.3 82.4 0.012 100.0 0.9 50.2 100 0.009
≥ 16 98.9 10.5 52.5 90.8 0.095 98.5 7.6 51.6 83.3 0.061
≥ 24 98.5 17.4 54.4 92.1 0.159 97.6 12.4 52.7 83.7 0.100
≥ 32 97.8 23.8 56.2 91.4 0.215 97.0 19.7 54.7 86.7 0.167
≥ 40 93.4 41.4 61.5 86.3 0.349 93.9 33.9 58.7 84.8 0.279
≥ 48 87.5 58.6 67.9 82.4 0.461 90.0 48.5 63.6 82.9 0.385
≥ 56 77.6 71.9 73.4 76.3 0.495 81.5 63.9 69.3 77.6 0.455
≥ 64 60.7 82.5 77.6 67.7 0.432 58.2 78.8 73.3 65.3 0.370
≥72 38.3 93.2 84.9 60.2 0.314 38.8 89.7 79.0 59.4 0.285
≥ 80 16.1 97.7 87.2 53.8 0.137 22.7 97.3 89.3 55.7 0.200
≥ 88 4.3 99.7 93.0 51.0 0.040 4.8 99.7 94.1 51.2 0.045
≥96 0.3 50.1 100 50.0 0.003 0.6 100 100 50.2 0.006
AUC (95%CI): 0.81(0.80–0.83); p<0.001 AUC (95%CI): 0.78(0.74 – 0.81); p<0.001

Se-Sensitivity; Sp-Specificity; PPV-Positive Predictive Value; NPV-Negative Predictive Value; K-Cohen’s Kappa; AUC-Area Under the ROC; CI-Confidence Interval.

Fig. 2.

Fig. 2.

ROC Curves for training and testing dataset for Ischemic strokes using constant and standardised weighting.

3.3. Risk score model for occurrence of hemorrhagic strokes

Among 937 hemorrhagic stroke cases and 937 matching controls, 7 risk factors were included in the final model using the training set (Supplementary Table S4). These include income <$100, hypertension, dyslipidemia, diabetes mellitus, raised waist-hip ratio, sprinkling of salt on food at table and low consumption of green leafy vegetables. The statistical weights and classification of stroke cases and controls using the scoring systems are shown in Tables S5 and S6. The validity characteristics of the risk scoring system at various cut-off are shown in Table 6. The Cohen’s kappa was maximal at a total risk score of 63% for both the constant and the standardized weighting methods with AUCs of 0.78, p<0.001 or 0.78, p<0.001 (Figure 3) for the test dataset, depending on the weighting approach used.

Table 6:

Performance of the risk score for Hemorrhagic stroke types using training and test dataset

Training Dataset
Test Dataset
Cut off for the risk score Se% Sp% PPV % NPV % K Se% Sp% PPV % NPV % K

Constant weighting [10*β] ≥ 9 99.5 11.4 52.9 11.4 0.109 97.9 9.9 52.1 82.4 0.078
≥ 18 98.9 28.4 58.0 96.2 0.273 97.2 26.2 56.8 90.2 0.234
≥ 27 98.4 41.1 62.5 96.2 0.395 96.5 39.0 61.3 91.7 0.355
≥ 36 98.2 45.6 64.4 96.3 0.439 96.5 48.9 65.4 93.2 0.454
≥ 45 98.0 46.7 64.8 95.9 0.447 96.5 49.6 65.7 93.3 0.461
≥ 54 95.6 50.3 65.8 92.0 0.459 92.2 52.5 66.0 87.1 0.447
≥ 63 83.5 66.4 71.3 80.1 0.499 87.9 61.7 69.7 83.7 0.496
≥ 72 49.9 84.7 76.5 62.8 0.346 69.5 70.2 70.0 69.7 0.397
≥81 16.8 97.2 85.9 53.9 0.140 47.5 85.1 76.1 61.9 0.326
≥ 90 1.4 100 100 50.3 0.014 8.5 98.6 85.7 51.9 0.071
≥ 99 0.1 100 100 50.0 0.001 2.0 100 100 50.5 0.021
AUC (95%CI): 0.81(0.79 – 0.83); p<0.001 AUC (95%CI): 0.78(0.73 – 0.83);p<0.001
Standardized weighting [10*βw] Cut off for the risk score Se% Sp% PPV % NPV % K Se% Sp% PPV % NPV % K
≥ 9 99.5 7.6 51.9 93.8 0.071 97.9 10.6 52.3 83.3 0.085
≥ 18 98.9 24.3 56.6 95.6 0.232 97.2 21.3 55.2 88.2 0.184
≥ 27 98.5 36.2 60.7 96.0 0.347 96.5 39.0 61.3 91.7 0.355
≥ 36 98.2 45.6 64.4 96.3 0.439 96.5 46.1 64.2 92.9 0.426
≥ 45 98.0 46.7 64.8 95.9 0.447 96.5 49.6 65.7 93.3 0.461
≥ 54 96.4 49.4 65.6 93.1 0.457 92.2 52.5 66.0 87.1 0.447
≥ 63 87.0 62.5 69.9 82.8 0.495 87.9 61.7 69.7 83.7 0.496
≥ 72 57.0 81.2 75.2 65.4 0.382 69.5 70.9 70.5 69.9 0.404
≥81 28.3 93.4 81.0 56.6 0.217 39.7 87.2 75.7 59.1 0.270
≥ 90 3.4 99.5 87.1 50.7 0.029 8.5 98.6 85.7 51.9 0.071
≥ 99 0.1 100 100 50.0 0.001 2.1 100 100 50.5 0.021
AUC (95%CI): 0.81(0.79–0.83); p<0.001 AUC (95%CI): 0.79(0.73 – 0.84); p<0.001

Se-Sensitivity; Sp-Specificity; PPV-Positive Predictive Value; NPV-Negative Predictive Value; K-Cohen’s Kappa; AUC-Area Under the ROC; CI-Confidence Interval.

Fig. 3.

Fig. 3.

ROC Curves for training and testing dataset for Hemorrhagic strokes using constant and standardised weighting.

3.4. Risk score models for occurrence of strokes by sex

The aggregate risk score information by sex for 1,938 male (Tables S7 to S10) and 1,615 female (Tables S11 to S14) case-control pairs are shown in the supplementary data section.

4.0. DISCUSSION

This is the first effort aimed at developing a stroke risk scoring instrument for indigenous Africans. Using a case-control methodology among 3,553 pairs of stroke cases and stroke-free controls, our method was able to identify 13 potentially modifiable (income level <$100, education, hypertension, dyslipidemia, diabetes mellitus, cardiac disease, raised waist-hip ratio, stress, sprinkling salt on food at table, low-consumption of green leafy vegetable, regular sugar consumption, physical inactivity and regular meat consumption) and two non-modifiable (advancing age and family history of CVD) risk factors associated with occurrence of stroke. Due to imprecise matching of age, we included age as a covariate in our models. By assigning weights to these individual risk factors according to their β coefficients obtained from multivariate logistic regression models, hypertension, dyslipidemia, diabetes and low consumption of green leafy vegetables were found to be the top four modifiable risk factors which attracted the highest weights with stress having the lowest weight using both constant and standardized weighting approaches. The rationale for ascribing weights to risk factors is similar to how regression coefficients obtained from Cox regression hazards for prospective studies are utilized in the development of risk prediction models.210

In the present study, when the test dataset was used to assess the performance of our scoring system, the overall predictive accuracy of the risk scoring instrument (for the occurrence of stroke taking into account 15 risk factors), had a predictive probability measured using Area Under Curve (AUC) of 76% (95% CI of 73 to 79%) or 76% (95% CI of 74 to 79%) depending on the weighting approach used. Similar diagnostic properties were obtained for ischemic stroke models using 12 risk factors and 7 risk factors for hemorrhagic stroke with AUC of 78% (95%CI: 74 to 81%) for Ischemic stroke (irrespective of weighting approach) and 785% (95%CI: 73% to 83%) or 79% (95%CI: 73 to 84%) for Hemorrhagic stroke, depending on the weighting approach used. By far, the most popular stroke risk prediction model is the Framingham Stroke Risk Score (FSRS) which incorporates systolic blood pressure, diabetes mellitus, cigarette smoking, prior cardiovascular disease, atrial fibrillation, left ventricular hypertrophy and use of antihypertensive medications.1,23 The AUC of the factors in the FSRS for prediction of stroke have been fair to good, ranging from 65% (95% CI: 62 – 67%) from the Atherosclerosis Risk in Communities (ARIC) cohort,24 60% (95% CI: 58 – 63%) for the Cardiovascular Health Study (CHS)25, 71% (95%CI: 67 – 75%) from the Framingham Heart Study (FHS)26 and 59% (95%CI:57 – 61%) the first cohort of the Rotterdam Study (RS)27. A meta-analysis shows that the AUC for the 7 factors in the FSRS from these four cohorts was 62% (95% CI: 61 – 63%).28 However, it should be emphasized that these AUCs for the FSRS are not directly comparable with those of ours due to differences in study design and methodology used to derive its predictive accuracy. For this comparison, an Indian study involving hemorrhagic stroke cases and controls (n=166) using a similar analytic approach using 5 factors namely hypertension, raised serum total cholesterol, use of anticoagulants and antiplatelet agents, past history of transient ischemic attack and alcohol had a predictive accuracy of 79% with a similar 95% confidence interval of 73 to 84%.29

There was however an appreciable overlap in cut-off scores in discriminating between cases and controls (Tables 3, 5 and 6) in our study. Thus, in a broader sense the discriminatory properties based on area-under curve plots were ‘good’ but not excellent and may require further refinements. For instance, an item such as hypertension was analyzed at ‘yes’ or ‘no’ but could be further categorized based on severity of blood pressure values to increase the discriminatory quality of this ubiquitous risk factor present among stroke and stroke-free subjects. Furthermore, dyslipidemia, which is a composite term defined in this study as elevations in total cholesterol, LDL-cholesterol, triglyceride concentration or decrement in HDL-cholesterol may have to be dis-aggregated and tested individually or as ratios or using different cut-off values for lipid sub-fractions to further separate cases from controls. Again, dietary factors such as sprinkling of salt on food at table, low consumption of green vegetable, regular sugar consumption and meat consumption may be constructed into a risk dietary score which considers tiers of regularity of dietary practices. An advantage with the approach we have adapted in developing this risk scoring system is in its good predictive accuracy of about 79% and its basis on a broad range of putative risk factors associated with stroke occurrence in Africa. Furthermore, as posited by Herman and colleagues30, we deployed a somewhat back-validation method by splitting the same dataset to training and test sets to validate our risk scoring system. Admittedly our approach is unconventional in the development of risk score, however there are instances in literature where such an approach has been utilized.15,3135 Going forward, it would be ideal to validate this in other dataset available on the continent such as the Cardiovascular H3Africa Innovation Resource (CHAIR) resource from the H3Africa networks.36,37

Despite the advantages, and possible improvements suggested, further validation is required using community dwelling, stroke-free, prospective cohorts. At the moment, there are no such longitudinal studies on-going in Africa largely due to limited funding available to undertake such an ambitious research projects as done in several high-income countries. The literature shows that some of these studies have accrued data from electronic health records and insurance claims records which are available in some African countries. Ascertainment bias for some risk factors such as stress, dietary history, smoking and alcohol intake might have been present in the stroke cases because those who were aphasic or unconscious on admission had these risk factors assessed using reliable proxies. We also acknowledge issues of selection bias as a concern in the use of case-control studies for predictive models. These challenges notwithstanding, this is the first attempt at developing an Afrocentric stroke risk scoring scale.

The information provided from these models are intended to be utilized for primary prevention drive for stroke and its primary types on the continent. Ultimately, the SIREN Afrocentric risk score can be further enhanced by integration of polygenic risk score that incorporates aggregate genetic risk from single nucleotide polymorphisms associated with stroke. A study which looked at incorporating 324 SNPs associated with stroke and its risk factors resulted in small improvement in prediction of future stroke using the Framingham Stroke Risk Score compared to the classical epidemiological risk factors for stroke.28. The enhancement in our own population may be higher due to the higher heritability of stroke in people of African ancestry.38

5.0. CONCLUSION

We have developed aggregate risk assessment scores for stroke and its primary types in indigenous Africans. Population-based prospective cohort studies are required to refine and validate this novel tool. This will facilitate risk factor assessment for primary prevention of stroke and indeed other cardiovascular diseases in Africa at the population level.

Supplementary Material

1

HIGHLIGHTS.

  • No Afro-centric aggregate stroke risk scoring instrument is available for indigenous Africans

  • Using data from the SIREN study, we developed a novel stroke riskometer for indigenous Africans

  • The instrument included 15 risk factors for stroke occurrence, 12 for Ischemic stroke and 7 for hemorrhagic stroke.

  • It had a predictive accuracy of 76%, sensitivity of 80% and specificity of 63%

  • Prospective studies are required to further validate this novel instrument for population-wide deployment to boost stroke prevention in Africa

Acknowledgements

The SIREN (U54HG007479) and SIBS Genomics (R01NS107900) studies are funded by the National Institutes of Health under the H3Africa initiative. Investigators are further supported by NIH grant SIBS Gen Gen R01NS107900-02S1; ARISES R01NS115944-01; and H3Africa CVD Supplement 3U24HG009780-03S5.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  • 1.Wolf PA, D’Agostino RB, Belanger AJ, Kannel WB. Probability of stroke: a risk profile from the Framingham study. Stroke. 1991;22:312–318. [DOI] [PubMed] [Google Scholar]
  • 2.Assmann G, Cullen P, Schulte H. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the prospective cardiovascular Munster (PROCAM) study. Circulation. 2002;105(3):310–5. [DOI] [PubMed] [Google Scholar]
  • 3.White H, Boden-Albala B, Wang C, Elking MS, Rundek T, Wright CB, et al. Ischemic stroke subtype incidence among whites, blacks, and Hispanics: the Northern Manhattan Study. Circulation. 2005;111:1327–1331. [DOI] [PubMed] [Google Scholar]
  • 4.Ridker PM, Buring JE, Rifai N, Cook NR. Development and validation of mproved algorithms for the assessment of global cardiovascular risk in women: the Reynolds risk score. JAMA. 2007;297:611–619. [DOI] [PubMed] [Google Scholar]
  • 5.Wu Y, Liu X, Li X, et al. Estimation of 10-year risk of fatal and nonfatal ischemic cardiovascular diseases in Chinese adults. Circulation. 2006;114(21):2217–25. [DOI] [PubMed] [Google Scholar]
  • 6.Woodward M, Brindle P, Tunstall-Pedoe H, et al. Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohert (SHHEC). Heart. 2007;93(7):172–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ferrario M, Chiodini P, Chambless LE, et al. Prediction of coronary events in a low incidence population. Assessing accuracy of the CUORE Cohort Study prediction equation. International Journal of epidemiology. 2005;34(2):413–415. [DOI] [PubMed] [Google Scholar]
  • 8.Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of the QRISK2. BMJ. 2008; 336(7659):1475–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Perk J, De Backer G, Gohlke H, et al. European Guidelines on cardiovascular disease prevention in clinical practice (version 2012). The Fifth Joint Task Force of the European Society of Cardiology and Other Societies on Cardiovascular Disease Prevention in Clinical Practice. European Heart Journal. 2012; 33(13):1635–701. [DOI] [PubMed] [Google Scholar]
  • 10.Jae-Woo Lee A, Hyun-Sun Lim B, Dong-Wook Kim B, Soon-Ae Shin C, Jinkwon Kim D, Bora Yoo E, Kyung-Hee Cho E. The development and implementation of stroke risk prediction model in National Health Insurance Service’s personal health record. Elsevier Computer Methods and Programs in Biomedicine. 2018; 153:253–257. [DOI] [PubMed] [Google Scholar]
  • 11.Lackland DT, Elkind ME, D’Agostino R, Dhamoon MS, Goff DC, Higashida RT, et al. Inclusion of Stroke in Cardiovascular Risk Prediction Instruments: A Statement for Healthcare Professionals from the American Heart Association/American Stroke Association. Stroke 43:1998–2027, 2012. [DOI] [PubMed] [Google Scholar]
  • 12.Owolabi MO, Akarolo-Anthony S, Akinyemi R, Arnett D, Gebregziabher M, et al. The burden of stroke in Africa: a glance at the present and a glimpse into the future. Cardiovasc J Afr. 2015;26(2 Suppl 1):S27–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Akpalu A, Sarfo FS, Ovbiagele B, Akinyemi R, Gebregziabher M, Obiako R, et al. Phenotyping stroke in sub-Saharan Africa: Stroke Investigative Research and Education Network (SIREN) Phenomics protocol. Neuroepidemiology. 2015;45(2):73–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sarfo F, Gebregziabher M, Ovbiagele B, Akinyemi R, Owolabi L, Obiako R, et al. Multilingual validation of the Questionnaire for verifying stroke-free status in West Africa. Stroke. 2016;47(1):167–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li H, Luo M, Zheng J, Luo J, Zeng R, Feng N, Du Q, Fang J. An artificial neural network prediction model of congenital heart disease based on risk factors: A hospital-based case-control study. Medicine (Baltimore). 2017. February;96(6):e6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Greenberg RS, Ibrahim MA. The case-control study. In:Holland WW, Detels R, Knox G. editors. Textbook of public health. Oxford: Oxford University Press, 1985. [Google Scholar]
  • 17.WHO (1997). WHOQOL-BREF With Scoring Instructions Updated 01–10-14. https://dokumen.tips/documents/whoqol-bref-with-scoring-instructionsupdated-01-10-14.html.
  • 18.Sturges HA. The choice of a class interval. Journal of American Statistical Association 1926; 21:65–66. [Google Scholar]
  • 19.Scott DW. On optimal and data-based histograms. Biometrika 1979; 66:605–610. [Google Scholar]
  • 20.Scott DW. Sturges’ rule. WIREs Computational Statistics, 2009;Volume 1, Issue 3. 10.1002/wics.35. [DOI] [Google Scholar]
  • 21.Sackett DL, Haynes RB, Guyatt GH, Tuowell P. Clinical epidemiology: A basic science for clinical medicine. 2nd ed. London: Little, Brown and company. 1991. [Google Scholar]
  • 22.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36. [DOI] [PubMed] [Google Scholar]
  • 23.D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. The Framingham Study. Stroke. 1994. January; 25(1):40–3. [DOI] [PubMed] [Google Scholar]
  • 24.The ARIC investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol. 1989. April; 129(4):687–702. [PubMed] [Google Scholar]
  • 25.Fried LP, Borhani NO, Enright P, Furberg CD, Gardin JM, Kronmal RA, et al. The Cardiovascular Health Study: design and rationale. Ann Epidemiol. 1991. February; 1(3):263–76. [DOI] [PubMed] [Google Scholar]
  • 26.Feinleib M, Kannel WB, Garrison RJ, McNamara PM, Castelli WP. The Framingham Offspring Study. Design and preliminary data. Prev Med. 1975. December; 4(4):518–25. [DOI] [PubMed] [Google Scholar]
  • 27.Hofman A, Breteler MM, van Duijn CM, Janssen HL, Krestin GP, Kuipers EJ, et al. The Rotterdam Study: 2010 objectives and design update. Eur J Epidemiol. 2009; 24(9):553–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ibrahim-Verbaas CA, Fornage M, Bis JC, Choi SH, Psaty BM, Meigs JB, et al. Predicting stroke through genetic risk functions: The CHARGE risk score project. Stroke. 2014;45(2):403–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zodpey SP, Tiwari RR. A risk scoring system for prediction of haemorrhagic stroke. Indian J Public Health. 2005;49(4):218–22. [PubMed] [Google Scholar]
  • 30.Herman AAB, Irwio LM, Groeneveld HT. Evaluating obstetric risk scores by receiver operating characteristic curves. Am J Epidemiol 1988; 127:831–42. [DOI] [PubMed] [Google Scholar]
  • 31.Zodpey SP, Tiwari RR. A risk scoring system for prediction of haemorrhagic stroke. Indian J Public Health. 2005. Oct-Dec;49(4):218–22 [PubMed] [Google Scholar]
  • 32.Li XW, Jiang YJ, Wang XQ, Yu JL, Li LQ. A scoring system to predict mortality in infants with esophageal atresia: A case-control study. Medicine (Baltimore). 2017. August;96(32):e7755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fava C, Sjögren M, Olsson S et al. A genetic risk score for hypertension associates with the risk of ischemic stroke in a Swedish case–control study. Eur J Hum Genet 23, 969–974 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.McGorrian C, Yusuf S, Islam S, Jung H, Rangarajan S, Avezum A, Prabhakaran D, Almahmeed W, Rumboldt Z, Budaj A, Dans AL, Gerstein HC, Teo K, Anand SS; INTERHEART Investigators. Estimating modifiable coronary heart disease risk in multiple regions of the world: the INTERHEART Modifiable Risk Score. Eur Heart J. 2011. March;32(5):581–9. [DOI] [PubMed] [Google Scholar]
  • 35.Colak MC, Colak C, Kocatürk H, Sağiroğlu S, Barutçu I. Predicting coronary artery disease using different artificial neural network models. Anadolu Kardiyol Derg. 2008. August;8(4):249–54. [PubMed] [Google Scholar]
  • 36.Akpa OM, Made F, Ojo A, Ovbiagele B, Adu D, Motala AA, et al. as members of the CVD Working Group of the H3Africa Consortium. Regional Patterns and Association Between Obesity and Hypertension in Africa :Evidence From the H3Africa CHAIR Study. Hypertension. 2020;75:00–00. OI: 10.1161/HYPERTENSIONAHA.119.14147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Owolabi M, Akpa OM, Made F, Adebamowo SN, Ojo A, Adu D, et al. as members of the CVD Working Group of the H3Africa Consortium Data Resource Profile: Cardiovascular H3Africa Innovation Resource (CHAIR). International Journal of Epidemiology 2018, 1–9 doi: 10.1093/ije/dyy261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Traylor M, Rutten-Jacobs L, Curtis C, et al. Genetics of stroke in a UK African ancestry case-control study: South London Ethnicity and Stroke Study. Neurol Genet 2017;3:e142. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES