Gruenewald et al. 10.1073/pnas.0606215103. |
Supporting Figure 3
Supporting Figure 4
Supporting Table 4
Supporting Table 5
Supporting Table 6
Supporting Text
Fig. 3.
Survival over the 12-year follow-up period as a function of representation in varying numbers of HR pathways in the male forest from analyses of incomplete data.Fig. 4.
Survival over the 12-year follow-up period as a function of representation in varying numbers of HR pathways in the female forest from analyses of incomplete data.Table 4. Biomarkers present in HR pathways across trees in the male and female forests (analyses with incomplete data)
Males | Females | |||||||||||||||||||||||||||||||
Trees: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 1 | 2 | 3 | 4 | 5 | 6 | ||||||||||||||||||
HR Pathway: | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 1 | 2 | 1 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 1 | 2 | 3 | |
SBP | l | |||||||||||||||||||||||||||||||
DBP | ||||||||||||||||||||||||||||||||
HDL chol. | l | l | ||||||||||||||||||||||||||||||
Total/HDL chol. | l | |||||||||||||||||||||||||||||||
CRP | l | l | l | l | l | l | l | l | l | l | l | l | l | |||||||||||||||||||
IL-6 | l | l | l | l | l | l | l | l | l | l | l | l | l | l | l | l | l | l | ||||||||||||||
Fibrinogen | l | l | l | l | l | l | ||||||||||||||||||||||||||
DHEA | l | l | l | |||||||||||||||||||||||||||||
Cortisol | l | l | l | l | l | l | l | l | l | |||||||||||||||||||||||
NE | l | l | l | l | l | l | ||||||||||||||||||||||||||
EPI | l | l | l | l | l | |||||||||||||||||||||||||||
HbA1c | ||||||||||||||||||||||||||||||||
Albumin | l |
A dot (
l) indicates that the biomarker in the row of the dot is present in the HR pathway represented by the column of the dot.Table 5. Cut points on biomarkers that determine HR zones across trees in male and female forests (analyses with incomplete data)
Males | Females | Conventional | |
SBP, mmHg | >164 | >140 | |
DBP, mmHg | <75, >90 | ||
HDL cholesterol, mg/dl | <32.5, 44.5 | <40 | |
Total/HDL cholesterol | >4.7 | >5 | |
CRP, mg/liter | >1.5, 2.3, 2.8, 3.3 | >1.1 | >3.0 |
IL-6, pg/ml | >1.2, 2, 2.5, 2.6, 2.9 | >1.2, 4.6, 2.4, 6.3 | |
Fibrinogen, mg/dl | >252.5, 259.9, 345.5 | >356.5 | |
DHEA, mg/dl | <11.5, 14.5, 30.5 | ||
Cortisol, ug/g creatinine | >9.2, 9.6, 12.3, 24.8 | ||
NE, ug/g creatinine | >23.3, 44.8, 58.2, 64.2 | >43.8 | |
EPI, ug/g creatinine | >1.5, 1.7, 2.8 | ||
HbA1c, % | >7 | ||
Albumin, mg/dl | >4.3 | <3.5 OR >5 |
CRP, C-reactive protein; DHEA, dehydroepiandrosterone; DBP, diastolic blood pressure; EPI, epinephrine; HbA1c, glycosylated hemoglobin; HDL, high-density lipoprotein; HR, high-risk; NE, norepinephrine; RP, recursive partitioning; SBP, systolic blood pressure.
Table 6. Sensitivity and specificity of mortality prediction for HR tree pathways
Sensitivity/specificity of pathways from individual trees | Sensitivity/specificity of representation in multiple HR pathways | |||||||
Training sample | Testing sample | |||||||
Tree | Sensitivity, % | Specificity, % | Sensitivity, % | Specificity, % | k+ (no. of HR pathways) | Sensitivity, % | Specificity, % | |
Males: | Males: | |||||||
1 | 33.0 | 94.0 | 24.5 | 79.2 | 1 | 72.2 | 64.1 | |
2 | 37.4 | 95.3 | 16.7 | 78.1 | 2 | 51.4 | 79.2 | |
3 | 34.6 | 91.3 | 34.3 | 83.6 | 3 | 35.2 | 90.2 | |
4 | 32.4 | 95.9 | 15.7 | 80.8 | 4 | 26.1 | 93.9 | |
5 | 33.5 | 95.0 | 24.4 | 85.8 | 5 | 14.4 | 97.6 | |
6 | 24.8 | 96.4 | 13.0 | 84.9 | 6 | 9.9 | 98.8 | |
7 | 32.5 | 89.4 | 24.6 | 86.4 | 7 | 6.3 | 99.6 | |
8 | 24.1 | 97.9 | 15.3 | 90.3 | 8 | 3.2 | 100.0 | |
Females: | Females: | |||||||
1 | 10.5 | 99.6 | 3.2 | 97.3 | 1 | 37.7 | 84.0 | |
2 | 14.0 | 97.7 | 7.5 | 92.0 | 2 | 30.0 | 88.6 | |
3 | 23.1 | 93.6 | 9.6 | 88.6 | 3 | 13.5 | 95.8 | |
4 | 20.9 | 94.3 | 8.2 | 89.8 | 4 | 5.8 | 99.1 | |
5 | 16.9 | 97.7 | 9.1 | 92.7 | 5 | 1.4 | 99.3 | |
6 | 25.4 | 95.3 | 14.3 | 89.6 | 6 | 0.0 | 100.0 |
Supporting Text
Analyses With Incomplete Data.
Two primary strategies have been used with recursive partitioning (RP) tree construction algorithms when values are missing for one or more of the predictor variables: surrogate splits and the "missings together" approach [see Zhang and Singer (1) for a technical discussion of both methods]. Surrogate splits utilizes information on other predictors where data values are present to decide on an imputed value for the missing observation. Although useful in many contexts, the absence of a biological basis for using observed levels of combinations of biomarkers to reliably predict the level of a missing different biomarker level suggested that we not use this approach in the present article. The missings together algorithm forces all subjects with a missing value on a given variable into the same daughter node at a given split. This methodology, of course, introduces some errors in tree construction relative to what one might see with complete data. However, it is much less dependent on possibly unwarranted biological assumptions, and for this reason we have used it in the present analyses.RP analyses were repeated with random subsamples of male and female participants that included participants with incomplete data on one or more of the 13 biomarker predictors. A large forest of 51 trees was grown in the four random training subsamples in males, and a forest of 33 trees was produced for females, using the same tree-growing criteria as was used in analyses with complete data. A total of 8 trees in males and 6 trees in females were selected (using the same selection criteria as was used in analyses of complete data) as the set of trees representing the male and female forests.
The combinations of biomarkers present in high-risk pathways in male and female forests are detailed in Table 4. Similar to analyses in males using complete data subsamples, hormone and inflammatory biomarkers appeared frequently in high-risk (HR) pathways. Systolic blood pressure (SBP), high-density lipoprotein (HDL) cholesterol, and albumin were also present in HR pathways in males. Inflammatory biomarkers appeared more frequently in HR pathways in females as compared with trees produced from analyses using complete data. Blood pressure and glycosylated hemoglobin biomarkers were not present in HR pathways in the female forest produced from analyses with incomplete data; dehydroepiandrosterone (DHEA) was present in analyses using both complete and incomplete data. For those biomarkers that appeared in HR pathways in analyses of both complete and incomplete data, HR biomarker zones derived from the two sets of analyses were generally similar, although specific cut points varied somewhat (see Table 5).
In three of the trees (tree 2, path 4 and tree 4, path 3 in males; tree 6, path 3 in females), a path with a high rate of mortality in the terminal node (qualifying it as an HR pathway) was present that did not contain any HR biomarker conditions. The occurrence of these pathways may signal deficiencies in the use of data with incomplete information in clearly identifying biomarker combinations associated with a high rate of mortality, because such pathways were not present in the forest of trees derived from analyses using complete data.
Compared with analyses using subsamples of participants with complete data on the 13 biomarkers, the sensitivity of mortality prediction was much lower for individual trees in the male and female forests obtained from analyses using incomplete data in the training and testing subsamples (see Table 6). Specificity rates for individual trees were adequate in both training and testing samples (see Table 6). As with analyses using complete data subsamples, the sensitivity of mortality prediction for HR pathways improves when examining this prediction at the level of the forest. The highest value obtained for the sensitivity of mortality prediction for representation in a HR pathway in any single tree from the male forest was 37.4% (see Table 6). This value increases when examining sensitivity of mortality prediction for representation in one or more HR pathways in the forest of eight trees, for which the sensitivity value is 72.2% (sensitivity for representation in two or more pathways is 51.4%). The sensitivity of mortality prediction for representation in one or more HR pathways in the female forest is 37.7% (sensitivity for representation in two or more pathways is 30.0%). However, sensitivity rates for membership in one or more HR pathways in the forest are still lower than corresponding sensitivity rates in analyses using complete data.
The survival curves for individuals represented in varying numbers of HR pathways across trees in the male and female forests are depicted in Figs. 3 and 4, respectively. In general, the probability of death within 12 years was an increasing function of membership in increasing number of HR pathways in the forest in both males and females, similar to results seen in analyses using complete data.
1. Zhang H, Singer B (1999) Recursive Partitioning in the Health Sciences (Springer, New York).