Skip to main content
. 2019 Jan 29;9:42. doi: 10.1038/s41398-019-0396-7

Fig. 1. Advanced Recursive Partitioning Analysis (ARPA) for the Paisa sample.

Fig. 1

a Derived Classification and Regression Tree (CART) for SUD status as categorical target variable (disjunctive affection status, i.e., substance use of either alcohol, or nicotine, or other drugs). Only founder individuals were included in the analysis to avoid kinship relatedness bias. Class 0 (unaffected) is indicated in red and class 1 (affected) in blue. This derived tree for the Paisa sample included demographic (age), clinical (conduct disorder (CD)), and genetic variables (markers rs5010235 and rs4860437). The T allele of the rs4860437 variant (node 4) generates a highly discriminant split in combination with age (45.5 years) to terminal node 3 of ADHD individuals without CD (see root node 1). b Variable importance scores derived by Random Forest and TreeNet analysis were compatible with the variables included in the tree derived by CART. c, d TreeNet analysis to maximize the ROC area and minimize the classification error using 200 trees. The areas under the ROC curve (AUC) were 0.954 and 0.87 for learning and testing samples (blue and red curves and values, respectively), while the proportions of misclassification for SUD cases in the cross-validation experiment were 0.124 and 0.177 for learning and testing data sets, respectively