Skip to main content
. 2024 Oct 7;14:23282. doi: 10.1038/s41598-024-72158-9

Fig. 1.

Fig. 1

Overview of the adolescent depression prediction framework. (a) The Avon Longitudinal Study of Parents and Children (ALSPAC) dataset is a long-term study spanning over two decades since the early 90 s in the Bristol, UK area, which includes features like questionnaires, hospital records, and lab samples of the child, mother, and her partner from the gestation stage through adolescence. We use this ALSPAC dataset to generate 6 derivative datasets, five for predicting depression at each target age (12, 13, 16, 17, and 18) and one for predicting depression diagnosis any time between ages 12–18, using features from the gestation stage to age 10. These are represented in the figure as Dep12, Dep13, Dep16, Dep17, Dep18, and Dep12-18. (b) The model selection pipeline selects the best combination of feature selection (FS), missing value imputation (MVI), outlier detection (OD), and binary classification (CLS) for each derived dataset. For each combination, we also performed hyperparameter tuning using a fivefold cross-validation optimized for F1-score. (c) Once the best model pipeline is selected for each dataset, we run the recursive feature elimination (RFE) to reduce the number of features while retaining the model performance. TPR stands for true positive rate and FPR stands for false positive rate.