Skip to main content
. 2022 Mar 7;130(3):037004. doi: 10.1289/EHP9752

Figure 1.

Figure 1 is a schematic algorithm depicting the three-stage stacked deep ensemble machine learning method framework. The framework is divided into three stages. Stage 1: The data includes lowercase italic n records. The data split into lowercase italic k fords for Cross-validation analysis, including training and testing. The training data lead to lowercase italic m base-learners, including, Support Vector Machine, Random Forest, extreme Gradient Boosting, and Gradient Boosting Machine. The lowercase italic m base-learners and lowercase italic l original features lead to uppercase italic z begin subscript 1 end subscript, which equals lowercase italic n times open parenthesis lowercase italic m close parenthesis predictions. Stage 2: Uppercase italic z begin subscript 1 end subscript predictions created in stage 1 lead to lowercase italic h meta-learners, including Random Forest, extreme Gradient Boosting, and Generalized Linear Model. The lowercase italic h meta-learners lead to Uppercase italic z begin subscript 2 end subscript, which equals lowercase italic n times lowercase italic h predictions. Stage 3: Uppercase italic z begin subscript 2 end subscript predictions with weighted by nonnegative least squares algorithm lead to the final optimal weighted prediction.

The framework of the DEML algorithm. Z1 is a matrix with n rows and m columns, which is the combination of PM2.5 predictions for each base model; l represents the original features; h denotes the number of meta-models; Z2 is a matrix with n row and h columns, which is the combination of PM2.5 predictions for each meta model. We finally get Z2 as the input to obtain the weights of the meta models by using the NNLS algorithm and get the final PM2.5 prediction; k is the number of folds for CV, and we select the same valid rows for the base and meta models; n is the number of records of all data; m denotes the number of base models. Note: CV, cross-validation analysis; DEML, the three-stage stacked deep ensemble machine learning method; GBM, gradient boosting machine; GLM, generalized linear model; NNLS, nonnegative least squares algorithm; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.