Skip to main content
. 2022 Oct 8;15:17562864221129380. doi: 10.1177/17562864221129380

Figure 2.

Figure 2.

The machine learning model process. The VISTA data set first went through data pre-processing which included missing data imputation using 5-nearest neighbor model, data cleansing and normalization. For the categorized variables, the one hot encoding was used to cover all the possibilities, and for the continuous type of features, Z score normalization was applied. Then, the VISTA set went through imbalanced processing by using synthetic minority oversampling technique (SMOTE) technique. The SMOTE technique is an oversampling approach that creates synthetic minority class samples. It potentially performs better than simple oversampling and it is widely used. This process generated parameters, and the training data set was used to evaluate the accuracy of the model. In the end, our model was also external validated in an independent Chinese cohort.

sICH, symptomatic intracerebral hemorrhage.