Table 1.
Advantages and disadvantages of supervised learning and unsupervised learning methods.
| Advantages | Disadvantages | |
|---|---|---|
| Supervised learning | ||
| Regression analysis (25, 26, 58–61) | 1. Simple modeling and strong interpretability 2. Independent of parameter adjustment, the same model and data can usually calculate the unique result |
1. Sensitive to missing and abnormal values 2. Logistic regression is difficult to deal with nonlinear problems 3. It is difficult to deal with multicollinearity problems 4. There is risk of over fitting 5. Sensitive to unbalanced data |
| Decision tree (28–30, 62–66) | 1. The model is highly interpretable 2. The model can be described graphically 3. Insensitive to continuous and discrete data 4. It can process multi classification data 5. Insensitive to missing and abnormal values 6. It does not depend on background knowledge and can be modeled directly |
1. The decision tree without pruning has the risk of over fitting 2. Sensitive to unbalanced data 3. The performance of the model is generally weaker than that of ensemble learning and regression analysis |
| SVM (31) | 1. It has complete theoretical support, especially suitable for small sample research 2. The computational complexity depends on the support vector, which avoids the disaster of dimensionality to a certain extent 3. A few support vectors determine the final result, reducing the impact of miscellaneous samples on the model 4. Insensitive to outliers |
1. It is difficult to train on big data samples 2. It is difficult to solve the multi classification problem 3. The model depends on parameter selection |
| ANN (38, 39, 67) | 1. Strong nonlinear mapping ability 2. Have the ability to associate input information, self-learning and adaptive ability 3. Have a strong ability to distinguish training samples 4. Convolution algorithm can recognize imaging data well |
1. The model has the risk of over fitting 2. Large amount of model calculation 3. Complex imaging data analysis |
| Ensemble learning (40–42, 44, 48) | 1. The performance of the model is improved to a certain extent compared with the weak classifier 2. Insensitive to outliers 3. High performance on large samples 4. It can deal with nonlinear problems 5. Random forest is not sensitive to unbalanced data 6. Little possibility of over fitting |
1. The model is difficult to explain, and there is a black box problem 2. Normalization is required 3. Some models are sensitive to missing values |
| Unsupervised learning | ||
| Association rule (49, 50) | 1. The algorithm principle is simple and easy to implement 2. It is not restricted by dependent variables, and the association between data can be found in big data |
1. There are many output rules and a lot of useless information |
| Clustering (53, 54, 61) | 1. The principle is relatively simple, the implementation is also very easy, and the convergence speed is fast 2. Be able to handle big data problems 3. Strong interpretability |
1. The model is sensitive to outliers 2. The model is sensitive to unbalanced data 3. Local optimal solutions are often obtained |
| Dimensionality reduction (55, 56) | The model is fast, simple and effective | Poor interpretability of the model |