. 2023 Feb 3;10:1050255. doi: 10.3389/fmed.2023.1050255

Table 1.

Description of machine learning algorithms.

Concept	Type	Description	Advantages	Disadvantages
Supervised learning	Supervised learning is a computational method where we give the algorithm a dataset and given the correct answer, the machine goes through the data to learn the correct answer.
Linear regression	Regression	■ The simplest regression method that uses a linear equation (y = m ^* x + b) to model the dataset.	Fast modeling speedSimple calculation variable interpretation can be provided based on the coefficients	It is necessary to first determine whether the relationship between variables is linear Does not fit non-linear data well
Logistic regression	Classification	■ Logistic regression is an algorithm that estimates the probability of an event based on one or more inputs and is more commonly used in classification problems.	■ Less time-consuming and faster classification calculation ■ Intuitive observation sample probability scores ■ Not affected by multicollinearity and can be combined with L2 regularization to solve the problem ■ Low computational cost, easy to understand and implement	■ Computational performance degrades when the feature space is large ■ Easy to underfit, generally not very accurate ■ Does not handle large number of class features well ■ Conversion is required for non-linear features
Naive Bayes	Classification	■ A probabilistic classifier, which is a classification method based on Bayes' theorem and the assumption of the conditional independence of features.	■ Stable classification efficiency ■ Fast speed for large volume training and queries ■ Performs well on small data sizes, can handle multiple classification tasks, and is suitable for incremental training ■ Less sensitive to missing data and simpler algorithms	■ Need to calculate the probability prior ■ High error rate in classification decisions ■ Sensitive to the form of expression of the input data
Decision trees	Classification	■ An algorithm for solving classification problems. The decision tree algorithm uses a tree structure and uses layers of inference to achieve the final classification.	■ Decision trees are easy to understand and interpret and can be analyzed visually ■ Can handle both nominal and numeric data ■ More suitable for handling samples with missing attributes ■ Able to handle unrelated features ■ Runs relatively fast when testing datasets ■ Provides reliable and effective results for large data sources in a relatively short period of time	■ Prone to overfitting ■ Easy to ignore the interconnectedness of attributes in a dataset ■ For data with inconsistent sample sizes in each category, different decision criteria lead to different attribute selection tendencies when attribute classification is performed by decision trees
Random forest	Classification	■ The random forest method is an integrated learning method containing multiple decision trees for classification, regression and other tasks.	■ High-dimensional (many features) data can be computed without dimensionality reduction and without feature selection ■ The importance of features can be judged ■ The interaction between different features can be judged ■ Not easily overfitted ■ Training is faster, and it is easy to use parallel methods ■ Can balance the error of unbalanced datasets ■ Accuracy can be maintained if a large portion of the features are missing	■ Overfitting on some noisy classification or regression problems ■ For data with attributes that have different values, attributes with more value divisions will have a greater impact on the random forest
Support vector machines	Classification	■ Support vector machines is a class of generalized linear classifiers that perform binary classification of data in a supervised learning manner, and whose decision boundary is the maximum margin hyperplane solved for the learned samples.	■ Can solve high-dimensional problems, i.e., large feature spaces ■ Able to handle interactions of non-linear features ■ No local minimal value problem ■ Stronger generalization ability	■ Not very efficient when the observation sample is large ■ There is no universal solution for non-linear problems, and sometimes it is difficult to find a suitable kernel function ■ Weak explanatory power for high-dimensional mapping of kernel functions, especially radial basis functions ■ Conventional algorithms only support binary classification
Gradient boosting decision tree	Classification	■ Boosted trees use additive models and forward stepwise algorithms to implement the optimization process of learning, which is also one of the integrated learning methods.	■ High accuracy can be obtained with relatively little tuning time ■ Flexible handling of various types of data, including continuous and discrete values, for a wide range of uses ■ Some robust loss functions can be used, which are more robust to outliers	■ Dependencies between weak learners make it difficult to train data in parallel
Adaptive boosting	Classification	■ Adaptive boosting (AdaBoost) belongs to one of the boosting categories in the ensemble method. It is a binary classification model. As an iterative algorithm, the core idea is to train different classifiers (weak classifiers) for the same	■ Flexible in the use of various regression classification models to build weak learners ■ The implementation of AdaBoost in Sklearn is based on a weighted learning perspective, which is simple and easy to understand ■ Controls the number of iterations to prevent overfitting to some extent	■ Sensitive to anomalous samples, which may receive higher weights in the iterations, affecting the final prediction accuracy
		training set, and then aggregate these weak classifiers to form a stronger final classifier (strong classifier).
Extreme gradient boosting	Classification/Regression	■ Extreme gradient boosting (XGBoost) is also a type of integration algorithm as a boosted tree model, which is a combination of many tree models together to form a very strong classifier. In addition, the tree model used is the CART regression tree model.	■ Compared to other machine learning libraries, users can easily use XGBoost and obtain satisfactory results ■ Fast and effective in processing large-scale datasets and does not require large amounts of hardware resources such as memory ■ Compared to deep learning models, the effect is similar without fine-tuning parameters ■ XGBoost internally implements a boosted tree model, which can automatically handle missing values	■ Compared with the deep learning model, it is unable to model spatiotemporal location and capture high-dimensional data such as image, voice and text well ■ Deep learning is far more accurate than XGBoost when it has a large amount of training data and can find a suitable deep learning model
Light gradient boosting machine	Classification/Regression	■ Light gradient boosting machine (LightGBM) is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms. It can be used for sorting, classification, regression and many other machine learning tasks.	■ Faster training speed and higher efficiency ■ Lower memory footprint ■ Higher accuracy than any other enhancement algorithm ■ Compared to XGBoost, it is also capable of handling big data due to its reduced training time ■ Supports parallel learning	■ The computation process may grow deeper decision trees, thus creating overfitting ■ Since LightGBM is a bias-based algorithm, it is more sensitive to noise ■ In finding the optimal solution, it is based on the optimal cut variables and does not take into account the idea that the optimal solution is a combination of all features
Categorical boosting	Classification/Regression	■ Categorical boosting (CatBoost) is a GBDT framework with fewer parameters, support for categorical variables and high accuracy based on oblivious trees as the base learner implementation.	■ Can rival any advanced machine learning algorithm in terms of performance ■ Reduces the need for much hyperparameter tuning and reduces the chance of overfitting, which also makes the model more generalizable ■ Can handle categorical and numerical features and supports custom loss functions	■ The processing of category-based features requires a great deal of memory and time ■ The setting of different random numbers has certain influences on the prediction results of the model
Generalized additive model	Classification	■ The generalized additive model is an interpretable model that uses the sum of the unary and binary shape functions of the predictor variables to explain the response variables.	■ Non-linear functions can be introduced ■ Because it is “additive”, the hypothesis testing method of the linear model can still be used	■ Because of the “additive” assumption, important interactions may be missing from the model and can only be compensated for by manually adding interaction terms
K-nearest neighbor	Classification/Regression	■ The core idea of the algorithm is that if a sample belongs to a class in which most of the k most adjacent samples in the feature space belong to that class, then that sample also belongs to that class and has the characteristics of the samples in that class.	■ The theory is mature and the idea is simple and can be used for both classification and regression ■ Can be used for non-linear classification ■ No assumptions about the data, has high accuracy, and ■ is an online technique where new data can be added directly to the dataset without retraining	■ The treatment of unbalanced samples is less effective, and the prediction bias is relatively large ■ Requires a lot of memory ■ More computationally intensive for datasets with large sample sizes ■ Each time the classification is performed again, a global operation is performed, which is computationally intensive ■ There is no theory for the choice of k-value size, which is often chosen in conjunction with k-fold cross-validation to obtain the optimal k-value
Artificial neural networks	Classification/Regression	■ Artificial neural networks are broadly similar to unusually complex networks composed of neurons, which are individual units connected to each other, and each unit has a numerical amount of inputs and outputs, which can be in the form of real numbers or linear combinatorial functions.	■ High accuracy of classification ■ High parallel distribution processing capability ■ Distributed storage and high learning capacity ■ Strong robustness and fault tolerance to noisy nerves ■ Able to fully approximate complex non-linear relationships with associative memory function	■ Neural networks require a large number of parameters, such as network topology, initial values of weights and thresholds ■ The inability to observe the internal learning process and the difficulty in interpreting the output can affect the credibility and acceptability of the results ■ The study time is too long and may not even achieve the purpose of learning
Unsupervised learning	In unsupervised learning, there is no “right answer” for a given dataset, all data are the same. The task of unsupervised learning is to uncover the underlying structure from a given dataset.
Principal component analysis	Classification	■ A set of potentially correlated variables is transformed into a set of linearly uncorrelated	■ Reduces the computational overhead of the algorithm	■ Eigenvalue decomposition has some limitations, e.g., the transformed matrix must be a square matrix
		variables by an orthogonal transformation, and the transformed set of variables is called the principal component.	■ Noise removal ■ Makes the results easy to understand ■ Completely parameter free	■ In the case of a non-Gaussian distribution, the resulting principal element may not be optimal
K-means clustering	Classification	■ K-means clustering is a technique for clustering data into a specified number of classes to reveal the intrinsic properties and patterns of the data.	■ For large datasets, k-means clustering is efficient ■ The computational complexity is close to linear	■ The algorithm is affected by the initial values and outlier points, and the results are unstable every time; usually the results are not the global optimal solution but the local optimal solution ■ Cannot solve the case of relatively large differences in data cluster distribution very well ■ Not truly applicable to discrete data ■ K-value based on artificial selection lacks objectivity
Other special learning algorithms	Reinforcement learning is closer to the nature of biological learning and therefore promises higher intelligence. It is concerned with how an intelligent body can adopt a set of behaviors in its environment in order to obtain the maximum cumulative reward.
	Deep learning is the process of learning the intrinsic laws and levels of the representation of sample data, and the information obtained from these learning processes can be of great help in the interpretation of data such as text, images and sound.
Reinforcement learning	Classification	■ Reinforcement learning is used to describe and address the problem of learning strategies by which an intelligent body learns to maximize reward or achieve a specific goal during its interaction with the environment.	■ Ability to model sequential decision problems ■ Training does not require labeled data ■ There are good theoretical guarantees, and the main algorithms have corresponding convergence proofs	■ Feedback is delayed, not generated immediately ■ Delayed reward
Deep learning	Classification	■ The concept of deep learning originates from the study of artificial neural networks, and a multilayer perceptron with multiple hidden layers is a deep learning structure that mainly includes convolutional neural networks, deep neural networks and recurrent neural networks.	■ Strong learning ability ■ Wide coverage and good adaptability ■ Data-driven, high ceiling ■ Excellent portability	■ Application scenarios that can only provide a limited amount of data do not allow deep learning algorithms to provide an unbiased estimation of the patterns of the data ■ To achieve good accuracy, big data support is needed ■ As the complexity of graph models in deep learning leads to a dramatic increase in the time complexity of the algorithm, higher parallel programming skills and more and better hardware support are needed to ensure the real-time performance of the algorithm