Skip to main content
. 2023 Nov 28;10:58. doi: 10.1186/s40779-023-00490-8

Table 5.

Advantages and disadvantages of the most common algorithms used in LTBI differential diagnostic models

Algorithm Advantages Disadvantages
Support vector machine (SVM)

Good generalization ability for high dimensional and nonlinear problems;

Can adapt to different data types by selecting different kernel functions;

Performs well with a small amount of data

Long training time for large-scale datasets;

Challenging to select the appropriate kernel function and parameters for noisy data and nonlinear problems;

Does not provide direct probability estimates

Decision trees

Easy to understand and interpret;

Can handle nonlinear features and large-scale data;

Suitable for both classification and regression problems;

Minimal data preprocessing is required

Prone to overfitting, especially with deep trees;

Performs poorly with continuous and highly correlated features

Random forest

High accuracy;

Can handle high-dimensional and large-scale datasets;

Robust to noise and missing data;

Provides feature importance estimation

A more complex model with longer training time;

Substantial memory consumption for datasets with large feature spaces;

Less effective for highly correlated features

Logistic regression

Simple and fast computation;

Interpretable parameter weights to understand feature importance;

Suitable for binary classification problems

Performs poorly with nonlinear relationships in the data;

Prone to underfitting;

May not perform well with high-dimensional data or highly correlated features

Hierarchical clustering

No need to specify the number of clusters in advance;

Provides a hierarchical structure of clusters;

Works with numerical and categorical data;

Allows for visual analysis through dendrograms

High computational complexity;

Difficulty with high-dimensional data;

Restrictions on data types;

Irreversible clustering results