Table 2.
Machine Learning Method | Description | Usage | Strengths | Weaknesses |
---|---|---|---|---|
Linear regression | Estimates the linear relationship between dependent and independent variables. | Regression | Simple model to implement and understand. | Outliers can affect the regression. Assumes independence between attributes. Not a complete description of relationships among variables. |
Logistic regression | Sigmoid function to assign a probability for an event. | Classification | Simple model. Makes no assumptions about distributions. Measures the predictor’s coefficient size and its direction of association. Interpret model coefficients as indicators of feature importance. |
Less suitable for complex situations. Assumption of linearity between the dependent and independent variables. Can only be used to predict discrete functions. Cannot solve non-linear problems because it has a linear decision surface. |
Decision trees | A flowchart-like tree structure that splits the training data into subsets based on the values of the attributes until a stopping criterion is met. | Classification and regression | Simple to understand and interpret. Deals with unbalanced data. Variable Selection—can identify the most significant variables and the relation between variables. Handles missing values. Non-parametric nature—keeps the model simple and less prone to significant errors. |
Overfitting. Sensitive to small variations and alterations in the input data that can drastically change the structure of the decision tree. Biased learning—without proper parameter tuning, decision trees can create bias if some classes dominate. |
Random forest | Combination of many overfitted algorithm-generated deep decision trees outputs in order to deal with the bias and overfitting of a single decision tree. | Classification and regression | Reduced risk of overfitting Flexibility—can handle both regression and classification. Can determine feature importance. |
Time-consuming process. Requires more resources. More complex model to interpret. |
Boosting | A strong classifier model built by a series of weak classifiers in order to decrease the error. Each weak classifier tries to correct the errors present in the previous classifier. This continues till the training dataset is predicted correctly or the maximum number of models are added. Gradient Boosting—boosting technique that builds a final model from the sum of several weak learning algorithms that were trained on the same dataset (numerical or categorical data). XGBoost (v2.0.3) —a regularized version of the t gradient boosting technique. Outperforms the standard gradient boosting method in speed, and the dataset can contain both numerical and categorical variables. |
Classification and regression | Improved Accuracy—reduced risk for bias. Reduce the risk of overfitting—reweighting the inputs that are classified wrongly. Better handling of imbalanced data—focusing more on the data points that are misclassified. Better Interpretability—breaking the model decision process into multiple processes. |
Vulnerable to the outliers. Difficult to use boosting algorithms for real-time applications. Computationally expensive for large datasets. |
K-nearest neighbors | The algorithm places new, unclassified data near its K-nearest neighbors in a field of labeled data points. | Classification | Interpretable results since it relies on proximity calculations. Simple method. |
No learning steps. Does not identify the most relevant features to place new data—influenced by noise. Choosing the right K. Computing and time-consuming. |
Neural networks | Model that mimics the complex functions of the human brain—activation of a group of neurons from one neural layer activates other neurons in the next layer until the output layer gives the final interpretation of the model. | Classification | Adaptability—the model can adapt to new situations and learn from data. Pattern recognition—excel in audio and image identification, as well as natural language processing. Parallel processing—can process numerous processes at once, improving computational efficiency. Non-linearity—can use non-linear activation functions in order to model and comprehend complicated data. |
Computational intensity—training demands a lot of computing power. Black box Nature—difficult to understand how decisions were made. Overfitting. Large training datasets. |
Support vector machines | Algorithm used for linear or nonlinear classification or regression. Algorithms find the maximum separating hyperplane in an N-dimensional space between the different classes available in the target feature. |
Classification and regression | Perform well with high-dimensional data. Require less memory and use it effectively. Perform well when there is a large gap between classes. |
Long training period— not practical for large datasets. Inability to handle overlapping classes and noise. Poorly performed when the number of features for each data point is greater than the number of training data samples. |