Table 1.
Approach | Supervision | Machine Learning Model | Description |
---|---|---|---|
Deep Learning |
Supervised | Convolutional Neural Network | Mostly used for classification and segmentation. It includes wide range of model architectures such as ResNet, VGG-net, and AlexNet. |
Mask-region-based Convolutional Neural Network | CNN type primarily employed for detecting objects in input images. | ||
YOLO | CNN types primarily employed for image segmentation or classification. | ||
U-net | Type of CNN mainly used for image segmentation. | ||
Gated recurrent unit | Type of recurrent neural network tailored for modeling time dependent data to address long-range dependencies in sequential data. | ||
Long short-term memory (LSTM) | |||
Vision transformer | Novel category of CNNs. Adopts transformer architecture commonly used in NLP and shows high performance in image classification benchmarks. | ||
Unsupervised | Convolutional Deep Belief Network (CDBN) | Type of deep generative models that is constructed by stacking max-pooling Convolutional Restricted Boltzmann Machines (CRBMs). |
|
Autoencoder | A type of neural network that specializes in learning to convert data into a compact and efficient representation, often employed for the purpose of dimensionality reduction. | ||
Traditional | Supervised | k-nearest neighbors | Assigns class labels or values according to the distance of the input data to the k-nearest neighbors in the training data. |
Binary Tree | Decision-making algorithm that navigates the tree from root to leaf to make decisions based on specific features or attributes. | ||
Naïve Bayes | Probabilistic machine learning algorithm that classifies data based on the conditional independence between every pair of features. | ||
Support vector machine (SVM) | It uses the kernel trick to find a linear decision boundary to separate input data in the transformed space. | ||
Fuzzy Inference System | A computational model that uses fuzzy logic to perform reasoning on uncertain or imprecise information. | ||
Fisher’s linear discriminant analysis | Classifies input data based on linear combination of features that represent items in each class. | ||
Linear Mixed Model | An extension of simple linear models that allow fixed and random effects, useful for complex data | ||
Logistic/linear regression | This is a statistical model that uses the logistic function to predict the probability of a specific class. | ||
Supervised and Unsupervised |
Random forest | Ensemble learning method that comprises multiple trees trained on random subsets of data. The final prediction is aggregated from all trees. | |
Neural network | It is a conventional machine learning model employed for classification and regression. In comparison to existing deep methods, it exhibits lower accuracy. | ||
Singular Value Decomposition (SVD) | It decomposes the input feature space into 3 generic and familiar matrices. | ||
Unsupervised | Fuzzy C-means | A computational model that uses fuzzy logic to perform reasoning on uncertain or imprecise information | |
Gaussian Mixture Model Segmentation | It uses Gaussian distribution to partition pixels into similar segments |
Table 1 outlines the types of machine learning models used in CV algorithm development. As computer vision has progressed over the years, the use of deep supervised models has increased. This innovation includes the use of the transformers and autoencoders listed above.