Recognition of Conus species using a combined approach of supervised learning and deep learning-based feature extraction

Noshaba Qasmi; Rimsha Bibi; Sajid Rashid

doi:10.1371/journal.pone.0313329

. 2024 Dec 9;19(12):e0313329. doi: 10.1371/journal.pone.0313329

Recognition of Conus species using a combined approach of supervised learning and deep learning-based feature extraction

Noshaba Qasmi ¹, Rimsha Bibi ¹, Sajid Rashid ^1,^*

Editor: Ramada Rateb Khasawneh²

PMCID: PMC11627371 PMID: 39652613

Abstract

Cone snails are venomous marine gastropods comprising more than 950 species widely distributed across different habitats. Their conical shells are remarkably similar to those of other invertebrates in terms of color, pattern, and size. For these reasons, assigning taxonomic signatures to cone snail shells is a challenging task. In this report, we propose an ensemble learning strategy based on the combination of Random Forest (RF) and XGBoost (XGB) methods. We used 47,600 cone shell images of uniform size (224 x 224 pixels), which were split into an 80:20 train-test ratio. Prior to performing subsequent operations, these images were subjected to pre-processing and transformation. After applying a deep learning approach (Visual Geometry Group with a 16-layer deep model architecture) for feature extraction, model specificity was further assessed by including multiple related and unrelated seashell images. Both classifiers demonstrated comparable recognition ability on random test samples. The evaluation results suggested that RF outperformed XGB due to its high accuracy in recognizing Conus species, with an average precision of 95.78%. The area under the receiver operating characteristic curve was 0.99, indicating the model’s optimal performance. The learning and validation curves also demonstrated a robust fit, with the training score reaching 1 and the validation score gradually increasing to 95 as more data was provided. These values indicate a well-trained model that generalizes effectively to validation data without significant overfitting. The gradual improvement in the validation score curve is crucial for ensuring model reliability and minimizing the risk of overfitting. Our findings revealed an interactive visualization. The performance of our proposed model suggests its potential for use with datasets of other mollusks, and optimal results may be achieved for their categorization and taxonomical characterization.

Introduction

Conus Linnaeus is a large genus of gastropods that has been well-preserved in fossil records since its first appearance about 55 million years ago in the Lower Eocene. Cone snails are major predators in tropical reef communities [1, 2]. Their venom contains a diverse array of small peptides (conotoxins) that target neuromuscular receptors and are extensively utilized in drug development [3–5]. Taxonomic classification of the highly similar cone shell patterns is challenging due to variations in size, color, and geographical distribution. In particular, some Conus species exhibit nearly identical morphological characteristics, making identification difficult and requiring researchers to spend more time on differential analysis. To address these challenges, there is a pressing need to develop more sophisticated computational algorithms or models to automate Conus species recognition and streamline taxonomic classification.

In recent years, due to technological advancements, artificial intelligence (AI) and machine learning (ML) models have emerged as ideal solutions for image recognition [6]. ML algorithms are routinely used to perform various tasks, including pulmonary embolism segmentation via computed tomographic (CT) angiography [7], polyp detection through virtual colonoscopy or CT during colon cancer diagnosis [8], breast cancer detection through mammography [9], brain tumor segmentation using magnetic resonance (MR) imaging [10], and the detection of brain cognitive states through functional MR imaging for diagnosing neurological disorders [11, 12]. ML techniques, such as feature selection and classification, have become crucial for the accurate and automatic diagnosis and prognosis of various brain diseases [13, 14]. For instance, Ronneberger et al. utilized a Convolutional Neural Network (CNN) and data augmentation techniques, achieving promising results by training on an image dataset [15]. Ke et al. proposed a method to enhance the spatial distribution of hue, saturation, and brightness in X-ray images (as image descriptors) to identify unhealthy lung tissues using Artificial Neural Network-based heuristic algorithms [16]. Jaiswal et al. employed Mask-Region-based CNN, a deep neural network approach, which utilizes both global and local features for pulmonary image segmentation, combined with image augmentation, dropout, and L2 regularization for pneumonia identification [17]. Wozniak and Połap simulated the X-ray image inspection process to identify infected tissue locations [18].

Hu et al. used gene eigenvalues and MRI imaging, together with a genetic-weighted random forest (RF) model, to identify key genetic and imaging biomarkers for diagnosis and personalized treatment [19]. Jing et al. applied RF to optical sensors for foreign object debris detection, crucial for aerospace safety [20]. Chen et al. optimized chemical exchange saturation transfer MRI by analyzing frequency contributions using a permuted RF model [21]. Wang and Zhou improved soil organic matter estimates by combining multitemporal Sentinel-2A imaging with RF to benefit agricultural practices [22]. Matese et al. highlighted the role of unmanned aerial vehicle-based hyperspectral imaging in advancing crop health monitoring and management [23]. Barrett et al. emphasized the importance of predictive models in early Huntington’s disease intervention [24]. Waldo-Benitez et al. demonstrated ML’s impact on enhancing glioblastoma diagnosis and treatment planning through MRI analysis [25]. Huang et al. showed how stacked models improve wheat quality control using hyperspectral imaging [26]. Feng et al. emphasized the need for accurate plume injection height measurements to improve smoke exposure estimates during Australian wildfires [27]. Grandremy et al. provided insights into zooplankton monitoring through advanced imaging in a 16-year Bay of Biscay study [28]. Nobrega et al. applied deep transfer learning to classify lung nodule malignancy [29]. Philips and Abdulla proposed a method for detecting honey adulteration using hyperspectral imaging and ML, enhancing classification models with a feature-smoothing technique [30]. Tao et al. demonstrated the benefits of combining hyperspectral imaging and ML for municipal solid waste characterization, significantly improving material identification and sorting efficiency by capturing detailed spectral information [31].

ML strategies, together with advancements in AI, have been employed in the early detection of diseases through the accurate interpretation of chest X-rays [32]. Similarly, the use of these innovations is accelerating in other areas. A valuable addition of deep learning in image recognition facilitates aircraft target recognition, enabling air defense systems to quickly determine the target category of an acquired aircraft image and automatically estimate countermeasures, potentially saving significant reaction time and reducing combat risks [33]. In this study, we propose an automated method for identifying Conus species using a cohesive ML algorithm framework through feature-assisted training on imaging datasets. Additionally, by designing a local database, this study may serve as a basis for cataloging cone snail species, including their sequence information and family-wise distribution.

Materials and methods

Data collection

The image dataset of 119 Conus species was obtained from the ConoServer database [34]. Our proposed methodology is illustrated in the flowchart (Fig 1).

Fig 1 — A) Image preprocessing. B) Image transformation. C) Image quality analysis of preprocessed images. D) Background removal by obtaining the largest contour followed by masking. E) Conversion of species labels into numerical values using a label encoder. F) Feature extraction using three different steps: **Fi)** Color moments in different orders based on color distribution. **Fii)** Texture information using local binary patterns. **Fiii)** Additional texture information using Haralick texture features. G) Deep feature extraction using VGG16. H) Training data comprising 80% of the dataset. I) Testing data consisting of 20% of the entire dataset. J) Optimization of hyperparameter tuning. K) Algorithm selection from all models. L) Random forest selection. M) Model testing. N) Model validation using different methods.

Image preprocessing

Initially, each image file format (JPG, JPEG, or PNG) and size was checked for uniformity. The Pillow library was used to resize the images to a standard size of 224 x 224 pixels. Next, cvtColor was applied to find contours, and the images were converted to grayscale to remove background noise. A Canny filter was used to compute edge strength, utilizing linear filtering with a Gaussian kernel to smooth out noise [35]. The edges were then overlayed on the original RGB images. All images were processed through these steps and stored in a local folder.

We also applied some pre-processing to each highlighted image. First, using cv2.COLOR_BGR2GRAY, we converted the image to grayscale. Gaussian blur was applied to remove noise from each image, and the images were normalized for enhancement. We used the Canny and Sobel functions [36] with a kernel size of 5 to detect edges in each image. The original images of Conus ammiralis, Conus ebraeus, and Conus anabathrum, along with the binary and Canny edge-detected images, are shown in Fig 2. These species exhibit specific patterns and shapes (pointed or round). In Conus ammiralis, few patterns are separated by filled brown areas with varying distances, while in the case of Conus ebraeus, the patterns are more pronounced, making it easily distinguishable from other species. In contrast, Conus anabathrum contains a line pattern at the pointed end.

Fig 2 — A) Original image of the *Conus ammiralis* shell, B) Highlighted enhanced image, C) Binary image, D) Canny edge-highlighted image, E) Enhanced edge-highlighted image. **F-J**) *Conus ebraeus* with enhanced, highlighted, binary, edge detected and respective enhanced images, respectively. **K-O)** *Conus anabathrum* with all respective images.

Image transformation

Image transformation was performed on each pre-processed image, with the total number set to 400. We initialized the ImageDataGenerator [37] using various parameters, such as width shift range, height shift range, zoom range, and shear range, all set to 0.2. Subsequently, we modified the rotation range to 30 degrees, set the horizontal flip to ’True,’ and used ’nearest’ for the fill mode. Each transformed image was stored in a unique folder. For each transformation, we applied a random transformation with a size of 224 x 224 pixels. Image transformation was cross-validated before further processing. In total, we obtained 47,600 transformed images. The original Conus andremenezi and its transformed images are shown in Fig 3, along with a detailed description of each image, highlighting distinct height, width, and pixel count.

Fig 3 — A) The original image of the *Conus andremenezi* shell and its dimension details are indicated in pink color. B-E) Its four transformed images, with pixel sizes ranging from 3,859 to 4,462, have different shell sizes (width × height). Each transformed image and its details are mentioned in their respective colors.

Proposed methodology

The next step was to check the image quality, and all images below the standard were removed. Noisy backgrounds were eliminated, and the cvtColor module was used to convert the images to grayscale, followed by the application of a threshold to segment the background and obtain the largest contour. A mask was applied to remove the background. Later, we combined all these images into a list and used a label encoder to encode each cone snail species label as a numerical value.

Color moments and local binary patterning

Subsequently, color moments of different orders were calculated for each channel, revealing color distribution and variation. The local binary pattern (LBP) texture feature was computed for each grayscale image to extract texture information. LBP works by measuring the intensity levels of neighboring and central pixels, forming a binary number [38]. The threshold is obtained by comparing the neighborhood pixel g_p with the center pixel g_c. This operator yields a binary value of 1 if g_p is larger than g_c and 0 otherwise. The final form of the LBP is represented in decimal value. The features extracted by the LBP operator are displayed in a histogram. This operation can be expressed as:

{L B P}_{P R} = \sum_{p = 0}^{p - 1} s (g_{p} - g_{c}) 2^{p}, s (x) = {(\frac{1, x \geq 0}{0, x < 0})}

(1)

After the thresholding stage, a histogram was developed on the LBP values. With a neighborhood of P = 24P = 24P = 24 and R = 3R = 3R = 3, a 256-bin histogram represents the image features. The mathematical representation of the LBP histogram is denoted by [39]:

H (k) = \sum_{i = 1}^{I} \sum_{j = 1}^{J} f ({L B P}_{P R} (i, j) k), k \in [0, K], w h e r e f (x) = {\frac{1, x = y}{0, o t h e r w i s e}}

(2)

Haralick texture feature extraction

Next, we computed feature extraction through a method proposed by Haralick, named as spatial gray-level dependence method (SGLDM). These features are routinely used for diagnosis purposes and Alzheimer’s disease diagnosis by MR images [40]. For quantifying the texture through SGLDM, 13 features were calculated in each phase. These features were extracted from the co-occurrence matrix, which represents an estimate of the second-order probability function C (i; j| x; y). This matrix represented the occurrence rate of a pixel pair with gray levels i and j, given the distances between the pixels were x and y in the x and y directions, respectively [41]. The elements of the matrix were calculated by:

C (i, j | Δ_{x} Δ_{y}) = \frac{N o . o f (x, y) f o r w h i c h I (x, y) = i, I (x + Δ_{x}, y + Δ_{x}) = j a n d b o t h (x, y) a n d (x + Δ_{x}, y + Δ_{y}) a r e w i t h i n t h e R O I}{N o . o f (x, y) f o r w h i c h b o t h (x, y) a n d (x + Δ_{x}, y + Δ_{y}) a r e w i t h i n t h e R O I}

(3)

The Haralick texture features were computed using the Haralick function, which included texture information such as contrast, correlation, and entropy in the image. In the next step, we concatenated these three features as trained features.

Visual Geometry Group with 16-layer deep model architecture

The Visual Geometry Group with 16-layer deep model architecture (VGG16) [42] was used for extracting deep features that were utilized in a pre-trained deep learning model. It included 16 layers, comprising 13 convolutional layers and 3 fully connected layers. VGG16 employed a small 3x3 kernel (filter) on all convolutional layers with a single stride. Max pooling layers always followed the convolutional layers. The input for VGG16 was fixed at 224 x 224 three-channel images. In VGG16, the three fully connected layers exhibited different depths. The first two layers contained a similar channel size of 4096, while the last fully connected layer had a channel size of 1000, representing the number of class labels in the ImageNet dataset. The output layer was the softmax layer, which is responsible for providing the probability of the input image [43]. We added deep features to the feature vector by horizontally stacking the deep and trained features.

Random Forest

The RF classifier was used due to its ideal prediction capabilities, stability, and high accuracy rate compared to a single decision tree. RF is a powerful ensemble and supervised learning method, characterized by balanced bias, minimal hyperparameter input, reduced variance, and minimized risk of overfitting in both classification and regression tasks. These features make RF an invaluable tool for prediction, modeling, and data analysis across various domains. The RF algorithm performs better with larger datasets and accelerates the decision-making process through a higher number of trees [44]. RF is an extension of the Classification and Regression Tree (CART) method, employing bagging (bootstrap aggregation) and voting to determine classification results. It consists of k classification trees, and its basic idea is to convert multiple weak classifiers into one strong classifier. The number of generated bootstrap samples determines the number of trees in the model. After the bootstrap method, each tree (bootstrap sample) is formed using the following rules: If there are M input variables, the number of m predictor variables at each node satisfies m≤Mm. The variable m is chosen randomly from M. The selection of the best predictor variable from m is determined by calculating the measure of purity (Gini or entropy). The Gini index G_gini (D) is used to decide the optimal binary cut point for each feature. G_gini represents the uncertainty of the set D. In the classification problem, suppose there are N classes; for a given set of samples D, the G_gini index is:

G_{g i n i} (D) = 1 - \sum_{n = 1}^{N} {(\frac{| C_{n} |}{D})}^{2}

(4)

where Cn is the subset of samples in D that belong to the nth class [45]. If a sample set D is divided into two parts, D1 and D2, according to the value of feature A,

D_{1} = {(x, y) \in D | A (x) = a}, D_{2} = D - D_{1}

(5)

The best split on m is used to separate the nodes. The amount of m is kept constant during the growth of forests. Each tree is formed to the maximum extent without pruning. The final result of RF is the optimal result chosen by voting on all classification trees [45]. The best predictor variable provides more decision-making information. More tree formation and their usage in the decision-making process yield more robust result [46].

Next, data was divided into training (80%) and testing or validation data (20%), about 38,080 and 9,520 images out of 47,600, respectively. As a result, we extracted X_Train, X_Valid, Y_Train, and Y_Valid for further optimization of hyperparameters [47]. Enhancing the RF algorithm’s ability is crucial for extracting high-quality features and optimizing parameter selection. This can significantly help reduce the model’s generalization error and improve the RF algorithm’s classification accuracy. We used 100 trees or estimators and a minimum sample split of 2 for splitting the internal nodes.

The model was then trained and evaluated by fitting X_Train and Y_Train and by predicting the model by X_Valid.

XGBoost

Tree-based gradient boosting integrated model XGBoost (XGB) [48], is composed of multiple classification regression trees (CART) that acquire the residual value through the sum of target and predicted values based on the prior decision trees. Upon training of all decision trees, they reach a consensus and finally compute the prediction result through the accumulation of samples from the previous findings. Every new tree in the XGB model training phase is trained using the previously trained tree as a model, and once a decision tree has been generated, it is stripped to avoid overfitting. The XGB model trains the obtained error to minimize the overall error. The input from each tree is utilized to train the subsequent tree again to progressively minimize the prediction error and gradually drive the model’s predicted value closer to reality. The prediction model for XGB can be represented as:

y_{i} = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(6)

Where x_i and y_i are training samples. x represents the eigenvector, y represents the sample label, and f_k(xi) represents the kth decision tree. The corresponding objective function is defined as follows [49]:

O b j (O) = \sum_{i = 1}^{n} L (y_{i}, {y_{i}}^{'}) + \sum_{k = 1}^{K} Ω (f_{k})

(7)

The objective function Obj(O) is divided into two parts: the regularization term, which reduces the chances of a model demonstrating overfitting, and the loss function, which indicates a specific objective to evaluate the accuracy of the model’s prediction. The function is as follows:

Ω (f) = γ T + \frac{1}{2} λ {| | ω | |}^{2}

(8)

Where γ is the leaf node coefficient, its goal is to optimize and modify the objective function using XGBoost, similar to a pre-pruning operation (i.e., γT regulates the tree’s complexity; the higher the value, the higher the objective function value, which subsequently suppresses the model’s complexity). The leaf node weight percentage is regulated by the full L2 regularization term, and λ, the coefficient of the squared mode of L2, prevents overfitting. The objective function is gradient boosting decision tree (GBDT) if the regularization term has a value of 0 [50].

This model lessens the chance of overfitting by including regularization elements in the objective function. It utilizes both the first and second derivatives to enhance the accuracy of the loss function and customize the loss. We used the ‘Extreme Gradient Boosting’ classifier of the XGB library by specifying the evaluation metric to measure cross-entropy loss (which is a multi-class logarithmic loss) and avoid any deprecation issues in the disabled labels.

Confusion matrix

The performance of the chosen strategy was determined by a confusion matrix, which showed the number of correct and incorrect predictions made by the model as compared to the actual data [51, 52]. The confusion matrix comprises four components: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The following metrics evaluate the performance of a classification model on a dataset:

Precision = TP / (TP + FP)

Recall (Sensitivity) = TP / (TP + FN)

F1-score = 2 * (Precision * Recall) / (Precision + Recall)

Other analyses, including bar plots and histogram generation, were performed to check the proportion and prediction results through the classification report of the desired RF model. The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) [53] is a performance metric for binary classification problems. The AUC-ROC value ranges from 0 to 1, where a higher value indicates better performance. A curve closer to the top-left corner represents a better model. It was plotted to estimate the true positive rate (sensitivity) against the false positive rate (specificity) at various threshold settings.

Results

Cone snail shell image processing

Conus species exhibit diverse characteristics in terms of shell shape, size, color, and localization. The differentiation characteristics, including mean intensity, intensity standard deviation, edge pixel number, mean key point, vary significantly among Conus species (Table 1). In particular, images obtained from different sources need to be processed for color variation, background noise removal, pixel adjustment, and color intensity correction. To accurately process shell images, we scaled the RGB (red, green, and blue) intensity in the image. The average predicted RGB values were 70.23, 88.12, and 107.98 for R, G and B, respectively (S1 Fig). These values were distinct for each image, which largely facilitated enhancing model efficiency.

Table 1. Statistical analysis of raw images of Conus species before preprocessing.

Size (S), mean intensity (MI), intensity standard deviation (ISD), number of edge pixels (NEP), and mean key point size (MKS) are presented in different columns.

Specie name	Size	MI	ISD	NEP	MKS
*Conus abbreviatus*	126 x 196	114.900551	85.24146037	2070	3.855107131
*Conus achatinus*	234 x 469	90.1004501	72.62897095	17287	3.738486035
*Conus adamsonii*	166 x 309	81.5929543	64.52867715	10219	3.394759074
*Conus amadis*	137 x 283	89.7953625	81.18837988	7880	3.242136133
*Conus ammiralis*	147 x 266	101.342642	90.00906917	7293	3.558139329
*Conus anabathrum*	113 x 236	102.825859	91.64173238	2811	5.372164498
*Conus andremenezi*	130 x 306	77.2646053	78.52747464	4856	4.783157641
*Conus anemone*	140 x 333	102.90532	87.34408651	6528	5.295740278
*Conus araneosus*	190 x 344	95.4301561	89.64749384	10002	4.042105765
*Conus archon*	173 x 325	78.0636372	76.09148881	6231	3.903034503
*Conus arenatus*	150 x 258	127.868966	88.99116882	5384	3.248078797
*Conus aristophanes*	125 x 209	115.918813	84.83766222	3292	3.536915887
*Conus asiaticus*	160 x 303	91.0680693	93.87201536	5579	3.533550901
*Conus ateralbus*	147 x 251	65.5442177	66.74975933	7071	3.789925593
*Conus aulicus*	127 x 305	95.9198916	75.00209789	6079	3.877136884
*Conus aurisiacus*	172 x 309	99.8640777	80.66912726	5803	3.930534717
*Conus austini*	167 x 318	82.4746545	77.49197812	4362	3.392523493
*Conus australis*	115 x 306	94.9256323	82.35290923	5929	3.427619775
*Conus bandanus*	646 x 1202	83.4092946	78.0616978	44765	7.77255379
*Conus bayani*	114 x 227	67.7127676	71.5729914	2925	4.299335957
*Conus betulinus*	224 x 335	101.927159	81.59713297	7407	3.679199442
*Conus brunneus*	154 x 191	66.2547766	62.9423698	6141	3.567691536
*Conus bullatus*	114 x 219	107.642193	67.27182937	5300	3.453630916
*Conus californicus*	462 x 846	80.9327341	69.53473463	15680	5.947179261
*Conus capitaneus*	169 x 252	80.4169954	65.4770939	7079	3.501728312
*Conus caracteristicus*	163 x 225	102.626667	82.49999629	5106	3.704006016
*Conus catus*	135 x 240	90.7333025	71.40495938	6002	3.917673782
*Conus cervus*	136 x 274	100.219381	76.16191665	6241	3.530832996
*Conus chiangi*	153 x 264	85.9632601	76.05567255	5952	3.216992084
*Conus circumcisus*	116 x 279	109.326319	72.72633786	5200	4.345783836
*Conus consors*	141 x 299	86.4395266	67.08034587	2973	6.904867876
*Conus coronatus*	83 x 133	97.6019567	81.97972403	2303	3.625967436
*Conus dalli*	157 x 267	93.0402681	77.21764877	8964	3.322457316
*Conus delessertii*	161 x 307	83.4159063	81.82363518	5977	5.091297852
*Conus diadema*	194 x 307	89.0702173	73.76674884	7057	4.010664793
*Conus distans*	89 x 160	105.306812	85.94422289	2571	4.76245108
*Conus ebraeus*	209 x 311	73.9946615	79.35979636	5273	5.99292686
*Conus eburneus*	222 x 349	92.4428612	88.25812038	8119	5.578738826
*Conus emaciatus*	251 x 405	81.1571197	60.17887854	4533	5.640602514
*Conus episcopatus*	150 x 320	91.8544583	80.46877076	9376	3.610899895
*Conus ermineus*	185 x 329	88.6190421	76.6154946	6962	4.545134057
*Conus ferrugineus*	210 x 416	83.9429831	70.89388376	7075	5.904191236
*Conus figulinus*	282 x 407	80.4968547	71.85676559	16639	3.399929217
*Conus flavidus*	170 x 295	97.7092921	74.24410961	4067	3.978458209
*Conus floridulus*	667 x 1131	87.5026585	80.04923969	14478	7.446896809
*Conus frigidus*	156 x 265	103.462821	72.74093208	4527	3.876202816
*Conus fulmen*	196 x 357	83.4306723	72.73174627	2611	6.047156509
*Conus gauguini*	89 x 163	103.28214	76.99108538	2067	4.176049745
*Conus generalis*	135 x 287	98.3426249	85.07910231	2625	4.553597675
*Conus geographus*	76 x 178	69.1569338	59.76345165	3261	4.091072835
*Conus gladiator*	168 x 249	82.6303548	72.62570646	5498	4.632142848
*Conus gloriamaris*	119 x 343	80.1448661	70.72475205	10025	3.040092381
*Conus imperialis*	82 x 156	73.2795497	77.08575888	3229	3.322255486
*Conus inscriptus*	161 x 330	102.096951	88.50804845	6676	4.308585652
*Conus judaeus*	186 x 311	80.9460291	91.12290436	5677	5.282902826
*Conus kinoshitai*	133 x 306	109.052312	87.59792953	4673	4.002651231
*Conus kintoki*	171 x 362	110.7324	82.19116586	3015	3.805836274
*Conus leopardus*	120 x 211	100.795616	79.70195154	5007	3.217252134
*Conus limpusi*	166 x 335	80.9515015	66.41763157	2640	5.178752613
*Conus litteratus*	91 x 156	128.705128	101.7086228	2609	2.857512904
*Conus lividus*	137 x 249	88.3807639	77.07639339	2551	4.272186609
*Conus longurionis*	116 x 351	79.9143089	72.36931846	5730	4.66598781
*Conus loroisii*	172 x 273	54.6943948	46.1643957	9412	3.114201716
*Conus lynceus*	174 x 386	106.663123	82.14609781	8848	5.149299075
*Conus magnificus*	116 x 261	111.673438	84.23736161	7044	3.127516587
*Conus magus*	279 x 582	102.056319	77.61301005	21916	5.131758487
*Conus marmoreus*	464 x 987	71.0377253	75.10760805	35820	8.258521537
*Conus memiae*	210 x 350	78.1916871	85.53722763	9499	5.492338902
*Conus miles*	136 x 207	74.5656081	76.81322467	5401	3.088816641
*Conus miliaris*	180 x 296	90.4191254	74.03838125	8054	3.592617067
*Conus milneedwardsi*	69 x 223	87.232274	80.6165129	2860	3.355763269
*Conus monachus*	226 x 424	118.874885	87.78246128	11436	4.074395915
*Conus moncuri*	195 x 342	83.8184885	75.0833367	7482	4.94113918
*Conus monile*	161 x 337	87.6115709	82.27396067	5175	4.915662615
*Conus mus*	84 x 150	95.6694444	76.5681594	3144	3.750667921
*Conus mustelinus*	149 x 272	92.0070322	77.31240476	5208	4.039098181
*Conus natalis*	157 x 318	74.0708449	67.42437807	10205	4.775390739
*Conus nigropunctatus*	127 x 216	92.907699	73.07321996	4902	4.230405607
*Conus nux*	194 x 332	77.2799186	71.9322076	4320	7.542459114
*Conus obscurus*	70 x 160	74.8146429	52.24022396	2667	3.338338166
*Conus omaria*	86 x 194	107.138576	65.92997092	4084	3.085670003
*Conus parius*	182 x 303	100.761923	82.87203435	1756	4.551220399
*Conus pennaceus*	104 x 174	78.9077697	86.20071699	2587	4.316248887
*Conus pergrandis*	136 x 344	76.2034456	77.06110854	5116	3.972704224
*Conus pictus*	182 x 340	77.9745637	73.67581946	7533	4.965593014
*Conus planorbis*	113 x 207	75.9901672	69.28186747	4616	3.446087527
*Conus princeps*	155 x 273	110.268746	91.49685066	5370	4.075615161
*Conus profundineocaledonicus*	155 x 333	87.7985857	74.28470619	1865	6.221098957
*Conus purpurascens*	554 x 932	64.4792845	59.42204983	57556	4.533820502
*Conus quercinus*	160 x 272	106.95347	78.62236341	1603	10.07662979
*Conus radiatus*	114 x 244	81.7528401	61.54902861	2578	3.766189418
*Conus rattus*	185 x 298	85.1469617	71.41448917	7632	4.262023336
*Conus regius*	146 x 261	89.9245263	76.46770287	7479	3.76844333
*Conus regularis*	134 x 285	86.8535219	78.17037418	5597	4.388107317
*Conus rolani*	151 x 300	106.889382	82.66212069	4221	4.337546096
*Conus sanguinolentus*	153 x 262	91.726987	73.29265179	2704	5.807804724
*Conus sponsalis*	304 x 381	85.6163835	83.54710968	7368	5.747592142
*Conus spulicarius*	216 x 346	86.9485389	74.4378499	9807	5.313243719
*Conus spurius*	166 x 270	106.758188	82.90364202	3524	5.422410713
*Conus stercusmuscarum*	113 x 236	111.163154	77.06938388	4015	3.110592977
*Conus striatus*	135 x 306	109.730864	80.91233496	6460	4.263692126
*Conus striolatus*	149 x 268	90.6919764	74.36637035	7842	4.19279689
*Conus sulcatus*	150 x 266	87.610802	73.93412992	7816	3.80518956
*Conus sulturatus*	109 x 175	123.898768	81.81499109	735	10.90305368
*Conus terebra*	102 x 237	104.010176	80.97356016	1960	4.921096532
*Conus tessulatus*	163 x 252	86.5140715	76.07680221	4052	5.229228191
*Conus textile*	114 x 228	88.8001693	75.83345613	6716	2.816846265
*Conus tinianus*	99 x 192	104.217119	77.25270464	2544	4.685287444
*Conus tulipa*	115 x 228	105.702021	65.36394178	6445	3.410149088
*Conus varius*	136 x 266	104.889761	82.3494734	3056	6.092001697
*Conus ventricosus*	158 x 277	93.4467395	81.42046926	9519	3.198309433
*Conus vexillum*	152 x 249	96.2798563	81.22803912	6762	4.179350178
*Conus victoriae*	86 x 183	66.5662727	67.5775656	3900	3.02614837
*Conus villepinii*	76 x 183	94.9417601	87.01109306	2081	4.106920018
*Conus virgo*	164 x 316	109.196955	84.3636481	1928	4.158706044
*Conus vitulinus*	146 x 282	93.9529049	78.04338245	4788	4.153326996
*Conus ximenes*	80 x 140	93.75125	81.05532005	2199	2.958279716
*Conus zeylanicus*	146 x 251	125.553376	92.91948417	6446	3.69964845
*Conus zonatus*	66 x 129	94.6779422	74.00836486	2014	3.002186416

Open in a new tab

The dataset of 47,600 images were split into 80% training and 20% testing data, resulting in 38,080 and 9,520 images. X_Train, X_Valid, Y_Train, and Y_Valid were extracted for hyperparameter optimization [54]. Enhancing the RF algorithm is crucial for extracting high-quality features and optimizing parameter selection. This may significantly help reduce the model’s generalization error and improve the RF algorithm’s classification accuracy. The model was trained and evaluated by fitting X_Train and Y_Train and by predicting with X_Valid.

Model validation

Next, we added more data to check the predictions for each search image as validation data. Among the 119 species, five species were wrongly predicted: Conus monile was predicted as Conus kintoki, Conus monachus was predicted instead of Conus virgo, Conus tinianus as Conus catus, Conus vitulinaus was predicted as Conus regularis and Conus flavidus was predicted as Conus betulinus. All other species were accurately predicted by the trained RF model, achieving a high accuracy rate (S1 Table). For these species, structural similarity index ranged from 0.33 to 0.99, which measures similarity between test and reference images by calculating variations in contrast, brightness, and edge information [55].

We included images of some species other than cone snails, such as Miter shells, Olive shells, Cypraea argus, Aulica imperialis, and Eloise Beach, along with Conus species Conus literatus, Conus asiaticus, and Conus ebraeus for further validation of our model (Fig 4). Training results revealed no irrelevant species due to feature differentiation. These shell images were ranked in the range of 27,674, 27,413, 27,584, 26,522, and 26,549, while Conus shells exhibited 27,143 features. Overall, the proposed model in this report is 95% efficient in cone snail species recognition through shell images.

Fig 4 — **A-C)** Conus species that are accurately recognized by our model as *Conus litteratus*, *Conus asiaticus*, and *Conus ebraeus*, respectively. **D-H)** Feature differentiation led to no species recognition in cases of *Aulica imperalis*, *Cypraea argus*, *Eloise beach*, *Miter shells*, and *Olive shells*.

Model performance assessment

Precision and recall analysis

The RF classification report indicated a significant proportion of TP predictions as compared to XGB. Multiple species exhibiting precision score values close to 1 demonstrated accurate predictions through the RF model. These species were categorized into three groups for better representation in bar plots (Fig 5). Among the 119 Cone snail species, group 1 contained 40 species, group 2 exhibited 39 species, and group 3 included 40 members.

Fig 5 — Bar plot illustrating precision and recall values for 119 Cone snail species categorized into three groups. A) Group 1 contains 40 species. B) Group 2 exhibits 39 species, while C) Group 3 comprises 40 members. In all plots, species names are presented on the X-axis, while the corresponding precision and recall rates obtained through RF and XGB models are indicated on the Y-axis. The dark blue and orange bars represent the respective values of precision and recall for each species by XGB, while the green and blue bars represent precision and recall values obtained by the RF model.

In group 1, nine members (Conus andremenezi, archon, aurisiacus, austini, bandanus, californicus, delessertii, diadema and episcopatus) exhibited RF precision scores of 0.98, 0.92, 0.87, 0.95, 0.98, 0.96, 0.89, 0.96, and 0.90, respectively. Group 2 comprised 15 members (ermineus, figulinus, floridulus, frigidus, fulmen, geographus, inscriptus, judaeus, lividus, magus, memiae, miles, miliaris, mustelinus and nux) demonstrating precision scores of 0.95, 0.99, 0.95, 0.93, 0.94, 0.93, 0.95, 0.96, 0.96, 0.92, 0.94, 0.96, 0.96, 0.96, 0.99 scores. In contrast, 10 species in group 3, including obscurus, pergrandis, pictus, planorbis, purpurascens, sponsalis, striolatus, sulcatus, varius, and ventricosus contained precision scores of 0.97, 0.91, 0.93, 0.95, 0.99, 0.94, 0.99, 0.96, 0.90, 0.95, respectively through the RF model. The minimum precision value (0.64) was observed for Conus consors.

Notably, Conus anabathrum, araneosus, kintoki, and sanguinolentus exhibited better precision scores using XGB. Nevertheless, the high proportions of TP predictions among actual positive instances underscored the effectiveness of the RF model. The presence of a high recall value (a measure of model quantity) further bolstered the model’s accuracy, with 24 species considered FN. Conus lividus exhibited a score of 0.8227. These 24 species were ammiralis, anabathrum, australis, bandanus, californicus, coronatus, dalli, episcopatus, fulmen, gloriamaris, imperialis, litteratus, loroisii, lynceus, marmoreus, miliaris, milneedwardsi, natalis, obscurus, parius, rattus, striolatus, sulcatus, zeylanicus. Out of these, 7 species were members of group 1, 11 were in group2, and 6 species were part of group 3. The recall scores for the XGB model ranged from 0.80–0.98 (Fig 5). The harmonic mean of precision and recall, known as the F1 score, ranged from 0.76 to 1 for the RF model. It balances precision and recall, serving as a single metric for evaluating model performance. The number of actual occurrences of each class in the dataset was captured by the support value. We focused on the RF model for further validation and evaluation results.

F1 score and support analysis

The F1 score (harmonic mean) ranged from 0.76 to 1 for the RF model, revealing a balanced performance between recall and precision. The class distribution was analyzed by examining the support, reflecting actual class occurrences. The F1 score and support plots demonstrated model performance across several classes. The model accurately predicted multiple classes with high F1 scores. Conus sanguinolentus was observed in the range of 0.82 to 0.83, while other species fell within the ranges of 0.85–0.88, 0.88–0.91, 0.91–0.94, 0.94–0.97, and 0.97–0.99, with counts of 6, 9, 17, 34, and 43 species, respectively. Eight species exhibited maximum scores, including Conus bandanus, californicus, episcopatus, and fulmen from group 1 (Fig 6A), while miliaris, obscurus, striolatus, and sulcatus belonged to group 2 (Fig 6B). Some classes with low F1 scores were also observed, such as Conus consors with a score of 0.76, indicating slightly poor prediction. Overall, these findings provided evidence that the model operated effectively with significant F1 score values.

Fig 6 — The line graphs indicate the performance scores for each class in the dataset. A) Group 1 contains 59 species (X-axis) against performance scores (Y-axis). B) Group 2 comprises 60 species (X-axis) with their respective F1 scores (Y-axis). The blue color indicates the F1-score values, showcasing the model’s accurate predictions for multiple classes with high F1 scores.

To comprehend class distribution, a support analysis was performed. The histogram indicated varying class numbers in terms of their distribution. Conus sulcatus exhibited a score in the range of 37 to 43. One, four, and seventeen species were noticed in the ranges of 55.3–61.4, 61.4–67.5, and 67.5–73.6, respectively. The number of species significantly increased to 72 for the range of 73.6–85.8. Finally, 13 and 11 species were observed with the highest range values of 85.8–91.9 and 91.9–98, respectively (Fig 7). Classes with high support values were well represented in the dataset, whereas those with low values were less common.

Fig 7 — It indicates the variation in species distribution patterns against score ranges obtained through the model classification report.

Confusion matrix

A confusion matrix revealed the instances where the RF model accurately predicted a positive class. The FPR is represented by a negative value. Confusion matrix analysis revealed 24 species with TPR values of 1, indicating accurate predictions. These species included Conus ammiralis, anabathrum, australis, bandanus, californicus, coronatus, dalli, episcopatus, fulmen, gloriamaris, imperialis, litteratus, loroisii, lynceus, marmoreus, miliaris, milneedwardsi, natalis, obscurus, parius, rattus, striolatus, sulcatus, and zeylanicus. The lowest TPR values were 0.8227 and 0.8292 for Conus tessulatus and Conus lividus, respectively. For FNR, values should be close to zero, indicating instances where the model incorrectly predicts a negative class as positive, while TNR denotes the correct prediction of the negative class. FNR values for all 24 species were zero. In contrast, Conus lividus and Conus tessulatus exhibited the highest FNR values of 0.177 and 0.171, respectively.

A deeper insight into the model’s performance was obtained using a heatmap. Fig 8 represents the macro average, average, and weighted average of recall, precision, and F1 scores based on the values obtained from the model. Due to the narrow range (0.955–0.958), color differences were minimal. Darker hues (purple) indicated somewhat lower values (0.955) for accuracy in F1 score, recall, precision, and weighted average of recall. In contrast, lighter hues indicated slightly higher values. These findings suggest that all metrics and classes contributed to consistent model performance. The highest weighted precision average was 0.958, indicating improved performance.

Model performance evaluation

To evaluate model performance, both training and validation scores were plotted (Fig 9A and 9B). The validation curve showed a high training score across the range of hyperparameters, suggesting that the model fit the training data very well. The validation score curve indicated that the model generalized well to unseen data for these hyperparameter values. Both training and validation scores were high and closely aligned, reflecting a good balance between bias and variance. This indicates that the model is well-performing and appropriately tuned, with strong generalization capabilities (Fig 9A).

In the learning curve, a training score close to 1 (or 100%) revealed that the model learned and fitted the training data effectively. The validation score stabilized at approximately 95%, indicating good generalization performance for new data. The small gap between training and validation scores suggests that the model’s complexity is appropriate for the given data, achieving a favorable balance between variance and bias (Fig 9B). The model is neither significantly overfitted, as it performs well on both training and validation datasets, nor underfitted (as both training and validation scores are low), making it a “Good Fit” model.

Next, we plotted a Precision-Recall (PR) curve, which shows precision against recall for different thresholds. A curve closer to the top-right corner indicates better model performance. The area under the PR curve serves as a single metric to assess overall performance (Fig 9C). Thus, the current model demonstrates favorable precision and recall values, indicating its accurate prediction ability.

Discussion

Identifying Conus species presents significant challenges due to the similarities in shell patterns among various mollusks. The classification of cone snail taxonomic features requires considerable effort because of variations in size, distinct color patterns, and geographical distributions. Here, we propose an automated strategy to identify cone snail species using a cohesive machine learning (ML) algorithm framework based on feature-assisted training of the Conus shell imaging dataset. Our proposed ML model achieved an accuracy of 95% with an 80:20 train-test data ratio, utilizing 38,080 and 9,520 cone snail shell images, respectively.

To ensure clear feature delineation and consistency, we implemented a preprocessing scheme that included grayscale conversion [56], binary image generation [57], image quality enhancement, and Canny edge detection [35, 58]. Edge detection is a crucial preprocessing step that enhances the visibility of key features for accurate identification [59]. This process refines image comparison and improves feature visibility by employing methods used in image recognition. Here, edge detection supports object segmentation and RF-based recognition, thereby strengthening overall performance [60]. Further preprocessing steps included background removal [61], quality checks, image transformation [62, 63], and feature extraction using Haralick features [41], deep features [42], color moments, and local binary patterns [39], which collectively enhanced the training dataset’s quality.

In this study, we utilized a conventional Local Binary Pattern (LBP) approach combined with additional features, significantly improving the recognition rate compared to LBP variants such as LBP Variance (LBPV) and Center Symmetric LBP (CS-LBP). The integration of these additional features addressed the limitations of conventional LBP and its derivatives. Faudzi and Yahya evaluated four LBP derivatives—conventional LBP, LBP Variance (LBPV), Center Symmetric LBP (CS-LBP), and Completed-LBP (CLBP)—under varying environmental conditions [39]. Their findings suggested that LBPV had a higher recognition rate, while CS-LBP excelled under contrast changes, highlighting that applying conventional LBP with additional features can yield better results.

Next, we employed a genetic algorithm for feature selection. Soltanian-Zadeh et al. utilized a comprehensive methodology to extract features from mammographic images using four distinct methods: shape features, Haralick features, wavelet features, and multi-wavelet features [41]. Our approach mirrored this strategy by leveraging a deep learning model (VGG16) for feature extraction, enabling automated learning of complex shell patterns [64]. Deep learning, particularly through convolutional neural networks like VGG16, facilitates hierarchical feature extraction from image objects [65–67]. For cone snail shell images, which exhibit subtle morphological differences [68], deep learning effectively captures fine details such as shell patterns and color gradients. Jaderberg et al. reported that deep learning techniques significantly enhance recognition accuracy for complex image targets [69]. In this study, we integrated Haralick features with additional features derived from the deep learning model, resulting in a robust and informative feature set that improved accuracy.

The model’s efficiency was cross-validated by including data from unrelated species, ensuring that features from other species differed significantly from those of Conus. The species support histogram (to assess the distribution of different species number ranges) demonstrated multiple species with high support values, positively contributing to model efficiency. Additionally, we generated a heatmap to depict the macro average, accuracy, weighted average for recall, precision, and F1 score, revealing the highest weighted precision average of 0.958, indicating improved performance. We observed minimal fluctuations in F1-score values across different species, with a value of 0.76 for Conus consors. The Structural Similarity Index Metric (SSIM) results ranged from 0.33 to 0.99, indicating varying levels of structural similarity among individual images. As reported by Zhou et al., SSIM can effectively assess structural similarity and serves as a reliable evaluation tool for image quality assessment [70]. These findings suggest that our proposed model recognized multiple species as positive instances, making it more reliable and scalable than manual feature extraction, particularly for handling large datasets (Fig 10).

Fig 10 — The blue bars represent the average accuracy rates of the model, displaying metrics such as recall, precision, F1 score, and ROC AUC value.

Among various classification models, the RF model demonstrated reliable results [44, 71], validating Conus species recognition. The RF approach incorporates random feature selection and serves as an effective tool for high-dimensional complex datasets, ensuring robust classification results [72, 73]. The effectiveness of the RF approach has been proven in various applications, including pattern recognition and species identification [74]. The novelty of our approach lies in integrating deep learning-based feature extraction with a supervised learning RF model. Deep learning captures nuanced details through the image dataset [75], while supervised learning optimizes classification accuracy [76], creating a robust and automated system capable of efficiently handling species recognition tasks.

A thorough analysis of learning and validation curves can inform model selection and parameter tuning. Goriya et al. focused on applying fine-tuned ResNet and DenseNet models for classifying choroidal neovascularization (CNV) from optical coherence tomography (OCT) images, demonstrating promising results with high accuracy and validation scores [77]. In our study, the DenseNet model achieved a validation accuracy of approximately 0.95, with both training and validation curves exhibiting similar trends. Specifically, our training accuracy reached 99%, while the validation accuracy gradually increased to 95% (Table 2). These values indicate a well-trained model that generalizes effectively to validation data without significant overfitting. This observation suggests that our model, similar to DenseNet, effectively captures the underlying patterns of cone snail shell images through accurate classification. The gradual improvement of the validation score curve is crucial for ensuring model reliability and minimizing the risk of overfitting [77]. The training accuracy of our proposed model resembles the learning curve reported for the RF model by Afuwape et al., which exhibited similar performance metrics [78]. Such similarities in learning curves reinforce the robustness of the RF algorithm in handling classification tasks.

Table 2. The statistical report for RF model evaluation.

	Precision	Recall	F1-Score	Support	TPR	FPR	FNR	TNR
mean	0.9583	0.9560	0.9560	79.6386	0.9560	0.0439	0.00011	0.9998
std	0.0572	0.0411	0.0405	8.5645	0.0410	0.0411	0.00015	0.00015
min	0.6346	0.8228	0.7674	37	0.8227	0	0	0.9991
25%	0.9426	0.9289	0.9341	74	0.9289	0.0117	0	0.9998
50%	0.9762	0.9714	0.9664	80	0.9714	0.0285	0.00007	0.9999
75%	1	0.9882	0.9864	84.5	0.9882	0.0710	0.00017	1
max	1	1	1	98	1	0.1772	0.00088	1

Open in a new tab

Conclusion

Overall, machine learning approaches, particularly the Random Forest model, are instrumental in the categorization of cone snail species and in distinguishing them from other marine invertebrates. The proposed RF model, tested on diverse datasets encompassing both cone snail and other mollusk shells, demonstrates its capability in effective pattern matching and decision-based ranking. This model could also be adapted to detect and classify various other mollusk species, showcasing its versatility and potential for broader applications in marine biology.

Supporting information

S1 Fig. Species distribution on the basis of RGB intensities.

Average predicted values were 70.23 for R, 88.12 for average G, and 107.98 for B. A) First 59 species (X-axis) with their respective RGB values (Y-axis). B) Last 60 species (X-axis) with their respective RGB values (Y-axis).

(TIF)

pone.0313329.s001.tif^{(491.5KB, tif)}

S1 Table. Specie prediction results.

Highlighted five rows indicate wrong predictions.

(DOCX)

pone.0313329.s002.docx^{(22.7KB, docx)}

Acknowledgments

The authors would like to thank members of the Functional Informatics Lab, National Center for Bioinformatics, QAU, Islamabad for their valuable support.

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This work has been supported by Higher Education Commission, Pakistan via grant No. 20-15051/NRPU/R&D/HEC/2021. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Olivera B. M. et al. , “Diversity of Conus Neuropeptides,” Science (80-.)., 1990, doi: 10.1126/science.2165278 [DOI] [PubMed] [Google Scholar]
2.Leviten P. J. and Kohn A. J., “Microhabitat Resource Use, Activity Patterns, and Episodic Catastrophe: Conus on Tropical Intertidal Reef Rock Benches,” Ecol. Monogr., 1980, doi: 10.2307/2937246 [DOI] [Google Scholar]
3.Livett B. G., Gayler K. R., and Khalil Z., “Drugs from the Sea: Conopeptides as Potential Therapeutics,” Curr. Med. Chem., 2012, doi: 10.2174/0929867043364928 [DOI] [PubMed] [Google Scholar]
4.Han T., Teichert R., Olivera B., and Bulaj G., “Conus Venoms—A Rich Source of Peptide-Based Therapeutics,” Curr. Pharm. Des., 2008, doi: 10.2174/138161208785777469 [DOI] [PubMed] [Google Scholar]
5.Carté B. K., “Biomedical potential of marine natural products,” Bioscience, 1996, doi: 10.2307/1312834 [DOI] [Google Scholar]
6.Shi S. et al. , “Estimation of Heavy Metal Content in Soil Based on Machine Learning Models,” Land, 2022, doi: 10.3390/land11071037 [DOI] [Google Scholar]
7.Schoepf U. J., Schneider A. C., Das M., Wood S. A., Cheema J. I., and Costello P., “Pulmonary embolism: Computer-aided detection at multidetector row spiral computed tomography,” J. Thorac. Imaging, 2007, doi: 10.1097/RTI.0b013e31815842a9 [DOI] [PubMed] [Google Scholar]
8.Yoshida H. and Näppi J., “CAD in CT colonography without and with oral contrast agents: Progress and challenges,” Comput. Med. Imaging Graph., 2007, doi: 10.1016/j.compmedimag.2007.02.011 [DOI] [PubMed] [Google Scholar]
9.Chan H. P., Lo S. C. B., Sahiner B., Lam K. L., and Helvie M. A., “Computer-aided detection of mammographic microcalcifications: Pattern recognition with an artificial neural network,” Med. Phys., 1995, doi: 10.1118/1.597428 [DOI] [PubMed] [Google Scholar]
10.Bauer S., Wiest R., Nolte L. P., and Reyes M., “A survey of MRI-based medical image analysis for brain tumor studies,” Physics in Medicine and Biology. 2013. doi: 10.1088/0031-9155/58/13/R97 [DOI] [PubMed] [Google Scholar]
11.Davatzikos C., Fan Y., Wu X., Shen D., and Resnick S. M., “Detection of prodromal Alzheimer’s disease via pattern classification of magnetic resonance imaging,” Neurobiol. Aging, 2008, doi: 10.1016/j.neurobiolaging.2006.11.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kim D., Burge J., Lane T., Pearlson G. D., Kiehl K. A., and Calhoun V. D., “Hybrid ICA-Bayesian network approach reveals distinct effective connectivity differences in schizophrenia,” Neuroimage, 2008, doi: 10.1016/j.neuroimage.2008.05.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Dimitriadis S. I. and Liparas D., “How random is the random forest? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer’s disease: From Alzheimer’s disease neuroimaging initiative (ADNI) database,” Neural Regeneration Research. 2018. doi: 10.4103/1673-5374.233433 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Van Ginneken B., Schaefer-Prokop C. M., and Prokop M., “Computer-aided diagnosis: How to move from the laboratory to the clinic,” Radiology. 2011. doi: 10.1148/radiol.11091710 [DOI] [PubMed] [Google Scholar]
15.Ronneberger O., Fischer P., and Brox T., “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015. doi: 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
16.Ke Q. et al. , “A neuro-heuristic approach for recognition of lung diseases from X-ray images,” Expert Syst. Appl., 2019, doi: 10.1016/j.eswa.2019.01.060 [DOI] [Google Scholar]
17.Jaiswal A. K., Tiwari P., Kumar S., Gupta D., Khanna A., and Rodrigues J. J. P. C., “Identifying pneumonia in chest X-rays: A deep learning approach,” Meas. J. Int. Meas. Confed., 2019, doi: 10.1016/j.measurement.2019.05.076 [DOI] [Google Scholar]
18.Woźniak M. and Połap D., “Bio-inspired methods modeled for respiratory disease detection from medical images,” Swarm Evol. Comput., 2018, doi: 10.1016/j.swevo.2018.01.008 [DOI] [Google Scholar]
19.Hu Z., Wang X., Meng L., Liu W., Wu F., and Meng X., “Detection of Association Features Based on Gene Eigenvalues and MRI Imaging Using Genetic Weighted Random Forest,” Genes (Basel)., 2022, doi: 10.3390/genes13122344 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Jing Y., Zheng H., Lin C., Zheng W., Dong K., and Li X., “Foreign Object Debris Detection for Optical Imaging Sensors Based on Random Forest,” Sensors, 2022, doi: 10.3390/s22072463 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Chen Y. et al. , “Frequency importance analysis for chemical exchange saturation transfer magnetic resonance imaging using permuted random forest,” NMR Biomed., 2023, doi: 10.1002/nbm.4744 [DOI] [PubMed] [Google Scholar]
22.Wang L. and Zhou Y., “Combining Multitemporal Sentinel-2A Spectral Imaging and Random Forest to Improve the Accuracy of Soil Organic Matter Estimates in the Plough Layer for Cultivated Land,” Agric., 2023, doi: 10.3390/agriculture13010008 [DOI] [Google Scholar]
23.Matese A., Prince Czarnecki J. M., Samiappan S., and Moorhead R., “Are unmanned aerial vehicle-based hyperspectral imaging and machine learning advancing crop science?,” Trends in Plant Science. 2024. doi: 10.1016/j.tplants.2023.09.001 [DOI] [PubMed] [Google Scholar]
24.Barrett M. J. et al. , “Optimizing Screening for Intrastriatal Interventions in Huntington’s Disease Using Predictive Models,” Mov. Disord., 2024, doi: 10.1002/mds.29749 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Waldo-Benítez G., Padierna L. C., Cerón P., and Sosa M. A., “Machine Learning in Magnetic Resonance Images of Glioblastoma: A Review,” Curr. Med. Imaging Rev., 2024, doi: 10.2174/0115734056265212231122102029 [DOI] [PubMed] [Google Scholar]
26.Huang Y. et al. , “Detection of wheat saccharification power and protein content using stacked models integrated with hyperspectral imaging,” J. Sci. Food Agric., 2024, doi: 10.1002/jsfa.13296 [DOI] [PubMed] [Google Scholar]
27.Feng X., Mickley L. J., Bell M. L., Liu T., Fisher J. A., and Val Martin M., “Improved estimates of smoke exposure during Australia fire seasons: importance of quantifying plume injection heights,” Atmos. Chem. Phys., 2024, doi: 10.5194/acp-24-2985-2024 [DOI] [Google Scholar]
28.Grandremy N. et al. , “Metazoan zooplankton in the Bay of Biscay: A 16-year record of individual sizes and abundances obtained using the ZooScan and ZooCAM imaging systems,” Earth Syst. Sci. Data, 2024, doi: 10.5194/essd-16-1265-2024 [DOI] [Google Scholar]
29.da Nóbrega R. V. M., Rebouças Filho P. P., Rodrigues M. B., da Silva S. P. P., Dourado Júnior C. M. J. M., and de Albuquerque V. H. C., “Lung nodule malignancy classification in chest computed tomography images using transfer learning and convolutional neural networks,” Neural Comput. Appl., 2020, doi: 10.1007/s00521-018-3895-1 [DOI] [Google Scholar]
30.Phillips T. and Abdulla W., “A new honey adulteration detection approach using hyperspectral imaging and machine learning,” Eur. Food Res. Technol., 2023, doi: 10.1007/s00217-022-04113-9 [DOI] [Google Scholar]
31.Tao J. et al. , “Combination of hyperspectral imaging and machine learning models for fast characterization and classification of municipal solid waste,” Resour. Conserv. Recycl., 2023, doi: 10.1016/j.resconrec.2022.106731 [DOI] [Google Scholar]
32.Zhang J. et al. , “Fully Automated Echocardiogram Interpretation in Clinical Practice,” Circulation, 2018, doi: 10.1161/CIRCULATIONAHA.118.034338 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Yang W., Zou T., Dai D., and Sun H., “Polarimetric SAR image classification using multifeatures combination and extremely randomized clustering forests,” EURASIP J. Adv. Signal Process., 2010, doi: 10.1155/2010/465612 [DOI] [Google Scholar]
34.Kaas Q., Yu R., Jin A. H., Dutertre S., and Craik D. J., “ConoServer: Updated content, knowledge, and discovery tools in the conopeptide database,” Nucleic Acids Res., 2012, doi: 10.1093/nar/gkr886 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Sekehravani E. A., Babulak E., and Masoodi M., “Implementing canny edge detection algorithm for noisy image,” Bull. Electr. Eng. Informatics, 2020, doi: 10.11591/eei.v9i4.1837 [DOI] [Google Scholar]
36.Lynn N. D., Sourav A. I., and Santoso A. J., “Implementation of Real-Time Edge Detection Using Canny and Sobel Algorithms,” IOP Conf. Ser. Mater. Sci. Eng., 2021, doi: 10.1088/1757-899x/1096/1/012079 [DOI] [Google Scholar]
37.Park E., Yang J., Yumer E., Ceylan D., and Berg A. C., “Transformation-grounded image generation network for novel 3D view synthesis,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.82 [DOI] [Google Scholar]
38.Ojala T., Pietikäinen M., and Mäenpää T., “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., 2002, doi: 10.1109/TPAMI.2002.1017623 [DOI] [Google Scholar]
39.Faudzi S. A. A. M. and Yahya N., “Evaluation of LBP-based face recognition techniques,” in 2014 5th International Conference on Intelligent and Advanced Systems: Technological Convergence for Sustainable Future, ICIAS 2014—Proceedings, 2014. doi: 10.1109/ICIAS.2014.6869522 [DOI] [Google Scholar]
40.Freeborough P. A. and Fox N. C., “MR image texture analysis applied to the diagnosis and tracking of alzheimer’s disease,” IEEE Trans. Med. Imaging, 1998, doi: 10.1109/42.712137 [DOI] [PubMed] [Google Scholar]
41.Soltanian-Zadeh H., Rafiee-Rad F., and Siamak Pourabdollah-Nejad D., “Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms,” Pattern Recognit., 2004, doi: 10.1016/j.patcog.2003.03.001 [DOI] [Google Scholar]
42.Chhabra M. and Kumar R., “An Advanced VGG16 Architecture-Based Deep Learning Model to Detect Pneumonia from Medical Images,” in Lecture Notes in Electrical Engineering, 2022. doi: 10.1007/978-981-16-8774-7_37 [DOI] [Google Scholar]
43.Albashish D., Al-Sayyed R., Abdullah A., Ryalat M. H., and Ahmad Almansour N., “Deep CNN Model based on VGG16 for Breast Cancer Classification,” in 2021 International Conference on Information Technology, ICIT 2021—Proceedings, 2021. doi: 10.1109/ICIT52682.2021.9491631 [DOI] [Google Scholar]
44.Quinlan J. R., “Induction of decision trees,” Mach. Learn., 1986, doi: 10.1007/bf00116251 [DOI] [Google Scholar]
45.Vrtkova A., “Predicting clinical status of patients after an acute ischemic stroke using random forests,” in Proceedings of the International Conference on Information and Digital Technologies, IDT 2017, 2017. doi: 10.1109/DT.2017.8024330 [DOI] [Google Scholar]
46.Xi E., “Image Classification and Recognition Based on Deep Learning and Random Forest Algorithm,” Wirel. Commun. Mob. Comput., 2022, doi: 10.1155/2022/2013181 [DOI] [Google Scholar]
47.Subudhi A., Dash M., and Sabut S., “Automated segmentation and classification of brain stroke using expectation-maximization and random forest classifier,” Biocybern. Biomed. Eng., 2020, doi: 10.1016/j.bbe.2019.04.004 [DOI] [Google Scholar]
48.Chen T. and Guestrin C., “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]
49.Dong C. et al. , “Non-contact screening system based for COVID-19 on XGBoost and logistic regression,” Comput. Biol. Med., 2022, doi: 10.1016/j.compbiomed.2021.105003 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Cui B., Ye Z., Zhao H., Renqing Z., Meng L., and Yang Y., “Used Car Price Prediction Based on the Iterative Framework of XGBoost+LightGBM,” Electron., 2022, doi: 10.3390/electronics11182932 [DOI] [Google Scholar]
51.Saragih G. S., Rustam Z., Aldila D., Hidayat R., Yunus R. E., and Pandelaki J., “Ischemic Stroke Classification using Random Forests Based on Feature Extraction of Convolutional Neural Networks,” Int. J. Adv. Sci. Eng. Inf. Technol., 2020, doi: 10.18517/ijaseit.10.5.13000 [DOI] [Google Scholar]
52.Düntsch I. and Gediga G., “Confusion Matrices and Rough Set Data Analysis,” in Journal of Physics: Conference Series, 2019. doi: 10.1088/1742-6596/1229/1/012055 [DOI] [Google Scholar]
53.de Hond A. A. H., Steyerberg E. W., and van Calster B., “Interpreting area under the receiver operating characteristic curve,” The Lancet Digital Health. 2022. doi: 10.1016/S2589-7500(22)00188-1 [DOI] [PubMed] [Google Scholar]
54.Bischl B. et al. , “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2023. doi: 10.1002/widm.1484 [DOI] [Google Scholar]
55.Bakurov I., Buzzelli M., Schettini R., Castelli M., and Vanneschi L., “Structural similarity index (SSIM) revisited: A data-driven approach,” Expert Syst. Appl., 2022, doi: 10.1016/j.eswa.2021.116087 [DOI] [Google Scholar]
56.Güneş A., Kalkan H., and Durmuş E., “Optimizing the color-to-grayscale conversion for image classification,” Signal, Image Video Process., 2016, doi: 10.1007/s11760-015-0828-7 [DOI] [Google Scholar]
57.Shapiro L. and Stockman G., “Binary Image Analysis,” Comput. Vis., 2000. [Google Scholar]
58.Sundani D., Widiyanto S., Karyanti Y., and Wardani D. T., “Identification of image edge using quantum canny edge detection algorithm,” J. ICT Res. Appl., 2019, doi: 10.5614/itbj.ict.res.appl.2019.13.2.4 [DOI] [Google Scholar]
59.Jing J., Liu S., Wang G., Zhang W., and Sun C., “Recent advances on image edge detection: A comprehensive review,” Neurocomputing, 2022, doi: 10.1016/j.neucom.2022.06.083 [DOI] [Google Scholar]
60.Azeem A., Sharif M., Raza M., and Murtaza M., “A survey: Face recognition techniques under partial occlusion,” Int. Arab J. Inf. Technol., 2014. [Google Scholar]
61.Zhang Y., Liu B., and Liang R., “Two-step phase-shifting algorithms with background removal and no background removal,” Optics and Lasers in Engineering. 2023. doi: 10.1016/j.optlaseng.2022.107327 [DOI] [Google Scholar]
62.Chen Y., Zhao Y., Jia W., Cao L., and Liu X., “Adversarial-learning-based image-to-image transformation: A survey,” Neurocomputing, 2020, doi: 10.1016/j.neucom.2020.06.067 [DOI] [Google Scholar]
63.Wang C., Xu C., Wang C., and Tao D., “Perceptual Adversarial Networks for Image-to-Image Transformation,” IEEE Trans. Image Process., 2018, doi: 10.1109/TIP.2018.2836316 [DOI] [PubMed] [Google Scholar]
64.Govindankutty Menon A. et al. , “A deep-learning automated image recognition method for measuring pore patterns in closely related bolivinids and calibration for quantitative nitrate paleo-reconstructions,” Sci. Rep., 2023, doi: 10.1038/s41598-023-46605-y [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Bakasa W. and Viriri S., “VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction,” J. Imaging, 2023, doi: 10.3390/jimaging9070138 [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Qi C., Zuo Y., Chen Z., and Chen K., “VGG16,” Nongye Jixie Xuebao/Transactions Chinese Soc. Agric. Mach., 2021. [Google Scholar]
67.Asriny D. M. and Jayadi R., “Transfer Learning VGG16 for Classification Orange Fruit Images,” J. Syst. Manag. Sci., 2023, doi: 10.33168/JSMS.2023.0112 [DOI] [Google Scholar]
68.Gefaell J., Galindo J., and Rolán-Alvarez E., “Shell color polymorphism in marine gastropods,” Evolutionary Applications. 2023. doi: 10.1111/eva.13416 [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Jaderberg M., Vedaldi A., and Zisserman A., “Deep features for text spotting,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. doi: 10.1007/978-3-319-10593-2_34 [DOI] [Google Scholar]
70.Wang Z., Bovik A. C., Sheikh H. R., and Simoncelli E. P., “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., 2004, doi: 10.1109/tip.2003.819861 [DOI] [PubMed] [Google Scholar]
71.Hussain M. A. et al. , “Classification of healthy and diseased retina using SD-OCT imaging and Random Forest algorithm,” PLoS One, 2018, doi: 10.1371/journal.pone.0198281 [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Tan K., Wang H., Chen L., Du Q., Du P., and Pan C., “Estimation of the spatial distribution of heavy metal in agricultural soils using airborne hyperspectral imaging and random forest,” J. Hazard. Mater., 2020, doi: 10.1016/j.jhazmat.2019.120987 [DOI] [PubMed] [Google Scholar]
73.Santos Pereira L. F., Barbon S., Valous N. A., and Barbin D. F., “Predicting the ripening of papaya fruit with digital imaging and random forests,” Comput. Electron. Agric., 2018, doi: 10.1016/j.compag.2017.12.029 [DOI] [Google Scholar]
74.Peng X. et al. , “Random Forest Based Optimal Feature Selection for Partial Discharge Pattern Recognition in HV Cables,” IEEE Trans. Power Deliv., 2019, doi: 10.1109/TPWRD.2019.2918316 [DOI] [Google Scholar]
75.Strack R., “Deep learning in imaging,” Nature Methods. 2019. doi: 10.1038/s41592-018-0267-9 [DOI] [PubMed] [Google Scholar]
76.Aljuaid A. and Anwar M., “Survey of Supervised Learning for Medical Image Processing,” SN Comput. Sci., 2022, doi: 10.1007/s42979-022-01166-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Goriya M., Amrutiya Z., Ghadiya A., Vasa J., and Patel B., “Classification of Choroidal Neovascularization (CNV) from Optical Coherence Tomography (OCT) Images Using Efficient Fine-Tuned ResNet and DenseNet Deep Learning Models,” in Lecture Notes in Networks and Systems, 2023. doi: 10.1007/978-981-99-3758-5_42 [DOI] [Google Scholar]
78.Afuwape A. A., Xu Y., Anajemba J. H., and Srivastava G., “Performance evaluation of secured network traffic classification using a machine learning approach,” Comput. Stand. Interfaces, 2021, doi: 10.1016/j.csi.2021.103545 [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0313329.r001

Decision Letter 0

Ramada Rateb Khasawneh

11 Sep 2024

PONE-D-24-36557Conus specie recognition through combinatory action of supervised learning and deep learning-based feature extractionPLOS ONE

Dear Dr. Rashid,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

the article discuses nice work but I have some comments:

the article need some English editing

also you need to discuss the literature in a depth and compare it to your work and show the novelty of your work

finally, please make sure that the text is following PLOSONE criteria

Good luck

Please submit your revised manuscript by Oct 26 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ramada Rateb Khasawneh

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

4. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

the article discuses nice work but I have some comments:

the article need some English editing

also you need to discuss the literature in a depth and compare it to your work and show the novelty of your work

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors proposed "Conus specie recognition through combinatory action of supervised learning and deep learning-based feature extraction". The structure of the article is well structured. But authors will consider following comments.

1.Proofread the entire manuscript.

2.Draw a Graphical abstract of the proposed approach.

3.Compare your approach with previously available existing approaches.

4.Explain the novelty of the proposed approach.

5.Explain why you choose the deep learning based feature extraction.

6. Gold standard dataset is available freely to Conus specie recognition?

7. What type of features played an important role to recognize Conus specie ?

8.compare your approach with previous traditional feature extraction approaches.

Reviewer #2: The manuscript can be accepted. I just would like to suggest you include the conclusion heading and try to provide the figures in as higher quality as you can because of the nature of this work. Also, I recommend authors to include some more models rather just being on very simplest model.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 9;19(12):e0313329. doi: 10.1371/journal.pone.0313329.r002

Author response to Decision Letter 0

2 Oct 2024

On behalf of all coauthors, I would like to thank you and both reviewers for the valuable feedback to enhance the quality of our manuscript entitled “Conus specie recognition through combinatory action of supervised learning and deep learning-based feature extraction”.

Reviewer #1

The authors proposed "Conus specie recognition through combinatory action of supervised learning and deep learning-based feature extraction". The article is well structured. But authors will consider the following comments.

1. Proofread the entire manuscript.

We have thoroughly reviewed the manuscript for any grammatical errors and sentence structure anomalies. All sections have been carefully proofread and issues have been resolved. We believe that the manuscript now meets the standards expected for publication in terms of readability. Some changes according to requirements for PLOS ONE:

• Removed the heading numbers, and changed heading 1 to 18, heading 2 to 16, heading 3 to 14.

• Changed in-text citation style from [1][2] to [1,2]. Same for all in-text citations.

• More literature review information related to random forest and imaging was added to the introduction.

• Changed Figure 1 to Fig 1 in both paragraph and caption (did this for all figures).

• Unbold the table reference in paragraph.

• Changed Figure S1 to S1 Fig (same for supplementary table).

• Changed the Fig 6 caption placement right below the paragraph.

• Added the reason for choosing the deep-learning model and the novelty of our approach in the discussion section.

• Add funding as separately according to the requirement.

2. Draw a Graphical abstract of the proposed approach.

We have included a graphical abstract to describe the details of a work plan.

3. Compare your approach with previously available existing approaches.

Prior to initiate this work, we have thoroughly overviewed previous approaches. Generally, these methods for Conus specie recognition rely on the traditional image processing techniques or features, which have limitations such as sensitivity to image noise and variability in specie appearance. In this study, we utilized hybrid approach that combines deep learning with traditional feature extraction methods followed by supervised learning, feature extraction automation, improved handling of interspecies variability, and noise resilience. These steps resulted in a more reliable and scalable model as compared to conventional methods.

4. Explain the novelty of the proposed approach.

The novelty of our approach lies in the integration of deep learning-based feature extraction with supervised learning. Deep learning captures nuanced details from images, while supervised learning optimizes classification accuracy. This combination allows a more robust and automated system which is capable of efficiently handling specie recognition task adaptable to other biological datasets.

5. Explain why you choose the deep learning based feature extraction.

Deep learning, particularly VGG16, is known to automatically extract hierarchical features through images. For Conus specie shells, which exhibit subtle morphological differences, deep learning-based feature extraction captures fine details such as shell patterns and color gradients. This method is more reliable and scalable than manual feature extraction, especially for large datasets.

6. Gold standard dataset is available freely to Conus specie recognition?

The dataset was mainly extracted from ConoServer database, which is freely available.

7. What type of features played an important role to recognize Conus species?

Critical features for accurate Conus specie classification included shell patterning, color variations, texture, surface details, and overall shell shape and size.

8. Compare approach with previous traditional feature extraction approaches.

Feature extraction refers to the process of transforming raw data into numerical features that can be processed while preserving the information in the original data set.

In discussion section, we addressed the rational for using deep learning-based feature extraction method and “Haralick” method. Haralick features are derived from the Gray Level Co-occurrence Matrix. This matrix records how many times two gray-level pixels adjacent to each other appear in an image. Then based on this matrix, Haralick proposes 13 values that can be extracted from the Gray Level Co-occurrence Matrix to quantify texture.

The traditional feature extraction approaches including Principal Component Analysis, Local Binary Patterns (LBP) and Natural Language Processing (NLP)-based techniques are not well defined as image-derived text data is an important source for NLP-based systems. The objects recognized in images (nouns) can be linked to actions (verbs) and attributes (adjectives) described in text, creating a more comprehensive understanding of a scene. The problems are associated with poor alignment of visual and textual data due to differences in data structure and representation. More sophisticated algorithms are required to cope with these issues. Similarly, we cannot use LBP for cone shell detection as it is sensitive to image noise which hinders its ability to accurately capture texture information. The LBP operator compares neighboring pixel intensities, and if there is noise in the image, it can result in incorrect binary values that can affect the resulting LBP histogram.

Reviewer #2

The manuscript can be accepted. I just would like to suggest you include the conclusion heading and try to provide the figures in as higher quality as you can because of the nature of this work. Also, I recommend authors to include some more models rather just being on very simplest model.

We are thankful to worthy reviewer for appreciation. As per your suggestion, we have added a "Conclusion" heading to the manuscript to clearly delineate our findings and their implications. Additionally, we have enhanced the quality of the figures to ensure they meet the standards.

For image recognition, there are numerous methods which have been proposed recently; however, not all models fit specific tasks due to nature of dataset. There is an issue of overfitting upon dealing with small dataset. The model performs well on the training data but then fails on test data, and lacks performance. The application of advanced neural network structures poses a limitation of their implementation upon architecture variation. Deep learning-based pre-trained models including VGG, ResNet, and Inception are considered as more accurate and efficient in computer vision; however, these models are relatively new and there are certain challenges associated to monitor their benefits.

We have carefully addressed the comments raised by worthy reviewers. We appreciate the opportunity to revise the manuscript and resubmit it for reconsideration.

Thank you for your valuable time.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0313329.s003.docx^{(21.5KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0313329.r003

Decision Letter 1

Ramada Rateb Khasawneh

8 Oct 2024

PONE-D-24-36557R1Conus specie recognition through combinatory action of supervised learning and deep learning-based feature extractionPLOS ONE

Dear Dr. Rashid,

Please submit your revised manuscript by Nov 22 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ramada Rateb Khasawneh

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

the article need some English editing

can you discuses the literature more ... and compare your finding with other finding

[Note: HTML markup is below. Please do not edit.]

PLoS One. 2024 Dec 9;19(12):e0313329. doi: 10.1371/journal.pone.0313329.r004

Author response to Decision Letter 1

16 Oct 2024

Editor

PLOS ONE

Dear Editor,

On behalf of all coauthors, I would like to thank you and both reviewers for the valuable feedback to enhance the quality of our manuscript entitled “Recognition of Conus Species Using a Combined Approach of Supervised Learning and Deep Learning-Based Feature Extraction”.

1. Proofread the entire manuscript.

2. Compare current approach with previous approaches.

We have carefully addressed the comments raised by worthy reviewers. We appreciate the opportunity to revise the manuscript and resubmit it for reconsideration.

Thank you for your valuable time.

Sincerely,

Sajid Rashid

sajid@qau.edu.pk

Attachment

Submitted filename: Response to Reviewers.docx

pone.0313329.s004.docx^{(16.6KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0313329.r005

Decision Letter 2

Ramada Rateb Khasawneh

23 Oct 2024

Recognition of Conus Species Using a Combined Approach of Supervised Learning and Deep Learning-Based Feature Extraction

PONE-D-24-36557R2

Dear Dr. Rashid,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Patrick Goymer

Staff Editor

PLOS ONE

on behalf of

Ramada Rateb Khasawneh

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0313329.r006

Acceptance letter

Ramada Rateb Khasawneh

29 Oct 2024

PONE-D-24-36557R2

PLOS ONE

Dear Dr. Rashid,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ramada Rateb Khasawneh

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Species distribution on the basis of RGB intensities.

(TIF)

pone.0313329.s001.tif^{(491.5KB, tif)}

S1 Table. Specie prediction results.

Highlighted five rows indicate wrong predictions.

(DOCX)

pone.0313329.s002.docx^{(22.7KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0313329.s003.docx^{(21.5KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0313329.s004.docx^{(16.6KB, docx)}

Data Availability Statement

All relevant data are within the paper and its Supporting Information files.

[pone.0313329.ref001] 1.Olivera B. M. et al. , “Diversity of Conus Neuropeptides,” Science (80-.)., 1990, doi: 10.1126/science.2165278 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref002] 2.Leviten P. J. and Kohn A. J., “Microhabitat Resource Use, Activity Patterns, and Episodic Catastrophe: Conus on Tropical Intertidal Reef Rock Benches,” Ecol. Monogr., 1980, doi: 10.2307/2937246 [DOI] [Google Scholar]

[pone.0313329.ref003] 3.Livett B. G., Gayler K. R., and Khalil Z., “Drugs from the Sea: Conopeptides as Potential Therapeutics,” Curr. Med. Chem., 2012, doi: 10.2174/0929867043364928 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref004] 4.Han T., Teichert R., Olivera B., and Bulaj G., “Conus Venoms—A Rich Source of Peptide-Based Therapeutics,” Curr. Pharm. Des., 2008, doi: 10.2174/138161208785777469 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref005] 5.Carté B. K., “Biomedical potential of marine natural products,” Bioscience, 1996, doi: 10.2307/1312834 [DOI] [Google Scholar]

[pone.0313329.ref006] 6.Shi S. et al. , “Estimation of Heavy Metal Content in Soil Based on Machine Learning Models,” Land, 2022, doi: 10.3390/land11071037 [DOI] [Google Scholar]

[pone.0313329.ref007] 7.Schoepf U. J., Schneider A. C., Das M., Wood S. A., Cheema J. I., and Costello P., “Pulmonary embolism: Computer-aided detection at multidetector row spiral computed tomography,” J. Thorac. Imaging, 2007, doi: 10.1097/RTI.0b013e31815842a9 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref008] 8.Yoshida H. and Näppi J., “CAD in CT colonography without and with oral contrast agents: Progress and challenges,” Comput. Med. Imaging Graph., 2007, doi: 10.1016/j.compmedimag.2007.02.011 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref009] 9.Chan H. P., Lo S. C. B., Sahiner B., Lam K. L., and Helvie M. A., “Computer-aided detection of mammographic microcalcifications: Pattern recognition with an artificial neural network,” Med. Phys., 1995, doi: 10.1118/1.597428 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref010] 10.Bauer S., Wiest R., Nolte L. P., and Reyes M., “A survey of MRI-based medical image analysis for brain tumor studies,” Physics in Medicine and Biology. 2013. doi: 10.1088/0031-9155/58/13/R97 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref011] 11.Davatzikos C., Fan Y., Wu X., Shen D., and Resnick S. M., “Detection of prodromal Alzheimer’s disease via pattern classification of magnetic resonance imaging,” Neurobiol. Aging, 2008, doi: 10.1016/j.neurobiolaging.2006.11.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref012] 12.Kim D., Burge J., Lane T., Pearlson G. D., Kiehl K. A., and Calhoun V. D., “Hybrid ICA-Bayesian network approach reveals distinct effective connectivity differences in schizophrenia,” Neuroimage, 2008, doi: 10.1016/j.neuroimage.2008.05.065 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref013] 13.Dimitriadis S. I. and Liparas D., “How random is the random forest? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer’s disease: From Alzheimer’s disease neuroimaging initiative (ADNI) database,” Neural Regeneration Research. 2018. doi: 10.4103/1673-5374.233433 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref014] 14.Van Ginneken B., Schaefer-Prokop C. M., and Prokop M., “Computer-aided diagnosis: How to move from the laboratory to the clinic,” Radiology. 2011. doi: 10.1148/radiol.11091710 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref015] 15.Ronneberger O., Fischer P., and Brox T., “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015. doi: 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]

[pone.0313329.ref016] 16.Ke Q. et al. , “A neuro-heuristic approach for recognition of lung diseases from X-ray images,” Expert Syst. Appl., 2019, doi: 10.1016/j.eswa.2019.01.060 [DOI] [Google Scholar]

[pone.0313329.ref017] 17.Jaiswal A. K., Tiwari P., Kumar S., Gupta D., Khanna A., and Rodrigues J. J. P. C., “Identifying pneumonia in chest X-rays: A deep learning approach,” Meas. J. Int. Meas. Confed., 2019, doi: 10.1016/j.measurement.2019.05.076 [DOI] [Google Scholar]

[pone.0313329.ref018] 18.Woźniak M. and Połap D., “Bio-inspired methods modeled for respiratory disease detection from medical images,” Swarm Evol. Comput., 2018, doi: 10.1016/j.swevo.2018.01.008 [DOI] [Google Scholar]

[pone.0313329.ref019] 19.Hu Z., Wang X., Meng L., Liu W., Wu F., and Meng X., “Detection of Association Features Based on Gene Eigenvalues and MRI Imaging Using Genetic Weighted Random Forest,” Genes (Basel)., 2022, doi: 10.3390/genes13122344 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref020] 20.Jing Y., Zheng H., Lin C., Zheng W., Dong K., and Li X., “Foreign Object Debris Detection for Optical Imaging Sensors Based on Random Forest,” Sensors, 2022, doi: 10.3390/s22072463 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref021] 21.Chen Y. et al. , “Frequency importance analysis for chemical exchange saturation transfer magnetic resonance imaging using permuted random forest,” NMR Biomed., 2023, doi: 10.1002/nbm.4744 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref022] 22.Wang L. and Zhou Y., “Combining Multitemporal Sentinel-2A Spectral Imaging and Random Forest to Improve the Accuracy of Soil Organic Matter Estimates in the Plough Layer for Cultivated Land,” Agric., 2023, doi: 10.3390/agriculture13010008 [DOI] [Google Scholar]

[pone.0313329.ref023] 23.Matese A., Prince Czarnecki J. M., Samiappan S., and Moorhead R., “Are unmanned aerial vehicle-based hyperspectral imaging and machine learning advancing crop science?,” Trends in Plant Science. 2024. doi: 10.1016/j.tplants.2023.09.001 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref024] 24.Barrett M. J. et al. , “Optimizing Screening for Intrastriatal Interventions in Huntington’s Disease Using Predictive Models,” Mov. Disord., 2024, doi: 10.1002/mds.29749 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref025] 25.Waldo-Benítez G., Padierna L. C., Cerón P., and Sosa M. A., “Machine Learning in Magnetic Resonance Images of Glioblastoma: A Review,” Curr. Med. Imaging Rev., 2024, doi: 10.2174/0115734056265212231122102029 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref026] 26.Huang Y. et al. , “Detection of wheat saccharification power and protein content using stacked models integrated with hyperspectral imaging,” J. Sci. Food Agric., 2024, doi: 10.1002/jsfa.13296 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref027] 27.Feng X., Mickley L. J., Bell M. L., Liu T., Fisher J. A., and Val Martin M., “Improved estimates of smoke exposure during Australia fire seasons: importance of quantifying plume injection heights,” Atmos. Chem. Phys., 2024, doi: 10.5194/acp-24-2985-2024 [DOI] [Google Scholar]

[pone.0313329.ref028] 28.Grandremy N. et al. , “Metazoan zooplankton in the Bay of Biscay: A 16-year record of individual sizes and abundances obtained using the ZooScan and ZooCAM imaging systems,” Earth Syst. Sci. Data, 2024, doi: 10.5194/essd-16-1265-2024 [DOI] [Google Scholar]

[pone.0313329.ref029] 29.da Nóbrega R. V. M., Rebouças Filho P. P., Rodrigues M. B., da Silva S. P. P., Dourado Júnior C. M. J. M., and de Albuquerque V. H. C., “Lung nodule malignancy classification in chest computed tomography images using transfer learning and convolutional neural networks,” Neural Comput. Appl., 2020, doi: 10.1007/s00521-018-3895-1 [DOI] [Google Scholar]

[pone.0313329.ref030] 30.Phillips T. and Abdulla W., “A new honey adulteration detection approach using hyperspectral imaging and machine learning,” Eur. Food Res. Technol., 2023, doi: 10.1007/s00217-022-04113-9 [DOI] [Google Scholar]

[pone.0313329.ref031] 31.Tao J. et al. , “Combination of hyperspectral imaging and machine learning models for fast characterization and classification of municipal solid waste,” Resour. Conserv. Recycl., 2023, doi: 10.1016/j.resconrec.2022.106731 [DOI] [Google Scholar]

[pone.0313329.ref032] 32.Zhang J. et al. , “Fully Automated Echocardiogram Interpretation in Clinical Practice,” Circulation, 2018, doi: 10.1161/CIRCULATIONAHA.118.034338 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref033] 33.Yang W., Zou T., Dai D., and Sun H., “Polarimetric SAR image classification using multifeatures combination and extremely randomized clustering forests,” EURASIP J. Adv. Signal Process., 2010, doi: 10.1155/2010/465612 [DOI] [Google Scholar]

[pone.0313329.ref034] 34.Kaas Q., Yu R., Jin A. H., Dutertre S., and Craik D. J., “ConoServer: Updated content, knowledge, and discovery tools in the conopeptide database,” Nucleic Acids Res., 2012, doi: 10.1093/nar/gkr886 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref035] 35.Sekehravani E. A., Babulak E., and Masoodi M., “Implementing canny edge detection algorithm for noisy image,” Bull. Electr. Eng. Informatics, 2020, doi: 10.11591/eei.v9i4.1837 [DOI] [Google Scholar]

[pone.0313329.ref036] 36.Lynn N. D., Sourav A. I., and Santoso A. J., “Implementation of Real-Time Edge Detection Using Canny and Sobel Algorithms,” IOP Conf. Ser. Mater. Sci. Eng., 2021, doi: 10.1088/1757-899x/1096/1/012079 [DOI] [Google Scholar]

[pone.0313329.ref037] 37.Park E., Yang J., Yumer E., Ceylan D., and Berg A. C., “Transformation-grounded image generation network for novel 3D view synthesis,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.82 [DOI] [Google Scholar]

[pone.0313329.ref038] 38.Ojala T., Pietikäinen M., and Mäenpää T., “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., 2002, doi: 10.1109/TPAMI.2002.1017623 [DOI] [Google Scholar]

[pone.0313329.ref039] 39.Faudzi S. A. A. M. and Yahya N., “Evaluation of LBP-based face recognition techniques,” in 2014 5th International Conference on Intelligent and Advanced Systems: Technological Convergence for Sustainable Future, ICIAS 2014—Proceedings, 2014. doi: 10.1109/ICIAS.2014.6869522 [DOI] [Google Scholar]

[pone.0313329.ref040] 40.Freeborough P. A. and Fox N. C., “MR image texture analysis applied to the diagnosis and tracking of alzheimer’s disease,” IEEE Trans. Med. Imaging, 1998, doi: 10.1109/42.712137 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref041] 41.Soltanian-Zadeh H., Rafiee-Rad F., and Siamak Pourabdollah-Nejad D., “Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms,” Pattern Recognit., 2004, doi: 10.1016/j.patcog.2003.03.001 [DOI] [Google Scholar]

[pone.0313329.ref042] 42.Chhabra M. and Kumar R., “An Advanced VGG16 Architecture-Based Deep Learning Model to Detect Pneumonia from Medical Images,” in Lecture Notes in Electrical Engineering, 2022. doi: 10.1007/978-981-16-8774-7_37 [DOI] [Google Scholar]

[pone.0313329.ref043] 43.Albashish D., Al-Sayyed R., Abdullah A., Ryalat M. H., and Ahmad Almansour N., “Deep CNN Model based on VGG16 for Breast Cancer Classification,” in 2021 International Conference on Information Technology, ICIT 2021—Proceedings, 2021. doi: 10.1109/ICIT52682.2021.9491631 [DOI] [Google Scholar]

[pone.0313329.ref044] 44.Quinlan J. R., “Induction of decision trees,” Mach. Learn., 1986, doi: 10.1007/bf00116251 [DOI] [Google Scholar]

[pone.0313329.ref045] 45.Vrtkova A., “Predicting clinical status of patients after an acute ischemic stroke using random forests,” in Proceedings of the International Conference on Information and Digital Technologies, IDT 2017, 2017. doi: 10.1109/DT.2017.8024330 [DOI] [Google Scholar]

[pone.0313329.ref046] 46.Xi E., “Image Classification and Recognition Based on Deep Learning and Random Forest Algorithm,” Wirel. Commun. Mob. Comput., 2022, doi: 10.1155/2022/2013181 [DOI] [Google Scholar]

[pone.0313329.ref047] 47.Subudhi A., Dash M., and Sabut S., “Automated segmentation and classification of brain stroke using expectation-maximization and random forest classifier,” Biocybern. Biomed. Eng., 2020, doi: 10.1016/j.bbe.2019.04.004 [DOI] [Google Scholar]

[pone.0313329.ref048] 48.Chen T. and Guestrin C., “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]

[pone.0313329.ref049] 49.Dong C. et al. , “Non-contact screening system based for COVID-19 on XGBoost and logistic regression,” Comput. Biol. Med., 2022, doi: 10.1016/j.compbiomed.2021.105003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref050] 50.Cui B., Ye Z., Zhao H., Renqing Z., Meng L., and Yang Y., “Used Car Price Prediction Based on the Iterative Framework of XGBoost+LightGBM,” Electron., 2022, doi: 10.3390/electronics11182932 [DOI] [Google Scholar]

[pone.0313329.ref051] 51.Saragih G. S., Rustam Z., Aldila D., Hidayat R., Yunus R. E., and Pandelaki J., “Ischemic Stroke Classification using Random Forests Based on Feature Extraction of Convolutional Neural Networks,” Int. J. Adv. Sci. Eng. Inf. Technol., 2020, doi: 10.18517/ijaseit.10.5.13000 [DOI] [Google Scholar]

[pone.0313329.ref052] 52.Düntsch I. and Gediga G., “Confusion Matrices and Rough Set Data Analysis,” in Journal of Physics: Conference Series, 2019. doi: 10.1088/1742-6596/1229/1/012055 [DOI] [Google Scholar]

[pone.0313329.ref053] 53.de Hond A. A. H., Steyerberg E. W., and van Calster B., “Interpreting area under the receiver operating characteristic curve,” The Lancet Digital Health. 2022. doi: 10.1016/S2589-7500(22)00188-1 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref054] 54.Bischl B. et al. , “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2023. doi: 10.1002/widm.1484 [DOI] [Google Scholar]

[pone.0313329.ref055] 55.Bakurov I., Buzzelli M., Schettini R., Castelli M., and Vanneschi L., “Structural similarity index (SSIM) revisited: A data-driven approach,” Expert Syst. Appl., 2022, doi: 10.1016/j.eswa.2021.116087 [DOI] [Google Scholar]

[pone.0313329.ref056] 56.Güneş A., Kalkan H., and Durmuş E., “Optimizing the color-to-grayscale conversion for image classification,” Signal, Image Video Process., 2016, doi: 10.1007/s11760-015-0828-7 [DOI] [Google Scholar]

[pone.0313329.ref057] 57.Shapiro L. and Stockman G., “Binary Image Analysis,” Comput. Vis., 2000. [Google Scholar]

[pone.0313329.ref058] 58.Sundani D., Widiyanto S., Karyanti Y., and Wardani D. T., “Identification of image edge using quantum canny edge detection algorithm,” J. ICT Res. Appl., 2019, doi: 10.5614/itbj.ict.res.appl.2019.13.2.4 [DOI] [Google Scholar]

[pone.0313329.ref059] 59.Jing J., Liu S., Wang G., Zhang W., and Sun C., “Recent advances on image edge detection: A comprehensive review,” Neurocomputing, 2022, doi: 10.1016/j.neucom.2022.06.083 [DOI] [Google Scholar]

[pone.0313329.ref060] 60.Azeem A., Sharif M., Raza M., and Murtaza M., “A survey: Face recognition techniques under partial occlusion,” Int. Arab J. Inf. Technol., 2014. [Google Scholar]

[pone.0313329.ref061] 61.Zhang Y., Liu B., and Liang R., “Two-step phase-shifting algorithms with background removal and no background removal,” Optics and Lasers in Engineering. 2023. doi: 10.1016/j.optlaseng.2022.107327 [DOI] [Google Scholar]

[pone.0313329.ref062] 62.Chen Y., Zhao Y., Jia W., Cao L., and Liu X., “Adversarial-learning-based image-to-image transformation: A survey,” Neurocomputing, 2020, doi: 10.1016/j.neucom.2020.06.067 [DOI] [Google Scholar]

[pone.0313329.ref063] 63.Wang C., Xu C., Wang C., and Tao D., “Perceptual Adversarial Networks for Image-to-Image Transformation,” IEEE Trans. Image Process., 2018, doi: 10.1109/TIP.2018.2836316 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref064] 64.Govindankutty Menon A. et al. , “A deep-learning automated image recognition method for measuring pore patterns in closely related bolivinids and calibration for quantitative nitrate paleo-reconstructions,” Sci. Rep., 2023, doi: 10.1038/s41598-023-46605-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref065] 65.Bakasa W. and Viriri S., “VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction,” J. Imaging, 2023, doi: 10.3390/jimaging9070138 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref066] 66.Qi C., Zuo Y., Chen Z., and Chen K., “VGG16,” Nongye Jixie Xuebao/Transactions Chinese Soc. Agric. Mach., 2021. [Google Scholar]

[pone.0313329.ref067] 67.Asriny D. M. and Jayadi R., “Transfer Learning VGG16 for Classification Orange Fruit Images,” J. Syst. Manag. Sci., 2023, doi: 10.33168/JSMS.2023.0112 [DOI] [Google Scholar]

[pone.0313329.ref068] 68.Gefaell J., Galindo J., and Rolán-Alvarez E., “Shell color polymorphism in marine gastropods,” Evolutionary Applications. 2023. doi: 10.1111/eva.13416 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref069] 69.Jaderberg M., Vedaldi A., and Zisserman A., “Deep features for text spotting,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. doi: 10.1007/978-3-319-10593-2_34 [DOI] [Google Scholar]

[pone.0313329.ref070] 70.Wang Z., Bovik A. C., Sheikh H. R., and Simoncelli E. P., “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., 2004, doi: 10.1109/tip.2003.819861 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref071] 71.Hussain M. A. et al. , “Classification of healthy and diseased retina using SD-OCT imaging and Random Forest algorithm,” PLoS One, 2018, doi: 10.1371/journal.pone.0198281 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref072] 72.Tan K., Wang H., Chen L., Du Q., Du P., and Pan C., “Estimation of the spatial distribution of heavy metal in agricultural soils using airborne hyperspectral imaging and random forest,” J. Hazard. Mater., 2020, doi: 10.1016/j.jhazmat.2019.120987 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref073] 73.Santos Pereira L. F., Barbon S., Valous N. A., and Barbin D. F., “Predicting the ripening of papaya fruit with digital imaging and random forests,” Comput. Electron. Agric., 2018, doi: 10.1016/j.compag.2017.12.029 [DOI] [Google Scholar]

[pone.0313329.ref074] 74.Peng X. et al. , “Random Forest Based Optimal Feature Selection for Partial Discharge Pattern Recognition in HV Cables,” IEEE Trans. Power Deliv., 2019, doi: 10.1109/TPWRD.2019.2918316 [DOI] [Google Scholar]

[pone.0313329.ref075] 75.Strack R., “Deep learning in imaging,” Nature Methods. 2019. doi: 10.1038/s41592-018-0267-9 [DOI] [PubMed] [Google Scholar]

[pone.0313329.ref076] 76.Aljuaid A. and Anwar M., “Survey of Supervised Learning for Medical Image Processing,” SN Comput. Sci., 2022, doi: 10.1007/s42979-022-01166-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0313329.ref077] 77.Goriya M., Amrutiya Z., Ghadiya A., Vasa J., and Patel B., “Classification of Choroidal Neovascularization (CNV) from Optical Coherence Tomography (OCT) Images Using Efficient Fine-Tuned ResNet and DenseNet Deep Learning Models,” in Lecture Notes in Networks and Systems, 2023. doi: 10.1007/978-981-99-3758-5_42 [DOI] [Google Scholar]

[pone.0313329.ref078] 78.Afuwape A. A., Xu Y., Anajemba J. H., and Srivastava G., “Performance evaluation of secured network traffic classification using a machine learning approach,” Comput. Stand. Interfaces, 2021, doi: 10.1016/j.csi.2021.103545 [DOI] [Google Scholar]

PERMALINK

Recognition of Conus species using a combined approach of supervised learning and deep learning-based feature extraction

Noshaba Qasmi

Rimsha Bibi

Sajid Rashid

Roles

Abstract

Introduction

Materials and methods

Data collection

Fig 1. Flowchart scheme of the ML-based model.

Image preprocessing

Fig 2. Image preprocessing.

Image transformation

Fig 3. Image transformation.

Proposed methodology

Color moments and local binary patterning

Haralick texture feature extraction

Visual Geometry Group with 16-layer deep model architecture

Random Forest

XGBoost

Confusion matrix

Results

Cone snail shell image processing

Table 1. Statistical analysis of raw images of Conus species before preprocessing.

Model validation

Fig 4. Prediction results of species other than Conus species.

Model performance assessment

Precision and recall analysis

Fig 5. Bar plot for precision and recall values for 119 Cone snail species are categorized into three groups.

F1 score and support analysis

Fig 6. F1 score analysis.

Fig 7. Support value histogram plot.

Confusion matrix

Fig 8. Heatmap of different categories against the precision, recall, and F1-score.

Model performance evaluation

Fig 9. Model performance analysis.

Discussion

Fig 10. The overall model metric analysis.

Table 2. The statistical report for RF model evaluation.

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ramada Rateb Khasawneh

Roles

Author response to Decision Letter 0

Decision Letter 1

Ramada Rateb Khasawneh

Roles

Author response to Decision Letter 1

Decision Letter 2

Ramada Rateb Khasawneh

Roles

Acceptance letter

Ramada Rateb Khasawneh

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases