Skip to main content
PLOS One logoLink to PLOS One
. 2023 Mar 29;18(3):e0283567. doi: 10.1371/journal.pone.0283567

Computational method for aromatase-related proteins using machine learning approach

Muthu Krishnan Selvaraj 1,*, Jasmeet Kaur 2,*
Editor: Avaniyapuram Kannan Murugan3
PMCID: PMC10057777  PMID: 36989252

Abstract

Human aromatase enzyme is a microsomal cytochrome P450 and catalyzes aromatization of androgens into estrogens during steroidogenesis. For breast cancer therapy, third-generation aromatase inhibitors (AIs) have proven to be effective; however patients acquire resistance to current AIs. Thus there is a need to predict aromatase-related proteins to develop efficacious AIs. A machine learning method was established to identify aromatase-related proteins using a five-fold cross validation technique. In this study, different SVM approach-based models were built using the following approaches like amino acid, dipeptide composition, hybrid and evolutionary profiles in the form of position-specific scoring matrix (PSSM); with maximum accuracy of 87.42%, 84.05%, 85.12%, and 92.02% respectively. Based on the primary sequence, the developed method is highly accurate to predict the aromatase-related proteins. Prediction scores graphs were developed using the known dataset to check the performance of the method. Based on the approach described above, a webserver for predicting aromatase-related proteins from primary sequence data was developed and implemented at https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html. We hope that the developed method will be useful for aromatase protein related research.

Introduction

Cancer cases continue to rise globally despite advances in clinical therapy [1]. Breast cancer remains the most frequently diagnosed cancer in females and metastasis remains the leading cause of death by this cancer [1]. Breast cancer incidence is greater in developed countries, while mortality is highest in developing countries [2]. About 30% breast cancer patients develop recurring metastatic cancer despite recent advances in therapeutic regimens.

Biological actions of estrogen are mediated with the estrogen receptor (ER) and 70% of breast tumors express the ER and/or progesterone receptor (PR). Thus, estrogen deprivation has been considered an important treatment for estrogen-dependent (ER+) breast cancers. In post-menopausal women, estradiol is produced in extragonadal sites and thus it stops functioning as a circulating hormone and acts locally as a paracrine or intracrine factor [3, 4]. These peripheral sites include the mesenchymal cells of adipose tissue, osteoblasts and chondrocytes of bone and numerous sites in the brain and promotes breast cancer [5, 6].

For a long time, tamoxifen has been a reliable therapeutic measure for ER+ breast cancer, in both pre- and post-menopausal women. However, over half of advanced ER+ breast cancers are intrinsically resistant to tamoxifen and about 40% will acquire the resistance during the treatment. Aromatase inhibitors (AIs) are the next line of therapeutic approach for ER+ breast cancer in women and serve as first-line therapy for metastatic breast cancer [7]. AIs block the action of microsomal aromatase cytochrome P450 (P450arom), thus limiting estrogen biosynthesis and tumor progression [8]. Aromatase is a product of CYP19A1 gene, which produces a monomeric enzyme composed of a heme group and a single polypeptide chain of 503 amino-acids [9, 10]. Aromatase is primarily expressed in gonads and brain of humans, but also occurs in placenta and liver of developing fetus, and in muscle, adrenal cortex and adipose tissue of the adults [9]. In the ovary, aromatase is produced in the granulosa cells and converts androgens (male hormones) into estrogens (female hormones) and is essential for the female reproductive cycle, development of female secondary sexual characteristics and for maintaining reproductive health [11].

AIs are currently an established treatment regimen for the ER+ breast cancer patients and FDA has approved first-, second-, and third-generation AIs. The third-generation inhibitors including letrozole, anastrozole and exemestanea are routine treatment for post-menopausal breast cancer patients [12]. Besides the therapeutic success of the third-generation AIs, acquired resistance develops, leading to tumor relapse [13]. Further, patients with prolonged clinical usage of both steroidal and non-steroidal third-generation AIs, have experienced side effects like myalgia, arthralgia, hot flashes and night sweats [14]. Thus, there is an urgency to develop novel aromatase inhibitors for improved effectiveness and lesser side-effects.

Machine learning is coming-up as a useful tool in biological science [15] and it can be used to uncover novel aromatase-related proteins and to investigate the structural and functional properties of these enzymes. At present many computational methods has been proposed for clinical data analysis, clinically important protein or enzymes using machine learning approaches [1619]. But so far, there are no reports on investigating aromatase-related proteins by support vector machine (SVM). Thus finding or identifying novel or unknown aromatase-related proteins using SVM, is the need of the hour.

Support vector machine is a governed machine learning method, commonly used in bioinformatics applications, such as predicting protein functions and their evolutionary correlations, analyzing DNA sequences, and classifying microarray data [2022]. Due to its powerful prediction ability, it is used not just for protein studies but in numerous clinical investigations like gene expression profiling, cancer classification and biomarker discovery [23]. SVM is also being employed for prediction of drug-target interactions, disease-associated genes and drug efficacy [24]. Its use in gene selection and classification of microRNA expression data has enabled researchers to analyze large datasets and help understand relationship between genes and diseases [25].

SVM statistical predictors for sequence-based biological system use step-wise rules: dataset construction/selection to coach and examine a predictor, programming the biological sequence in an effective mathematical term, developing a vigorous algorithm to run the prediction, performing cross-validation to evaluate prediction accuracy and running the algorithm on a public accessible user-friendly web-server [26]. It is one of the major tasks in bioinformatics to predict the protein functions using protein structure, post-translational modification (PTM) sites and DNA binding sites; which can assist in understanding disease mechanisms and/or identifying novel drug targets [23, 27, 28].

Therefore, we have made a concerted effort to develop a method for identifying aromatase-related proteins. We developed a method for recognizing enzymes that will aid in the identification of new or unknown aromatase-related proteins, using amino acid composition (AAC), dipeptide composition (DPC), hybrid and position-specific scoring matrix (PSSM) models.

Methods

Machine learning based support vector machine (SVM)

Amino acid composition (AAC), dipeptide composition (DPC), PSSM profile and Hybrid approach employing machine learning based support vector Machine (SVM) were used to construct the method. The SVM-based prediction technique is often used to manage vast amounts of data, and it has been demonstrated to perform well in a number of biological data processing applications such as classification, protein functions and type identification [2931]. In this study, we used SVM to analyze the performance of the classifiers and five-fold cross validation [3234]. The generated approach model’s performance was assessed using the original and additional protein datasets. To eliminate outcome bias, all models were run with the same amount of negative sequences. Based on the size of the aromatase dataset, negative sequences were picked at random from the UniProt database. The performance of the SVM models was tested using known positive and negative sequence data. A blank dataset was also utilized to test the generated models, which successfully recognized the data.

Generation of survival curves

Kaplan-Meier (KM) plotter is a web-based survival analysis tool and evaluates correlation between the expression of all genes (mRNA, miRNA, protein) and survival in about 30k+ samples from all tumor types. GEO, EGA, and TCGA are the sources for the databases and the plotter provides a meta-analysis based discovery and validation of survival biomarkers for cancer research [35]. The KM plotter tool (http://kmplot.com/analysis/) was used to determine the prognostic value of aromatase (CYP19A1) mRNA expression using Pan-cancer RNA-seq in various cancers by correlating it with overall (OS) and relapse-free (RFS) survival [36], for a follow-up threshold of 240 months. For mRNA expression analysis, samples were split into high and low expression groups based on the median expression of aromatase. The median expression was selected to split patients over other options of lower quartile, lower tertile, upper tertile and upper quartile expression to give almost same sample numbers for both groups and hence less bias. Hazard ratio (HR), 95% confidence intervals and logrank p for all the survival curves were provided by the KM plotter website and p value of < 0.05 was considered to be statistically significant.

Datasets for SVM

Aromatase data was taken from the Uniprot/SWISSPROT database [37]. When we used the keyword, we found 9836 protein sequences which included 257 reviewed sequences. So, we used only reviewed sequences retrieved on 10th May 2021, and removed all these sequences annotated or labeled as "fragments," "isoforms," "potentials," "similarity," or "probables" to generate a high quality dataset and this removal will help in reducing the prediction error. To avoid redundancy and the incorporation of variants, this dataset was then processed with the CD-hit tool, which deleted sequences that were more than 90% identical to any other sequence in the dataset [38]. The final dataset contained a total of 191 aromatase sequences (positive dataset) out of 257, details provided in the S1 File. The negative dataset contained 191 non-aromatase sequences that were unrelated to the aromatase and were picked at random. A Uniprot/Swissprot keyword search for "regulatory proteins" was used to select the negative sequence collection. A web server for predicting aromatase-related proteins from primary sequence data was developed and implemented at weblink https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html.

Amino acid and dipeptide composition

The amino acid composition of a protein refers to the percentage of each amino acid in the protein [21, 39]. Encoding data into vectors is required by the SVM light. The percentage of all 20 natural amino acids was calculated using the following equation:

Fractionofaminoacid(i)=Totalnumberofaminoacid(i)Totalnumberofaminoacidinprotein (1)

In a similar manner, dipeptide composition was calculated using a vector with a constant length of 400 (20x20) dimensions [40]. To determine the fraction of each dipeptide composition, the following equation was used:

Fractionofdipeptide(i)=Totalnumberofdipep(i)Totalnumberofallpossibledipeptides (2)

PSSM profile

The GPSR software was used to create the PSSM profile against the nr (non-redundant) blast database. We utilized the seq2pssm imp, pssm n2, pssm comp, and col2svm programmes in the GPSR package for PSI-BLAST searches against the nr database using different iterations with a cut-off e-value of 0.001, as well as to normalize the PSSM profile and produce the SVM light input format (i.e. as a composition vector of 400) [26]. Finally, the SVM models were created with various parameters, optimized, and the best model was employed in the prediction server. For normalization, the following formula was used:

Normalizedvalue=(Value-Minimum)(Maximum-Minimum) (3)

Hybrid approach

In order to improve prediction accuracy, a hybrid technique was developed. A hybrid model is defined as the combination of two or more profiles. The hybrid models were developed using 420 vector lengths, which included 20 and 400 from AAC and DPC, respectively. The col_add function in the GPSR 1.0 package’s was used to merge the AAC and DPC profiles to generate a hybrid profile [41, 42].

Evaluation and performance

A five-fold cross validation approach was used to evaluate performance. We started with an aromatase positive dataset and a non-aromatase negative dataset. Positive and negative datasets were randomly divided into five equal groups. In order to run SVM, four sets were utilized for training and the remaining set for testing. This process was performed five times, resulting in only one test for each sub-set [22, 43]. This has been done with all approaches, including amino acid, dipeptide, PSSM, and hybrid. The average of the test scores from all five sets was used to compute the final performance. The performance of the classifiers was assessed using sensitivity, specificity, accuracy, and the Mathew correlation coefficient (MCC). These measurements were calculated using the following standard formulas:

Accuracy(ACC)=TP+TNTP+TN+FP+FN (4)
Sensitivity(SN)=TPTP+FN (5)
Specificity(SP)=TNTN+FP (6)
MCC=TPXTNFPXFN(TP+FP)(TP+FN)(TN+FP)(TN+FN) (7)

Support vector machine (SVM)

Aromatase prediction was done with the SVM light programme, a very successful machine learning approach. The SVM-light has been used in a variety of investigations, including plasminogen activator prediction, BacHbpred-bacterial hemoglobin prediction, Oxypred-oxygen-binding protein prediction, and VerHb-vertebrate hemoglobin protein prediction [21, 26, 3942]. The SVM may employ a range of parameter settings, including kernel, linear, polynomial, and radial basic functions (RBI) [44]. We optimized distinct parameters for each prediction approach in the prediction studies. In the method, aromatase was utilized as a positive example and non-aromatase was used as a negative example. In practice, we ran SVM light with (+)ve labels for positive sequences and (-)ve labels for negative sequences.

Webserver

The aromatase related protein prediction webserver was developed using HTML and CGI-PERL script. The backend was connected to the apache server utilizing the linux operating system. The prediction webserver can be accessed freely at the following weblink https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html. It is a Support Vector Machine (SVM) based classification method for predicting aromatase-related protein. The user can paste their sequences in fasta format into the text box on the submit page. This server will predict the input sequences as aromatase or non-aromatase protein, based on the selected approaches—amino acid composition (AAC), dipeptide composition (DPC), PSSM and hybrid (AAC+DPC).

Results

Effect of aromatase mRNA expression on cancer patient’s survival

KM plotter Pan-cancer RNA-seq was used to analyze correlation of aromatase (CYP19A1) mRNA expression and survival in different available tumor types (Table 1). Aromatase higher mRNA expression significantly correlated to poorer OS in head-neck squamous cell carcinoma (Fig 1A, Table 1), kidney renal clear cell carcinoma (Fig 1B, Table 1) and kidney renal papillary cell carcinoma (Fig 1C, Table 1) patients. Aromatase higher mRNA expression was also significantly correlated to poorer RFS in kidney renal papillary cell carcinoma (Fig 1D, Table 1) patients. Further, higher aromatase mRNA expression led to significantly poorer OS in liver hepatocellular carcinoma (Fig 1E, Table 1) and stomach adenocarcinoma (Fig 1F, Table 1) patients. No significant correlation between aromatase mRNA expression and survival was seen for other types of tumors (Table 1).

Table 1. Correlation of aromatase mRNA expression with overall (OS) and relapse-free survival (RFS) in various cancer patients.

Tumor type Samples with RNAseq data OS RFS
HR 95%CI p-value HR 95%CI p-value
Bladder Carcinoma 405 1.05 0.78 − 1.41 0.75 1.22 0.6 − 2.48 0.59
Breast cancer 1090 1.08 0.78 − 1.48 0.66 1.34 0.87 − 2.06 0.19
Cervical squamous cell carcinoma 304 1.33 0.83 − 2.12 0.24 0.51 0.23 − 1.15 0.1
Esophageal Adenocarcinoma 80 1.4 0.74 − 2.66 0.3 1.09 0.15 − 7.93 0.93
Esophageal Squamous Cell Carcinoma 81 1.06 0.48 − 2.33 0.89 1.7 0.65 − 4.47 0.28
Head-neck squamous cell carcinoma 500 1.38 1.06 − 1.8 0.017 2.15 0.99 − 4.67 0.048
Kidney renal clear cell carcinoma 530 1.48 1.1 − 2 0.01 0.61 0.22 − 1.71 0.34
Kidney renal papillary cell carcinoma 288 2.11 1.13 − 3.94 0.017 2.3 1.07 − 4.96 0.028
Liver hepatocellular carcinoma 371 1.77 1.25 − 2.5 0.001 1.1 0.79 − 1.53 0.56
Lung adenocarcinoma 513 1.27 0.95 − 1.7 0.11 1.3 0.86 − 1.97 0.22
Lung squamous cell carcinoma 501 1.3 0.99 − 1.7 0.061 1.18 0.71 − 1.94 0.53
Ovarian cancer 374 0.97 0.74 − 1.25) 0.79 0.97 0.69 − 1.38 0.88
Pancreatic ductal adenocarcinoma 177 0.89 0.59 − 1.35 0.59 1.25 0.53 − 2.93 0.61
Pheochromocytoma and Paraganglioma 178 3.27 0.58 − 18.55 0.16 0.5 0.05 − 4.82 0.54
Rectum adenocarcinoma 165 0.93 0.43 − 2 0.84 4.35 0.5 − 37.68 0.15
Sarcoma 259 1.14 0.77 − 1.7 0.52 1.42 0.87 − 2.32 0.16
Stomach adenocarcinoma 375 1.41 1.02 − 1.95 0.038 1.43 0.75 − 2.74 0.28
Testicular Germ Cell Tumor 134 2.07 0.19 − 22.89 0.54 1.1 0.52 − 2.33 0.81
Thymoma 119 1.16 0.3 − 4.41 0.83 Sample number too low for meaningful analysis
Thyroid carcinoma 502 0.76 0.28 − 2.1 0.6 0.74 0.34 − 1.63 0.45
Uterine corpus endometrial carcinoma 543 1.07 0.71 − 1.61 0.75 0.9 0.54 − 1.52 0.71

Fig 1.

Fig 1

Effect of aromatase mRNA expression on OS in head-neck squamous cell carcinoma (A), kidney renal clear cell carcinoma (B) and kidney renal papillary cell carcinoma (C) patients. (D) Effect of aromatase mRNA expression on RFS in kidney renal papillary cell carcinoma. Effect of aromatase mRNA expression on OS in liver hepatocellular carcinoma (E) and stomach adenocarcinoma (F) patients.

Amino acid composition analysis

The amino acid composition of aromatase sequences was computed for aromatase proteins, and it was observed that residue “L” occurs at much greater frequencies (above 10%) (Fig 2A). As shown in Fig 2A, “F”, “P”, “S” and “V” are present more than 6%. The residues “C” and “W” are shown less than 2%. When comparing the amino acid residue profiles of aromatase and non-aromatase, some of the residues pattern are similar, but not all (Fig 2B). These differences can be used to identify the aromatase from negative sequence by the developed models.

Fig 2.

Fig 2

A) Amino acid distribution chart between aromatase and non-aromatase protein sequences. B) Sequence length profile of aromatase and non-aromatase proteins binding.

Amino acid composition SVM modules

Firstly we used support vector machines (SVM) to develop models based on the amino acid composition of aromatase. SVM was trained on a variety of datasets using the SVM light implementation. A 20-dimensional amino acid composition vector was used to train the SVM classifiers. SVM Kernels and parameters were adjusted for the best discriminating between positive and negative protein sequence data sets. The maximum accuracy (ACC) of aromatase prediction based on amino acid composition was 87.42%, with 100% sensitivity (SN), 74.84% specificity (SP) and 0.87 Mathew correlation coefficient (MCC) (Table 2, Fig 3).

Table 2. The performance of SVM models using AAC, DPC, Hybrid and PSSM profiles on the original datasets.

ACC (%) SN (%) SP (%) MCC
AAC 87.42 100 74.84 0.87
DPC 84.05 99.84 68.26 0.82
Hybrid 85.12 98.68 71.55 0.83
PSSM 92.02 100 84.05 0.92

Fig 3. The performance of accuracy (A), sensitivity (B), specificity (C) and MCC (D) based on the threshold value in all approaches.

Fig 3

SVM modules using dipeptide composition

In general, SVM algorithms based on dipeptide composition are more effective than approaches based on single amino acid composition. SVM classifiers for dipeptide composition have also been constructed, which is represented by a 400-dimensional vector of dipeptide frequencies (20 x 20). During the adjustment of the kernel parameter and trade-off parameter C, better prediction performance was found with γ = 3 and C = 375. We developed models to distinguish aromatase from non-aromatase sequences based on these parameters. The SVM-based model achieved a maximum accuracy of 84.05%, 99.84% sensitivity, 68.26% specificity and 0.82 MCC as shown in Table 2 and Fig 3.

Hybrid (AC + DC) SVM modules

The aromatase prediction problem was also addressed using a hybrid prediction approach that integrated amino acid composition (AAC) and dipeptide composition (DPC). The hybrid approach yielded 85.12% accuracy, 98.68% sensitivity, 71.55% specificity and 0.83 MCC respectively (Table 2, Fig 3). The hybrid model results are slightly improved than the individual models, the hybrid model increase sensitivity while decrease in specificity, resulting in a slight improvement in overall performance.

PSSM profile based SVM modules

Aromatase prediction models based on position specific score matrix (PSSM) profiles were also developed to improve the performance, and they achieved maximum accuracy of 92.02% with 100% sensitivity, 84.05% specificity and 0.92% of MCC (Table 2, Fig 3). In general, all models, including the simple AAC method, performed comparably well as measured by accuracy and MCC.

Prediction scoring graphs analysis

Prediction scoring graphs were also used to assess the performance of SVM modules. The prediction score for each individual sequence tested is represented by the scoring graph, which shows how the score of sequences in the positive set is separated from the score of sequences in the negative set by a threshold that may be used to categorize positive and negative predictions. However, not all positive or negative sequences are successfully categorized, leading to misleading negative and positive predictions. This analysis summarizes the prediction results to reflect this element of performance. According to our study’s findings, no positive sequences predicted negatively in AAC, whereas one negative sequence predicted positively (Fig 4A). In DPC, no positive sequences predicted negatively and no negative sequences predicted positively (Fig 4B). In hybrid, three positive sequences predicted negatively while one negative sequence predicted positively (Fig 4C). One positive sequence predicted negatively whereas the one negative sequence predicted positively in the PSSM system (Fig 4D). On the negative dataset, the predicted false positive rate (FPR) in AAC, Hybrid was 0.005, and in PSSM 0.010.

Fig 4. Prediction scores graphs: Prediction performance of the developed models on aromatase and non-aromatase proteins.

Fig 4

A) Amino acid composition based approach (AAC), B) Dipeptide composition based approach (DPC), C) Hybrid profile based approach (AAC+DPC) and D) PSSM profile based approach.

BLAST data analysis

According to the results of the BLAST dataset, the developed methods are performing well in all approaches in identifying aromatase. We have randomly picked five sequences from our dataset (CP19A_HUMAN, CP2F1_HUMAN, CP4Z1_HUMAN, GCM1_HUMAN, and CP2A7_HUMAN) and BLAST was performed against non-redundancy dataset and collected 500 sequences (100 each from one sequence). Overall, the proposed method using the BLAST dataset was able to accurately identify 97.4% of the sequences in all approaches. All models correctly predicted the respective individual performances of AAC, DPC, Hybrid and PSSM at 99.2%, 93.8%, 96.6% and 100% (Table 3). Thus, the PSSM approach completely identifies the BLAST sequences (Fig 5). This result shows that our method outperforms the BLAST search in identifying the aromatase related proteins.

Table 3. The prediction performance of all models on the BLAST-Search data.

Total BLAST sequences Positive Prediction Negative Prediction Positive Prediction Percentage
AAC 500 496 4 99.2%
DPC 500 469 31 93.8%
Hybrid (AAC+DPC) 500 483 17 96.6%
PSSM 500 500 0 100%

Fig 5. BLAST-Search Data analysis: Prediction performance of all models on the BLAST-Search data, A) AAC, B) DPC, C) Hybrid and D) PSSM.

Fig 5

Discussion

Computational biology has helped understand proteins from a new perspective, as algorithms can predict protein-protein interactions [45, 46] and identify novel drug targets in various pathologies [47, 48]. Algorithms performing systematic study of cancer and protein databases [49, 50] have enhanced the accuracy of cancer patients’ survival predictions [5154], provide understanding of drug-induced side-effects [55] and allow identification of novel biomarkers [56]. To our knowledge, there are no algorithms for structural and functional characterization of aromatase or its polymorphisms. As aromatase is a critical target in breast cancer patients [57, 58], we established a reliable approach for detecting novel aromatase-related proteins, which will aid in developing novel AIs with improved efficacy.

Aromatase belong to the cytochrome P450 family, which are heme-containing mono-oxygenases and highly flexible enzymes that allow easy substrate access and binding, and product release [59]. Unlike most P450s, which are not highly substrate selective, androgenic specificity of aromatase sets it apart. Aromatase structure remained unknown for decades and this hindered explanation of its biochemical mechanism. Several laboratories purified aromatase from human placenta [60, 61] and recombinant expression systems [62, 63], however attempts to crystallize aromatase remained unsuccessful. So far, only one crystal structure of the only natural mammalian, full-length P450 human placental aromatase is known [64]. Thus, finding aromatase-related proteins using in-vivo and in-vitro methods is difficult and thus low-cost computational methods like SVM can be a reliable approach to identify novel aromatase-related proteins.

Aromatase is the only vertebrate enzyme which catalyzes aromatization of androgens into estrogens [64, 65]. It is a monomeric integral membrane protein in endoplasmic reticulum [66, 67] and has a heme group with 503 amino acids. Aromatase has twelve α-helices and ten β-strands [64, 68] and its active site is a distal cavity of heme-binding pocket with heme iron being the reaction center [68]. Aromatase in peripheral adipose tissues leads to estrogen biosynthesis in postmenopausal women, thus inducing breast tumors [69]. A small amount of estrogen can stimulate breast tumor formation and aromatase protein is seen in epithelial as well as stromal breast cancer cells [70]. AIs are currently being used to treat breast cancer patients, however resistance and toxicity of AIs induces the need for discovering novel AIs [71].

Survival analysis in various types of cancer patients using KM plotter showed that aromatase higher mRNA expression led to poorer overall survival (OS) in head-neck squamous cell carcinoma (Fig 1A), kidney renal clear cell carcinoma (Fig 1B), kidney renal papillary cell carcinoma (Fig 1C), liver hepatocellular carcinoma (Fig 1E) and stomach adenocarcinoma (Fig 1F) patients. Human fetal liver, kidney and intestine expresses significant level of aromatase [72], but the hepatic aromatase expression becomes untraceable in post-natal life [73]. Estrogens have shown to promote not only the development and progression of breast cancer, but also endometrial, prostrate and colorectal cancer by increasing the mitotic activity [74, 75]. The current survival analysis suggests a key role of aromatase as a tumor-promoter, even in extragonadal tissues including head-neck, kidney, liver and stomach [76]. These results signify the demand for a method to identify aromatase-related proteins for various types of endocrine-responsive tumors.

SVM is used in a variety of studies in the field of basic science and medicine, including clinical data analysis, laboratory testing for detection of disease and clinical trials of medicines [7779]. In this study, we developed a very reliable method for predicting aromatase-related proteins, based on a variety of protein patterns such as AAC, DPC and Hybrid approaches. The overall prediction accuracy for aromatase-related proteins was 87.42%, 84.05%, 85.12% and 92.02% for AAC, DPC, hybrid and PSSM, respectively. The results of the BLAST search data analysis and prediction score graph analysis demonstrate that the established method is effective in identifying the aromatase-related proteins. We expect that our developed method will find undiscovered aromatase-related proteins, which will aid researchers in cancer predictive studies and precision medicine. As it is a first webserver to detect aromatase-related proteins, we cannot compare the performance of our method with any other methods.

Conclusion

So far, there is no web-server/algorithm to predict or detect aromatase-related proteins. Thus, we developed a highly accurate method for identifying aromatase-related proteins using SVM with various amino acid approaches (Fig 6). The method was developed with the fivefold cross validation techniques with the approaches of amino acid composition (AAC), dipeptide composition (DPC), hybrid (AAC+DPC) and position specific score matrix (PSSM). We have tested the known and unknown data with our developed models and as a result all models detect aromatase-related proteins accurately. In future studies, we would like to work on the aromatase inhibitors with molecular docking, and we are also interested in using a deep learning technique [8082]. We believe that this study will facilitate researchers in finding new or undiscovered aromatase-related proteins.

Fig 6. Flow chart for developing SVM method to predict aromatase-related proteins.

Fig 6

Supporting information

S1 File

(DOC)

Acknowledgments

We are sincerely thankful to the Directors of CSIR-IMTECH and PGIMER (Chandigarh) for their support. A copy of the manuscript has been submitted to PTM, CSIR IMTECH, dated on 25.07.2022.

Data Availability

All relevant data are within the paper and it Supporting information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer Statistics, 2021. CA: A Cancer Journal for Clinicians. 2021;71: 7–33. doi: 10.3322/caac.21654 [DOI] [PubMed] [Google Scholar]
  • 2.Dhakal R, Noula M, Roupa Z, Yamasaki EN. A Scoping Review on the Status of Female Breast Cancer in Asia with a Special Focus on Nepal. Breast Cancer (Dove Med Press). 2022;14: 229–246. doi: 10.2147/BCTT.S366530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Simpson E, Rubin G, Clyne C, Robertson K, O’Donnell L, Jones M, et al. The role of local estrogen biosynthesis in males and females. Trends Endocrinol Metab. 2000;11: 184–188. doi: 10.1016/s1043-2760(00)00254-x [DOI] [PubMed] [Google Scholar]
  • 4.Labrie F, Bélanger A, Cusan L, Gomez JL, Candas B. Marked decline in serum concentrations of adrenal C19 sex steroid precursors and conjugated androgen metabolites during aging. J Clin Endocrinol Metab. 1997;82: 2396–2402. doi: 10.1210/jcem.82.8.4160 [DOI] [PubMed] [Google Scholar]
  • 5.Russo J, Hasan Lareef M, Balogh G, Guo S, Russo IH. Estrogen and its metabolites are carcinogenic agents in human breast epithelial cells. J Steroid Biochem Mol Biol. 2003;87: 1–25. doi: 10.1016/s0960-0760(03)00390-x [DOI] [PubMed] [Google Scholar]
  • 6.Cui X, Schiff R, Arpino G, Osborne CK, Lee AV. Biology of progesterone receptor loss in breast cancer and its implications for endocrine therapy. J Clin Oncol. 2005;23: 7721–7735. doi: 10.1200/JCO.2005.09.004 [DOI] [PubMed] [Google Scholar]
  • 7.Van Asten K, Neven P, Lintermans A, Wildiers H, Paridaens R. Aromatase inhibitors in the breast cancer clinic: focus on exemestane. Endocr Relat Cancer. 2014;21: R31–49. doi: 10.1530/ERC-13-0269 [DOI] [PubMed] [Google Scholar]
  • 8.Santen RJ, Santner S, Davis B, Veldhuis J, Samojlik E, Ruby E. Aminoglutethimide inhibits extraglandular estrogen production in postmenopausal women with breast carcinoma. J Clin Endocrinol Metab. 1978;47: 1257–1265. doi: 10.1210/jcem-47-6-1257 [DOI] [PubMed] [Google Scholar]
  • 9.Simpson ER, Mahendroo MS, Means GD, Kilgore MW, Hinshelwood MM, Graham-Lorence S, et al. Aromatase cytochrome P450, the enzyme responsible for estrogen biosynthesis. Endocr Rev. 1994;15: 342–355. doi: 10.1210/edrv-15-3-342 [DOI] [PubMed] [Google Scholar]
  • 10.Chen SA, Besman MJ, Sparkes RS, Zollman S, Klisak I, Mohandas T, et al. Human aromatase: cDNA cloning, Southern blot analysis, and assignment of the gene to chromosome 15. DNA. 1988;7: 27–38. doi: 10.1089/dna.1988.7.27 [DOI] [PubMed] [Google Scholar]
  • 11.Stocco C. Aromatase expression in the ovary: hormonal and molecular regulation. Steroids. 2008;73: 473–487. doi: 10.1016/j.steroids.2008.01.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ratre P, Mishra K, Dubey A, Vyas A, Jain A, Thareja S. Aromatase Inhibitors for the Treatment of Breast Cancer: A Journey from the Scratch. Anticancer Agents Med Chem. 2020;20: 1994–2004. doi: 10.2174/1871520620666200627204105 [DOI] [PubMed] [Google Scholar]
  • 13.Augusto TV, Correia-da-Silva G, Rodrigues CMP, Teixeira N, Amaral C. Acquired resistance to aromatase inhibitors: where we stand! Endocr Relat Cancer. 2018;25: R283–R301. doi: 10.1530/ERC-17-0425 [DOI] [PubMed] [Google Scholar]
  • 14.Din OS, Dodwell D, Wakefield RJ, Coleman RE. Aromatase inhibitor-induced arthralgia in early breast cancer: what do we know and how can we find out more? Breast Cancer Res Treat. 2010;120: 525–538. doi: 10.1007/s10549-010-0757-7 [DOI] [PubMed] [Google Scholar]
  • 15.Ahmad F, Mahmood A, Muhmood T. Machine learning-integrated omics for the risk and safety assessment of nanomaterials. Biomater Sci. 2021;9: 1598–1608. doi: 10.1039/d0bm01672a [DOI] [PubMed] [Google Scholar]
  • 16.Kalafi EY, Nor NAM, Taib NA, Ganggayah MD, Town C, Dhillon SK. Machine Learning and Deep Learning Approaches in Breast Cancer Survival Prediction Using Clinical Data. Folia Biol (Praha). 2019;65: 212–220. [DOI] [PubMed] [Google Scholar]
  • 17.Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19: 281. doi: 10.1186/s12911-019-1004-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gorji F, Shafiekhani S, Namdar P, Abdollahzade S, Rafiei S. Machine learning-based COVID-19 diagnosis by demographic characteristics and clinical data. Adv Respir Med. 2022. doi: 10.5603/ARM.a2022.0021 [DOI] [PubMed] [Google Scholar]
  • 19.Tapani KT, Nevalainen P, Vanhatalo S, Stevenson NJ. Validating an SVM-based neonatal seizure detection algorithm for generalizability, non-inferiority and clinical efficacy. Comput Biol Med. 2022;145: 105399. doi: 10.1016/j.compbiomed.2022.105399 [DOI] [PubMed] [Google Scholar]
  • 20.Peng Z-L, Yang J-Y, Chen X. An improved classification of G-protein-coupled receptors using sequence-derived features. BMC Bioinformatics. 2010;11: 420. doi: 10.1186/1471-2105-11-420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Muthukrishnan S, Puri M, Lefevre C. Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators. BMC Res Notes. 2014;7: 63. doi: 10.1186/1756-0500-7-63 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Muthu Krishnan S. Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol. 2018;445: 62–74. doi: 10.1016/j.jtbi.2018.02.008 [DOI] [PubMed] [Google Scholar]
  • 23.Sahu SS, Loaiza CD, Kaundal R. Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AoB Plants. 2020;12: plz068. doi: 10.1093/aobpla/plz068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18: 463–477. doi: 10.1038/s41573-019-0024-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering. 2023;10. doi: 10.3390/bioengineering10020173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Muthu Krishnan S. Classify vertebrate hemoglobin proteins by incorporating the evolutionary information into the general PseAAC with the hybrid approach. J Theor Biol. 2016;409: 27–37. doi: 10.1016/j.jtbi.2016.08.027 [DOI] [PubMed] [Google Scholar]
  • 27.Hendrix SG, Chang KY, Ryu Z, Xie Z-R. DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method. Int J Mol Sci. 2021;22. doi: 10.3390/ijms22115510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pugalenthi G, Nithya V, Chou K-C, Archunan G. Nglyc: A Random Forest Method for Prediction of N-Glycosylation Sites in Eukaryotic Protein Sequence. Protein Pept Lett. 2020;27: 178–186. doi: 10.2174/0929866526666191002111404 [DOI] [PubMed] [Google Scholar]
  • 29.Huang G, Zhang G, Yu Z. Computational prediction and analysis of histone H3k27me1-associated miRNAs. Biochim Biophys Acta Proteins Proteom. 2021;1869: 140539. doi: 10.1016/j.bbapap.2020.140539 [DOI] [PubMed] [Google Scholar]
  • 30.Zhou L, Duan Q, Tian X, Xu H, Tang J, Peng L. LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification. BMC Bioinformatics. 2021;22: 568. doi: 10.1186/s12859-021-04485-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhang M, Su Q, Lu Y, Zhao M, Niu B. Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Med Chem. 2017;13: 506–514. doi: 10.2174/1573406413666170522150940 [DOI] [PubMed] [Google Scholar]
  • 32.Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Correction to: Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics. 2021;22: 366. doi: 10.1186/s12859-021-04257-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Park B, Im J, Tuvshinjargal N, Lee W, Han K. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models. Comput Methods Programs Biomed. 2014;117: 158–167. doi: 10.1016/j.cmpb.2014.07.009 [DOI] [PubMed] [Google Scholar]
  • 34.Suresh V, Parthasarathy S. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures. Protein Pept Lett. 2014;21: 736–742. doi: 10.2174/09298665113209990064 [DOI] [PubMed] [Google Scholar]
  • 35.Lánczky A, Győrffy B. Web-Based Survival Analysis Tool Tailored for Medical Research (KMplot): Development and Implementation. J Med Internet Res. 2021;23: e27633. doi: 10.2196/27633 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nagy Á, Munkácsy G, Győrffy B. Pancancer survival analysis of cancer hallmark genes. Sci Rep. 2021;11: 6047. doi: 10.1038/s41598-021-84787-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49: D480–D489. doi: 10.1093/nar/gkaa1100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26: 680–682. doi: 10.1093/bioinformatics/btq003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Krishnan SM. The evolutionary relationship of S15/NS1RNA binding domains with a similar protein domain pattern—A computational approach. Informatics in Medicine Unlocked. 2021;24: 100611. doi: 10.1016/j.imu.2021.100611 [DOI] [Google Scholar]
  • 40.Muthukrishnan S, Puri M. Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules. BMC Res Notes. 2018;11: 290. doi: 10.1186/s13104-018-3383-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Selvaraj M, Puri M, Dikshit KL, Lefevre C. BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins. Adv Bioinformatics. 2016;2016: 8150784. doi: 10.1155/2016/8150784 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Agrawal P, Kumar R, Usmani SS, Dhall A, Patiyal S, Sharma N, et al. GPSRdocker: A Docker-based Resource for Genomics, Proteomics and Systems biology. bioRxiv. 2019. doi: 10.1101/827766 [DOI] [Google Scholar]
  • 43.Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics. 2017;33: 854–862. doi: 10.1093/bioinformatics/btw730 [DOI] [PubMed] [Google Scholar]
  • 44.Palagi L, Sciandrone M. On the convergence of a modified version of SVM light algorithm. Optimization Methods and Software. 2005;20: 317–334. doi: 10.1080/10556780512331318209 [DOI] [Google Scholar]
  • 45.Di Paola L, Mei G, Di Venere A, Giuliani A. Exploring the stability of dimers through protein structure topology. Curr Protein Pept Sci. 2016;17: 30–36. doi: 10.2174/1389203716666150923104054 [DOI] [PubMed] [Google Scholar]
  • 46.Minicozzi V, Di Venere A, Nicolai E, Giuliani A, Caccuri AM, Di Paola L, et al. Non-symmetrical structural behavior of a symmetric protein: the case of homo-trimeric TRAF2 (tumor necrosis factor-receptor associated factor 2). J Biomol Struct Dyn. 2021;39: 319–329. doi: 10.1080/07391102.2020.1719202 [DOI] [PubMed] [Google Scholar]
  • 47.Platania CBM, Di Paola L, Leggio GM, Romano GL, Drago F, Salomone S, et al. Molecular features of interaction between VEGFA and anti-angiogenic drugs used in retinal diseases: a computational approach. Front Pharmacol. 2015;6: 248. doi: 10.3389/fphar.2015.00248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Di Paola L, Hadi-Alijanvand H, Song X, Hu G, Giuliani A. The Discovery of a Putative Allosteric Site in the SARS-CoV-2 Spike Protein Using an Integrated Structural/Dynamic Approach. J Proteome Res. 2020;19: 4576–4586. doi: 10.1021/acs.jproteome.0c00273 [DOI] [PubMed] [Google Scholar]
  • 49.Mihaylov I, Kańduła M, Krachunov M, Vassilev D. A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models. Biol Direct. 2019;14: 22. doi: 10.1186/s13062-019-0249-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Werner J, Géron A, Kerssemakers J, Matallana-Surget S. mPies: a novel metaproteomics tool for the creation of relevant protein databases and automatized protein annotation. Biol Direct. 2019;14: 21. doi: 10.1186/s13062-019-0253-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Amelio I, Bertolo R, Bove P, Candi E, Chiocchi M, Cipriani C, et al. Cancer predictive studies. Biol Direct. 2020;15: 18. doi: 10.1186/s13062-020-00274-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Han Y, Ye X, Wang C, Liu Y, Zhang S, Feng W, et al. Integration of molecular features with clinical information for predicting outcomes for neuroblastoma patients. Biol Direct. 2019;14: 16. doi: 10.1186/s13062-019-0244-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Han Y, Ye X, Cheng J, Zhang S, Feng W, Han Z, et al. Integrative analysis based on survival associated co-expression gene modules for predicting Neuroblastoma patients’ survival time. Biol Direct. 2019;14: 4. doi: 10.1186/s13062-018-0229-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kim SY, Jeong H-H, Kim J, Moon J-H, Sohn K-A. Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies. Biol Direct. 2019;14: 8. doi: 10.1186/s13062-019-0239-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chierici M, Francescatto M, Bussola N, Jurman G, Furlanello C. Predictability of drug-induced liver injury by machine learning. Biology Direct. 2020;15: 3. doi: 10.1186/s13062-020-0259-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Liu L, Wang G, Wang L, Yu C, Li M, Song S, et al. Computational identification and characterization of glioma candidate biomarkers through multi-omics integrative profiling. Biol Direct. 2020;15: 10. doi: 10.1186/s13062-020-00264-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Adhikari N, Amin SA, Saha A, Jha T. Combating breast cancer with non-steroidal aromatase inhibitors (NSAIs): Understanding the chemico-biological interactions through comparative SAR/QSAR study. Eur J Med Chem. 2017;137: 365–438. doi: 10.1016/j.ejmech.2017.05.041 [DOI] [PubMed] [Google Scholar]
  • 58.Brueggemeier RW, Hackett JC, Diaz-Cruz ES. Aromatase Inhibitors in the Treatment of Breast Cancer. Endocrine Reviews. 2005;26: 331–345. doi: 10.1210/er.2004-0015 [DOI] [PubMed] [Google Scholar]
  • 59.Cojocaru V, Winn PJ, Wade RC. The ins and outs of cytochrome P450s. Biochim Biophys Acta. 2007;1770: 390–401. doi: 10.1016/j.bbagen.2006.07.005 [DOI] [PubMed] [Google Scholar]
  • 60.Nakajin S, Shinoda M, Hall PF. Purification to homogeneity of aromatase from human placenta. Biochem Biophys Res Commun. 1986;134: 704–710. doi: 10.1016/s0006-291x(86)80477-6 [DOI] [PubMed] [Google Scholar]
  • 61.Kellis JT, Vickery LE. Purification and characterization of human placental aromatase cytochrome P-450. Journal of Biological Chemistry. 1987;262: 4413–4420. doi: 10.1016/S0021-9258(18)61364-X [DOI] [PubMed] [Google Scholar]
  • 62.Amarneh B, Simpson ER. Expression of a recombinant derivative of human aromatase P450 in insect cells utilizing the baculovirus vector system. Mol Cell Endocrinol. 1995;109: R1–5. doi: 10.1016/0303-7207(95)03524-b [DOI] [PubMed] [Google Scholar]
  • 63.Hong Y, Cho M, Yuan Y-C, Chen S. Molecular basis for the interaction of four different classes of substrates and inhibitors with human aromatase. Biochem Pharmacol. 2008;75: 1161–1169. doi: 10.1016/j.bcp.2007.11.010 [DOI] [PubMed] [Google Scholar]
  • 64.Ghosh D, Griswold J, Erman M, Pangborn W. Structural basis for androgen specificity and oestrogen synthesis in human aromatase. Nature. 2009;457: 219–223. doi: 10.1038/nature07614 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Schuster D, Laggner C, Steindl TM, Palusczak A, Hartmann RW, Langer T. Pharmacophore modeling and in silico screening for new P450 19 (aromatase) inhibitors. J Chem Inf Model. 2006;46: 1301–1311. doi: 10.1021/ci050237k [DOI] [PubMed] [Google Scholar]
  • 66.Shimozawa O, Sakaguchi M, Ogawa H, Harada N, Mihara K, Omura T. Core glycosylation of cytochrome P-450(arom). Evidence for localization of N terminus of microsomal cytochrome P-450 in the lumen. J Biol Chem. 1993;268: 21399–21402. [PubMed] [Google Scholar]
  • 67.Amarneh B, Corbin CJ, Peterson JA, Simpson ER, Graham-Lorence S. Functional domains of human aromatase cytochrome P450 characterized by linear alignment and site-directed mutagenesis. Mol Endocrinol. 1993;7: 1617–1624. doi: 10.1210/mend.7.12.8145767 [DOI] [PubMed] [Google Scholar]
  • 68.Ghosh D, Griswold J, Erman M, Pangborn W. X-ray structure of human aromatase reveals an androgen-specific active site. J Steroid Biochem Mol Biol. 2010;118: 197–202. doi: 10.1016/j.jsbmb.2009.09.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Zhao H, Zhou L, Shangguan AJ, Bulun SE. Aromatase expression and regulation in breast and endometrial cancer. J Mol Endocrinol. 2016;57: R19–33. doi: 10.1530/JME-15-0310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Esteban JM, Warsi Z, Haniu M, Hall P, Shively JE, Chen S. Detection of intratumoral aromatase in breast carcinomas. An immunohistochemical study with clinicopathologic correlation. Am J Pathol. 1992;140: 337–343. [PMC free article] [PubMed] [Google Scholar]
  • 71.Chumsri S, Schech A, Chakkabat C, Sabnis G, Brodie A. Advances in mechanisms of resistance to aromatase inhibitors. Expert Rev Anticancer Ther. 2014;14: 381–393. doi: 10.1586/14737140.2014.882233 [DOI] [PubMed] [Google Scholar]
  • 72.Price T, Aitken J, Simpson ER. Relative expression of aromatase cytochrome P450 in human fetal tissues as determined by competitive polymerase chain reaction amplification. J Clin Endocrinol Metab. 1992;74: 879–883. doi: 10.1210/jcem.74.4.1548354 [DOI] [PubMed] [Google Scholar]
  • 73.Yamamoto T, Sakai CN, Yamaki J, Takamori K, Yoshiji S, Kitawaki J, et al. Estrogen biosynthesis in human liver–a comparison of aromatase activity for C-19 steroids in fetal liver, adult liver and hepatoma tissues of human subjects. Endocrinologia japonica. 1984;31 3: 277–81. doi: 10.1507/endocrj1954.31.277 [DOI] [PubMed] [Google Scholar]
  • 74.Sasano H, Harada N. Intratumoral aromatase in human breast, endometrial, and ovarian malignancies. Endocr Rev. 1998;19: 593–607. doi: 10.1210/edrv.19.5.0342 [DOI] [PubMed] [Google Scholar]
  • 75.Henderson BE, Ross R, Bernstein L. Estrogens as a cause of human cancer: the Richard and Hinda Rosenthal Foundation award lecture. Cancer Res. 1988;48: 246–253. [PubMed] [Google Scholar]
  • 76.Murakami K, Hata S, Miki Y, Sasano H. Aromatase in normal and diseased liver. Hormone Molecular Biology and Clinical Investigation. 2020;41: 20170081. doi: 10.1515/hmbci-2017-0081 [DOI] [PubMed] [Google Scholar]
  • 77.Çubukçu HC, Topcu Dİ, Bayraktar N, Gülşen M, Sarı N, Arslan AH. Detection of COVID-19 by Machine Learning Using Routine Laboratory Tests. Am J Clin Pathol. 2022;157: 758–766. doi: 10.1093/ajcp/aqab187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Abiodun TN, Okunbor D, Osamor VC. Remote Health Monitoring in Clinical Trial using Machine Learning Techniques: A Conceptual Framework. Health Technol (Berl). 2022;12: 359–364. doi: 10.1007/s12553-022-00652-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Chen Y, Mao Q, Wang B, Duan P, Zhang B, Hong Z. Privacy-Preserving Multi-class Support Vector Machine Model on Medical Diagnosis. IEEE J Biomed Health Inform. 2022;PP. doi: 10.1109/JBHI.2022.3157592 [DOI] [PubMed] [Google Scholar]
  • 80.Ahmed AA, Abouzid M, Kaczmarek E. Deep Learning Approaches in Histopathology. Cancers. 2022;14. doi: 10.3390/cancers14215264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Zhao B-W, You Z-H, Hu L, Guo Z-H, Wang L, Chen Z-H, et al. A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning. Cancers (Basel). 2021;13. doi: 10.3390/cancers13092111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Zhao B-W, Hu L, You Z-H, Wang L, Su X-R. HINGRL: predicting drug–disease associations with graph representation learning on heterogeneous information networks. Briefings in Bioinformatics. 2021;23. doi: 10.1093/bib/bbab515 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Avaniyapuram Kannan Murugan

31 Oct 2022

PONE-D-22-24198Computational method for aromatase-related proteins using machine learning approachPLOS ONE

Dear Dr. Kaur,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 15 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Avaniyapuram Kannan Murugan, M.Phil., Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. New software must comply with the Open Source Definition.

Additional Editor Comments:

Reviewers positively comment on the manuscript. However, raise many major critiques on the manuscript. Kindly address them carefully giving additional input.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #4: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript sounds interesting, and the authors have developed method to predict the aromatase-related proteins. The manuscript is well structured and the authors have elucidated their method distinctly. However, There are some corrections could be made to make it more precise and clear;

1- The abstract should be more organized and includes briefly the method, result, and conclusion.

2- The last paragraph of the introduction should be moved to the method section.

3- There are some out dated references, I recommend the authors to use most recent ones.

Reviewer #2: Dear authors,

Your manuscript is highly interesting, and it caught my interest as I wrote almost same journal article after a regress work on SARS CoV-2. However, your manuscript is having a huge flaw, the Abstract, methods, and conclusion are totally not in the article fashion, improvise it properly as you have done wonderful work.

Reviewer #3: Dear Authors,

Thank you for submitting your work to Plos One journal. Some improvements need to be considered to make your work looks better.

Please consider the following comments and suggestions:

1. Abstract:

a. I suggest moving the link to the related section (dataset).

b. Please recheck the link and make sure it is correct and working.

2. Introduction:

Please follow the traditional way of writing this part. For instance, at the end of this part you should mention the structure of your article.

3. Methods:

- Datasets for SVM:

you mentioned removing sequences labeled "fragments,", "isoforms",.. etc. Please elaborate.

- Generation of survival curves:

please explain more about Kaplan-Meier tool.

4. Results:

- In table 1:

The p-value numbers varies a lot! for example the p-value for Esophageal Adenocarcinoma on RFS was 0.93 while o-value for Head-neck squamous cell carcinoma was 0.048. Please explain the reason.

5. Conclusion:

Based on your findings, do you suggest any future work?

- Please check the equations and formulas because it appears that there's something wrong with the fraction bars!

Reviewer #4: The present study aims to develop a method to recognize aromatase protein using different amino acid composition parameters and evolutionary profile using SVM technique. Also, the authors state that they have developed a webserver for the same.

However, from the introduction given in the manuscript it is difficult to understand that why authors need to predict Aromatase protein. They state that Aromatase is a target for designing inhibitors in humans for treating Cancer and other diseases. But, then the sequence of this protein is already known. So what is the purpose of this study.

Also, if I understand Aromatase is one single protein and if we start developing methods for a single proteins then we can develop many thousand methods. But what is the utility?

Minor comments

1. In “Method” section, number of sequences initially taken from Uniprot/Swiss-prot as well as number of sequences removed to get a final dataset of 191 sequences is not mentioned.

2. Have the authors compared the similarity of negative set sequences with positive dataset?

3. Amino acid profiles of aromatase and non-aromatase was found to be similar by authors (Fig 2B). Then how it has contributed in model devlopment.

4. The web link given for the server could not accessed.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Alhassan Ali Ahmed

Reviewer #2: Yes: Shaban Ahmad

Reviewer #3: Yes: Abdullah Almuhaimeed

Reviewer #4: No

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Mar 29;18(3):e0283567. doi: 10.1371/journal.pone.0283567.r002

Author response to Decision Letter 0


10 Jan 2023

Dear Editor,

Thank you for providing us an opportunity to submit a revision of our manuscript. Below is a point-by-point response to all the reviewers. All the changes in the revised manuscript are highlighted in yellow.

Reviewer #1:

The manuscript sounds interesting, and the authors have developed method to predict the aromatase-related proteins. The manuscript is well structured and the authors have elucidated their method distinctly. However, there are some corrections could be made to make it more precise and clear;

1- The abstract should be more organized and includes briefly the method, result, and conclusion.

Yes, we have modified the abstract as per your guidance (highlighted in yellow).

2- The last paragraph of the introduction should be moved to the method section.

Yes, some part of the paragraph is now moved to the Method section on page 4 as para 1 (highlighted in yellow).

3- There are some out dated references, I recommend the authors to use most recent ones.

Yes, now we have added the recent references (highlighted in yellow).

Reviewer #2:

Dear authors,

Your manuscript is highly interesting, and it caught my interest as I wrote almost same journal article after a regress work on SARS CoV-2. However, your manuscript is having a huge flaw, the Abstract, methods, and conclusion are totally not in the article fashion, improvise it properly as you have done wonderful work.

Yes, we have modified the abstract, methods and conclusion as per your guidance (highlighted in yellow).

Reviewer #3:

Dear Authors,

Thank you for submitting your work to Plos One journal. Some improvements need to be considered to make your work looks better.

Please consider the following comments and suggestions:

1. Abstract:

a. I suggest moving the link to the related section (dataset).

Yes, we have moved the link to webserver in dataset section in the methods on page 5 (highlighted in yellow).

b. Please recheck the link and make sure it is correct and working.

Due to some security reason the provided link was not working, now we have secured our site under (https). Now it can be accessed.

2. Introduction:

Please follow the traditional way of writing this part. For instance, at the end of this part you should mention the structure of your article.

Yes, we have modified the end of Introduction as per your guidance (highlighted in yellow).

3.Methods:

- Datasets for SVM: you mentioned removing sequences labeled "fragments,", "isoforms",.. etc. Please elaborate.

Yes, it is now explained in the Dataset for SVM section on page 5 (highlighted in yellow).

- Generation of survival curves: please explain more about Kaplan-Meier tool.

Yes, the explanation is now added to the section on page 4 (highlighted in yellow).

4. Results:

- In table 1: The p-value numbers varies a lot! for example the p-value for Esophageal Adenocarcinoma on RFS was 0.93 while o-value for Head-neck squamous cell carcinoma was 0.048. Please explain the reason.

The p-value is calculated by the KM plotter website for each cancer type and hence it is significant for some types of cancers and not significant for others.

5. Conclusion:

Based on your findings, do you suggest any future work?

Yes, we are interested to work on molecular docking of aromatase inhibitors.

- Please check the equations and formulas because it appears that there's something wrong with the fraction bars!

Yes, we have corrected the equations and formulas.

Reviewer #4: The present study aims to develop a method to recognize aromatase protein using different amino acid composition parameters and evolutionary profile using SVM technique. Also, the authors state that they have developed a webserver for the same.

However, from the introduction given in the manuscript it is difficult to understand that why authors need to predict Aromatase protein.

There are many unknown functional proteins, which are available and need to be identified. Thus, there is a need of computational methods to identify these proteins and their functions. So, we have developed a method to identify these aromatase-related proteins.

They state that Aromatase is a target for designing inhibitors in humans for treating Cancer and other diseases. But, then the sequence of this protein is already known. So what is the purpose of this study. Also, if I understand Aromatase is one single protein and if we start developing methods for single proteins then we can develop many thousand methods. But what is the utility?

It is not like that; we have used aromatase as one of the major enzymes in the cancer-related study. So, we developed a computational method which will be useful for the cancer researchers. Actually, it is one of the major tasks in bioinformatics to predict the protein functions which can assist with a variety of biological issues, such as understanding disease mechanisms or identifying novel drug targets.

Minor comments

1. In “Method” section, number of sequences initially taken from Uniprot/Swiss-prot as well as number of sequences removed to get a final dataset of 191 sequences is not mentioned.

Yes, when we used the keyword “aromatase”, we found 9836 protein sequences which included 257 reviewed sequences. We only used reviewed sequences and the final dataset contained 191 sequences out of 257. This is added on page 5 under “Datasets for SVM” (highlighted in yellow).

2. Have the authors compared the similarity of negative set sequences with positive dataset?

We have not done the similarity of negative set sequences with positive dataset. The negative sequences are totally different from positive set sequences which were selected using keyword in the uniprot databases.

3. Amino acid profiles of aromatase and non-aromatase was found to be similar by authors (Fig 2B). Then how it has contributed in model development

Sorry, some of the residues patterns are similar, but not all. These differences can identify the aromatase from negative sequence by the developed models.

4. The web link given for the server could not accessed.

Due to some security reason the provided link was not working, now we have secured our site under (https). Now it can be accessed.

Attachment

Submitted filename: Response to reviewers.doc

Decision Letter 1

Avaniyapuram Kannan Murugan

10 Feb 2023

PONE-D-22-24198R1Computational method for aromatase-related proteins using machine learning approachPLOS ONE

Dear Dr. Kaur,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by 15 March 2023. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Avaniyapuram Kannan Murugan, M.Phil., Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: Dear Authors,

I see the major revisions that you have made so far, and I provided only basic suggestions previously and now providing the extensive suggestions. Kindly make the possible changes to make the manuscript good for the reader and future researchers.

1. Make a graphical abstract (Flow chart) for the complete study how you conceptualised to what you obtained followed by each method in depth. (https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=6386390_pcbi.1006730.g001.jpg)

2. Kindly make the source code available through GitHub with proper documentations.

3. The server link is not working, make sure your institute keep it live.

4. You have used different SVM approach-based models with amino acid, dipeptide composition, hybrid and evolutionary profiles to predict. This is good; however, the tool must work with it in background and in the front the user only supposed to see the results and if user wants there supposed to be a button to view the documentation how and why the SVM has classified them to be aromatase.

5. In methods- Provide the dataset in supplementary files and keep the search date to not make it irrelevant for future researchers.

6. The Introduction and discussion is not sufficiently written, it needed to be validated with any other expert, authors can read some more articles to understand how to write introduction as well as discussion.

I wish all the best to authors for the revision and hoping they will make it asap.

Reviewer #3: Dear Authors,

Thank you for re-submitting your work to PLOS ONE journal. All my comments and suggestions have been covered successfully.

Thank you

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: 

Reviewer #3: Yes: 

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Mar 29;18(3):e0283567. doi: 10.1371/journal.pone.0283567.r004

Author response to Decision Letter 1


28 Feb 2023

Dear Editor,

Thank you for providing us an opportunity to submit a revision of our manuscript. Below is a point-by-point response to the reviewer. All the changes in the revised manuscript are highlighted in yellow.

Reviewer 2:

1. Make a graphical abstract (Flow chart) for the complete study how you conceptualized to what you obtained followed by each method in depth. https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=6386390_pcbi.1006730.g001.jpg)

Yes, we have prepared a graphical abstract as per your guidance, which is now Figure No. 6.

2. Kindly make the source code available through GitHub with proper documentations.

It is really good to keep our web-tool in the GitHub and we will try it in our future studies. At present, our institute (CSIR-IMTECH) has provided all the facility to host our web-tool and it is also being a part of an open source portal.

3. The server link is not working, make sure your institute keep it live.

Yes, our server is on live at https://bioinfo.imtech.res.in/servers/muthu/aromatase/submit.html

4. You have used different SVM approach-based models with amino acid, dipeptide composition, hybrid and evolutionary profiles to predict. This is good; however, the tool must work with it in background and in the front the user only supposed to see the results and if user wants there supposed to be a button to view the documentation how and why the SVM has classified them to be aromatase.

Yes we agree, but it is not possible to show all the background running programs. Actually, it has a set of programs wrote in perl script. All the programs were kept in cgi-bin folder, which run only in the background and it is not for public viewing. The output results will be displayed in a new web page.

From documentation point of view, it is a good idea that the user can view the complete results. We will try it in our future studies.

5. In methods - Provide the dataset in supplementary files and keep the search date to not make it irrelevant for future researchers.

Yes, we have provided all the dataset as a supplementary file-1. The retrieval date is now mentioned in the dataset section (highlighted in yellow).

6. The Introduction and discussion is not sufficiently written, it needed to be validated with any other expert, authors can read some more articles to understand how to write introduction as well as discussion.

Yes, we have modified the Introduction and discussion as per your guidance (highlighted in yellow).

Attachment

Submitted filename: Response to reviewers-25.2.23.doc

Decision Letter 2

Avaniyapuram Kannan Murugan

13 Mar 2023

Computational method for aromatase-related proteins using machine learning approach

PONE-D-22-24198R2

Dear Dr. Kaur,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Avaniyapuram Kannan Murugan, M.Phil., Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: N/A

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Dear authors,

I went through the revisions, and found it suitable. I mark it for acceptance, rest it depends on the Editor's decision as the final.

Reviewer #3: Dear Authors,

Thank you for re-submitting your work to PLOS ONE journal. All comments and suggestions have been covered successfully.

Thank you

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: 

Reviewer #3: Yes: 

Acceptance letter

Avaniyapuram Kannan Murugan

17 Mar 2023

PONE-D-22-24198R2

Computational method for aromatase-related proteins using machine learning approach

Dear Dr. Kaur:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Avaniyapuram Kannan Murugan

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (DOC)

    Attachment

    Submitted filename: Response to reviewers.doc

    Attachment

    Submitted filename: Response to reviewers-25.2.23.doc

    Data Availability Statement

    All relevant data are within the paper and it Supporting information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES