Skip to main content
Computational and Mathematical Methods in Medicine logoLink to Computational and Mathematical Methods in Medicine
. 2022 Jan 12;2022:7493834. doi: 10.1155/2022/7493834

Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features

Mujiexin Liu 1, Hui Chen 2, Dong Gao 3, Cai-Yi Ma 3, Zhao-Yue Zhang 2,3,
PMCID: PMC8769816  PMID: 35069791

Abstract

Helicobacter pylori (H. pylori) is the most common risk factor for gastric cancer worldwide. The membrane proteins of the H. pylori are involved in bacterial adherence and play a vital role in the field of drug discovery. Thus, an accurate and cost-effective computational model is needed to predict the uncharacterized membrane proteins of H. pylori. In this study, a reliable benchmark dataset consisted of 114 membrane and 219 nonmembrane proteins was constructed based on UniProt. A support vector machine- (SVM-) based model was developed for discriminating H. pylori membrane proteins from nonmembrane proteins by using sequence information. Cross-validation showed that our method achieved good performance with an accuracy of 91.29%. It is anticipated that the proposed model will be useful for the annotation of H. pylori membrane proteins and the development of new anti-H. pylori agents.

1. Introduction

Helicobacter pylori (H. pylori) is a Gram-negative spiral-shaped bacterium that infects half of the human population worldwide. H. pylori causes gastric mucosa damage, chronic inflammation, and dysregulation of the gut community, increasing the risk of gastric cancer [13]. Attachment to the gastric mucosa is the first step in establishing bacterial colonization [4]. H. pylori membrane proteins such as antigen-binding adhesin (BabA), sialic acid-binding adhesin (SabA), outer inflammatory protein (OipA), and outer membrane protein Q (HopQ) can act as putative virulence factors that mediate the host-pathogen interactions, induce the release of inflammatory cytokines, and enhance the virulence property of the bacterium [46]. Thus, the identification of H. pylori membrane protein receptors contributes to the design of therapeutic drugs and vaccine development [7, 8].

Although H. pylori membrane proteins play a key role in attachment to and entry into host cells, only few have been described so far. There are some efforts in the prediction of membrane proteins [9, 10] for other germs like Mycobacterial [11] and Chlamydiae [12]. However, there are no machine learning-based approaches for the prediction of the H. pylori membrane proteins. In this study, we developed a comprehensive in silico approach for discriminating novel H. pylori membrane proteins using amino acid sequence-based criteria. First, the benchmark dataset was constructed based on a reliable source. Second, sequence-based feature encoding methods were used to represent protein sequences. Next, the incremental feature selection (IFS) technique with multiple feature ranking methods was applied to obtain the optimal feature set. Finally, a membrane protein prediction model was established based on the optimal feature set. The workflow can be seen in Figure 1.

Figure 1.

Figure 1

The workflow diagram of developing the H. pylori membrane protein prediction model.

2. Materials and Methods

2.1. Benchmark Dataset

An objective and strict benchmark dataset is fundamental for a robust prediction model construction [1318]. The Universal Protein Resource (UniProt) [19] is a comprehensive resource for proteins and can be freely accessed at https://www.uniprot.org/. The 382 H. pylori membrane protein sequences and 1111 nonmembrane protein sequences were obtained from the UniProt. If a sequence contains nonstandard letters, the sequence was removed from the dataset. To avoid the influence of sequence similarity [20], CD-HIT [21] with 0.3 sequence identity was used to exclude highly similar membrane proteins. Finally, 114 (29.8% of the original) membrane proteins and 219 (19.7% of the original) nonmembrane proteins remained in the benchmark dataset.

2.2. Feature Encoding

Generally, feature encoding plays a crucial role for machine learning in model construction [2228]. The feature encoding method determines the degree of sequence information mining. In this work, k-mer amino acid composition [2931], gapped k-mer method [32], and pseudo-amino acid composition (PseAAC) [3339] were used to formulate sequences.

Let the protein S be expressed as follows:

S=R1R2R3R4R5RiRi+1RL, (1)

where L denotes the length of the protein sequence and Ri is the i-th amino acid.

By using k-mer amino acid composition, a primary protein sequence S can be transferred into a vector Vk with 20k elements according to the following formula:

Vk=f1kmerf2kmerfikmerf20kkmerT, (2)

where the symbol T means the transposition of a vector and fik−mer is the normalized frequency of the i-th k-mer amino acid component occurring in S and can be calculated by

fikmer=nii=120kni=niLk+1, (3)

where ni means the number of occurrences of the i-th k-mer amino acid component in the sequence S.

With the increase of k, one protein sequence may have many k-mers absent, and its feature vector will contain a large number of zero values. To overcome this sparse problem, gapped k-mer (k-mer with g gap) was used. For example, “GG” with 3 gaps constitute the patterns “GNNNG,” where N represent any kind of amino acid. By using the gapped k-mer method, a primary protein sequence S can be transferred into a vector Vg with 20kg elements according to the following formula:

Vg=f1gkmerf2gkmerfigkmerf20kggkmerT, (4)

where the figk−mer is the normalized frequency of the i-th k-mer with g gap amino acid component occurring in S.

PseAAC can represent a protein sequence in a discrete model without completely losing its sequence-order information. A primary protein sequence S can be transferred into a vector Vp with PseAAC according to the following formula:

Vp=x1x20x20+1x20+λT, (5)
xi=fii=120fi+ωj=1λΘj,1i20,ωΘi20i=120fi+ωj=1λΘj,20+1i20+λ, (6)

where fi is the normalized frequency of i-th amino acid, and Θj is the j-th sequence correlation factor that can be calculated by the product of the six physicochemical property numerical values between amino acids at different positions. ω is the weight factor for short range and long range.

2.3. Feature Selection and Modeling

To exclude noise and improve computational efficiency, feature selection is an indispensable step [23, 4045]. Binomial distribution is one of the wonderful feature selection techniques that have been successfully applied in many works [4648]. The high binomial distribution score indicates that the presence of the k-mer amino acid in a membrane protein sequence is not accidental. Analysis of variance (ANOVA) tests the ratio of the variance between groups and the variance within the groups to analyse the differences among group means [30]. The high ANOVA score means there is a big feature difference between the membrane protein group and the nonmembrane protein group. In this study, binomial distribution was used on k-mer features, and ANOVA was used on gapped k-mer and PseAAC features to winnow out the irrelevant features. Then, ANOVA was used to reprune all the redundant features.

After ranking the features according to their statistical scores, the IFS strategy with support vector machine (SVM) was adopted to determine the optimal feature set [4953]. SVM is a classification algorithm that finds the optimal classification hyperplane in the high-dimensional feature space. The IFS strategy added features one by one to the feature set from a higher-ranked to a lower-ranked score. Once a new feature set was composed, LIBSVM [54] with 5-fold cross-validation was performed to train and test prediction models. The optimal feature set is defined based on the principle that the prediction model based on such features could achieve maximum accuracy. Finally, an SVM model was constructed based on the optimal feature subset for the membrane protein prediction.

2.4. Performance Evaluation Metrics

In order to assess the capability of the binary prediction method, six indexes, namely, accuracy (ACC), sensitivity (Sn), specificity (Sp), precision (Pre), Matthew's correlation coefficient (MCC), and the area under the receiver operating characteristic curve (AUC) [5560], were used and formulated as

ACC=TP+TNTP+TN+FP+FN, (7)
Sn=TPTP+FN, (8)
Sp=TNTN+FP, (9)
Pre=TPTP+FP, (10)
MCC=TP×TNFP×FNTP+FNTP+FPTN+FPTN+FN, (11)

where TP (true positive) and TN (true negative) present the numbers of correctly identified membrane proteins and nonmembrane proteins, respectively. FP (false positive) and FN (false negative) denote the number of nonmembrane proteins incorrectly classified as membrane proteins and the number of membrane proteins incorrectly classified as nonmembrane proteins, respectively. Receiver operating characteristics (ROC) analysis was used to measure the performance of the model with the varying decision thresholds [6163]. Due to the small sample size, the result of the 5-fold cross-validation was used to evaluate the model performance.

3. Results and Discussion

3.1. Feature Optimization

As shown in equations (3), (4), and (5), the description of the protein sequences depends on parameters k, g, ω, and λ. For k-mer feature encoding, k = 2, 3, 4 was tried in this study. The model achieved the best accuracy of 90.09% with the top 150 binomial distribution-ranked 2-mer features (Figure 2(a)). For gapped k-mer feature encoding, we set k = 2 and traverse g from 1 to 20, when g = 15, and the model achieved the best accuracy of 90.39% with the top 89 ANOVA-ranked features (Figure 2(b)). For PseAAC, we set the weight factor ω = 0.5 and parameter λ from 1 to 70 with step size 5, and the best performance achieved was 88.59% when the λ is 20 and feature number is 10 (Figure 2(c)). To represent the sequence information comprehensively, all best feature subsets were merged and ranked by ANOVA. IFS was performed again to filter out the redundant features. As we can see in Figure 2(d), the model achieved the best accuracy of 91.29% when the top 109 ANOVA-ranked features were used to train the model.

Figure 2.

Figure 2

The IFS curves for (a) 2-mer features, (b) gapped 2-mer features, (c) PseAAC features, and (d) merged features.

3.2. Model Construction and Evaluation

Finally, 109 features were used to construct the SVM-based model for the prediction of membrane proteins. And the soft margin SVM penalty coefficient c and Gaussian kernel function width parameter γ are 0.5.

To show the prediction capability of the final model, six evaluation metrics were calculated based on the result of the 5-fold cross-validation. The model achieved the  ACC of 91.29%, Sn of 82.46%, Sp of 95.9%, Pre of 91.26%, and MCC of 0.804. We also drew the ROC curve in Figure 3. It shows that the AUC reaches the value of 0.931, suggesting that the proposed model has an excellent prediction capability on membrane protein classification.

Figure 3.

Figure 3

The ROC curves of the 5-fold cross-validation test.

3.3. Amino Acid Composition (AAC) of Optimal Features

The AAC of the model features was used to analyse the preference of membrane proteins for specific amino acids. Among the optimal feature set, there are 83 2-mer features, 16 gapped 2-mer features, and 10 PseAAC features. Focusing on the 2-mer and gapped 2-mer features, we found that the occurrence of leucine (L), glutamic acid (E), aspartic acid (D), phenylalanine (F), valine (V), and histidine (H) exceeds 50% of the total (Figure 4(a)). And the frequencies of F, L, and V in membrane protein sequences are significantly higher than those in nonmembrane protein sequences (p < 0.001). In contrast, the frequencies of D, E, and H in nonmembrane protein sequences are significantly higher than those in membrane proteins (p < 0.001) (Figure 4(b)).

Figure 4.

Figure 4

(a) The heat map of AAC of the model features. (b) The frequency of the six amino acids in the two classes.

4. Conclusions

H. pylori membrane proteins are an important class of molecules that play key roles in host-pathogen interactions. However, it is a new area in the prediction of H. pylori membrane proteins with machine learning methods. Hence, we developed an H. pylori membrane proteins predictor on the basis of sequence-based information. The model will powerfully support the discovery of H. pylori membrane proteins and the research of H. pylori infection. It has the potential to be significant in novel vaccine candidate antigens and drug development [64, 65]. In the future, we will stay focused on the H. pylori membrane protein prediction issues and screen the possible vaccine candidates and drug targets. Moreover, we will collect more data to train a deep learning model [6671] to improve prediction performance.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (62102067).

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

References

  • 1.Li Z., Zhang T., Lei H., et al. Research on gastric cancer's drug-resistant gene regulatory network model. Current Bioinformatics . 2020;15(3):225–234. doi: 10.2174/1574893614666190722102557. [DOI] [Google Scholar]
  • 2.Cheng L., Qi C., Zhuang H., Fu T., Zhang X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Research . 2020;48(D1):D554–D560. doi: 10.1093/nar/gkz843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cheng L., Qi C., Yang H., et al. gutMGene: a comprehensive database for target genes of gut microbes and microbial metabolites. Nucleic Acids Research . 2021 doi: 10.1093/nar/gkab786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Matsuo Y., Kido Y., Yamaoka Y. Helicobacter pylori outer membrane protein-related pathogenesis. Toxins (Basel) . 2017;9(3):p. 101. doi: 10.3390/toxins9030101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ansari S., Kabamba E. T., Shrestha P. K., et al. Helicobacter pylori Bab characterization in clinical isolates from Bhutan, Myanmar, Nepal and Bangladesh. PLoS One . 2017;12(11, article e0187225) doi: 10.1371/journal.pone.0187225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Suganuma M., Kurusu M., Okabe S., et al. Helicobacter pylori membrane protein 1: a new carcinogenic factor of Helicobacter pylori. Cancer Research . 2001;61(17):6356–6359. [PubMed] [Google Scholar]
  • 7.Yamaoka Y., Ojo O., Fujimoto S., et al. Helicobacter pylori outer membrane proteins and gastroduodenal disease. Gut . 2006;55(6):775–781. doi: 10.1136/gut.2005.083014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yu L., Xia M., An Q. A network embedding framework based on integrating multiplex network for drug combination prediction. Briefings in Bioinformatics . 2021 doi: 10.1093/bib/bbab364. [DOI] [PubMed] [Google Scholar]
  • 9.Kabir M., Arif M., Ali F., Ahmad S., Swati Z. N. K., Yu D. J. Prediction of membrane protein types by exploring local discriminative information from evolutionary profiles. Analytical Biochemistry . 2019;564-565:123–132. doi: 10.1016/j.ab.2018.10.027. [DOI] [PubMed] [Google Scholar]
  • 10.Zuo Y. C., Su W. X., Zhang S. H., et al. Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure. Molecular BioSystems . 2015;11(3):950–957. doi: 10.1039/C4MB00681J. [DOI] [PubMed] [Google Scholar]
  • 11.Ding C., Yuan L. F., Guo S. H., Lin H., Chen W. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. Journal of Proteomics . 2012;77:321–328. doi: 10.1016/j.jprot.2012.09.006. [DOI] [PubMed] [Google Scholar]
  • 12.Heinz E., Tischler P., Rattei T., Myers G., Wagner M., Horn M. Comprehensive in silico prediction and analysis of chlamydial outer membrane proteins reflects evolution and life style of the Chlamydiae. BMC Genomics . 2009;10:p. 634. doi: 10.1186/1471-2164-10-634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang D., Chen H.-D., Zulfiqar H., et al. iBLP: an XGBoost-based predictor for identifying bioluminescent proteins. Computational and Mathematical Methods in Medicine . 2021;2021 doi: 10.1155/2021/6664362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Su W., Liu M. L., Yang Y. H., et al. PPD: a manually curated database for experimentally verified prokaryotic promoters. Journal of Molecular Biology . 2021;433(11, article ???) doi: 10.1016/j.jmb.2021.166860. [DOI] [PubMed] [Google Scholar]
  • 15.Cheng L., Hu Y., Sun J., Zhou M., Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics . 2018;34(11):1953–1956. doi: 10.1093/bioinformatics/bty002. [DOI] [PubMed] [Google Scholar]
  • 16.Wei L., He W., Malik A., Su R., Cui L., Manavalan B. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Briefings in Bioinformatics . 2021;22(4) doi: 10.1093/bib/bbaa275. [DOI] [PubMed] [Google Scholar]
  • 17.Hasan M. M., Alam M. A., Shoombuatong W., Deng H. W., Manavalan B., Kurata H. NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Briefings in Bioinformatics . 2021;22(6) doi: 10.1093/bib/bbab167. [DOI] [PubMed] [Google Scholar]
  • 18.Charoenkwan P., Nantasenamat C., Hasan M. M., Manavalan B., Shoombuatong W. Bert4bitter: a bidirectional encoder representations from transformers (Bert)-based model for improving the prediction of bitter peptides. Bioinformatics . 2021;37(17):2556–2562. doi: 10.1093/bioinformatics/btab133. [DOI] [PubMed] [Google Scholar]
  • 19.UniProt C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research . 2021;49(D1):D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zou Q., Lin G., Jiang X., Liu X., Zeng X. Sequence clustering in bioinformatics: an empirical study. Briefings in Bioinformatics . 2020;21(1):1–10. doi: 10.1093/bib/bby090. [DOI] [PubMed] [Google Scholar]
  • 21.Fu L., Niu B., Zhu Z., Wu S., Li W. Cd-Hit: accelerated for clustering the next-generation sequencing data. Bioinformatics . 2012;28(23):3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zulfiqar H., Sun Z. J., Huang Q. L., et al. Deep-4mCW2V: a sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods . 2021 doi: 10.1016/j.ymeth.2021.07.011. [DOI] [PubMed] [Google Scholar]
  • 23.Zhang D., Xu Z. C., Su W., et al. PROBselect: accurate prediction of protein-binding residues from proteins sequences via dynamic predictor selection. Bioinformatics . 2020;36(Supplement_2):i735–i744. doi: 10.1093/bioinformatics/btaa806. [DOI] [PubMed] [Google Scholar]
  • 24.Yang H., Luo Y., Ren X., et al. Risk prediction of diabetes: big data mining with fusion of multifarious physical examination indicators. Information Fusion . 2021;75:140–149. doi: 10.1016/j.inffus.2021.02.015. [DOI] [Google Scholar]
  • 25.Long J., Yang H., Yang Z., et al. Integrated biomarker profiling of the metabolome associated with impaired fasting glucose and type 2 diabetes mellitus in large-scale Chinese patients. Clinical and Translational Medicine . 2021;11(6, article e432) doi: 10.1002/ctm2.432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lv H., Dao F. Y., Guan Z. X., Yang H., Li Y. W., Lin H. Landscape of cancer diagnostic biomarkers from specifically expressed genes. Briefings in Bioinformatics . 2020;21(6):2175–2184. doi: 10.1093/bib/bbz131. [DOI] [PubMed] [Google Scholar]
  • 27.Yu L., Wang M., Yang Y., et al. Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways. PLoS Computational Biology . 2021;17(2, article e1008696) doi: 10.1371/journal.pcbi.1008696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chen X. G., Shi W. W., Deng L. Prediction of disease comorbidity using HeteSim scores based on multiple heterogeneous networks. Current Gene Therapy . 2019;19(4):232–241. doi: 10.2174/1566523219666190917155959. [DOI] [PubMed] [Google Scholar]
  • 29.Liu M. L., Su W., Wang J. S., Yang Y. H., Yang H., Lin H. Predicting preference of transcription factors for methylated DNA using sequence information. Mol Ther Nucleic Acids . 2020;22:1043–1050. doi: 10.1016/j.omtn.2020.07.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tang H., Zhao Y. W., Zou P., et al. HBPred: a tool to identify growth hormone-binding proteins. International Journal of Biological Sciences . 2018;14(8):957–964. doi: 10.7150/ijbs.24174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zuo Y., Li Y., Chen Y., Li G., Yan Z., Yang L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics . 2017;33(1):122–124. doi: 10.1093/bioinformatics/btw564. [DOI] [PubMed] [Google Scholar]
  • 32.Tan J. X., Li S. H., Zhang Z. M., et al. Identification of hormone binding proteins based on machine learning methods. Mathematical Biosciences and Engineering . 2019;16(4):2466–2480. doi: 10.3934/mbe.2019123. [DOI] [PubMed] [Google Scholar]
  • 33.Zheng L., Liu D., Yang W., Yang L., Zuo Y. Location deviations of DNA functional elements affected SNP mapping in the published databases and references. Briefings in Bioinformatics . 2020;21(4):1293–1301. doi: 10.1093/bib/bbz073. [DOI] [PubMed] [Google Scholar]
  • 34.Zheng L., Huang S., Mu N., et al. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. Database: The Journal of Biological Databases and Curation . 2019;2019 doi: 10.1093/database/baz131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Cao Y. Y., Yu C. L., Huang S. H., Wang S. Y., Zuo Y. C., Yang L. Characterization and prediction of presynaptic and postsynaptic neurotoxins based on reduced amino acids and biological properties. Current Bioinformatics . 2021;16(3):364–370. doi: 10.2174/1574893615999200707150512. [DOI] [Google Scholar]
  • 36.Shen H. B., Chou K. C. PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry . 2008;373(2):386–388. doi: 10.1016/j.ab.2007.10.012. [DOI] [PubMed] [Google Scholar]
  • 37.Naseer S., Hussain W., Khan Y. D., Rasool N. Sequence-based identification of arginine amidation sites in proteins using deep representations of proteins and PseAAC. Current Bioinformatics . 2021;15(8):937–948. doi: 10.2174/1574893615666200129110450. [DOI] [Google Scholar]
  • 38.Hasan M. A. M., Ben Islam M. K., Rahman J., Ahmad S. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Current Bioinformatics . 2020;15(3):235–245. doi: 10.2174/1574893614666191202152328. [DOI] [Google Scholar]
  • 39.Amanat S., Ashraf A., Hussain W., Rasool N., Khan Y. D. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Current Bioinformatics . 2020;15(5):396–407. doi: 10.2174/1574893614666190723114923. [DOI] [Google Scholar]
  • 40.Han X., Kong Q., Liu C., Cheng L., Han J. Subtypedrug: a software package for prioritization of candidate cancer subtype-specific drugs. Bioinformatics . 2021;37(16):2491–2493. doi: 10.1093/bioinformatics/btab011. [DOI] [PubMed] [Google Scholar]
  • 41.Sheng Y., Jiang Y., Yang Y., et al. Selecting gene features for unsupervised analysis of single-cell gene expression data. Briefings in Bioinformatics . 2021;22(6) doi: 10.1093/bib/bbab295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Yang W., Zhu X. J., Huang J., Ding H., Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Current Bioinformatics . 2019;14:234–240. doi: 10.2174/1574893613666181113131415. [DOI] [Google Scholar]
  • 43.He S., Guo F., Zou Q., Ding H. MRMD2.0: a Python tool for machine learning with feature ranking and reduction. Current Bioinformatics . 2021;15(10):1213–1221. doi: 10.2174/1574893615999200503030350. [DOI] [Google Scholar]
  • 44.Wu X., Yu L. EPSOL: sequence-based protein solubility prediction using multidimensional embedding . Oxford, England: Bioinformatics; 2021. [DOI] [PubMed] [Google Scholar]
  • 45.Li J. W., Wang X. Y., Li N., et al. Feasibility of mesenchymal stem cell therapy for Covid-19: a mini review. Current Gene Therapy . 2020;20(4):285–288. doi: 10.2174/1566523220999200820172829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zhang Z. Y., Yang Y. H., Ding H., Wang D., Chen W., Lin H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Briefings in Bioinformatics . 2021;22(1):526–535. doi: 10.1093/bib/bbz177. [DOI] [PubMed] [Google Scholar]
  • 47.Feng C. Q., Zhang Z. Y., Zhu X. J., et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics . 2019;35(9):1469–1477. doi: 10.1093/bioinformatics/bty827. [DOI] [PubMed] [Google Scholar]
  • 48.Wang H., Liang P., Zheng L., Long C., Li H., Zuo Y. Correction to: ncDLRES: a novel method for non-coding RNAs family prediction based on dynamic LSTM and ResNet. Bioinformatics . 2021;22(1) doi: 10.1186/s12859-021-04495-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Dao F. Y., Lv H., Wang F., et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics . 2019;35(12):2075–2083. doi: 10.1093/bioinformatics/bty943. [DOI] [PubMed] [Google Scholar]
  • 50.Ao C., Yu L., Zou Q. Prediction of bio-sequence modifications and the associations with diseases. Briefings in Functional Genomics . 2021;20(1):1–18. doi: 10.1093/bfgp/elaa023. [DOI] [PubMed] [Google Scholar]
  • 51.Basith S., Lee G., Manavalan B. Stallion: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction. Briefings in Bioinformatics . 2021 doi: 10.1093/bib/bbab376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Basith S., Hasan M. M., Lee G., Wei L., Manavalan B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Briefings in Bioinformatics . 2021;22(6) doi: 10.1093/bib/bbab252. [DOI] [PubMed] [Google Scholar]
  • 53.Hasan M. M., Schaduangrat N., Basith S., Lee G., Shoombuatong W., Manavalan B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics . 2020;36(11):3350–3356. doi: 10.1093/bioinformatics/btaa160. [DOI] [PubMed] [Google Scholar]
  • 54.Chang C. C., Lin C. J. LIBSVM. ACM transactions on intelligent systems and technology . 2011;2(3):1–27. doi: 10.1145/1961189.1961199. [DOI] [Google Scholar]
  • 55.Manavalan B., Shin T. H., Lee G. PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Frontiers in Microbiology . 2018;9:p. 476. doi: 10.3389/fmicb.2018.00476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Tang H., Cao R. Z., Wang W., Liu T. S., Wang L. M., He C. M. A two-step discriminated method to identify thermophilic proteins. International Journal of Biomathematics . 2017;10(4):p. 1750050. doi: 10.1142/S1793524517500504. [DOI] [Google Scholar]
  • 57.Cheng L., Shi H., Wang Z., et al. IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity. Oncotarget . 2016;7(30):47864–47874. doi: 10.18632/oncotarget.10012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mo F., Luo Y., Fan D. A., et al. Integrated analysis of mRNA-seq and miRNA-seq to identify c-MYC, YAP1 and miR-3960 as major players in the anticancer effects of caffeic acid phenethyl ester in human small cell lung cancer cell line. Current Gene Therapy . 2020;20(1):15–24. doi: 10.2174/1566523220666200523165159. [DOI] [PubMed] [Google Scholar]
  • 59.Govindaraj R. G., Subramaniyam S., Manavalan B. Extremely-randomized-tree-based prediction of N(6)-methyladenosine sites in saccharomyces cerevisiae. Current Genomics . 2020;21(1):26–33. doi: 10.2174/1389202921666200219125625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Basith S., Manavalan B., Hwan Shin T., Lee G. Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening. Medicinal Research Reviews . 2020;40(4):1276–1314. doi: 10.1002/med.21658. [DOI] [PubMed] [Google Scholar]
  • 61.Metz C. E. Basic principles of ROC analysis. Seminars in Nuclear Medicine . 1978;8(4):283–298. doi: 10.1016/S0001-2998(78)80014-2. [DOI] [PubMed] [Google Scholar]
  • 62.Lv H., Dao F. Y., Zulfiqar H., Lin H. DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of Sars-Cov-2 infection using a deep learning-based approach. Briefings in Bioinformatics . 2021;22(6) doi: 10.1093/bib/bbab244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.An Q., Yu L. A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Briefings in Bioinformatics . 2021;22(6) doi: 10.1093/bib/bbab275. [DOI] [PubMed] [Google Scholar]
  • 64.Liu D., Li G., Zuo Y. Function determinants of Tet proteins: the arrangements of sequence motifs with specific codes. Briefings in Bioinformatics . 2019;20(5):1826–1835. doi: 10.1093/bib/bby053. [DOI] [PubMed] [Google Scholar]
  • 65.Xu B. F., Liu D. Y., Wang Z. R., Tian R. X., Zuo Y. C. Multi-substrate selectivity based on key loops and non-homologous domains: new insight into ALKBH family. Cellular and Molecular Life Sciences . 2021;78(1):129–141. doi: 10.1007/s00018-020-03594-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Wang D., Zhang Z., Jiang Y., et al. DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism. Nucleic Acids Research . 2021;49(8, article e46) doi: 10.1093/nar/gkab016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Dao F. Y., Lv H., Su W., Sun Z. J., Huang Q. L., Lin H. iDHS-Deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network. Briefings in Bioinformatics . 2021;22(5) doi: 10.1093/bib/bbab047. [DOI] [PubMed] [Google Scholar]
  • 68.Zhang Y., Yan J., Chen S., et al. Review of the applications of deep learning in bioinformatics. Current Bioinformatics . 2020;15(8):898–911. [Google Scholar]
  • 69.Cui F., Zhang Z., Zou Q. Sequence representation approaches for sequence-based protein prediction tasks that use deep learning. Briefings in Functional Genomics . 2021;20(1):61–73. doi: 10.1093/bfgp/elaa030. [DOI] [PubMed] [Google Scholar]
  • 70.Peng X., Chen L., Zhou J.-P. Identification of carcinogenic chemicals with network embedding and deep learning methods. Current Bioinformatics . 2021;15(9):1017–1026. [Google Scholar]
  • 71.Lv Z. B., Ao C. Y., Zou Q. Protein function prediction: from traditional classifier to deep learning. Proteomics . 2019;19(14):p. 2. doi: 10.2174/1574893615999200414084317. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.


Articles from Computational and Mathematical Methods in Medicine are provided here courtesy of Wiley

RESOURCES