Skip to main content
PLOS One logoLink to PLOS One
. 2024 Oct 4;19(10):e0311301. doi: 10.1371/journal.pone.0311301

Comprehensive in silico analyses of fifty-one uncharacterized proteins from Vibrio cholerae

Sritapa Basu Mallick 1, Sagarika Das 1,¤, Aravind Venkatasubramanian 1, Sourabh Kundu 2, Partha Pratim Datta 1,*
Editor: Rajesh Kumar Pathak3
PMCID: PMC11452002  PMID: 39365770

Abstract

Due to the rise of multidrug-resistant strains of Vibrio cholerae and the recent cholera outbreaks in African and Asian nations, it is imperative to identify novel therapeutic targets and possible vaccine candidates. In this regard, this work primarily aims to identify and characterize new antigenic molecules using comparative RNA sequencing data and label-free proteomics data, carried out with essential GTPase cgtA knockdown and wild-type strain of V. cholerae. We identified hitherto 51 characterized proteins from high-throughput RNA-sequencing and proteomics data. This work involved the assessment of their physicochemical characteristics, subcellular localization, solubility, structures, and functional annotations. In addition, the immunoinformatic and reverse vaccinology technique was used to find new vaccine targets with high antigenicity, low allergenicity, and low toxicity profiles. Among the 51 proteins, 24 were selected based on their immunogenic profiles to identify B/T-cell epitopes. In addition, 20 prospective therapeutic targets were identified using virulence predictions and related investigations. Furthermore, two proteins, UniProt ID- Q9KRD2 and Q9KU58, with molecular weight of 92kDa and 12kDa, respectively, were chosen for cloning and expression towards in vitro biochemical characterization based on their range of expression patterns, high antigenic, low allergenic, and low toxicity properties. In conclusion, we believe that this study will reveal new facets and avenues for drug discovery and put us a step forward toward novel therapeutic interventions against the deadly disease of cholera.

Introduction

V. cholerae is a pathogenic Gram-negative bacterium that is known to be the etiological driver of the deadly diarrheal disease cholera. Approximately 1.3–4 million people are estimated to be affected annually, and around 95,000 people die annually in 51 endemic countries due to this fatal disease [1]. Instances of cholera have been observed globally and etiologically tied to the consumption of contaminated water or seafood, which is often infested with a cocktail of various strains of Vibrio spp., such as V. cholerae, V. vulnificus, and V. parahaemolyticus [2]. This, coupled with the emergence of antibiotic-resistant variants of cholera [3], has driven cholera research worldwide, including in developed countries. Since we dwell in an era of multidrug resistance (MDR), there is the utmost need to explore and study new potential drug targets.

CgtA is a well-studied potential drug target [4] and is an essential ribosome-associated GTPase that is known to play a monumental role in various cellular functions, such as cell growth and DNA replication [5], chromosome partitioning [6], DNA repair [7], ribosome biogenesis [810] and stringent response [11]. However, the mechanism by which CgtA exerts its pleiotropic effect is still unknown. In one of our recent studies [12], high-throughput techniques like RNA-seq and labeled-free proteomics were used to conduct a comparative study between cgtA knockdown and wild-type Vibrio cholerae strain, in which we identified several uncharacterized proteins whose expression patterns were seen to be significantly altered when cgtA was knocked down from the V. cholerae genome. Understanding these biochemically, structurally and functionally uncharacterized proteins can pave the way towards mechanistic insights of how this essential GTPase, CgtA, exerts its pleotropic effect. Hence, we carried out a comprehensive in silico analysis of these uncharacterized proteins, which allowed us to predict their physicochemical and immunogenic properties and hypothesize and design various in vitro and in vivo experiments to characterize these proteins, which will lead us to understand the basis of cholera pathogenesis at a deeper level. Also, we have successfully identified a number of potential drug-targets and vaccine candidates among those 51 uncharacterized proteins that will facilitate the production of various vaccine constructs and drugs through in vivo and in vitro experiments against the Vibrio cholerae pathogen. Recently, a similar study has also been carried out using a reverse vaccinology study in V. parahaemolyticus [13].

In addition, we have also cloned and expressed two uncharacterized proteins, which validated our in silico and bioinformatics studies. Fig 1 depicts the summary and strategic pipeline of our objectives, mission, and approaches adopted to fulfil our mission. Our study will undoubtedly reveal new facets of cholera pathogenesis and will lead us to a step forward toward novel therapeutic interventions against this disease.

Fig 1. Schematic workflow of the comprehensive in silico study of 51 uncharacterized and hypothetical proteins: The blue-colored boxes indicate the methodology and approaches that has been followed in the study.

Fig 1

The green-colored boxes indicate the tools that have been executed for its corresponding approach.

Materials and methods

Sequence retrieval, and data collection

V. cholerae serotype O1 (strain ATCC 39315/El Tor Inaba N16961) was used in the present study. The complete genome sequence of V. cholerae serotype O1 was first reported in 2000 [14]. RNA-seq data that was used for this study were collected from the European Nucleotide Archive (ENA) at EMBL-EBI with BioProject accession number PRJEB53015 (SRA accession number ERP137772) (https://www.ebi.ac.uk/ena/browser/view/PRJE). Furthermore, the mass spectrometry proteomics data were collected from the ProteomeXchange Consortium with the data set identifier PXD034015.

Quality control and transcriptomics data analysis

The quality of the RNA-seq data generated by Das et al., 2023 [12] was checked using FastQC and MultiQC software. The data were checked for base call quality distribution, percentage of bases above Q20 and Q30, %GC, and sequencing adapter contamination. All the samples passed the quantity control (QC) threshold (Q20 > 95%). The raw sequence reads were processed to remove adapter sequences and low-quality bases using fastp. The QC-passed reads were mapped onto the indexed V. cholerae reference genome (O1 biovar El Tor strain N16961) using the Bowtie 2 aligner. On average, 98.55% of the reads were aligned to the reference genome. Gene-level expression values were obtained as read counts using featureCounts software. Differential expression analysis was carried out using the DESeq2 package. Thread counts were normalized (variance-stabilized normalized counts) using DESeq2, and differential enrichment analysis was performed. The test sample was compared to the control sample. Genes with an absolute log2 fold change of ≥1 and a P value of ≤0.05 were considered significant. The p-value and the fold change are noted in S1 Table.

Descriptive statistics of the differentially regulated proteins identified by label-free proteomics

A label-free proteomics study was performed according to Das et al., 2023 [12]. The proteins identified by label-free proteomics were statistically validated by ANOVA. ANOVA was used to validate whether the means of more than two groups were significantly different from each other. After the ANOVA, the significantly altered proteins (up- and downregulated) were identified by the p-value. The list of altered proteins was further sorted based on the fold change (abundance ratio). The fold change refers to the ratio of the abundance of a protein under the mutant condition to that under the wild-type condition. A less stringent cutoff was applied to capture more altered proteins. The cutoff value for sorting proteins based on the fold change in the number of significantly upregulated proteins was ≥1.2 (log2(1.2) = 0.26). However, for the significantly downregulated proteins, the cutoff value for fold change was ≤0.83 [log2(0.83) = −0.26]. On the logarithmic scale, the significantly altered proteins with a log2 fold change between +0.26 and −0.26 were not considered for further analysis.

Physicochemical properties and functional characterizations

The ProtParam web server by ExPASy (https://web.expasy.org/protparam/) [15] was used to identify the physicochemical characteristics of the uncharacterized proteins such as molecular weight, theoretical isoelectric point (pI), molar extinction coefficient, instability index, aliphatic index, grand average hydropathy (GRAVY), total number of positively charged (Arg+Lys) and negatively charged (Asp+Glu) residues based on their amino acid sequences (S2 Table and Fig 2A) and amino acid composition profile (%) (S3 Table). The molar extinction coefficient is the measure of the amount of light that proteins absorb at a specific wavelength. A high molar extinction coefficient value indicates the presence of a high concentration of cysteine, tryptophan, and tyrosine in the candidate proteins. The instability index provides an estimation of the stability of a protein in a test tube. A protein whose instability index is greater than 40 is predicted to be instable in solution. The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chain amino acids like alanine, valine, leucine, and isoleucine. It may be considered as a positive factor for the enhancement of thermo-stability of globular protein. A high aliphatic index indicates that a protein is thermo-stable over a wide temperature range. The GRAVY score for a protein is calculated as the sum of the hydropathy values of all of the amino acids divided by the number of residues in the query sequence. A low GRAVY value indicates the possibility of a protein being a globular or hydrophilic protein rather than membranous. A comprehensive analysis of these physicochemical properties will provide us an insight of the possible biological functions of these uncharacterized proteins. In addition, the predicted traits and properties like instability index and solubility will allow us to design effective strategies to express and purify these proteins for downstream biochemical and functional characterization.

Fig 2.

Fig 2

The alteration in the expression pattern of transcripts and proteins in CgtA-depleted condition: A. mRNA abundance of VC_A0032 (UniProt ID- Q9KND3) shows significant downregulation in the expression of transcript in cgtA knockdown strain of Vibrio cholerae. B. Protein expression of Q9KND3 was significantly reduced when cgtA was knocked down from the genome of V. cholerae C. The mRNA abundance of VC_A0010 (UniProt ID- Q9KNF4) depicts significant downregulation in CgtA depleted condition in V. cholerae. D. Protein expression levels of Q9KNF4 were downregulated in cgtA knockdown strain of V. cholerae E. The expression of mRNA from VC_0689 (UniProt ID- Q9KLQ3) was significantly reduced in cgtA knockdown strain of V. cholerae. F. The expression of protein, Q9KLQ3 was observed to be significantly reduced in CgtA-depleted condition in V. cholerae. G. mRNA production from VC_2326 (UniProt ID- Q9KPP0) was significantly downregulated in cgtA knockdown strain of V. cholerae H. Protein expression of Q9KPP0 was significantly reduced in cgtA knockdown strain of V. cholerae I. mRNA production from the gene VC_1607 (UniProt ID-Q9KRM9) was significantly reduced in cgtA knockdown strain of V. cholerae and J. Protein expression of Q9KRM9 was observed to be reduced in CgtA-depleted condition of V. cholerae K. mRNA production of VC_A0185 (UniProt ID- Q9KKL8) and L. protein expression of Q9KKL8 were significantly downregulated in cgtA knockdown strain of V. cholerae M. mRNA expression from VC_A0026 (Q9KND9) and N. protein expression of Q9KND9 were downregulated in CgtA depleted condition in V. cholerae O. mRNA production from VC_1150 (UniProt ID-Q9KSV6) and P. protein expression of Q9KSV6 were downregulated in CgtA depleted condition in V. cholerae Q. mRNA expression from VC_A0004 (UniProt ID- Q9KNG0) and R. protein expression of Q9KNG0 were downregulated in cgtA knockdown strain of V. cholerae.

Prediction of subcellular localization

The subcellular localization of each uncharacterized protein was predicted by PSORTb version 3.0.2 (http://www.psort.org/psortb/). PSORTb is the most precise web-based tool used for the prediction of subcellular localization in bacterial protein sequences by submitting one or more Gram-positive or Gram-negative bacterial sequences or archaeal sequences in FASTA format [16]. A score is assigned for every localization site which reflects the confidence level of the prediction. A score, higher than the cut-off value of 7.5 indicates a strong confidence in the predicted localization. Further, the results were confirmed by PSLpred (https://webs.iiitd.edu.in/raghava/pslpred/submit.html), which accurately predicts the subcellular localization of uncharacterized proteins based on a hybrid approach that integrates PSI_BLAST and three SVM based on physicochemical properties, residue composition and dipeptide [17]. The overall accuracy of the prediction is expressed in terms of a percentage (S4 Table).

Identification of protein motifs, domains, and families

The uncharacterized proteins were classified into families and predicted for domains using the InterPro server (https://www.ebi.ac.uk/interpro/) [18]. InterPro utilizes prediction models derived from multiple databases within the InterPro consortium to classify proteins according to their distinctive characteristics. The results generated by the Interpro database were validated using SMART (http://smart.embl-heidelberg.de/) [19], a tool that extensively annotates a protein domain based on functional class, phyletic distributions, tertiary structures, and functionally important residues. We further validated the results using PROSITE (https://prosite.expasy.org/) [20], a large database of protein domains, families, and functional sites that identifies conserved sequences and functional site within candidate proteins, which is crucial for understanding their structure and function (S5 Table).

Assessment of hydrophobic and hydrophilic regions

The solubility of uncharacterized proteins was assessed using SOSUI (https://harrier.nagahama-i-bio.ac.jp/sosui/) [21], DeepTHMM (https://dtu.biolib.com/DeepTMHMM) [22], and HMMTOP (http://www.enzim.hu/hmmtop/) [23]. The SOSUI system could differentiate membrane proteins from soluble proteins and predict transmembrane helices with 99% and 97% accuracy, respectively. Additionally, DeepTMHMM, a deep learning protein language model-based system, accurately detects and predicts alpha-helical and beta-barrel protein structures. We also employed HMMTOP to validate our results by using the hidden Markov model to predict transmembrane protein structure and topological information based on amino acid composition. Additionally, ExPASy’s ProtScale (https://web.expasy.org/protscale/) was utilized to calculate Kyte-Doolittle Hydropathy Plots [24]. The plots revealed the hydrophobic and hydrophilic areas of 51 uncharacterized proteins (S1 Fig). The plots allowed us to determine the hydrophobic and hydrophilic regions within the three-dimensional structure of 51 uncharacterized and hypothetical proteins. Plots above 0 (zero) in the graph indicate the hydrophobic region and plots below 0 (zero) indicate the hydrophilic regions within a protein. The results are summarized in S6 Table.

Determination of putative biological processes and molecular functions of candidate proteins

The potential molecular functions, biological processes, and cellular compartments associated with the 51 uncharacterized proteins were analyzed using the PFP (Protein Function Prediction) tool (http://kiharalab.org/web/pfp.php) [25] (S7 Table). The protein function prediction tool is an automated method that uses the extended similarity group (ESG) algorithm to forecast the potential biological and molecular activities of proteins whose cellular roles are unknown. Further, “Argot2” (Annotation Retrieval of Gene Ontology Terms) (http://www.medcomp.medicina.unipd.it/Argot2/) [26] was used to validate the results of PFP, which generates output in the form of Gene Ontology (GO) annotations (S8 Table).

Protein‒protein interaction prediction

Using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING version 11.5) (http://string-db.org/), the protein interactions were predicted. This database contains predictions for protein interactions with 14094 species and 67.6 million proteins. The genetic background, curated databases, high-throughput studies, conserved expression, and prior knowledge were the sources of the connections, which comprised both functional and physical relationships [27]. Results were shown with protein scores greater than 0.444 (S9 Table and S2 Fig).

Prediction of secondary structures and disordered regions

PSIPRED Protein Analysis Workbench (http://bioinf.cs.ucl.ac.uk/psipred/) is a tool used for determining and predicting the secondary structure of uncharacterized proteins [28]. It includes the ability to connect to GenTHREADER for protein fold identification and MEMSAT-2 for transmembrane topology prediction. The secondary structures of 51 uncharacterized proteins are shown in S5 Fig. Further, DISOPRED Workbench, available at http://bioinf.cs.ucl.ac.uk/psipred/, was utilized to identify the disordered area of a protein. The service receives a solitary protein amino acid sequence in FASTA format as input and provides the likelihood of the amino acid in the sequence being disordered as output. The threshold chance for an amino acid residue to create a disordered area is 0.5 (S4 Fig).

Three-dimensional protein structure construction and validation

3-dimensional structures of proteins were constructed with the aid of Alphafold2-mmseqs2 [29]. The model’s quality was evaluated through the use of pLDDT and PAE graphs. The 3-dimesnsional model structures of candidate proteins were validated using ProSA (https://prosa.services.came.sbg.ac.at/prosa.php) [30] and PROCHECK (https://www.ebi.ac.uk/thornton-srv/software/PROCHECK/) [31] (S5 Fig). The PROCHECK tool evaluates the overall geometry of a model by analyzing the geometry of each residue. It provides insight into the stereochemical quality of the predicted model, indicating whether the residues fall within the most favored regions, additionally allowed regions, generously allowed regions, or disallowed regions. Overall, a 3D structural model is considered to be of good quality if 90% of the residues are present in the most favored regions.

Bacterial strains, plasmids, and culture conditions

V. cholerae and Escherichia coli strains were routinely grown aerobically in Luria–Bertani (LB) medium (10g/L NaCl, 10g/L Tryptone and 5g/L Yeast Extract, pH = 7.0) at 37°C. The agar medium contained 1.5% (wt/vol) agar, except for the motility studies, where the agar concentration was 0.3% and 0.5% (wt/vol). V. cholerae was grown for the genomic DNA isolation required for PCR amplification. The pET28a (+) vector was used for cloning the uncharacterized proteins. The Escherichia coli DH5-alpha strain was used for amplifying and screening clones. However, Escherichia coli BL21 (pLyss) cells were used for the induction and expression of proteins.

Cloning, expression, and purification of uncharacterized proteins

Two full-length genes expressing uncharacterized proteins were cloned into the expression vector pET28a (+). The two proteins (a large 92kDa and a small 12kDa protein) were selected based on their ability to be potential vaccine candidates as predicted by in silico studies. The forward and reverse primers for amplifying the protein (UniProt ID-Q9KRD2) were TAAGCAGGATCCATGAGTGTGAATGTATCAACCGT and TAAGCACTCGAGTCAACTCGCTAAATAAGCGAGCA, respectively. The forward and reverse primers for amplifying the proteins (UniProt ID- Q9KU58) were TAAGCAGGATCCGTGTCTTCTGACTTTTCCCT and TAAGCACTCGAGTTACGTCGGTATTCGCG, respectively. The restriction sites for BamHI and XhoI were added for the overhang production necessary for cloning purposes. The gene was subsequently cloned and inserted into the pET28a (+) vector in such a way that the N-terminal His tag was added. Protein production was carried out in LB medium via IPTG induction (1 mM final concentration) at 37°C and 160 rpm. E. coli cells harboring amplified gene products were subsequently harvested, resuspended in lysis buffer (20 mM Tris/HCl pH 8, 200 mM NaCl, and 20 mM imidazole) supplemented with 1mM EDTA-free protease inhibitor cocktail, and lysed using a sonicator (pulse on 5 seconds, pulse off on 20 seconds, amplitude 60%). The lysed cells were centrifuged at 11000 × g at 4°C. The inclusion bodies pellet was washed with washing buffer (1X PBS, 1% Triton-X-100, 1M Urea) and was dissolved and stored in 1X PBS and 2M urea at -20°C overnight. The solubilized inclusion body was then purified using a Ni-NTA column. The expression of the proteins was further validated by performing western blotting using an anti-His tag antibody.

Antigenicity, allergenicity, toxicity, and virulence of proteins

The antigenicity properties of 51 uncharacterized proteins were assessed using VaxiJen (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) [32], with a cutoff score of 0.4. Moreover, the allergenicity of each uncharacterized protein was evaluated using the AllerTOP v.2.0 tool (https://www.ddg-pharmfac.net/AllerTOP/) [33]. Additionally, the toxicity of the candidate proteins was determined using the ToxinPred 2 tool (https://webs.iiitd.edu.in/raghava/toxinpred2/batch.html) [34], which employs a hybrid machine learning technique with a cutoff threshold of 0.6 (S10 Table). The virulence of a protein was predicted using the VirulentPred (http://bioinfo.icgeb.res.in/virulent/) [35] tool, which utilizes the SVM method to predict protein virulence (S11 Table).

B-cell epitopes prediction

Linear B-cell epitope prediction was performed using BepiPred-3.0 (http://tools. The iedb.org/bcell/) [36]. It utilizes a cutoff value of > 0.5 to predict linear B-cell epitopes based on protein language models (LMs). Subsequently, BcePred (https://webs.iiitd.edu.in/raghava/bcepr ed/bcepred_submission.html) [37], which predicts epitopes based on physicochemical properties like hydrophilicity, polarity and surface properties, was also employed to predict the linear B-cell epitopes (S12 Table and S6 Fig). The prediction of discontinuous B-cell epitopes for the identified uncharacterized protein was performed using DiscoTope (http://tools.iedb.org/discotope/help/) [38]. The prediction was based on the 3D structures of proteins in PDB format (S13 Table and S7 Fig).

Determination of T-cell epitopes

The NetCTL (http://www.cbs.dtu.dk/services/NetCTL/) servers were used to predict the T-cell epitopes of the proteins. The NetCTL server was used to predict cytotoxic T-lymphocyte (CTL) epitopes of the query proteins [33]. NetCTL prediction depends on the binding affinity of MHCI, C-terminal cleavage function, and transporter function associated with antigen processing. The combined threshold value for MHC-I prediction is 0.75. The retrieved information is tabulated in S14 Table. The IEDB MHCII (http://tools.iedb.org/mhcii/) server was used for identifying helper T-lymphocyte (HTL) epitopes, as shown in S15 Table. Human/HLA-DR was chosen as the species/locus with 7 alleles of human leukocyte antigen (HLA), and 15-m-long epitopes were retrieved.

Results and discussion

Transcriptomics and proteomics data reveals uncharacterized proteins in ΔcgtA V. cholerae

Transcriptomic analysis via RNA-seq and a label-free proteomics study revealed numerous transcripts and proteins whose expression were altered in V. cholerae under CgtA-depleted conditions. CgtA is an essential ribosome-associated GTPase that has multiple functions. On knocking down cgtA from cholerae causing Vibrio cholerae, around 51 proteins have been identified whose role are yet to be deciphered. Among all the 51 uncharacterized proteins, fifty proteins were downregulated at mRNA transcript level, 9 proteins were downregulated at both mRNA and protein level and the protein Q9KU58 was upregulated in V. cholerae N16961 ΔcgtA: kanR/cgtA-pBAD18Cm. The fold changes and p values are noted in S1 Table. Table 1 shows the comparison of the proteomic and transcriptomic data for 9 proteins- Q9KND3, Q9KNF4, Q9KLQ3, Q9KPP0, Q9KRM9, Q9KKL8, Q9KND9, Q9KSV6, and Q9KNG0 whose expression was altered at mRNA and protein level, when cgtA was knocked down. Fig 2A–2I show the genes and their products that were downregulated in V. cholerae N16961 ΔcgtA: kanR/cgtA-pBAD18Cm with respect to the wild-type strain.

Table 1. Comparison between label-free proteomics and transcriptomics analyses.

Gene and Protein Identifier Protein Expression mRNA abundance
Gene ID Protein ID Log2Fold change p value Log2Fold change p value
Q9KND3 VC_A0032 -2.1631343 0.002054179 -2.03968639 0.015611256
Q9KNF4 VC_A0010 1.877225002 0.041021192 -1.89614697 0.01858139
Q9KLQ3 VC_A0689 -1.943823 0.00041941 -1.78636172 0.020959642
Q9KPP0 VC_2326 -1.78433722 0.008289489 -1.40274321 0.0406664
Q9KRM9 VC_1607 1.352318913 0.002389324 -1.94337726 0.039401561
Q9KKL8 VC_A0185 -0.783152317 0.366870853 -.654626335 0.044541827
Q9KND9 VC_A0026 -0.979426671 0.336336033 -2.662460158 0.029802832
Q9KSV6 VC_1150 -1.134660702 0.114161074 -2.662277492 0.030027089
Q9KNG0 VC_A0004 0.292646614 0.599053076 -2.15772107 0.014994607

Out of 51 proteins, there are 9 proteins whose altered expressions were detected in both the RNA-seq and label free proteomic data. The log2Fold change and p-value indicating the alteration in expression of transcript and proteins are recorded for each protein.

Physicochemical properties of the uncharacterized proteins

The physicochemical properties of the fifty-one uncharacterized and hypothetical proteins, including the size of the protein, molecular weight, theoretical isoelectric point (pI), molar extinction coefficient, instability index, aliphatic index, grand average hydropathy (GRAVY), and the total number of positively charged (Arg+Lys) and negatively charged (Asp+Glu) residues, are listed in S2 Table. The proteins are arranged in descending order of length. The amino acid composition found to include all 20 standard amino acids, each present at varying percentages (S3 Table). The top 10 amino acids, in order of abundance, were D, T, I, Q, K, V, S, E, A, and L (Fig 3A). Additionally, the uncharacterized proteins exhibited a molecular weight range of 5.19 kDa to 91.831 kDa, and a pI fall of 4.19 to 11.40 (S2 Table and Fig 3B and 3C). Moreover, the uncharacterized proteins were exhibited a range of lengths, spanning from 45 to 818. Additionally, the aliphatic index of the protein ranged from 10.62 to 135.36, as indicated in S2 Table. In addition, the protein exhibited a negative GRAVY score ranging from -0.967 to -0.0268 (S2 Table and Fig 3D). The proteins exhibited an instability index ranging from 17.6 to 101.35, with stable proteins having an optimal score of less than 40 on the instability index. Among the 51 proteins, only 29 of them were found to be stable in their natural state, as indicated in S2 Table. Furthermore, among the 51 proteins, a total of 10 proteins (Uniprot ID-Q9KKX0, Q9KVJ9, Q9KP29, Q9KPA3, Q9KT53, Q9KL56, Q9KRE6, Q9K2J6, Q9KVW5, and Q9KPZ1) have been determined to be insoluble in their natural state. A total of 15 positively charged proteins (rich in Arginine and Lysine) and 24 negatively charged proteins (rich in Aspartic Acid and Glutamic Acid) were found out of the 51 uncharacterized proteins (S2 Table).

Fig 3.

Fig 3

Physicochemical properties of 51 uncharacterized proteins: A. Heatmap showing the amino acid composition of each protein. The scale on the right is expressed in terms of percentage. B. The bar graph illustrates the pI of all the uncharacterized proteins. Majority of the proteins were seen to be acidic C. The heatmap shows the pI of all the uncharacterized proteins, where the scale represents the pH. D. The GRAVY index indicates the solubility of all the fifty-one candidate proteins. A negative GRAVY value indicates that a protein is hydrophilic in nature. Majority of the uncharacterized proteins were predicted to be soluble.

Predicted subcellular localization of the uncharacterized proteins

The subcellular localization of the 51 uncharacterized proteins was determined using two tools: PSLPred, and Psortb. The results indicate that the proteins are abundant in the cytoplasmic, cytoplasmic membrane, and outer membrane compartments, as determined by the PSORTb score (7.88–10) (S4 Table and Fig 4A). Additionally, the PSLpred analysis indicate that the proteins are present in the cytoplasmic, outer-membrane, periplasmic, inner-membrane, and extracellular compartments, with an expected accuracy ranging from 53.1% to 98.10% (S4 Table and Fig 4B). Additionally, the PFP and Argot2 tools indicated that the uncharacterized proteins are found throughout the cell, namely in the membrane and cytoplasm (Fig 4C and 4D).

Fig 4.

Fig 4

Prediction of subcellular localization of 51 uncharacterized proteins: A. A pie chart illustrating the subcellular location of uncharacterized and hypothetical proteins predicted by Psortb. B. A pie chart depicting the subcellular location of uncharacterized and hypothetical proteins predicted by PSLPred. C. Predicted subcellular location of uncharacterized and hypothetical proteins predicted by PFP (Protein Function Prediction) and D. Argot2. Majority of the proteins were found to be associated with cytoplasm and membrane.

Solubility and transmembrane helices identification uncharacterized proteins

An analysis has been conducted on the solubility, protein type, and transmembrane helix region of 51 uncharacterized proteins using three machine learning tools: SOSUI, DeepTMHMM, and HMMTOP. The SOSUI analysis revealed that 11 proteins possess transmembrane helices and are classified as membrane proteins, while 40 proteins are classified as globular and soluble in nature (S6 Table). In addition, the hydropathy plot generated by ExPASy’s ProtScale tool displays the profile proteins based on position and amino acid composition hydrophobicity or hydrophilicity are displayed in the scale of -10 to +10 (S1 Fig).

Molecular function and biological process of the uncharacterized proteins

The molecular function of each uncharacterized protein was confirmed using the PFP (Fig 5A) and Argot2 tools (Fig 5B). The results indicated that the majority of the proteins were associated with ATP binding, DNA binding, RNA binding, translation elongation factor, and motor activity. This was determined based on the PFP Score range of 1.22 to 14120.19 (S7 Table) and the Argot2 score range of 5.158 to 11459.4 (S8 Table). In accordance to the biological process analysis (Argot2), a large number of proteins have been linked in the regulation of transcription, DNA integration, cell morphogenesis, chemotaxis, and metabolism. The finding is based on the Argot2 score range of 6.26808 to 38638.5 (Fig 5C and 5D). A few uncharacterized proteins have also been shown to play a crucial role in motility, aerotaxis, and chemotaxis. A comparison was carried out to examine the swimming and swarming motility of the CgtA-depleted strain and the wild-type strain of V. cholera. The findings indicate that in the cgtA knockdown strain, the swimming motility was significantly reduced compared to the wild-type strain of V. cholera (Fig 5D). The diameter of the swimming motility zone was determined and statistically verified using a two-tailed t-test at a significance level of p<0.05 (Fig 5E). Similarly, the swarming motility of the cgtA knockdown strain was found to be considerably lower than that of the wild-type strain of V. cholera (Fig 5F and 5G). However, the mechanism bridging the effect of motility and CgtA is yet to be determined. We can strongly anticipate and explore the mechanism of bacterial motility and pathogenesis through further metabolomic studies and functionally characterizing the uncharacterized proteins. Hence, to summarize, understanding the role of these proteins will enable us to look at the bigger picture of cholera pathogenesis, which affects millions of people globally.

Fig 5.

Fig 5

Prediction of functions of 51 uncharacterized proteins A. The green-colored bar chart illustrates the probable molecular functions associated with the 51 uncharacterized and hypothetical proteins predicted by PFP (Protein Function Prediction) tool. Majority of the proteins were found to involved in essential cellular functions like transcription, translation, phosphorylation, transport, chemotaxis and motility. B. The yellow-colored bar chart depicts the predicted molecular function of uncharacterized and hypothetical proteins predicted by Argot2. According to Argot2 prediction, majority of the proteins were involved with catalytic activity and DNA binding. C. The pink-colored bar graph illustrates the probable biological functions of uncharacterized and hypothetical proteins predicted by PFP. Majority of the proteins were found to be involved with cellular morphogenesis, translation, ribosome biogenesis and amino acid metabolism D. The blue-colored bar graph indicates the probable biological functions of uncharacterized and hypothetical proteins predicted by Argot2, where majority of the proteins were found to be involved with transcription, translation, signal transduction, phosphorylation and transport E. Analysis and comparison of swimming motility between the CgtA-depleted strain and the wild-type strain of V. cholerae. In the cgtA knockdown strain, the swimming motility were significantly lower than those in the wild-type strain of V. cholerae. The diameter of motility zone was measured and F. statistically validated using a two-tailed t test. G. Analysis and comparison of swarming motility between the CgtA-depleted strain and the wild-type strain of V. cholerae. Similar to swimming motility, the swarming motility was also significantly lower than those in the wild-type strain of V. cholerae. The diameter of motility zone was measured and H. statistically validated using a two-tailed t test.

Identification of protein-protein interaction network

Protein-protein interaction or PPI network for 51 uncharacterized proteins were visualized in STRING version 11.5, which predicts the interactive protein partners using a confidence score above 0.7. The interacting partners of each of the uncharacterized proteins are tabulated in S15 Table. The highest confidence score of 0.996 and 0.994 were observed for the interaction between SpoVR family protein, Q9KQX3 and a hypothetical protein, Q9KQX4 (VC_1873) and uncharacterized protein, Q9KQX5 (VC_1872) respectively. Other interaction with high confidence score of 0.993 was seen in PPI between EAL domain containing Q9KRD2 and Q9KPJ7 (Gene name- VC_2370) which is a sensory box containing diguanylate cyclase enzyme. The findings in this study will allow us to reinforce our understanding on the probable functions of these uncharacterized proteins and decipher the molecular pathways involved with cellular functions.

Three-dimensional protein structure construction and validation

The 3-dimensional structure of 51 uncharacterized proteins were predicted by AalphaFold-mmseqs2 (S5 Fig). Only the best ranked structure is shown which is validated by predicted local distance difference test (pLDDT) score and predicted aligned error (PAE) graph. pLDDT value above 90 indicates high model confidence and accurate structure. A protein with UniProt ID- Q9KMS2, a 40.5 kDa was observed to have the highest model confidence with pLDDT score of 97.9. Further, the structures were validated using PROCHECK Ramachandran Plot and ProSA.

Cloning, expression and purification of uncharacterized proteins

To validate the results derived from the in-silico analysis, we have selected two proteins (a 92 kDa Q9KRD2 and 12 kDa Q9KU58 as predicted in ProtParam) to perform cloning and purification since they approximately cover the entire size range, i.e., 92 kDa of the set of the uncharacterized proteins used for this study. In addition, RNA-seq data showed that the expression of the large protein with UniProt Id-Q9KRD2 was downregulated (Fig 6A), and the label-free proteomic data revealed that the small protein with UniProt Id-Q9KU58 was significantly upregulated when cgtA was knocked down in Vbrio cholerae (Fig 6B). We isolated genomic DNA from V. cholerae (Fig 6C) and amplified two genes of interest by quantitative polymerase chain reaction (Fig 6D and 6E). We digested and cloned the amplified product (product size Q9KRD2: 2457 bp, and Q9KU58:315 bp) into the pET28a(+) vector and transformed it into chemically competent Escherichia coli DH5α. The clones (Fig 6F) were screened and transformed into chemically competent Escherichia coli Bl21 (pLySs) cells. Further, cells were then cultivated in Luria–Bertani liquid media up to an O.D. of 0.7 and then induced with 1mM IPTG and incubated at 37°C for 5 hours. The cells were then induced with 1mM IPTG and incubated at 37°C for 5 hours. The induced cells were then sonicated (60% amplitude, pulse on = 5 seconds, pulse off = 15 seconds) for 15 minutes, and the cell lysates were centrifuged at 11000 × g for 10 minutes at 4°C. The supernatant was kept aside and the inclusion body pellet was solubilized in PBS buffer and 2M urea. The GRAVY index for Q9KRD2 and Q9KU58 were predicted to be -0.172 and -0.319, respectively, which indicates their solubility in solution. Also, the predictions based on SOSUI and DeepTMHMM, the two proteins were predicted to be soluble. However, while expressing the proteins in the heterologous system of E. coli BL21 (pLySs) both the proteins showed solubility issues. We have used mild concentration (2M) of polar reagent, urea, to solubilize the inclusion body. The solubilized inclusion body were then purified using a nickel-NTA column. Finally, the expression of the purified protein Q9KRD2 (Fig 6G) and Q9KU58 (Fig 6I) were checked by SDS‒PAGE, respectively. The purified Q9KRD2 showed a clear band near 100 kDa protein marker and purified Q9KU58 showed a distinct band near 15 kDa band as expected from in silico analysis. Further, the expression of the two purified protein were validated western blot analysis (Fig 6H and 6J).

Fig 6.

Fig 6

Cloning, expression and purification of two uncharacterized proteins: A. Fold change in mRNA expression shows that the EAL domain-containing Q9KRD2 is downregulated in CgtA-depleted cells. B. The abundance ratio shows that the expression of the protein Q9KU58 is upregulated in CgtA-depleted conditions. C. Genomic DNA isolated from wild-type V. cholerae. Lane 1 = marker, Lane 2 and 3 = isolated genomic DNA from V. cholerae, Lane 3 = Nuclease free water (Control) D. PCR amplification of Q9KRD2. Lane 1 = marker. Lane 2 = PCR mix without genomic DNA (control), Lane 3 = amplified product (length = 2547bp). E. PCR amplification of Q9KU58. Lane 1 = marker. Lane 2 = PCR mix without genomic DNA (control), Lane 3 = digested peT28a (+). Lane 4 = amplified product (315 bp). F. Luria Bertini agar plates showing Escherichia coli DH5α colonies containing chimeric plasmid containing gene expressing for Q9KRD2 (top) and Q9KU58 (bottom). G. Expression of 12 kDa protein, Q9KU58 in lane 7 (purified soluble fraction). H. Western blot gel verifying the expression of Q9KU58. Lane 1 = Uninduced, Lane 2 = induced. I. Expression of 92kDa protein, Q9KRD2 in lane 5 (purified solubilized inclusion body fraction). J. Western blot analysis verifying the expression of Q9KRD2. Lane 1 = Uninduced, Lane 2 = induced solubilized fraction, Lane 3 = induced solubilized inclusion body fraction. K.3-dimensional structure of the protein with UniProt ID–Q9KRD2, predicted by AlphaFold-mmseqs2 L. Ramachandran plot for 3-dimensional structure of protein with UniProt ID- Q9KRD2. M. Plot statistics of Ramachandran plot for protein with UniProt ID- Q9KRD2. N. Plot depicting the overall model quality of the protein with UniProt ID- Q9KRD2. The Z-score, which indicates the overall model quality, is -13.31. O. Plot depicting the local model quality of protein with UniProT ID- Q9KRD2 P. 3-dimensional structure of the protein with UniProt ID–Q9KU58, predicted by AlphaFold-mmseqs2 Q. Ramachandran plot for the 3-dimensional structure of the protein with UniProt ID- Q9KU58. R. Plot statistics of Ramachandran plot for protein with UniProt ID- Q9KU58. S. Plot depicting the overall model quality of the protein with UniProt ID- Q9KU58. The Z-score, which indicates the overall model quality, is -2.66. T. Plot depicting the local model quality of protein with UniProT ID- Q9KU58.

The 3-dimensional model structure of Q9KRD2 and Q9KU58 is shown in (Fig 6K and 6P). Ramachandran plot validation for the protein Q9KRD2 suggested that 91.70% the residues were seen to be in the most favored region (Fig 6L and 6M) while for Q9KU58 79.30% (Fig 6Q and 6R). Additionally, ProSA z-score of Q9KRD2 and Q9KU58 found as -13.31 (Fig 6N) and -2.66 (Fig 6S) respectively ProSA was also used for assessing local model quality of Q9KRD2 (Fig 6O) and Q9KU58 (Fig 6T).

Determination of the antigenicity, allergenicity, toxicity, and virulence of proteins

Understanding the pathogenesis of a disease is extremely vital for designing drugs to combat and alleviate the occurrence of deadly diseases such as cholera. To obtain a larger and clearer picture of cholera pathogenesis, characterizing and understanding the roles of proteins that are associated with the main switch, i.e., CgtA, an essential ribosome-associated GTPase that plays multifarious cellular roles required for the survival of bacterial cells, are essential. The solubility, antigenicity, allergenicity, toxicity, and virulence of the proteins were assessed, as shown in Fig 6. We identified potential vaccine candidates and drug targets from the pool of uncharacterized proteins by a comprehensive reverse vaccinology study, as shown in S10S15 Tables. The antigenicity profiles were determined using the VaxiJen, which exploits alignment-independent prediction of protective antigens. The VaxiJen results were analyzed based on the VaxiJen Overall prediction score, where scores greater than 4 indicate antigenic proteins and scores less than 4 indicate non-antigenic proteins. The results showed that out of the 51 uncharacterized proteins, 31 were found to be potent antigens (with VaxiJen Overall prediction scores ranging from 0.4098 to 0.5438), while 20 were non-antigenic (with VaxiJen Overall prediction scores ranging from 0.2393 to 0.3949), meaning they do not elicit an immune response (S10 Table and Fig 7A and 7D). The AllerTop tool result indicated that out of 51 proteins, 43 were classified as non-allergens, while the remaining proteins were identified as probable allergens (S10 Table and Fig 7B). In addition, the toxicity profiles of the candidate’s proteins were determined using the hybrid score obtained from the ToxinPred 2 program. The ToxinPred analysis generated a hybrid score indicating that 49 proteins were non-toxic, while only 2 proteins were toxic (Fig 7C). The hybrid score for the non-toxic proteins varied from -0.32 to 0.49, while for the toxic proteins, it ranged from 0.68 to 0.79 (S10 Table). On screening the proteins based on antigenicity, allergenicity, toxicity, B-cell (S12 and S13 Tables) and T-cell epitope prediction (S14 and S15 Tables), only 20 proteins were found to be potential vaccine candidates, as shown in Table 2, which can be further validated by experimental studies. We also found several potential drug targets, as determined by their ability to be virulent proteins, as depicted in S11 Table. The virulence of the proteins was assessed using the VirulentPred program, which provided findings based on five separate parameters: amino acid composition, dipeptide composition, PSI-BLAST generated PSSM profiles, a cascade of SVMs and PSI-BLAST, and higher order dipeptide composition. In the amino acid composition-based method, there were a total of 34 proteins that were classified as virulent, while 17 proteins were classified as non-virulent. The scores for the non-virulent proteins ranged from -1.873 to -0.053, while the scores for the virulent proteins varied from 0.0028 to 1.6919. Using the dipeptide composition approach, it was found that out of the 51 proteins analyzed, 22 were non-virulent and 29 were virulent. The virulent proteins had scores ranging from -0.518 to -0.087, while the non-virulent proteins had scores ranging from 0.1159 to 0.4899. Furthermore, PSI-BLAST analysis produced Position Specific Scoring Matrix (PSSM) profiles, which indicated that there were 21 proteins with scores ranging from -1.39 to -0.016, suggesting they were non-virulent. Additionally, there were 30 proteins with scores ranging from 0.0093 to 1.1672, indicating they were virulent. The cascade of Support Vector Machines (SVMs) and PSI-BLAST analysis yielded virulent scores ranging from 0.0182 to 1.1188, and non-virulent scores ranging from -1.1672 to -0.288. In addition to these, 17 proteins were classified as non-virulent and 34 proteins were classified as virulent. An analysis using higher-order dipeptide composition indicated that 18 proteins had a non-virulent nature, with scores ranging from -1.107 to -0.066. In contrast, 33 proteins were found to be virulent, with scores ranging from 0.076 to 1.0795 (S11 Table and Fig 7E). We further screened the proteins based on the average score of virulence and plotted them as illustrated in Fig 7F. On screening, we found around 20 proteins that are potential drug targets as shown in Table 2 and Fig 7G.

Fig 7.

Fig 7

Reverse vaccinology and immunoinformatic analysis on 51 uncharacterized proteins A. Pie chart illustrating the fraction of proteins that are antigen (Number of antigenic proteins = 31) and non-antigen (Number of non-antigenic protein = 20) out of 51 uncharacterized proteins. B. Pie chart depicting the relative fraction of proteins that are allergenic (Number of allergenic proteins = 8) and non-allergenic (Number of non-allergenic protein = 43). C. Pie chart illustrating the fraction of uncharacterized proteins that are toxic (Number of toxic proteins = 2) and non-toxic (Number of non-toxic proteins = 49) to human cells. D. The bar graph shows that individual proteins that have antigenic properties (cutoff = 0.4) that can elicit an immune response in host cells The X-axis depicts the 51 uncharacterized proteins and Y-axis depicts the antigenic score generated by VaxiJen. E. The bar graph illustrates the proteins that are virulent and potential drug targets (threshold = 0), based on the prediction by VirulentPred. The X-axis depicts the 51 uncharacterized proteins and Y-Axis depicts the average hybrid score generated by VirulentPred. F. Heatmap showing the virulence of individual proteins predicted by 5 approaches of VirulentPred (amino acid composition, Dipeptide composition, PSI-BLAST created PSSM profiles, cascade of SVMs and PSI-BLAST and higher order dipeptide composition). The scale on the right side depicts the score generated by VirulentPred. G. Venn diagram showing the numbers of uncharacterized proteins that have been predicted to be potential vaccine candidates and potential drug targets. Around 20 proteins, were found to be potential drug-targets and 20 proteins were predicted to be potential vaccine candidates. There were 6 proteins that were predicted to be both potential vaccine candidate and drug-target.

Table 2. Summary of the proteins that are potential vaccine candidates and drug targets.

Potential vaccine candidates predicted by Vaxijen Potential vaccine candidates predicted by B-cell epitope determination Potential vaccine candidates predicted by T-cell epitope determination Potential vaccine candidates which are unanimously predicted by Vaxijen, B-cell a T-cell epitope determination Potential drug targets predicted by VirulentPred (based on average score) Potential drug targets that are unanimously virulent in the algorithms of ViruelntPred
Q9KRD2 Q9KRD2 Q9KRD2 Q9KRD2 Q9KVG3 Q9KVG3
Q9KVG3 Q9KVG3 Q9KVG3 Q9KVG3 Q9KLK5 Q9KSQ9
Q9KT38 Q9KT38 Q9KT38 Q9KT38 Q9KU75 Q9KSV6
Q9KKL8 Q9KKL8 Q9KKL8 Q9KKL8 Q9KND1 Q9KND3
Q9KLK5 Q9KLK5 Q9KLK5 Q9KLK5 Q9KTC9 Q9KMX1
Q9KU75 Q9KU75 Q9KU75 Q9KU75 Q9KSQ9 Q9KPD6
Q9KND9 Q9KND9 Q9KND9 Q9KND9 Q9KKX0 Q9KNF4
Q9KVJ9 Q9KVJ9 Q9KVJ9 Q9KVJ9 Q9KSV6 Q9KL56
Q9KVJ9 Q9KSV6 Q9KSV6 Q9KSV6 Q9KND3 Q9KLQ3
Q9KSV6 Q9KND3 Q9KND3 Q9KND3 Q9KP29 Q9KKS6
Q9KND3 Q9KPA3 Q9KPA3 Q9KPA3 Q9KMX1 Q9KU58
Q9KPA3 Q9KT53 Q9KT53 Q9KT53 Q9KPD6 BIBIN2
Q9KT53 Q9KRE6 Q9KRE6 Q9KRE6 Q9KNF4 Q9K2J6
Q9KRE6 Q9KKS6 Q9KKS6 Q9KKS6 Q9KL56 Q9KS64
Q9KKS6 Q9KN87 Q9KN87 Q9KN87 Q9KLX2 Q9KN40
Q9KN87 Q9KU58 Q9KU58 Q9KU58 Q9KLQ3 Q9KL81
Q9KU58 Q9KPP0 Q9KPP0 Q9KPP0 Q9KKS6 Q9KL73
Q9KPP0 BIBIN2 BIBIN2 BIBIN2 Q9KU58 Q9KNG0
BIBIN2 Q9KL81 Q9KL73 Q9KL73 Q9KPP0 Q9KSJ4
Q9KL81 Q9KL73 Q9KPZ1 Q9KPZ1 B1B1N2 Q9KPZ1
Q9KL73 Q9KNG0 Q9K2J6
Q9KNG0 Q9KPZ1 Q9KS64
Q9KPZ1 Q9KNI6 Q9KN40
Q9K9I6 Q9KVT0 Q9KVW5
Q9KVT0 Q9KL81
Q9KPA0
Q9KL73
Q9KNG0
Q9KSJ4
Q9KPZ1
Q9KNI6
Q9KVT0
Q9KST0

Based on antigenicity, allergenicity, toxicity B-cell and T-cell epitope prediction, there are 20 proteins (potential vaccine candidates) that are predicted to elicit immunogenic response in human. In addition, we have also predicted 20 potential drug targets that are unanimously predicted to be virulent by 5 approaches (amino acid composition, Dipeptide composition, PSI-BLAST created PSSM profiles, cascade of SVMs and PSI-BLAST and higher order dipeptide composition) of VirulentPred.

Linear and discontinuous B-cell epitope prediction

Among the 51 uncharacterized proteins, only 24 proteins were investigated to identify the linear and discontinuous B-cell epitopes. Proteins were selected for their high antigenicity, non-allergenic properties, and low toxicity, indicating they are potential candidates for vaccine development. Each of the uncharacterized proteins possesses linear as well as discontinuous epitopes. The BcePred, a server for predicting linear B-cell epitopes, was used to predict epitopes for 24 proteins with unknown characteristics. Proteins containing more than 100 linear B-cell epitopes include Q9KRD2 (491), Q9KVG3 (432), Q9KT38 (294), Q9KLK5 (148), Q9KKL8 (222), Q9KSV6 (123), Q9KND3 (106), Q9KND9 (100), all of which have an epitope length of three amino acid residues and a score greater than 0.5 (S12 Table and S6 Fig). Discontinuous B-cell epitopes were predicted using DiscoTope tools, with a threshold discotope propensity score of -3.7. The top proteins with the high number of discontinuous epitopes were Q9KRD2 (45), Q9KVG3 (186), Q9KT38 (57), Q9KKL8 (125), Q9KLK5 (70), Q9KND9 (59), Q9KU58 (75), B1B1N2 (55) and Q9KL81 (48) (S13 Table and S7 Fig).

T-cell epitopes identification

The NetCTL servers were used to predict the T-cell epitopes of the proteins. CTL epitopes for the candidate protein are predicted using NetCTL 1.2 based on MHC class-I binding capacity, TAP transport efficiency, and proteasomal C-terminal cleavage [39]. All three predictions’ scores were added together, and the resulting merged score of all three predictions was used to determine the threshold for CTL epitope identification, which was set at 0.75 (S14 Table). Helper T-lymphocyte (HTL) epitopes were identified using the IEDB MHCII server using Human/HLA-DR as the species/locus and 7 alleles of human leukocyte antigen (HLA) (S15 Table). High MHC II affinity was found when the percentile rank was matched to the SwissProt database. Based on MHC-I and MHC-II binding of candidate proteins, 20 proteins were screened out of the pool of uncharacterized proteins to have T-cell epitopes (Table 2).

Conclusion

Understanding the role of uncharacterized and hypothetical proteins in bacteria are essential for bridging the gaps in our knowledge of gene functions, interactions and molecular mechanisms leading to bacterial pathogenesis. In this study, we analyzed 51 uncharacterized and hypothetical proteins of V. cholerae whose expression is altered in CgtA-depleted conditions, as shown by transcriptomic and proteomic studies. We determined the physicochemical properties of the strains, such as molecular weight, theoretical isoelectric point, extinction coefficient, instability index, aliphatic index, grand average of hydropathicity (GRAVY), and total number of negatively and positively charged residues. The molecular weight of the proteins ranged from 92 kDa to 5 kDa, and the theoretical pI ranged from 4.19 to 11.40, with the majority of the proteins being acidic. The theoretical pI is defined as the point at which a particular molecule carries no net electrical charge at the pH scale and is useful for understanding protein charge stability. We also computed the solubilities, subcellular localizations and probable functions of the proteins and identified their domains and families using various bioinformatics tools and databases. We have observed that several of these uncharacterized proteins are involved in essential cellular processes associated with cholera pathogenesis like transcription, translation, phosphorelay signal transduction, motility and chemotaxis. These predictions are very crucial and effective for hypothesis generation and designing experiments for further validation. For functional protein association networks, STRING was used for the prediction of interactions between our uncharacterized candidate proteins and other partners. Characterizing protein–protein interactions is vital to reinforce our understanding of protein function and the biology of the cell. Additionally, we employed reverse vaccinology and immunoinformatic approach to identify potential vaccine candidates and potential drug targets that will pave the way towards novel drug discovery and vaccine design against cholera pathogen. We also constructed 2D and 3D structural models with PSIPRED and Alphafold2-mmseqs2 respectively, which were further validated with ProSA and PROCHECK. The in-silico studies were further validated by experimental studies by cloning and expression of two crucial proteins (potential vaccine candidates) whose expression was altered in CgtA-depleted conditions. The protein Q9KRD2 is a 92 kDa EAL and sensory PAS domain-containing putative phosphodiesterase that is downregulated when cgtA is knocked down in the V. cholerae genome. In contrast, proteomic studies have shown that the 12 kDa protein Q9KU58 is upregulated when cgtA is knocked down in the V. cholerae genome. Our findings based on in-depth quantitative computational analysis and experimental work will help us to understand the biology of cholera pathogenesis as a whole, and also identify potential therapeutic leads at the molecular level.

Supporting information

S1 Table. RNA-seq data depicting the expression of genes that were downregulated in cgtA knockdown strain of Vibrio cholerae: The p-value and fold change of each protein are recorded and tabulated indicating the alteration in the expression of each protein when cgtA is knocked down from V. cholerae genome.

(DOCX)

pone.0311301.s001.docx (23.4KB, docx)
S2 Table. Physicochemical properties of the 51 uncharacterized proteins: Physicochemical properties like molecular weight, isoelectric point, aliphatic index(thermostability), GRAVY (solubility), molar extinction coefficient, instability index (Stability in solution) and number of positive and negative residues were assessed.

(DOCX)

pone.0311301.s002.docx (31.4KB, docx)
S3 Table. Amino acid composition (%) of the uncharacterized and hypothetical proteins.

(DOCX)

pone.0311301.s003.docx (37.2KB, docx)
S4 Table. Subcellular localization of the uncharacterized proteins: ‐ The subcellular location of 51 uncharacterized proteins were assessed using PSORTb and PSLPred.

(DOCX)

pone.0311301.s004.docx (21.6KB, docx)
S5 Table. Identification of domains of 51 uncharacterized proteins using InterPro, SMART and PROSITE.

(DOCX)

pone.0311301.s005.docx (23.5KB, docx)
S6 Table. Prediction of soluble and transmembrane protein and determination of transmembrane region present within the uncharacterized proteins.

(DOCX)

pone.0311301.s006.docx (41.4KB, docx)
S7 Table. Prediction of Protein Function using PFP (Protein Function Prediction) Server: The molecular function, biological function and cellular location was predicted for each of the proteins using Protein Function Prediction (PFP) server as depicted by PFP score.

(DOCX)

pone.0311301.s007.docx (26.6KB, docx)
S8 Table. Prediction of Protein Function using Argot2 Server: The molecular function, biological function and cellular location was predicted for each of the proteins using Argot2 server as depicted by Argot2 score.

(DOCX)

pone.0311301.s008.docx (27.1KB, docx)
S9 Table. Protein-protein interaction: Identification of interacting protein partners of candidate uncharacterized protein by STRING.

(DOCX)

pone.0311301.s009.docx (66.2KB, docx)
S10 Table. Antigenicity, allergenicity and toxicity of candidate uncharacterized proteins.

(DOCX)

pone.0311301.s010.docx (23.4KB, docx)
S11 Table. Prediction of virulence of the candidate protein by VirulentPred.

(DOCX)

pone.0311301.s011.docx (28.9KB, docx)
S12 Table. Identification of linear B-cell epitopes present within the candidate proteins.

(DOCX)

pone.0311301.s012.docx (218.5KB, docx)
S13 Table. Identification of discontinuous B-cell epitopes present within the candidate uncharacterized proteins.

(DOCX)

pone.0311301.s013.docx (124.9KB, docx)
S14 Table. Prediction of MHC-I binding of candidate proteins by NetCTL.

(DOCX)

pone.0311301.s014.docx (41.9KB, docx)
S15 Table. Prediction of binding affinity between MHC-II and candidate uncharacterized protein depicted by scores generated from IEDB MHC II.

(DOCX)

pone.0311301.s015.docx (39.4KB, docx)
S1 Fig. Hydropathy plot (Kyte/Doolittle plot): Hydropathy plots illustrating the hydrophilic and hydrophobic region within each of the uncharacterized proteins.

(DOCX)

pone.0311301.s016.docx (6.8MB, docx)
S2 Fig. Computational prediction of protein-protein interaction (PPI) network by STRING.

(DOCX)

pone.0311301.s017.docx (5.7MB, docx)
S3 Fig. Identification of secondary structures present within each of the uncharacterized protein by PSIPRED.

(DOCX)

pone.0311301.s018.docx (2.5MB, docx)
S4 Fig. Prediction of disordered regions present within candidate uncharacterized proteins by DISOPRED.

(DOCX)

pone.0311301.s019.docx (1.5MB, docx)
S5 Fig. Construction of 3-dimensional structures of uncharacterized proteins by ALphaFOLD-mmseq2 and validation of the structures.

(DOCX)

pone.0311301.s020.docx (14.6MB, docx)
S6 Fig. Graphical representation of linear B-cell epitope prediction for 24 uncharacterized proteins.

(DOCX)

pone.0311301.s021.docx (655.6KB, docx)
S7 Fig. Graphical representation of discontinuous B-cell epitope present within 24 uncharacterized proteins.

(DOCX)

pone.0311301.s022.docx (806.5KB, docx)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

MoE STARS (Award number- STARS/APR2019/BS/581/FS) was used. We also thank IISER, Kolkata for partially funding and supporting PPD lab. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Ali M., Nelson A.R., Lopez A.L., Sack D.A., 2015. Updated Global Burden of Cholera in Endemic Countries. PLoS Negl. Trop. Dis. 9, e0003832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bonnin-Jusserand M., Copin S., Le Bris C., Brauge T., Gay M., Brisabois A., et al. 2019. Vibrio species involved in seafood-borne outbreaks (Vibrio cholerae, V. parahaemolyticus and V. vulnificus): Review of microbiological versus recent molecular detection methods in seafood products. Crit. Rev. Food Sci. Nutr. 59, 597–610. [DOI] [PubMed] [Google Scholar]
  • 3.Verma J, Bag S, Saha B, Kumar P, Ghosh TS, Dayal M, et al. Genomic plasticity associated with antimicrobial resistance in Vibrio cholerae. Proc Natl Acad Sci U S A. 2019. Mar 26;116(13):6226–6231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zielke R. A., Wierzbicki I. H., Baarda B. I., & Sikora A. E., 2015. The Neisseria gonorrheae Obg protein is an essential ribosome-associated GTPase and a potential drug target. BMC microbiology, 15(1), 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sleominska M., Konopa G., & Wegrzyn G., 2002. Impaired chromosome partitioning and synchronization of DNA replication initiation in an insertional mutant of Vibrio harveyi The cgtA gene encodes a common GTP-binding protein. Biochemical Journal, 362(3), 579–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Foti J. J., Persky N. S., Ferullo D. J., & Lovett S. T., 2007. Chromosome segregation control by Escherichia coli ObgE GTPase. Molecular microbiology, 65(2), 569–581. [DOI] [PubMed] [Google Scholar]
  • 7.Zielke R., Sikora A., Dutkiewicz R., Wegrzyn G., & Czyz A., 2003. Involvement of the cgtA gene functions in stimulating DNA repair in Escherichia coli and Vibrio harveyi. Microbiology, 149(7), 1763–1770. [DOI] [PubMed] [Google Scholar]
  • 8.Polkinghorne A., and Vaughan L., 2011. Chlamydia abortus YhbZ, a truncated Obg family GTPase, associates with the Escherichia coli large ribosomal subunit. Microb Pathog 50, 200–206. [DOI] [PubMed] [Google Scholar]
  • 9.Sasindran S.J., Saikolappan S., Scofield V.L., and Dhandayuthapani S., 2011. Biochemical and physiological characterization of the GTP-binding protein Obg of Mycobacterium tuberculosis. BMC Microbiol 11, 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sikora A.E., Zielke R., Datta K., and Maddock J.R., 2006. b. The Vibrio harveyi GTPase CgtA is essential and associated with the 50S ribosomal subunit. J Bacterial 188,1205–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Raskin D.M., Judson N., Mekalanos J.J., 2007. Regulation of the stringent response is the essential function of the conserved bacterial G protein CgtA in Vibrio cholerae. Proc. Natl. Acad. Sci. 104, 4636–4641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Das S., Chatterjee A., & Datta P. P., 2023. Knockdown Experiment Reveals an Essential GTPase CgtA’s Involvement in Growth, Viability, Motility, Morphology, and Persister Phenotypes in Vibrio cholerae. Microbiology Spectrum, accepted on: 13 February 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wang W., Liu J., Guo S., Liu L., Yuan Q., Guo L., et al. 2021. Identification of Vibrio parahaemolyticus and Vibrio Spp. specific outer membrane proteins by reverse vaccinology and surface proteome. Frontiers in Microbiology, 11, 625315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Heidelberg J.F., Eisen J.A., Nelson W.C., Clayton R.A., Gwinn M.L., Dodson R.J., et al. 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406, 477–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R.D., Bairoch A., 2003. ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788. doi: 10.1093/nar/gkg563 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yu N.Y., Wagner J.R., Laird M.R., Melli G., Rey S., Lo R., et al., 2010. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26, 1608–1615 doi: 10.1093/bioinformatics/btq249 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bhasin M., Garg A., Raghava G.P.S., 2005. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522–2524. doi: 10.1093/bioinformatics/bti309 [DOI] [PubMed] [Google Scholar]
  • 18.Paysan-Lafosse T., Blum M., Chuguransky S., Grego T., Pinto B.L., Salazar G.A., et al., 2023. InterPro in 2022. Nucleic Acids Res. 51, D418–D427. doi: 10.1093/nar/gkac993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Letunic I., Khedkar S., Bork P., 2021. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res. 49, D458–D460. doi: 10.1093/nar/gkaa937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sigrist C.J.A., de Castro E., Cerutti L., Cuche B.A., Hulo N., Bridge A., et al., 2013. New and continuing developments at PROSITE. Nucleic Acids Res. 41, D344–D347. doi: 10.1093/nar/gks1067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hirokawa T., Boon-Chieng S., Mitaku S., 1998. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 14, 378–379. doi: 10.1093/bioinformatics/14.4.378 [DOI] [PubMed] [Google Scholar]
  • 22.Hallgren J., Tsirigos K.D., Pedersen M.D., Armenteros J.J.A., Marcatili P., Nielsen H., et al., 2022. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. [Google Scholar]
  • 23.Tusnády G.E., Simon I., 2001. The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849–850. doi: 10.1093/bioinformatics/17.9.849 [DOI] [PubMed] [Google Scholar]
  • 24.Kyte J., Doolittle R.F., 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. doi: 10.1016/0022-2836(82)90515-0 [DOI] [PubMed] [Google Scholar]
  • 25.Hawkins T., Chitale M., Luban S., Kihara D., 2009. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins Struct. Funct. Bioinforma. 74, 566–582. doi: 10.1002/prot.22172 [DOI] [PubMed] [Google Scholar]
  • 26.Falda M., Toppo S., Pescarolo A., Lavezzo E., Di Camillo B., Facchinetti A., et al., 2012. Argot2: a large-scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics 13, S14. doi: 10.1186/1471-2105-13-S4-S14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Szklarczyk D., Kirsch R., Koutrouli M., Nastou K., Mehryary F., Hachilif R., et al., 2023. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646. doi: 10.1093/nar/gkac1000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Buchan D.W.A., Jones D.T., 2019. The PSIPRED Protein Analysis Workbench: 20 years on. Nucleic Acids Res. 47, W402–W407. doi: 10.1093/nar/gkz297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., et al., 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. doi: 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007. Jul;35(Web Server issue): W407–10. doi: 10.1093/nar/gkm290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Laskowski R A, MacArthur M W, Moss D S, Thornton J M (1993). PROCHECK ‐ a program to check the stereochemical quality of protein structures. J. App. Cryst., 26, 283–291. [Google Scholar]
  • 32.Doytchinova I.A., Flower D.R., 2007. VaxiJen: a server for prediction of protective antigens, tumor antigens and subunit vaccines. BMC Bioinformatics 8, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dimitrov I., Bangov I., Flower D.R., Doytchinova I., 2014. AllerTOP v.2—a server for in silico prediction of allergens. J. Mol. Model. 20, 2278. [DOI] [PubMed] [Google Scholar]
  • 34.Sharma N., Naorem L.D., Jain S., Raghava G.P.S., 2022. ToxinPred2: an improved method for predicting toxicity of proteins. Brief. Bioinform. 23, bbac174. doi: 10.1093/bib/bbac174 [DOI] [PubMed] [Google Scholar]
  • 35.Sharma A., Garg A., Ramana J., Gupta D., 2023. VirulentPred 2.0: an improved method for prediction of virulent proteins in bacterial pathogens. Protein Sci. n/a, e4808. doi: 10.1002/pro.4808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Clifford J.N., Høie M.H., Deleuran S., Peters B., Nielsen M., Marcatili P., 2022. BepiPred-3.0: Improved B-cell epitope prediction using protein language models. Protein Sci. 31, e4497. doi: 10.1002/pro.4497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Saha.S and Raghava G.P.S. BcePred:Prediction of Continuous B-Cell Epitopes in Antigenic Sequences Using Physico-chemical Properties. In G.Nicosia, V.Cutello, P.J. Bentley and J.Timis (Eds.) ICARIS 2004, LNCS 3239, 197–204, Springer,2004.
  • 38.Høie M. H., Gade F. S., Johansen J. M., Würtzen C., Winther O., Nielsen M., et al. (2024). DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Frontiers in immunology, 15, 1322712. doi: 10.3389/fimmu.2024.1322712 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Larsen M.V., Lundegaard C., Lamberth K., Buus S., Lund O., Nielsen M., 2007. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics 8, 424. doi: 10.1186/1471-2105-8-424 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Rajesh Kumar Pathak

16 Aug 2024

PONE-D-24-29986Comprehensive in silico analyses of fifty-one uncharacterized proteins from Vibrio choleraePLOS ONE

Dear Dr. DATTA,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 30 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Rajesh Kumar Pathak, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

3. Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables (should remain/ be uploaded) as separate "supporting information" files

4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

5. PLOS ONE now requires that authors provide the original uncropped and unadjusted images underlying all blot or gel results reported in a submission’s figures or Supporting Information files. This policy and the journal’s other requirements for blot/gel reporting and figure preparation are described in detail at https://journals.plos.org/plosone/s/figures#loc-blot-and-gel-reporting-requirements and https://journals.plos.org/plosone/s/figures#loc-preparing-figures-from-image-files. When you submit your revised manuscript, please ensure that your figures adhere fully to these guidelines and provide the original underlying images for all blot or gel data reported in your submission. See the following link for instructions on providing the original image data: https://journals.plos.org/plosone/s/figures#loc-original-images-for-blots-and-gels.   

In your cover letter, please note whether your blot/gel image data are in Supporting Information or posted at a public data repository, provide the repository URL if relevant, and provide specific details as to which raw blot/gel images, if any, are not available. Email us at plosone@plos.org if you have any questions.

6. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

The manuscript has been reviewed and is found to be interesting. However, the reviewers have raised some queries that need to be addressed in the revised manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have tried to annotate a list of uncharacterized proteins from Vibrio cholerae and analyzed their potency for vaccine development. I find the work is very much useful for the scientific community.

Reviewer #2: In the manuscript “Comprehensive in silico analyses of fifty-one uncharacterized proteins from Vibrio cholerae", submitted by Mallick et al., the authors present an intriguing study on the characterization of 51 uncharacterized and hypothetical proteins in Vibrio cholerae. The focus on understanding these proteins' physicochemical properties, functional associations, and potential roles is well-justified and contributes to the broader understanding of cholera biology. However, a few areas could be further developed to enhance the clarity and impact of the paper.

• The introduction could benefit from a more detailed explanation of the significance of these particular proteins. Providing a stronger rationale for their selection and discussing how this study advances our understanding of V. cholerae would help set the stage for your findings.

• Your methodological approach is comprehensive, especially in describing the physicochemical properties of the proteins. However, it would be useful to elaborate on why these specific properties are important for studying hypothetical proteins. Additionally, more details on the tools and databases used for solubility predictions, subcellular localization, and domain identification would enhance reproducibility and transparency.

• The use of STRING for predicting protein-protein interactions is a strong aspect of the study, but the manuscript could be improved by providing more insight into these predicted interactions. Discussing how these interactions contribute to understanding the proteins' potential functions would add depth to your analysis.

• The experimental validation of two proteins is a valuable part of the study, but it would be beneficial to expand on the results of these experiments. Discuss how the experimental findings align with or differ from your in-silico predictions, as this correlation could be a significant strength of the paper.

• In terms of structure, the manuscript could be improved with smoother transitions between sections, particularly when moving from computational analysis to experimental validation. This would help create a more cohesive narrative. Additionally, the conclusion could be expanded to emphasize the broader significance of your findings and suggest possible directions for future research.

• The manuscript would benefit from grammatical revisions for clarity, and some technical terms should be defined more clearly for readers who may not be specialists in the field. Including figures or tables to summarize key findings, such as the physicochemical properties or interaction networks, would also enhance the readability and impact of the paper.

• Overall, the study presents valuable findings that contribute to the understanding of V. cholerae proteins but addressing these suggestions could strengthen the manuscript and make it more accessible and impactful.

Reviewer #3: 1. Explanation needed to choose CgtA over another available drug targets.

2. Explain significance of predicted 20 drug targets.

3. Explain relationship between 20 predicted candidate vaccines and 20 predicted drug targets.

4. Why only 2 uncharacterized proteins were selected for validation of 20 predicted candidate vaccines. Study should include more proteins for validation.

5. Abstract should be re-written, and manuscript should be checked to improve English.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Rajabrata Bhuyan

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Oct 4;19(10):e0311301. doi: 10.1371/journal.pone.0311301.r002

Author response to Decision Letter 0


3 Sep 2024

Response to reviewers

1. Reviewer #1: The authors have tried to annotate a list of uncharacterized proteins from Vibrio cholerae and analysed their potency for vaccine development. I find the work is very much useful for the scientific community.

Author’s response: Thank you so much for your kind comment.

2. Reviewer #2: In the manuscript “Comprehensive in-silico analyses of fifty-one uncharacterized proteins from Vibrio cholerae", submitted by Mallick et al., the authors present an intriguing study on the characterization of 51 uncharacterized and hypothetical proteins in Vibrio cholerae. The focus on understanding these proteins' physicochemical properties, functional associations, and potential roles is well-justified and contributes to the broader understanding of cholera biology. However, a few areas could be further developed to enhance the clarity and impact of the paper.

(A) The introduction could benefit from a more detailed explanation of the significance of these particular proteins. Providing a stronger rationale for their selection and discussing how this study advances our understanding of V. cholerae would help set the stage for your findings.

Author’s response: The lines 58-67 (Page 2-3) have been modified. The following lines (highlighted in yellow) are added within the manuscript: -

“Understanding these biochemically, structurally and functionally uncharacterized proteins can pave the way towards mechanistic insights of how this essential GTPase, CgtA, exerts its pleotropic effect. Hence, we carried out a comprehensive in silico analysis of these uncharacterized proteins, which allowed us to predict their physicochemical and immunogenic properties and hypothesize and design various in vitro and in vivo experiments to characterize these proteins, which will lead us to understand the basis of cholera pathogenesis at a deeper level. Also, we have successfully identified a number of potential drug-targets and vaccine candidates among those 51 uncharacterized proteins that will facilitate the production of various vaccine constructs and drugs through in vivo and in vitro experiments against the Vibrio cholerae pathogen.”

(B) Your methodological approach is comprehensive, especially in describing the physicochemical properties of the proteins. However, it would be useful to elaborate on why these specific properties are important for studying hypothetical proteins. Additionally, more details on the tools and databases used for solubility predictions, subcellular localization, and domain identification would enhance reproducibility and transparency.

Author’s response: Lines 115-136 (Page 4-5) have been modified to include more information as per the suggestion.

“The ProtParam web server by ExPASy (https://web.expasy.org/protparam/) was used to identify the physicochemical characteristics of the uncharacterized proteins such as molecular weight, theoretical isoelectric point (pI), amino acid composition profile (%), molar extinction coefficient, instability index, aliphatic index, grand average hydropathy (GRAVY), and the total number of positively charged (Arg+Lys) and negatively charged (Asp+Glu) residues based on their amino acid sequences (Supplementary Table 2, Supplementary Table 3, Fig. 2A). The molar extinction coefficient is the measure of the amount of light that proteins absorb at a specific wavelength. A high molar extinction coefficient value indicates the presence of a high concentration of cysteine, tryptophan, and tyrosine in the candidate proteins. The instability index provides an estimation of the stability of a protein in a test tube. A protein whose instability index is greater than 40 is predicted to be instable in solution. The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chain amino acids like alanine, valine, leucine, and isoleucine. It may be considered as a positive factor for the enhancement of thermo-stability of globular protein. A high aliphatic index indicates that a protein is thermo-stable over a wide temperature range. The GRAVY score for a protein is calculated as the sum of the hydropathy values of all of the amino acids divided by the number of residues in the query sequence. A low GRAVY value indicates the possibility of a protein being a globular or hydrophilic protein rather than membranous. A comprehensive analysis of these physicochemical properties will provide us an insight of the possible biological functions of these uncharacterized proteins. In addition, the predicted traits and properties like instability index and solubility will allow us to design effective strategies to express and purify these proteins for downstream biochemical and functional characterization.”

Lines 143-145 (Page 5) were added to describe PSORTb tool which is used to predict subcellular localization of a protein.

“A score is assigned for every localization site which reflects the confidence level of the prediction. A score, higher than the cut-off value 7.5 indicates a strong confidence in the predicted localization.”

We have also added a minor detail about the tool PSLPred in line 147-149 (Page 5). The following line was modified as follows: -

“Further, the results were confirmed by PSLpred (https://webs.iiitd.edu.in/raghava/pslpred/submit.html), which accurately predicts the subcellular localization of uncharacterized proteins based on a hybrid approach that integrates PSI_BLAST and three SVM based on physicochemical properties, residue composition and dipeptide.”

Lines 157-162 (Page 5-6) was modified and added in the manuscript file. The following lines were modified and added where we have described the tools used for domain identification.

“The results generated by the “Interpro” database were validated using SMART (http://smart.embl-heidelberg.de/), a tool that extensively annotates a protein domain based on functional class, phyletic distributions, tertiary structures and functionally important residues. We further validated the results using PROSITE (https://prosite.expasy.org/), a large database of protein domains, families, and functional sites that identifies conserved sequences and functional site within candidate proteins which is crucial for understanding their structure and function”

(C) The use of STRING for predicting protein-protein interactions is a strong aspect of the study, but the manuscript could be improved by providing more insight into these predicted interactions. Discussing how these interactions contribute to understanding the proteins' potential functions would add depth to your analysis.

Author’s response: Lines 386-397 (Page 13) were added to explain the protein-protein interaction network (PPI) derived from STRING version 11.5. The following lines were added in the manuscript: -

“Identification of protein-protein interaction network

Protein-protein interaction or PPI network for 51 uncharacterized proteins were visualized in STRING version 11.5, which predicts the interactive protein partners using a confidence score above 0.7. The interacting partners of each of the uncharacterized proteins are tabulated in Supplementary material 3, Table S15. The highest confidence score of 0.996 and 0.994 were observed for the interaction between SpoVR family protein, Q9KQX3 and a hypothetical protein, Q9KQX4 (VC_1873) and uncharacterized protein, Q9KQX5 (VC_1872) respectively. Other interaction with high confidence score of 0.993 was seen in PPI between EAL domain containing Q9KRD2 and Q9KPJ7 (Gene name- VC_2370) which is a sensory box containing diguanylate cyclase enzyme. The findings in this study will allow us to reinforce our understanding on the probable functions of these uncharacterized proteins and decipher the molecular pathways involved with cellular functions.”

(D) The experimental validation of two proteins is a valuable part of the study, but it would be beneficial to expand on the results of these experiments. Discuss how the experimental findings align with or differ from your in-silico predictions, as this correlation could be a significant strength of the paper.

Author’s response: Lines 428-443 (Page 14) were added where we have compared the predicted solubility and molecular weight of the selected proteins with the experimental results. The following lines were added (as highlighted in the manuscript):-

“The GRAVY index for Q9KRD2 and Q9KU58 were predicted to be -0.172 and -0.319, respectively, which indicates their solubility in solution. Also, the predictions based on SOSUI and DeepTMHMM, the two proteins were predicted to be soluble. However, while expressing the proteins in the heterologous system of E. coli BL21 (pLySs) both the proteins showed solubility issues. We have used mild concentration (2M) of polar reagent, urea, to solubilize the inclusion body. The solubilized inclusion body were then purified using a nickel-NTA column. Finally, the expression of the purified protein Q9KRD2 (Figure 6G) and Q9KU58 (Figure 6I) were checked by SDS‒PAGE, respectively. The purified Q9KRD2 showed a clear band near 100 kDa protein marker and purified Q9KU58 showed a distinct band near 15 kDa band as expected from in silico analysis. Further, the expression of the two purified protein were validared western blot analysis (Figure 6H and J).

The 3-dimensional model structure of Q9KRD2 and Q9KU58 is shown in (Figure 6K and 6P). Ramachandran plot validation for the protein Q9KRD2 suggested that 91.70% the residues were seen to be in the most favoured region (Figure 6L and 6M) while for Q9KU58 79.30% (Figure 6Q and 6R). Additionally, ProSA z-score of Q9KRD2 and Q9KU58 found as -13.31 (Figure 6N) and -2.66 (Figure 6S) respectively ProSA was also used for assessing local model quality of Q9KRD2 (Figure 6O) and Q9KU58 (Figure 6T).”

Other aspects such as domain characterization, subcellular localization, protein-protein interactions and structure are beyond the scope of this manuscript. However, we are currently exploring these aspects experimentally which we would like to present it in a separate manuscript.

(E) In terms of structure, the manuscript could be improved with smoother transitions between sections, particularly when moving from computational analysis to experimental validation. This would help create a more cohesive narrative. Additionally, the conclusion could be expanded to emphasize the broader significance of your findings and suggest possible directions for future research.

Author’s response: The “Result and discussion” part describing the structure of the 51 uncharacterized proteins, and the experimental validation of two proteins, Q9KRD2 and Q9KU58 by cloning and expression has been updated as shown by the highlighted region in page-13 and 14. We have also modified the conclusion as shown by the highlighted region in page 17-18 of manuscript file. Following is the conclusion that we have added in the manuscript.

“Understanding the role of uncharacterized and hypothetical proteins in bacteria are essential for bridging the gaps in our knowledge of gene functions, interactions and molecular mechanisms leading to bacterial pathogenesis. In this study, we analyzed 51 uncharacterized and hypothetical proteins of V. cholerae whose expression is altered in CgtA-depleted conditions, as shown by transcriptomic and proteomic studies. We determined the physicochemical properties of the strains, such as molecular weight, theoretical isoelectric point, extinction coefficient, instability index, aliphatic index, grand average of hydropathicity (GRAVY), and total number of negatively and positively charged residues. The molecular weight of the proteins ranged from 92 kDa to 5 kDa, and the theoretical pI ranged from 4.19 to 11.40, with the majority of the proteins being acidic. The theoretical pI is defined as the point at which a particular molecule carries no net electrical charge at the pH scale and is useful for understanding protein charge stability. We also computed the solubilities, subcellular localizations and probable functions of the proteins and identified their domains and families using various bioinformatics tools and databases. We have observed that several of these uncharacterized proteins are involved in essential cellular processes associated with cholera pathogenesis like transcription, translation, phosphorelay signal transduction, motility and chemotaxis. These predictions are very crucial and effective for hypothesis generation and designing experiments for further validation. For functional protein association networks, STRING was used for the prediction of interactions between our uncharacterized candidate proteins and other partners. Characterizing protein–protein interactions is vital to reinforce our understanding of protein function and the biology of the cell. Additionally, we employed reverse vaccinology and immunoinformatic approach to identify potential vaccine candidates and potential drug targets that will pave the way towards novel drug discovery and vaccine design against cholera pathogen. We also constructed 2D and 3D structural models with PSIPRED and Alphafold2-mmseqs2 respectively, which were further validated with ProSA and PROCHECK. The in-silico studies were further validated by experimental studies by cloning and expression of two crucial proteins (potential vaccine candidates) whose expression was altered in CgtA-depleted conditions. The protein Q9KRD2 is a 92 kDa EAL and sensory PAS domain-containing putative phosphodiesterase that is downregulated when cgtA is knocked down in the V. cholerae genome. In contrast, proteomic studies have shown that the 12 kDa protein Q9KU58 is upregulated when cgtA is knocked down in the V. cholerae genome. Our findings based on in-depth quantitative computational analysis and experimental work will help us to understand the biology of cholera pathogenesis as a whole, and also identify potential therapeutic leads at the molecular level.”

(F) The manuscript would benefit from grammatical revisions for clarity, and some technical terms should be defined more clearly for readers who may not be specialists in the field. Including figures or tables to summarize key findings, such as the physicochemical properties or interaction networks, would also enhance the readability and impact of the paper.

Author’s response: The entire manuscript was checked in English language editing tool, viz., Grammarly and Curie, for the purpose of enhancement of clarity for readers. Furthermore, the result and discussion part, especially the reverse vaccinology and immunoinformatic part have been explained in greater details for better clarity. The figure and table captions were also re-written elaborately.

(G) Overall, the study presents valuable findings that contribute to the understanding of V. cholerae proteins but addressing these suggestions could strengthen the manuscript and make it more accessible and impactful.

Author’s response: Thank you so much for your valuable input and suggestions. We tried our level best to respond to your insightful queries. Thank you.

3. Reviewer #3:

(A)Explanation needed to choose CgtA over another available drug targets.

Author’s response:

Although CgtA is a multifunctional essential GTPase in Vibrio cholerare, not much is known about its mechanisms of actions. Hence our laboratory has been working on it to better understand its functionality and find new or alternate drug targets based on CgtA research in V. cholerare.

(B)Explain significance of predicted 20 drug targets.

Author’s response: Virulent proteins play a monumental role in pathogenesis of infectious disease and can be targeted for drug-design and therapeutic interventions. 20 potential drug targets were predicted from a pool of 51 uncharacterized proteins based on assessment generated by VirulentPred which exploits 5 prediction approaches: -

(a) Amino acid composition,

(b) Dipeptide composition,

(c) PSI-BLAST generated PSSM profiles,

(d) A cascade of SVMs and PSI-BLAST, and

(e) Higher order dipeptide composition.

The threshold value for each approach was taken as 0 (zero). A positive score indicates that the protein is virulent. Whereas, a negative value indicates that the protein is a non-virulent protein. Understanding the functi

Attachment

Submitted filename: Reviewers comments.docx

pone.0311301.s023.docx (29.4KB, docx)

Decision Letter 1

Rajesh Kumar Pathak

17 Sep 2024

Comprehensive in silico analyses of fifty-one uncharacterized proteins from Vibrio cholerae

PONE-D-24-29986R1

Dear Dr. DATTA,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Rajesh Kumar Pathak, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The authors have addressed the comments and revised the manuscript, making it acceptable now.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Partly

Reviewer #4: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: Authors have made significant correction to improve their manuscript and i am partially satisfied with their explanation.

Reviewer #4: I am grateful to the authors for carefully considering all the comments. I hope this article will have a positive effect on the world of science

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Reviewer #4: No

**********

Acceptance letter

Rajesh Kumar Pathak

25 Sep 2024

PONE-D-24-29986R1

PLOS ONE

Dear Dr. DATTA,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Rajesh Kumar Pathak

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. RNA-seq data depicting the expression of genes that were downregulated in cgtA knockdown strain of Vibrio cholerae: The p-value and fold change of each protein are recorded and tabulated indicating the alteration in the expression of each protein when cgtA is knocked down from V. cholerae genome.

    (DOCX)

    pone.0311301.s001.docx (23.4KB, docx)
    S2 Table. Physicochemical properties of the 51 uncharacterized proteins: Physicochemical properties like molecular weight, isoelectric point, aliphatic index(thermostability), GRAVY (solubility), molar extinction coefficient, instability index (Stability in solution) and number of positive and negative residues were assessed.

    (DOCX)

    pone.0311301.s002.docx (31.4KB, docx)
    S3 Table. Amino acid composition (%) of the uncharacterized and hypothetical proteins.

    (DOCX)

    pone.0311301.s003.docx (37.2KB, docx)
    S4 Table. Subcellular localization of the uncharacterized proteins: ‐ The subcellular location of 51 uncharacterized proteins were assessed using PSORTb and PSLPred.

    (DOCX)

    pone.0311301.s004.docx (21.6KB, docx)
    S5 Table. Identification of domains of 51 uncharacterized proteins using InterPro, SMART and PROSITE.

    (DOCX)

    pone.0311301.s005.docx (23.5KB, docx)
    S6 Table. Prediction of soluble and transmembrane protein and determination of transmembrane region present within the uncharacterized proteins.

    (DOCX)

    pone.0311301.s006.docx (41.4KB, docx)
    S7 Table. Prediction of Protein Function using PFP (Protein Function Prediction) Server: The molecular function, biological function and cellular location was predicted for each of the proteins using Protein Function Prediction (PFP) server as depicted by PFP score.

    (DOCX)

    pone.0311301.s007.docx (26.6KB, docx)
    S8 Table. Prediction of Protein Function using Argot2 Server: The molecular function, biological function and cellular location was predicted for each of the proteins using Argot2 server as depicted by Argot2 score.

    (DOCX)

    pone.0311301.s008.docx (27.1KB, docx)
    S9 Table. Protein-protein interaction: Identification of interacting protein partners of candidate uncharacterized protein by STRING.

    (DOCX)

    pone.0311301.s009.docx (66.2KB, docx)
    S10 Table. Antigenicity, allergenicity and toxicity of candidate uncharacterized proteins.

    (DOCX)

    pone.0311301.s010.docx (23.4KB, docx)
    S11 Table. Prediction of virulence of the candidate protein by VirulentPred.

    (DOCX)

    pone.0311301.s011.docx (28.9KB, docx)
    S12 Table. Identification of linear B-cell epitopes present within the candidate proteins.

    (DOCX)

    pone.0311301.s012.docx (218.5KB, docx)
    S13 Table. Identification of discontinuous B-cell epitopes present within the candidate uncharacterized proteins.

    (DOCX)

    pone.0311301.s013.docx (124.9KB, docx)
    S14 Table. Prediction of MHC-I binding of candidate proteins by NetCTL.

    (DOCX)

    pone.0311301.s014.docx (41.9KB, docx)
    S15 Table. Prediction of binding affinity between MHC-II and candidate uncharacterized protein depicted by scores generated from IEDB MHC II.

    (DOCX)

    pone.0311301.s015.docx (39.4KB, docx)
    S1 Fig. Hydropathy plot (Kyte/Doolittle plot): Hydropathy plots illustrating the hydrophilic and hydrophobic region within each of the uncharacterized proteins.

    (DOCX)

    pone.0311301.s016.docx (6.8MB, docx)
    S2 Fig. Computational prediction of protein-protein interaction (PPI) network by STRING.

    (DOCX)

    pone.0311301.s017.docx (5.7MB, docx)
    S3 Fig. Identification of secondary structures present within each of the uncharacterized protein by PSIPRED.

    (DOCX)

    pone.0311301.s018.docx (2.5MB, docx)
    S4 Fig. Prediction of disordered regions present within candidate uncharacterized proteins by DISOPRED.

    (DOCX)

    pone.0311301.s019.docx (1.5MB, docx)
    S5 Fig. Construction of 3-dimensional structures of uncharacterized proteins by ALphaFOLD-mmseq2 and validation of the structures.

    (DOCX)

    pone.0311301.s020.docx (14.6MB, docx)
    S6 Fig. Graphical representation of linear B-cell epitope prediction for 24 uncharacterized proteins.

    (DOCX)

    pone.0311301.s021.docx (655.6KB, docx)
    S7 Fig. Graphical representation of discontinuous B-cell epitope present within 24 uncharacterized proteins.

    (DOCX)

    pone.0311301.s022.docx (806.5KB, docx)
    Attachment

    Submitted filename: Reviewers comments.docx

    pone.0311301.s023.docx (29.4KB, docx)

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES