Skip to main content
PLOS One logoLink to PLOS One
. 2022 Oct 3;17(10):e0275148. doi: 10.1371/journal.pone.0275148

Virtual 2D map of cyanobacterial proteomes

Tapan Kumar Mohanta 1,*, Yugal Kishore Mohanta 2, Satya Kumar Avula 1, Amilia Nongbet 3, Ahmed Al-Harrasi 1,*
Editor: Arabinda Ghosh4
PMCID: PMC9529120  PMID: 36190972

Abstract

Cyanobacteria are prokaryotic Gram-negative organisms prevalent in nearly all habitats. A detailed proteomics study of Cyanobacteria has not been conducted despite extensive study of their genome sequences. Therefore, we conducted a proteome-wide analysis of the Cyanobacteria proteome and found Calothrix desertica as the largest (680331.825 kDa) and Candidatus synechococcus spongiarum as the smallest (42726.77 kDa) proteome of the cyanobacterial kingdom. A Cyanobacterial proteome encodes 312.018 amino acids per protein, with a molecular weight of 182173.1324 kDa per proteome. The isoelectric point (pI) of the Cyanobacterial proteome ranges from 2.13 to 13.32. It was found that the Cyanobacterial proteome encodes a greater number of acidic-pI proteins, and their average pI is 6.437. The proteins with higher pI are likely to contain repetitive amino acids. A virtual 2D map of Cyanobacterial proteome showed a bimodal distribution of molecular weight and pI. Several proteins within the Cyanobacterial proteome were found to encode Selenocysteine (Sec) amino acid, while Pyrrolysine amino acids were not detected. The study can enable us to generate a high-resolution cell map to monitor proteomic dynamics. Through this computational analysis, we can gain a better understanding of the bias in codon usage by analyzing the amino acid composition of the Cyanobacterial proteome.

Introduction

Cyanobacteria are a diverse group of gram-negative, oxygenic, photosynthetic organisms that originated approximately 2–3 billion years ago [14]. Cyanobacteria can inhabit almost any environment, including arid and semi-arid environments and the Arctic and Antarctic [58]. They can also tolerate extremely adverse conditions, such as high salinity, low pH, and high and low light irradiance [912]. They also have the ability to fix atmospheric nitrogen through the use of nitrogenase enzymes [13,14]. Genomic technologies have provided the capacity to unlock the complex interactions between organisms and entire biological systems. The use of genomic and proteomic data has been increasingly utilized in research studies. To provide additional information that are not obtainable with only genomic and transcriptomics-based evaluations, proteomics is a field of study that has received great interest because while the genome of an organism is constant, the proteome differs from cell to cell and undergoes constant change during development and in response to external stimuli [1517]. The proteome provides more information than the genome, including understanding cellular function and the potential links between gene expression and translation [1820]. Proteomic technologies enable the exploration of the structure and function of a protein and serve as a connecting link between transcriptomics and metabolomics [2123]. Therefore, characterizing the details of the proteins of an organism and proteins at a kingdom level is of enormous importance. A comparative proteomic study can provide fundamental information on the role of proteins in complex biological processes, including growth, development, stress response, molecular signaling, etc. The ability to acquire this information requires specific methodologies, including high-throughput technologies, that enable one to identify and isolate a particular protein. Two-dimensional (2D) gel electrophoresis is one of the most prevalent techniques used to separate proteins based on molecular weight and isoelectric points (pI) at the pH range of 3–11. As a result, we lose a lot of information about the whole proteome study because 2D gel electrophoresis is not possible below pH 3 and above pH 11. No IPG (immobilized pH gradient) stripe exists below pH 3 and above pH 11. Therefore, it was essential to understand the proteome’s detailed pH gradient range and provide necessary information about the protein below pH 3 and above pH 11. Although proteins can undergo post-translational modifications, which result in changes in charge of the protein, the original (without any modification) charge of a protein is still of enormous importance. Since several proteins can undergo reversible post-translational modifications, they can retain their original charge after the modification has been accomplished. Understanding the native molecular weight and isoelectric point of a protein can help to understand its possible function and sub-cellular localization as well.

The chloroplast of algae, plants, and protists are descended from internalized cyanobacterium that retained many cyanobacterial genes with conserved photosynthetic activity. Therefore, knowledge of the proteomic data of the cyanobacterial kingdom will enable us to understand the evolution and biochemical process of its cyanobacterial ancestors. Several studies have been conducted in cyanobacteria that reported the proteome profiles of isolated cellular fractions [2427]. The studies were conducted in several isolated fractions, including plasma membrane [26], thylakoid membrane [28], outer membrane [29], and soluble fractions [30]. However, there were numerous inconsistencies in protein localization to the sub-cellular fractions [26,3133]. It was essential to understand the proteomic details of the cyanobacterial kingdom under a single umbrella. Therefore, in the present study, we attempted to determine the molecular weight and isoelectric point of cyanobacterial proteins by considering the annotated (ORF) protein sequences of cyanobacterial proteome and constructed a virtual 2D proteome map using the in-silico approach. It also provided important information on the amino acid composition of proteins in cyanobacterial species that can be used to determine codon usage in these organisms.

Results

Cyanobacterial proteome ranged from 42726.766 to 680331.825 kDa in size

The molecular weight of the Cyanobacterial proteome was found to range from 42726.766 to 680331.825 kDa. Candidatus synechococcus spongiarum (order Melainabacteria) was found to encode the smallest proteome (42726.766 kDa), while Calothrix desertica (order Nostocales) encoded the largest proteome (680331.825 kDa) (S1 Table). Other species found to encode smaller proteomes included Prochlorococcus marinus (46474.77806 kDa), Richelia sp. (48930.82514 kDa), and Trichodesmium thiebautii (50144.32502 kDa) (S1 Table). In contrast, species found to contain larger proteomes included Mastigocoleus testarum (603575.5146 kDa), Nostoc punctiforme, Oscillatoria acuminata (457197.2525 kDa), and several others (S1 Table). The average molecular weight of the cyanobacterial proteome was 182173.1324 kDa. Calothrix desertica was also encoded the highest number of protein sequences (19335), while Candidatus synechococcus spongiarum encoded the lowest (1337). Overall, cyanobacterial species encoded an average of 5260.704 protein sequences (Fig 1). The species, Anabaena cylindrica, was found to only encode 17878 protein sequences, and the molecular weight of the proteome of this species was 207570.656 kDa. These data indicate that the number of protein sequences in the proteome is not directly proportional to the molecular weight of the proteome. The proteome of the cyanobacterial species from order Spirulinales is comparatively heavier than the proteome of others (S1 Table). The average molecular weight of the protein of different classes of cyanobacteria increasing order was unclassified (30.519 kDa), Synechococcales (33.063 kDa), Chroococcidiopsidales (34.311 kDa), Chroococcales (34.487 kDa), Gloeobacterales (34.72 kDa), Pleurocapsales (35.571 kDa), Nostocales (35.788), Melainabacteria (36.659 kDa), Oscillatoriales (36.701 kDa), and Spirulinales (37.151 kDa) (S1 File). The molecular weight of proteins from Chroococcidiopsidales, Chroococcales, and Gloeobacterales falls around 34 kDa, whereas the molecular weight of proteins from other groups is slightly different. To further corroborate this premise, a correlation regression analysis was conducted based on the number of protein sequences and the molecular weight of the proteome. Results indicated that the number of protein sequences of the cyanobacterial proteome is only slightly proportional to the molecular weight of the proteome (kDa) (S1 Fig). Although the correlation was positive (r = 0.918, y = 13570 + 32.07 x), it was < 1 (S1 Fig). The D’Agostino and Pearson omnibus normality test indicated that the cyanobacterial proteome did not come from a normally distributed population (it did not pass the normality test at α = 0.05, p ≤ 0.0001). One sample t-test revealed that the molecular weight between cyanobacterial proteomes was significantly different (t = 31.24, degree of freedom df = 229, p = 0.0001, α = 0.05 (significant)).

Fig 1.

Fig 1

Box and whisker plot of (a) average pI of cyanobacterial proteins (b) average of basic pI proteins (c) average of acidic pI proteins and (d) number of protein sequences of cyanobacterial proteomes used in this study. The graphs were prepared using Microsoft excel version 2016.

Cyanobacterial proteins ranged in size from 1.4007 to 1200.393 kDa

The molecular weight of individual cyanobacterial proteins ranged from 1.4007 to 1200.393 kDa (S1 File). Oscillatoria nigro-viridis was found to encode the smallest cyanobacterial protein, MAVISVTAATNLIP (1.40 kDa, accession WP_071884041.1), while Anabaena cylindrica was found to encode the largest cyanobacterial protein with a molecular weight of 1200.393 kDa (accession WP_015217688.1). A tandem-95 repeat protein was found in the largest protein in the cyanobacterial proteome. The average molecular weight of cyanobacterial proteins was 34.647 kDa. At least 80 proteins from among the 903149 analyzed protein sequences were found to have a molecular weight > 500 kDa (0.008%), and at least 29671 (3.28%) protein sequences were found to have a predicted molecular weight of 100–500 kDa (S1 File). Approximately 16.298% of the proteins had a predicted molecular weight in the range of 50–100 kDa, while 69.15% of the protein had a molecular weight of 10–50 kDa, and 11.26% had a molecular weight of 1–10 kDa (S1 File). Some of the high molecular weight cyanobacterial proteins included a PKD domain-containing protein (1139.1315 kDa, accession WP_096661492.1) and a non-ribosomal peptide synthetase (914.085 kDa, accession WP_100898072.1), RHS family protein (833.607 kDa, accession WP_096661480.1), as well as several others. At least 101712 (11.26%) protein sequences encoded a protein with a predicted molecular weight of ≤10 kDa. A few of the low-molecular-weight cyanobacterial proteins were, MGLLCGIWLRRKN (1.559 kDa, accession PZV24433.1), MGVSSLASRLVNCNI (1.563 kDa, accession CEJ46831.1), MTGHSLTIDGGYTVQ (1.579 kDa, accession WP_094672089.1), MSITEIIDDFPELT (1.623 kDa, accession WP_094673095.1), and MRNPVGSTHITASKDG (1.670 kDa, accession WP_081980801.1).

The range of Cyanobacterial proteins is 180.617 to 480.131 amino acids per protein

The number of amino acids in the identified cyanobacterial proteins ranged from 180.617 to 480.131 amino acids per protein sequence. Longer proteins possessed a greater number of conserved functional domains and a greater number of predicted biological functions, while shorter proteins possessed fewer conserved domains and predicted biological functions. The greatest average length of protein sequences was found in Microcoleus sp. PCC7113 (480.131), while the lowest average length was found in Trichodesmium erythraeum (181.617). Collectively, the average number of amino acids present per cyanobacterial protein was 312.018.

Cyanobacterial proteome encodes a greater number of acidic pI proteins

The pI of cyanobacterial proteins ranged from 2.13 to 13.32. The BEN50 protein (accession number PNW56779.1) from Halothece sp. (accession number WP_036263155.1) was found to have the lowest predicted pI of 2.13, while a protein sequence (accession number WP_036263155.1) from Mastigocoleus testarum had the highest pI (13.32) (S1 File). The overall average pI of cyanobacterial proteins was 6.437 (median 6.419) (Fig 1). Approximately 33.57% of protein sequences from among the 903149 analyzed protein sequences were found to have a pI in the basic range. In comparison, 66.30% of protein sequences had a pI in the acidic range. Only 0.12% of protein sequences were found to have a neutral pI (S1 File). The overall average of basic pI proteins was 8.629 (Fig 1), while the overall average of acidic pI proteins was 5.296 (Fig 1). Interestingly, a few of the high pI proteins were found to contain repetitive amino acids. The highest pI encoding protein (accession: WP_036263155.1, Mastigocoleus testarum) possessed six R-R-R repeats with 50% of Arg amino acids. Similarly, a protein (accession: WP_084739217.1 from Chroococcidiopsis thermalis) with a pI of 13.1 was found to encode six G-T-R-G repeat sequences. The 50S ribosomal protein L34 (Cyanobium sp. ARS6) with a pI 13.01 contained R-R-R, R-R-V, and R-R-K repeats. In contrast, a protein (accession number WP_052324745.1) from Hassallia byssoidea was found to encode eleven M-K-L-R-V repeat sequences. We analyzed the amino acid composition of all the protein sequences with a pI ≥ 13 and found that they do not possess Asp, Cys, Glu, His, Phe, Trp, or Tyr. A protein (accession number WP_047157505.1) in Trichodesmium erythraeum had a predicted pI of 2.181 and contained at least 15% Asp and 6.4% Glu amino acids, which are both negatively charged. The amino acid composition of protein sequences with a pI ≤ 2.30 did not contain Cys, His, or Lys. To better understand the group-specific pI distribution of cyanobacteria, we grouped the cyanobacterial species into different groups and analysed the pI. The increasing order of pI of cyanobacterial species belonged to different order Spirulinales (6.135), Oscillatoriales (6.263), Chroococcales (6.342), Synechococcales (6.344), Pleurocapsales (6.397), Nostocales (6.494), Unclassified (6.553), Chroococcidiopsidales (6.578), Gloeobacterales (6.734), and Melainabacteria (7.073) (S1 File).

A correlation analysis was conducted with GC% content and pI of cyanobacterial proteome. It was found that GC% and pI of cyanobacterial proteome is slightly correlated (r = 0.2171) (Fig 2). A comparative evolutionary study of the pI of cyanobacterial proteome revealed, Roseofilum reptotaenium proteome exhibited the lowest pI i.e., 5.893 evolved approximately 2180 million years ago (S1 Table). Similarly, the proteome of Candidatus gaganbacteria exhibited the highest pI i.e., 7.388 was evolved 1426–2635 million years ago. All the cyanobacterial proteomes that possess pI of more than 7.10 evolved 1426–2635 million years ago (S1 Table). Whereas few of the cyanobacterial proteome that contains pI 5.9–5.97 were found to evolve approximately 452–1157 million years ago (S1 Table).

Fig 2. Correlation analysis of GC% with pI of cyanobacterial proteome.

Fig 2

GC% show positive correlation r = 0.2172 with pI. However, the correlation coefficient was not so significant. The photographs was generated using mathportal server https://www.mathportal.org/calculators/statistics-calculator/correlation-and-regression-calculator.php.

The Molecular weight and pI of Cyanobacterial proteins exhibit a bimodal distribution

The molecular weight and isoelectric point of cyanobacterial proteins greatly vary among the different proteomes. Notably, a bimodal distribution of cyanobacterial proteomes is evident (Fig 3) that deciphers the virtual 2D map of cyanobacterial proteomes. The overall average pI of cyanobacterial proteomes was 6.437, while the overall average molecular weight was 34.7417 kDa. The variance of the pI was found to be 0.068, and the variance of molecular weight was found to be 7.992. Variances lower than the mean reveal the binomial distribution of pI and molecular weight. Correlation analysis results indicated that cyanobacterial proteins’ molecular weight and isoelectric point are negatively correlated (r = -0.197) (S2 Fig). The correlation analysis of amino acid sequence length of cyanobacterial proteins and pI also exhibited a negative correlation (r = -0.240) (S3 Fig).

Fig 3.

Fig 3

(a) Virtual 2D map of cyanobacterial proteome. X- axis represents isoelectric protein and Y-axis represents molecular weight (kDa). (b) Represents the frequency of isoelectric point of proteins and (c) represents the frequency of molecular weight of cyanobacterial proteins. The scatter plot was generated using scatterplot online software https://scatterplot.online/.

The normal distribution analysis of pI for probability P(X >13.32), P(X < 13.32), P(X > 2.13), and P (X < 2.13) were 0, 1, 0.992, and 0.0078, respectively. The probability of a pI with P(X > 7) was 0.424 and P(X < 7) was 0.575. This indicates that the probability of finding a protein with a pI > 13.32 in cyanobacteria is zero, while the probability of finding a protein with a pI < 2.13 is 0.0078. Similarly, the normal distribution analysis of molecular weight for probability P(X > 1200.393), P(X < 1200.393), P(X > 1.400), and P(X < 1.400) were 0, 1, 0.872, and 0.127, respectively. This indicates that the probability of finding a cyanobacterial protein with a molecular weight > 1200.393 kDa is zero, while the probability of finding cyanobacterial protein with a molecular weight > 1.400 is 0.872, and the probability of finding a protein with a molecular weight < 1.400 is 0.127. At least 177 species of cyanobacteria were found to encode neutral pI proteins. Gamma distribution of isoelectric point of cyanobacterial proteome showed empirical probability closely matches with the theoretical probability at 95% confidence interval (S4 Fig).

Highest and lowest represented amino acids in the Cyanobacterial proteome

Proteome-wide analysis of the cyanobacterial proteome revealed that Leu (11.104%) was the most abundant and Cys (1.014%) was the least abundant amino acid (Table 1). Other highly abundant amino acids included Ala (8.324%), Gly (6.837%), Ile (6.619), and Val (6.536%). Low abundant amino acids, in addition to the Cys, included Trp (1.436%), His (1.87%), Met (1.88%), and Tyr (2.983%) (Table 1, Fig 4). Approximately 51.462% of the cyanobacterial proteome contained nonpolar amino acids and 48.525% polar amino acids. The highest and lowest abundant amino acids in different cyanobacterial species were also calculated. Prochlorococcus marinus contained the highest percentage of Asn (6.467%), Phe (4.966%), and Ser (7.722%) amino acids, while Candidatus gastranaerophilales contained the lowest percentage of Arg (3.349%), Gly (5.926%), Leu (8.777%), and Pro (3.274%) amino acids (Table 1). Aphanocapsa feldmannii possessed the highest percentage of Cys (1.382%) and Arg (8.220%) amino acids, while Aphanothece minutissima contained the highest percentage of Gly (9.232%) and Pro (6.490%) amino acids. Gloeomargarita lithophora contained the highest percentage of Gln (6.325%) and Trp (1.845%) amino acids, and Candidatus gastranaerophilales encoded the highest percentage of Tyr (3.963%) and Lys (9.122%) amino acids (Table 1). The lowest percentage of Arg (3.349%), Gly (5.926%), Leu (8.777%), and Pro (3.274%) was found in Candidatus gastranaerophilales, while the lowest percentage of His (1.501%) and Ala (5.418%) was found in Prochlorococcus marinus. Tolypothrix bouteillei, however, had the lowest percentage of Trp (0.005%) and Val (1.445%), while Vulcanococcus limneticus had the lowest percentage of Ile (3.725%) and Phe (2.923%) (Table 1). A comparative study of the highest and lowest abundant amino acid-containing species revealed, the Cyanobacterial species Vulcanococcus limneticus, Prochlorococcus marinus, Euhalothece sp. KZN001, Gloeomargarita lithophora, Candidatus gastranaerophilales, and Candidatus synechococcus have both the highest and lowest abundant amino acids in their proteome (Table 1). A principal component analysis (PCA) revealed that Arg, Pro, Asp, Gln, Thr, Glu, Ser, Val, and Gly clustered together. In contrast, Trp, His, Met, and Cys clustered in a separate group (Fig 5). The high-abundant amino acids, Leu, Ala, and Ile, were located separately and independent from the other two clusters in the PCA plot. Trp, His, Met, and Cys are comparatively low-abundant amino acids in the cyanobacterial proteome and were found to cluster together in the PCA plot (Fig 5). A correlation plot of amino acid composition revealed a strong correlation ship between some amino acids (Fig 6A). Phe-Tyr (0.996), Leu-Thr (0.992), Gly-Pro (0.993), Leu-Gly (0.994), and Asn-Lys (0.992) were among the strongly correlated amino acid pairs. Amino acid pairs with poor correlations included Met-Trp (0.849), Ala-Lys (0.836), Ala-Asn (0.862), Arg-Ile (0.892), Arg-Lys (0.839), and Lys-Trp (0.876) (Fig 6). Network plot also shows a strong correlation of Trp-Val (0.99), and Arg-Ala (0.989) (Fig 6B).

Table 1. Average amino acid composition of cyanobacterial proteomes.

Cyanobacterial species with highest and lowest abundance of amino acids are depicted here.

Amino acids Average percentage (%) Highest percentage (%) Name of the species with Highest percentage (%) Lowest percentage (%) Name of the Species with Lowest Percentage (%)
Ala 8.324 12.471 Vulcanococcus limneticus 5.418 Prochlorococcus marinus
Arg 5.328 8.220 Aphanocapsa feldmannii 3.349 Candidatus gastranaerophilales
Asn 4.219 6.467 Prochlorococcus marinus 1.995 Cyanobium gracile
Asp 5.005 6.209 Leptolyngbya valderiana 4.286 Synechococcus sp. 65AY640
Cys 1.014 1.382 Aphanocapsa feldmannii 0.890 Euhalothece sp. KZN001
Gln 5.265 6.325 Gloeomargarita lithophora 2.910 Candidatedivision WOR-1
Glu 6.263 7.434 Euhalothece sp. KZN001 5.250 Candidatus synechococcus
Gly 6.837 9.232 Aphanothece minutissima 5.926 Candidatus gastranaerophilales
His 1.870 2.635 Candidatus synechococcus 1.501 Prochlorococcus marinus
Ile 6.619 9.232 Candidatus margulisbacteria 3.725 Vulcanococcus limneticus
Leu 11.104 13.225 Vulcanococcus limneticus 8.777 Candidatus gastranaerophilales
Lys 4.665 9.122 Candidatus gastranaerophilales 1.800 Synechococcus sp. BO8801
Met 1.880 4.494 Tolypothrix bouteillei 1.508 Gloeobacter kilaueensis
Phe 3.895 4.966 Prochlorococcus marinus 2.923 Vulcanococcus limneticus
Pro 4.831 6.490 Aphanothece minutissima 3.274 Candidatus gastranaerophilales
Ser 6.330 7.722 Prochlorococcus marinus 4.893 Gloeomargarita lithophora
Thr 5.583 6.899 Tolypothrix bouteillei 4.219 Synechococcales bacterium
Trp 1.436 1.845 Gloeomargarita lithophora 0.005 Tolypothrix bouteillei
Tyr 2.983 3.963 Candidatus gastranaerophilales 1.759 Synechococcus sp. BO8801
Val 6.536 7.543 Gloeobacter violaceus 1.445 Tolypothrix bouteillei

Fig 4. Average amino acid composition of cyanobacterial proteomes (%).

Fig 4

Figure shows Leu is the highest and Cys is the lowest abundant amino acid in the cyanobacterial proteome.

Fig 5. Principal component analysis (PCA) of amino acid composition of cyanobacterial proteome.

Fig 5

The photograph was generated using Unsramber software version 7.

Fig 6.

Fig 6

(a) Correlation plot of amino acid compositions and (b) network plot of amino acid composition of cyanobacterial protein. Strong and thick blue line indicates a positive correlation. The photographs was generated using mathportal server https://www.mathportal.org/calculators/statistics-calculator/correlation-and-regression-calculator.php.

Notably, the proteome-wide analysis of amino acid composition also revealed the presence of selenocysteine (Sec in the cyanobacterial proteome. Candidatus melainabacteria (accession PWT95605.1) was found to encode one Sec amino acid in its proteome. No other species, however, contained proteomes that encoded Sec amino acids. Notably, several proteins were annotated with the term "seleno", such as tRNA 2-selenouridine synthase, selenocysteine lyase, selenophosphate synthase, and selenocysteine-specific translation elongation factor. None of these proteins, however, were found to encode any Sec amino acids. A correlation study of Cyanobacterial GC% content was conducted with the amino acid composition. The study revealed that the GC% content of the cyanobacterial genome with amino acid usage was not so significant (Table 2). The highest correlation was found in the case of Met (0.0966), whereas the lowest correlation was found in the case of Trp (-0.008) (Table 2). Met (CAT) and Trp (CCA) is encoded by a single isoacceptor, and this might be the reason why CAT and CAU have the highest and lowest correlation coefficient with GC%. However, other amino acids are encoded by more than one isoacceptor [3437].

Table 2. Correlation analysis of GC% content and amino acid usage bias in cyanobacterial proteome.

Amino acids Regression equation
Y
Correlation coefficient r
Ala 114917 + 442 x 0.0518
Cys 14174 + 49.43 x 0.0484
Asp 74508 + 160.9 x 0.0308
Glu 94837 + 173.2 x 0.0259
Phe 58337 + 126.4 x 0.0296
Gly 96186 + 318.4 x 0.0466
His 26600 + 83.05 x 0.044
Ile 104749 + 91.86 x 0.0123
Lys 72252 + 103.6 x 0.0181
Leu 164555 + 347.1 x 0.0309
Met 21818 + 195.8 x 0.0966
Asn 66215 + 86.4 x 0.0166
Pro 67356 + 245.3 x 0.0507
Gln 80610 + 140.5 x 0.0246
Arg 72412 + 304.9 x 0.0563
Ser 97274 + 153.9 x 0.0219
Thr 83054 + 202 x 0.0329
Val 102743 + 82.97 x 0.0121
Trp 23895–13.01 x -0.008
Tyr 44761 + 98.37 x 0.0297

Discussion

The completion of the sequencing of several cyanobacterial genomes has provided an excellent opportunity to analyse and better understand their genomic and proteomic details [3840]. Although genome sequencing has provided important information on the genes and genome composition of the cyanobacteria, especially with regard to potential biotechnological applications [38], very little information is available with regards to its proteomic. Although cyanobacterial kingdom has not received enormous attention with regard to its proteomic study, it has still made a profound impact on the biotechnological implication of producing single-cell protein [41]. Therefore, it has become an excellent platform for understanding the proteomic details of the cyanobacteria and extracting information on the global identification of expressed proteins in cyanobacterial cells, as well as providing valuable insights into the dynamic response of cyanobacterial cell’s environmental challenges and the regulation, compartmentalization, structure, and biological function of expressed proteins [42,43]. However, it is a challenging goal to assess and characterize the complete proteome of the entire cyanobacterial kingdom, and researchers have worked diligently to achieve this goal. The major core proteome of the cyanobacteria kingdom constitutes a few major protein families that vary dynamically among and between the species [38]. In this regard, we conducted a proteome-wide analysis in the present study by downloading and analysing the annotated protein sequences of all of the available cyanobacterial proteins, covering 229 cyanobacterial species (S2 File). In its entirety, our study collected and analyzed 903149 cyanobacterial protein sequences and constructed a virtual 2D map of the cyanobacterial proteome based on the molecular weight and isoelectric point of each of the collected protein sequences (S1 File). Among other results [4447], the current analysis revealed the bimodal distribution of the molecular weight and isoelectric point of cyanobacterial proteins. The presence of higher percentage of polar amino acid at the surface of the protein and non-polar amino acids at the core of the protein result in increased isoelectric point conferring a greater thermostability [48,49]. Lower isoelectric point and minimum negative energy play important role in stabilization of bonds in acidic environment [50,51]. We also documented those cyanobacterial proteins contain an average of 312.018 amino acids per protein sequence. The molecular weight of proteomes from the order Spirulinales (37.151 kDa) was found to be the highest, whereas the molecular weight of proteomes from unclassified (30.519 kDa) cyanobacterial group was found to be the lowest. However, Chroococcidiopsidales (34.311 kDa), Chroococcales (34.487 kDa), Gloeobacterales (34.72 kDa) were found to encode proteomes within the range of 34 kDa. The phenotypic shape of the cyanobacteria from the order Chroococcidiopsidales and Chroococcales are coccoid, and the similar cellular structure might be the cause of encoding the proteome with approximately similar molecular weight.

A previous study reported that higher plant proteins contain an average of 423.34 amino acids per protein [44], which is significantly higher than the average found in cyanobacterial proteins. It was also previously reported that plants encode 40469.83 protein sequences per species, while fungi encode 10345.83 protein sequences per species [44,46]. Cyanobacteria, however, were found to encode only 5260.704 protein sequences per species. Although the virtual 2D map of the cyanobacterial and fungal [46] proteome exhibit a bimodal distribution for pI and molecular weight, the virtual 2D map of the plant proteome exhibits a trimodal distribution [44]. The virtual 2D map of virus proteomes showed host-specific modalities, molecular weight, and isoelectric points [47]. Like the cyanobacteria, the pI of most proteins encoded in the plant and fungal proteome reside in the acidic pI range [44,46]. Although cyanobacteria are found in diverse habitats [52,53], the pI of their proteins primarily resides in the acidic range. A few species of Cyanobacteria, however, were found to exhibit an average pI of their overall proteome in the basic range. These species include Candidatus gaganbacteria (7.388), Aphanothece minutissima (7.028), Candidate division WOR-1 (7.287), Candidatus synechococcus spongiarum (7.174), Candidatus termititenax (7.107), Cyanobacterium PCC 7702 (7.033), Cyanobium gracile (7.039), Prochlorococcus marinus (7.129), Synechococcus sp. BO8801 (7.028), and Vulcanococcus limneticus (7.033) (S1 Table). The species Vulcanococcus limneticus was isolated from a volcanic lake in central Italy. It was found to contain the highest percentage of Ala (12.471) and Leu (13.225) amino acids among the studied cyanobacterial species, whereas it contained the lowest percentage of Phe (2.923) (Table 1). It might need the highest percentage of Ala and Leu amino acids to withstand higher temperature, and hence V. limneticus contained the highest percentage of Ala and Leu amino acids. Similarly, Phe might be the least required amino acid to withstand higher temperatures. Prochlorococcus marinus is a picoplankton that shows unusual pigmentation due to the presence of chlorophyll a2 (a derivative of chlorophyll a) and b2. It contained the highest percentage of Phe (4.966) and Ser (7.722) amino acids. P. marinus does not contain Chl a as a major photosynthetic pigment but contains α-carotene [54]. Similarly, P. marinus contained the lowest percentage of Ala (5.418) and His (1.501) amino acids. The composition of the highest and lowest abundance amino acids might play an important role in the lack of Chla and having Chl a2 for photosynthesis. A few of the cyanobacterial species exhibiting a proteome with a low average pI were Roseofilum reptotaenium (5.893), Aphanocapsa montana (5.974), Euhalothece sp. KZN001(5.902), Halomicronema excentricum (5.900), Halothece sp. PCC7418 (5.986), Oscillatoria acuminata (5.975), Phormidium lacuna (5.926), and Phormidium sp. SL48-SHIP (5.905) (Table 1). These data indicate that cyanobacterial proteomes exhibit a wide range of average pI. When we compared the amino acid composition with the plant proteomes, we found that Cys (1.014%) was the lowest encoding amino acid in the Cyanobacterial kingdom whereas Trp (1.28%) was the lowest encoding amino acid in the plant kingdom [44]. However, the abundance of Leu amino acid was highest in the Cyanobacteria and plant kingdom.

Our analysis also revealed several highly repetitive proteins in some cyanobacterial proteomes, all of which had a pI < 3. The accession number of some of these repetitive proteins with a pI < 3 were WP_079680752.1, WP_045868631.1, WP_041565552.1, WP_041234656.1, WP_040484803.1, WP_039747676.1, WP_036265112.1, WP_035758608.1, WP_027842305.1, WP_017716411.1, WP_015226612.1, WP_015178523.1, WP_015156103.1, WP_006634710.1, and WP_006194633.1. Evolutionary analysis of cyanobacterial pI revealed the lowest pI encoding species, Roseofilum reptotaenium evolved 2180 million years ago. In contrast, the cyanobacterial proteome with pI > 7 was found to evolved 1426–2635 million years ago, suggesting the dominance of basic pI protein in the early stage of life. Similarly, the maximum cyanobacterial species containing pI 5.9 to 5.97 evolved 452–1157 million years ago. This suggests that the dominance of acidic pI proteome in the cyanobacterial lineage is a recent event (S1 Table). This confirms that the evolution of cyanobacterial proteome tends towards the acidic pI, and this might be associated with the event of ocean acidification [5557]. Group-specific pI analysis revealed, Spirulinales (6.135) encoded the lowest pI proteins (acidic pI range), whereas Melainabacteria (7.073) encoded the highest pI proteins (basic pI range) (S1 File). The cyanobacterial groups that encoded pI in the range of 6.2 to 6.4 were Oscillatoriales (6.263), Chroococcales (6.342), Synechococcales (6.344), and Pleurocapsales (6.397). It was important to note that, Spirulinales (37.151 kDa) and Oscillatoriales (36.701 kDa) encoded high molecular weight proteins, and on the other hand, they contain low pI proteins. This suggests that increases in molecular weight of cyanobacteria is directly proportional to the decrease in the isoelectric point of the protein in the case of Spirulinales and Oscillatoria (S1 File). Melainabacteria is not able to perform photosynthesis [58], and hence they obtain energy by fermentation [59]. They lack an electron transport chain system and use Fe-Hydrogenase to produce hydrogen gas [58]. However, Melainabacteria has the potential to fix nitrogen [58].

A comparison of the molecular weight of the cyanobacterial and fungal proteome revealed that the average molecular weight of fungal proteins is 50.90 kDa, while the average molecular weight of cyanobacterial proteins is 34.647 kDa. Cyanobacteria are prokaryotic organisms and lack non-coding genomic DNA but still contain smaller-sized proteins. The biological function of a protein is partially determined by its three-dimensional tertiary structure, which is directly affected by the primary structure of the polypeptide chain of amino acids [60,61]. Longer peptides have greater possibilities for accommodating multiple structural and functional domains. A comparative study of protein size from a limited number of taxa revealed considerable differences in protein size [61]. More than 90% of E.coli K-12 strain lies in the small isoelectric point and molecular weight of 4–100 kDa, which is in accordance with the cyanobacterial proteome [62]. Some of the 2-DE isoform spots of the same gene had different pI, suggesting the role of post-translational modification of the protein [62]. However, most of the predicted and observed molecular weight and isoelectric point of E. coli K-12 strain showed reasonable correlation [62], suggesting the significance of our study.

Eukaryotic proteins were reported to possess proteins that, on average, are longer than bacterial proteins, which in turn have a longer average length than archaeal proteins [61,63]. Larger proteins in prokaryotic organisms have been reported to reflect the evolution toward larger protein size [61,63,64]. However, the average protein size in Chlamydomonas reinhardtii, Volvox carteri, and other lower eukaryotic organisms is larger than the average protein size of higher plant species [44,65,66]. The evolution of eukaryotic proteins has been reported to have occurred via the fusion of single-function proteins into multi-domain and multi-functional proteins [63]. The fusion of domains has increased the average size of proteins. It can potentially lead to a reduction in the number of individual proteins in the proteome of a species and its corresponding number of individual genes. Although this is true for prokaryotic and eukaryotic lineages, it is not completely true for plant lineages, as unicellular photosynthetic organisms contain larger average-size proteins. Eudicot plants, however, are considered evolutionarily older than monocot plants, and the average protein size of monocot species is slightly larger than dicot species [44]. The average protein size in monocot plants has been reported to be 431.07 amino acids and 424.30 amino acids in dicot plants [44].

According to the starter-set hypothesis, proteins are assumed to have originated from a small set of starter sequences called functional domains, having a length of 4, 15, or 50 amino acids, which later expanded through gene duplication and other genetic modifications [6770]. This hypothesis also states that gene or exon duplication or fusion has existed since the beginning of the evolution of larger protein sequences [64]. The random-origin hypothesis, however, states that proteins emerged from a large number of random heteropeptides [64,7175]. The random-origin hypothesis also states that the existence of larger proteins has occurred by chance [74,75]. The presence of the largest protein (Tandem-95 repeat protein, accession WP_015217688.1) in the cyanobacterium, Anabaena cylindrica, with a molecular weight of 1200.393 kDa, seems to fit the random-origin hypothesis, as this protein is not found in all cyanobacterial species.

Gamma-type distribution is frequently used to explain protein length [64,74], which assumes that protein sequences may have been exponentially distributed in random lengths. That protein folding and stability are length-dependent and that the potential for an increased number of biochemical activities increases with protein length. The gamma distribution analysis of amino acid sequence length of cyanobacterial proteins closely matches the empirical vs. theoretical value (confidence interval 95%), suggesting a close fit of Gamma distribution for amino acid sequence length/protein size in cyanobacteria (S5 Fig). Correlation analysis of the number of protein sequences vs. the size of the proteome revealed a significant positive correlation (r = 0.918). Strong selective forces have been reported to play an essential role in the increase in the number of proteins of a given size above the average frequency [64,76]. The stability of a protein is highly dependent upon its length, and thus longer proteins possess a selective advantage and have the potential to drive the evolution of a proteome [77,78]. The genetic composition can be theoretically used to assess average protein size and distribution. A stop codon can occur stochastically after a start codon, and it is highly possible that the presence of larger protein-coding sequences will be less frequent than smaller protein-coding sequences. Hence, only 3.28% of proteins are encoded with a molecular weight > 100 kDa. However, the role of pI in the frequency of the distribution of protein sizes is largely unknown. It seems improbable that the genome encodes proteins with varied sizes without consideration of charge and sub-cellular localization. In contrast, it is more plausible that proteins also undergo strong selective pressure based on their charge (pI) and size.

Conclusion and future perspectives

Analysis of the Cyanobacterial proteomes from 229 Cyanobacterial species revealed a bimodal distribution of the molecular weight and isoelectric point of the Cyanobacterial proteome. The map deduced from the molecular weight, and isoelectric point of the Cyanobacteria reflects a virtual 2D map of the Cyanobacterial proteome. These theoretical proteome profiles of the cyanobacteria can be experimentally validated to understand the post-translational modification of the proteins and their role in the change in the isoelectric point of the protein. Post-translational modification can change the molecular weight and isoelectric point of the protein. However, reversible post-translational modification can have a minimal difference in the molecular weight and isoelectric point. The study can be applied to understand the biochemical characteristics of any particular protein and its subsequent localization and function. The amino acid composition study can be very useful in understanding the codon usage bias in the Cyanobacterial genome in the future. The codon usage bias will enable us to understand the molecular evolution through codon optimization. The localization of proteins within the cyanobacterial cells is also poorly understood. It is imperative to produce the sub-cellular proteome map to understand their biochemical and physiological process.

Materials and methods

Sequence downloads and calculation of molecular weight and pI

From the National Center for Biotechnology Information (NCBI), we downloaded all the protein sequences from 229 cyanobacterial species (S1 Table, S1 and S2 Files) and analyzed the proteomic details of these organisms. All the protein sequences belonged to the annotated ORF (open reading frame)/CDS (coding DNA sequences) of the respective gene. The Cyanobacterial species were from 10 different orders, namely Chroococcales, Chroococcidiopsidales, Gloeobacterales, Melainabacteria, Nostocales, Oscillatoriales, Pleurocapsales, Spirulinales, Synechococcales, and Unclassified (S1 Table). All the species were considered for the analysis available till January 2021. The repetitive protein sequences of the individual species were identified using the proteome file of the respective species. The molecular weight and isoelectric point of the protein sequences were calculated using a Linux-based isoelectric point (pI) calculator [79]. The script for the calculation of molecular weight and the isoelectric point is mentioned as mentioned below. The script was ipc <fasta_file> <pKa set> <output_file> <plot_file>. The pKa sets of the ipc can be found in the software itself. The calculated pI and molecular weight of each cyanobacterial protein were then processed using Microsoft Excel 2016. The molecular weight and isoelectric point of the proteins were used to construct the virtual 2D map of the proteome of Cyanobacteria using scatterplot online (https://scatterplot.online/).

Our team used a Linux-based program to calculate the amino acid composition of the cyanobacterial proteomes as well as the amino acid count in each protein sequence. Later we merged all the proteome files that contain information about amino acids to determine the amino acid composition of Cyanobacteria and the overall proteome as a whole.

Statistical analysis

Several methods were used to analyse the derived molecular weight, isoelectric point, amino acid composition, and amino acid sequence length of the cyanobacterial proteins. In correlation analyses, the number of protein sequences in a proteome was compared to the average molecular weight of proteins, the weight of proteomes was compared to protein identity, and the amino acid sequence length was compared to protein identity. Based on amino acid composition, correlation and network plots were constructed, and the Gamma distribution of amino acid sequence length was calculated using JASP 0.14.1.0 software. JASP is an inbuilt software for statistical analysis. The user needs to submit the sample data in CSV file format. Once the data are uploaded, the user can choose the statistical option to run the analysis. A principal component analysis of amino acid composition was performed by Unscrambler 11. Leverage correction was done to conduct PCA analysis. The statistical analysis was conducted at a 95% significance level (p < 0.05). Normal distribution explains the probability of occurrence of protein with pI 2.13 and 13.32 and molecular weight of 1.4 kDa and 1200.393 kDa. Therefore, we conducted a normal probability distribution of molecular weight and isoelectric point. Normal distribution was conducted using math portal (https://www.mathportal.org/calculators.php) calculator.

Evolutionary time scale

The evolutionary time scale of the cyanobacterial species was estimated using the time tree: the time scale of life server (http://www.timetree.org/) [80]. In the time tree server, it needs to specify the taxon name to get the evolutionary timeline and divergence of the species.

Supporting information

S1 Fig. Correlation regression analysis of number of protein sequences with molecular weight of proteomes (kDa).

(PPTX)

S2 Fig. Molecular weight of proteomes and pI of cyanobacterial protein.

(PPTX)

S3 Fig. Correlation plot of amino acid sequence length vs isoelectric point.

(PPTX)

S4 Fig. Gamma distribution of isoelectric point of cyanobacterial proteome.

(PPTX)

S5 Fig. Gamma distribution of amino acid sequence length in cyanobacterial proteome.

(PPTX)

S1 Table. Table depicting the list of the cyanobacterial species with the number of protein sequences used in this study.

(DOCX)

S1 File. Accession number, protein name, molecular weight and isoelectric point of the cyanobacterial proteins.

(XLSX)

S2 File. Details about the species name Genebank accession number, GC% content and pI of cyanobacterial species.

(XLSX)

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This study was supported by the Research Council, Oman in the form of a research grant (BFP/RGP/EBR/21/005) awarded to TKM. No additional external funding was received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Sánchez-Baracaldo P, Cardona T. On the origin of oxygenic photosynthesis and Cyanobacteria. New Phytol. 2020;225: 1440–1446. 10.1111/nph.16249. [DOI] [PubMed] [Google Scholar]
  • 2.Lau N-S, Matsui M, Abdullah AA-A. Cyanobacteria: Photoautotrophic Microbial Factories for the Sustainable Synthesis of Industrial Products. Zhao ZK, editor. Biomed Res Int. 2015;2015: 754934. doi: 10.1155/2015/754934 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schirrmeister BE, Antonelli A, Bagheri HC. The origin of multicellularity in cyanobacteria. BMC Evol Biol. 2011;11: 45. doi: 10.1186/1471-2148-11-45 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rasmussen B, Fletcher IR, Brocks JJ, Kilburn MR. Reassessing the first appearance of eukaryotes and cyanobacteria. Nature. 2008;455: 1101–1104. doi: 10.1038/nature07381 [DOI] [PubMed] [Google Scholar]
  • 5.SÁNCHEZ-BARACALDO P, HAYES PK, BLANK CE. Morphological and habitat evolution in the Cyanobacteria using a compartmentalization approach. Geobiology. 2005;3: 145–165. 10.1111/j.1472-4669.2005.00050.x. [DOI] [Google Scholar]
  • 6.Quesada A, Vincent WF. Cyanobacteria in the Cryosphere: Snow, Ice and Extreme Cold. In: Whitton BA, editor. Ecology of Cyanobacteria II: Their Diversity in Space and Time. Dordrecht: Springer Netherlands; 2012. pp. 387–399. doi: 10.1007/978-94-007-3855-3_14 [DOI] [Google Scholar]
  • 7.Isichei AO. The role of algae and cyanobacteria in arid lands. A review. Arid Soil Res Rehabil. 1990;4: 1–17. doi: 10.1080/15324989009381227 [DOI] [Google Scholar]
  • 8.Cano-Díaz C, Mateo P, Muñoz-Martín MÁ, Maestre FT. Diversity of biocrust-forming cyanobacteria in a semiarid gypsiferous site from Central Spain. J Arid Environ. 2018;151: 83–89. doi: 10.1016/j.jaridenv.2017.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Joset F, Jeanjean R, Hagemann M. Dynamics of the response of cyanobacteria to salt stress: Deciphering the molecular events. Physiol Plant. 1996;96: 738–744. 10.1111/j.1399-3054.1996.tb00251.x. [DOI] [Google Scholar]
  • 10.Laloknam S, Tanaka K, Buaboocha T, Waditee R, Incharoensakdi A, Hibino T, et al. Halotolerant Cyanobacterium Aphanothece halophytica Contains a Betaine Transporter Active at Alkaline pH and High Salinity. Appl Environ Microbiol. 2006;72: 6018 LP– 6026. doi: 10.1128/AEM.00733-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Steinberg CEW, Schäfer H, Beisker W. Do Acid-tolerant Cyanobacteria Exist? Acta Hydrochim Hydrobiol. 1998;26: 13–19. . [DOI] [Google Scholar]
  • 12.Qian F, Dixon DR, Newcombe G, Ho L, Dreyfus J, Scales PJ. The effect of pH on the release of metabolites by cyanobacteria in conventional water treatment processes. Harmful Algae. 2014;39: 253–258. 10.1016/j.hal.2014.08.006. [DOI] [Google Scholar]
  • 13.Fay P. Oxygen relations of nitrogen fixation in cyanobacteria. Microbiol Rev. 1992;56: 340 LP– 373. Available: doi: 10.1128/mr.56.2.340-373.1992 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Latysheva N, Junker VL, Palmer WJ, Codd GA, Barker D. The evolution of nitrogen fixation in cyanobacteria. Bioinformatics. 2012;28: 603–606. doi: 10.1093/bioinformatics/bts008 [DOI] [PubMed] [Google Scholar]
  • 15.Jones AME, Thomas V, Truman B, Lilley K, Mansfield J, Grant M. Specific changes in the Arabidopsis proteome in response to bacterial challenge: differentiating basal and R-gene mediated resistance. Phytochemistry. 2004;65: 1805–1816. doi: 10.1016/j.phytochem.2004.04.005 [DOI] [PubMed] [Google Scholar]
  • 16.Daseke MJ, Valerio FM, Kalusche WJ, Ma Y, DeLeon-Pennell KY, Lindsey ML. Neutrophil proteome shifts over the myocardial infarction time continuum. Basic Res Cardiol. 2019;114: 37. doi: 10.1007/s00395-019-0746-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Foth BJ, Zhang N, Chaal BK, Sze SK, Preiser PR, Bozdech Z. Quantitative Time-course Profiling of Parasite and Host Cell Proteins in the Human Malaria Parasite Plasmodium falciparum. Mol Cell Proteomics. 2011;10: M110.006411. doi: 10.1074/mcp.M110.006411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Di Cagno R, De Angelis M, Calasso M, Gobbetti M. Proteomics of the bacterial cross-talk by quorum sensing. J Proteomics. 2011;74: 19–34. doi: 10.1016/j.jprot.2010.09.003 [DOI] [PubMed] [Google Scholar]
  • 19.Luo J, Tang S, Peng X, Yan X, Zeng X, Li J, et al. Elucidation of Cross-Talk and Specificity of Early Response Mechanisms to Salt and PEG-Simulated Drought Stresses in Brassica napus Using Comparative Proteomic Analysis. PLoS One. 2015;10: e0138974. Available: doi: 10.1371/journal.pone.0138974 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Tezel G. A proteomics view of the molecular mechanisms and biomarkers of glaucomatous neurodegeneration. Prog Retin Eye Res. 2013;35: 18–43. doi: 10.1016/j.preteyeres.2013.01.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhou M, Robinson C V. When proteomics meets structural biology. Trends Biochem Sci. 2010;35: 522–529. doi: 10.1016/j.tibs.2010.04.007 [DOI] [PubMed] [Google Scholar]
  • 22.Chalmel F, Rolland AD. Linking transcriptomics and proteomics in spermatogenesis. Reproduction. 2015;150: R149–R157. doi: 10.1530/REP-15-0073 [DOI] [PubMed] [Google Scholar]
  • 23.Perco P, Mühlberger I, Mayer G, Oberbauer R, Lukas A, Mayer B. Linking transcriptomic and proteomic data on the level of protein interaction networks. Electrophoresis. 2010;31: 1780–1789. doi: 10.1002/elps.200900775 [DOI] [PubMed] [Google Scholar]
  • 24.Wegener KM, Singh AK, Jacobs JM, Elvitigala T, Welsh EA, Keren N, et al. Global proteomics reveal an atypical strategy for carbon/nitrogen assimilation by a cyanobacterium under diverse environmental perturbations. Mol Cell Proteomics. 2010/09/21. 2010;9: 2678–2689. doi: 10.1074/mcp.M110.000109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Herranen M, Battchikova N, Zhang P, Graf A, Sirpiö S, Paakkarinen V, et al. Towards functional proteomics of membrane protein complexes in Synechocystis sp. PCC 6803. Plant Physiol. 2004;134: 470–481. doi: 10.1104/pp.103.032326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Huang F, Parmryd I, Nilsson F, Persson AL, Pakrasi HB, Andersson B, et al. Proteomics of Synechocystis sp. Strain PCC 6803: Identification of Plasma Membrane Proteins. Mol Cell Proteomics. 2002;1: 956–966. doi: 10.1074/mcp.m200043-mcp200 [DOI] [PubMed] [Google Scholar]
  • 27.Plohnke N, Seidel T, Kahmann U, Rögner M, Schneider D, Rexroth S. The proteome and lipidome of Synechocystis sp. PCC 6803 cells grown under light-activated heterotrophic conditions. Mol Cell Proteomics. 2015/01/05. 2015;14: 572–584. doi: 10.1074/mcp.M114.042382 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang Y, Sun J, Chitnis PR. Proteomic study of the peripheral proteins from thylakoid membranes of the cyanobacterium Synechocystis sp. PCC 6803. Electrophoresis. 2000;21: 1746–1754. . [DOI] [PubMed] [Google Scholar]
  • 29.Huang F, Hedman E, Funk C, Kieselbach T, Schröder WP, Norling B. Isolation of Outer Membrane of Synechocystis sp. PCC 6803 and Its Proteomic Characterization. Mol Cell Proteomics. 2004;3: 586–595. doi: 10.1074/mcp.M300137-MCP200 [DOI] [PubMed] [Google Scholar]
  • 30.Simon WJ, Hall JJ, Suzuki I, Murata N, Slabas AR. Proteomic study of the soluble proteins from the unicellular cyanobacterium Synechocystis sp. PCC6803 using automated matrix-assisted laser desorption/ionization-time of flight peptide mass fingerprinting. Proteomics. 2002;2: 1735–1742. . [DOI] [PubMed] [Google Scholar]
  • 31.Pisareva T, Kwon J, Oh J, Kim S, Ge C, Wieslander Å, et al. Model for Membrane Organization and Protein Sorting in the Cyanobacterium Synechocystis sp. PCC 6803 Inferred from Proteomics and Multivariate Sequence Analyses. J Proteome Res. 2011;10: 3617–3631. doi: 10.1021/pr200268r [DOI] [PubMed] [Google Scholar]
  • 32.Srivastava R, Pisareva T, Norling B. Proteomic studies of the thylakoid membrane of Synechocystis sp. PCC 6803. Proteomics. 2005;5: 4905–4916. doi: 10.1002/pmic.200500111 [DOI] [PubMed] [Google Scholar]
  • 33.Baers LL, Breckels LM, Mills LA, Gatto L, Deery MJ, Stevens TJ, et al. Proteome Mapping of a Cyanobacterium Reveals Distinct Compartment Organization and Cell-Dispersed Metabolism. Plant Physiol. 2019/10/02. 2019;181: 1721–1738. doi: 10.1104/pp.19.00897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mohanta TK, Khan AL, Hashem A, Abd-Allah EF, Yadav D, Al-Harrasi A. Genomic and evolutionary aspects of chloroplast tRNA in monocot plants. BMC Plant Biol. 2019;19. doi: 10.1186/s12870-018-1625-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mohanta TK, Mishra AK, Hashem A, Qari SH, Abd_Allah EF, Khan AL, et al. Genome-wide analysis revealed novel molecular features and evolution of Anti-codons in cyanobacterial tRNAs. Saudi J Biol Sci. 2020;27. doi: 10.1016/j.sjbs.2019.12.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Mohanta T, Syed A, Ameen F, Bae H. Novel Genomic and Evolutionary Perspective of Cyanobacterial tRNAs. Front Genet. 2017;8: 200. doi: 10.3389/fgene.2017.00200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Mohanta TK, Mishra AK, Hashem A, Abd_Allah EF, Khan AL, Al-Harrasi A. Construction of anti-codon table of the plant kingdom and evolution of tRNA selenocysteine (tRNASec). BMC Genomics. 2020;21: 804. doi: 10.1186/s12864-020-07216-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mohanta TK, Pudake RN, Bae H. Genome-wide identification of major protein families of cyanobacteria and genomic insight into the circadian rhythm. Eur J Phycol. 2017;52. doi: 10.1080/09670262.2016.1251619 [DOI] [Google Scholar]
  • 39.Moya A, Oliver JL, Verdú M, Delaye L, Arnau V, Bernaola-Galván P, et al. Driven progressive evolution of genome sequence complexity in Cyanobacteria. Sci Rep. 2020;10: 19073. doi: 10.1038/s41598-020-76014-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wang J, Huang X, Ge H, Wang Y, Chen W, Zheng L, et al. The quantitative proteome atlas of a model cyanobacterium. J Genet Genomics. 2022;49: 96–108. doi: 10.1016/j.jgg.2021.09.007 [DOI] [PubMed] [Google Scholar]
  • 41.Najafpour GD. CHAPTER 14—Single-Cell Protein. In: Najafpour GDBT-BE and B, editor. Biochemical Engineering and Biotechnology. Amsterdam: Elsevier; 2007. pp. 332–341. 10.1016/B978-044452845-2/50014-8. [DOI] [Google Scholar]
  • 42.Babele PK, Kumar J, Chaturvedi V. Proteomic De-Regulation in Cyanobacteria in Response to Abiotic Stresses. Front Microbiol. 2019;10: 1315. doi: 10.3389/fmicb.2019.01315 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ow SY, Wright PC. Current trends in high throughput proteomics in cyanobacteria. FEBS Lett. 2009;583: 1744–1752. doi: 10.1016/j.febslet.2009.03.062 [DOI] [PubMed] [Google Scholar]
  • 44.Mohanta TK, Khan AL, Hashem A, Abd_Allah EF, Al-Harrasi A. The Molecular Mass and Isoelectric Point of Plant Proteomes. BMC Genomics. 2019;20: 631. doi: 10.1186/s12864-019-5983-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Mohanta TK, Kamran MS, Omar M, Anwar W, Choi GS. PlantMWpIDB: a database for the molecular weight and isoelectric points of the plant proteomes. Sci Rep. 2022;12: 1–7. doi: 10.1038/s41598-022-11077-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mohanta TK, Mishra AK, Khan A, Hashem A, Abd_Allah EF, Al-Harrasi A. Virtual 2-D map of the fungal proteome. Sci Rep. 2021;11: 6676. doi: 10.1038/s41598-021-86201-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mohanta TK, Mishra AK, Mohanta YK, Al-Harrasi A. Virtual 2D mapping of the viral proteome reveals host-specific modality distribution of molecular weight and isoelectric point. Sci Rep. 2021;11: 21291. doi: 10.1038/s41598-021-00797-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Vogt G, Woell S, Argos P. Protein thermal stability, hydrogen bonds, and ion pairs. J Mol Biol. 1997;269: 631–643. doi: 10.1006/jmbi.1997.1042 [DOI] [PubMed] [Google Scholar]
  • 49.Kawashima T, Amano N, Koike H, Makino S, Higuchi S, Kawashima-Ohya Y, et al. Archaeal adaptation to higher temperatures revealed by genomic sequence of Thermoplasma volcanium. Proc Natl Acad Sci. 2000;97: 14257–14262. doi: 10.1073/pnas.97.26.14257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lanming C, Kim B, Marie S, Peter R, Qunxin S, Elfar T, et al. The Genome of Sulfolobus acidocaldarius, a Model Organism of the Crenarchaeota. J Bacteriol. 2005;187: 4992–4999. doi: 10.1128/JB.187.14.4992-4999.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Panja AS, Maiti S, Bandyopadhyay B. Protein stability governed by its structural plasticity is inferred by physicochemical factors and salt bridges. Sci Rep. 2020;10: 1822. doi: 10.1038/s41598-020-58825-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Gaysina LA, Saraf A, Singh P. Cyanobacteria in Diverse Habitats. In: Mishra AK, Tiwari DN, Rai ANBT-C, editors. Cyanobacteria. Academic Press; 2019. pp. 1–28. 10.1016/B978-0-12-814667-5.00001-5. [DOI] [Google Scholar]
  • 53.Chaurasia A. Cyanobacterial biodiversity and associated ecosystem services: introduction to the special issue. Biodivers Conserv. 2015;24: 707–710. doi: 10.1007/s10531-015-0908-6 [DOI] [Google Scholar]
  • 54.Ralf G, Repeta DJ. The pigments of Prochlorococcus marinus: The presence of divinylchlorophyll a and b in a marine procaryote. Limnol Oceanogr. 1992;37: 425–433. 10.4319/lo.1992.37.2.0425. [DOI] [Google Scholar]
  • 55.Hönisch B, Ridgwell A, Schmidt DN, Thomas E, Gibbs SJ, Sluijs A, et al. The Geological Record of Ocean Acidification. Science (80-). 2012;335: 1058 LP– 1063. doi: 10.1126/science.1208277 [DOI] [PubMed] [Google Scholar]
  • 56.Doney SC, Fabry VJ, Feely RA, Kleypas JA. Ocean Acidification: The Other CO2 Problem. Ann Rev Mar Sci. 2009;1: 169–192. doi: 10.1146/annurev.marine.010908.163834 [DOI] [PubMed] [Google Scholar]
  • 57.Riebesell U, Gattuso J-P. Lessons learned from ocean acidification research. Nat Clim Chang. 2015;5: 12–14. doi: 10.1038/nclimate2456 [DOI] [Google Scholar]
  • 58.Di Rienzi SC, Sharon I, Wrighton KC, Koren O, Hug LA, Thomas BC, et al. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. Kolter R, editor. Elife. 2013;2: e01102. doi: 10.7554/eLife.01102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Wrighton KC, Castelle CJ, Wilkins MJ, Hug LA, Sharon I, Thomas BC, et al. Metabolic interdependencies between phylogenetically novel fermenters and respiratory organisms in an unconfined aquifer. ISME J. 2014;8: 1452–1463. doi: 10.1038/ismej.2013.249 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Chothia C, Finkelstein A V. The classification and origins of protein folding patterns. Annu Rev Biochem. 1990;59: 1007–1035. doi: 10.1146/annurev.bi.59.070190.005043 [DOI] [PubMed] [Google Scholar]
  • 61.Zhang J. Protein-length distributions for the three domains of life. Trends Genet. 2000;16: 107–109. doi: 10.1016/s0168-9525(99)01922-8 [DOI] [PubMed] [Google Scholar]
  • 62.Link AJ, Robison K, Church GM. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis. 1997;18: 1259–1313. doi: 10.1002/elps.1150180807 [DOI] [PubMed] [Google Scholar]
  • 63.Brocchieri L, Karlin S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005;33: 3390–3400. doi: 10.1093/nar/gki615 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes. 2012;5: 85. doi: 10.1186/1756-0500-5-85 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Mohanta TK, Arora PK, Mohanta N, Parida P, Bae H. Identification of new members of the MAPK gene family in plants shows diverse conserved domains and novel activation loop variants. BMC Genomics. 2015;16. doi: 10.1186/s12864-015-1244-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Mohanta TK, Mohanta N, Mohanta YK, Bae H. Genome-Wide Identification of Calcium Dependent Protein Kinase Gene Family in Plant Lineage Shows Presence of Novel D-x-D and D-E-L Motifs in EF-Hand Domain. Front Plant Sci. 2015;6: 1146. doi: 10.3389/fpls.2015.01146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Eck R V, Dayhoff MO. Evolution of the Structure of Ferredoxin Based on Living Relics of Primitive Amino Acid Sequences. Science (80-). 1966;152: 363 LP– 366. doi: 10.1126/science.152.3720.363 [DOI] [PubMed] [Google Scholar]
  • 68.McLachlan AD. Repeating sequences and gene duplication in proteins. J Mol Biol. 1972;64: 417–437. doi: 10.1016/0022-2836(72)90508-6 [DOI] [PubMed] [Google Scholar]
  • 69.Darnell JE. Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science (80-). 1978;202: 1257 LP– 1260. doi: 10.1126/science.364651 [DOI] [PubMed] [Google Scholar]
  • 70.Dorit RL, Gilbert W. The limited universe of exons. Curr Opin Genet Dev. 1991;1: 464–469. doi: 10.1016/s0959-437x(05)80193-5 [DOI] [PubMed] [Google Scholar]
  • 71.White SH, Jacobs RE. Statistical distribution of hydrophobic residues along the length of protein chains. Implications for protein folding and evolution. Biophys J. 1990;57: 911–921. doi: 10.1016/S0006-3495(90)82611-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Lau KF, Dill KA. Theory for protein mutability and biogenesis. Proc Natl Acad Sci. 1990;87: 638 LP– 642. doi: 10.1073/pnas.87.2.638 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Shakhnovich EI, Gutin AM. Implications of thermodynamics of protein folding for evolution of primary sequences. Nature. 1990;346: 773–775. doi: 10.1038/346773a0 [DOI] [PubMed] [Google Scholar]
  • 74.White SH. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences. J Mol Evol. 1994;38: 383–394. doi: 10.1007/BF00163155 [DOI] [PubMed] [Google Scholar]
  • 75.White SH, Jacobs RE. The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol. 1993;36: 79–95. doi: 10.1007/BF02407307 [DOI] [PubMed] [Google Scholar]
  • 76.Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci U S A. 2006;103: 2605 LP– 2610. doi: 10.1073/pnas.0509379103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Dill KA. Theory for the folding and stability of globular proteins. Biochemistry. 1985;24: 1501–1509. doi: 10.1021/bi00327a032 [DOI] [PubMed] [Google Scholar]
  • 78.White SH. Amino acid preferences of small proteins: Implications for protein stability and evolution. J Mol Biol. 1992;227: 991–995. doi: 10.1016/0022-2836(92)90515-L [DOI] [PubMed] [Google Scholar]
  • 79.Kozlowski LP. IPC–Isoelectric Point Calculator. Biol Direct. 2016;11: 55. doi: 10.1186/s13062-016-0159-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol Biol Evol. 2017;34: 1812–1819. doi: 10.1093/molbev/msx116 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Arabinda Ghosh

19 Aug 2022

PONE-D-22-12929Decoding the Virtual 2D Map of Cyanobacterial ProteomesPLOS ONE

Dear Dr. Mohanta,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please submit your revised manuscript by Oct 03 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Arabinda Ghosh

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript submitted by Mohanta et al. has described the virtual 2D map of the cyanobacterial proteomes. They analyzed the Cyanobacterial proteomes of 229 species and reported the basic proteomics. The study contains each fundamental detail of the cyanobacterial proteome. It is promising research and holds high value get to publish in PLoS One. However, the article needs appropriate revision as per the comments provided below before acceptance for publication.

Overall comments; but the explanation should be reflected in the manuscript file to convey the importance of the study to the audience

1. Why authors have included 229 species in their study? Do they have some commonalities or chosen as random?

2. How the results of the cyanobacterial proteome can be corroborated with the bacterial or plant proteome. Please explain the similarities and differences, if any?

3. Is there any correlation with regard to molecular weight and isoelectric point of cyanobacterial proteomes?

4. Why Cys is lowest and Leu is highest abundant amino acid in the cyanobacterial proteomes. Although, it is difficult to establish the reason, do authors have any information or hypothesis behind this?

5. What is the possible reason for the presence of higher average basic pI proteins (8.62) in cyanobacteria?

Abstract:

I. Rewrite the sentence “…..The 20 genome sequence data of the Cyanobacteria has been extensively studied, but a detailed 21 proteomics study of the Cyanobacteria is lacking….” for better meaning and simplicity.

II. Rewrite the sentence “…..The Cyanobacterial proteome encoded a 27 greater number of acidic-pI proteins, and the average pI of the Cyanobacterial proteome was found 28 6.437…..”.

III. Write either full name or its abbreviation any one “molecular weight and pI”

IV. Pls. rewrite “…Using this computational analysis…..”

Introduction:

i. Make a single sentence “Therefore, proteomics provides more information about an organism compared to its genome. Proteomics provides an understanding of the cellular function and serves as a link between gene expression and translational products.”

ii. Rewrite the sentence “However, we lose lot of information about the whole proteome study as it is not possible to conduct 2D gel electrophoresis below pH 3 and above pH 11 due to lack of IPG (immobilized pH gradient) stripe below pH 3 and above pH 11.

iii. ………proteomic details can be written as ……..proteomics data…….

iv. Please rewrite this as both the consecutive sentences are started with Therefore….. line 81-86

The introduction section is nicely presented but moderate English editing will be more helpful to the diverse audience to understand this informative work.

Results

I. The results part is well described as per the aim and objectives of the Manuscript by the authors; however, the pictorial presentation would be more updated by increasing the resolutions.

II. The author should also mention the image source and the software used to make the images.

III. Units of the parameters should cross-checked.

Discussions

I. The authors have added sufficient discussion on their works and if within this time any more literature has been published, please add them in the discussion part which could give a complete frame for this work.

II. The introduction section is nicely presented but moderate English editing will be more helpful to the diverse audience to understand this informative work.

Conclusion

The author should add a paragraph on future prospects of these types of study in the application to the real scientific world. As this work is indeed a time taking analysis based on the data available in the database but it needs a very clear futuristic direction that can encourage following researchers

Materials and Methods

I. In this section, the Author should add a table providing details on databases (online/offline) and other internet sources which will be essential to carry out such type of study. It would be helpful for young budding researchers to handle such big data sets.

II. Authors should add abbreviations.

Moderate English language revision is a must prior to considering this manuscript for the publications.

Reviewer #2: - This study was designed similar to authors previous publication on ‘Virtual 2-D map of the fungal proteome’. This study used the sequences of 229 cyanobacterial species to analyse the molecular mass, average number of amino acids and isoelectric point of cyanobacterial proteome.

- This study focussed only on numbers and statistical data, and it lacks functional analysis.

- Following suggestions may improve the quality of manuscript.

- The whole structure of the manuscript needs to be reorganised.

- The headings in the results section seems like long sentences rather than section headings. It should be descriptive and as concise as possible.

- Table.1.: it can be provided as supplementary information. The information in this table can be segregated to identify the species with ascending or descending order of no of sequences/mol wt/pI. The functional analysis should be discussed in the discussion section. For example, the authors should extract the information on species with industrial applications and its correlation to above parameters.

- Table.2. Significance of highest and lowest percentage of amino acids and its correlation to different species should be discussed in the text.

- The manuscript should also focus the discussion on evolutionary time scale. Evolution of species and order may be represented in cladograms. Divergence and functional variations should also be discussed in the manuscript.

- Majority of the conclusion section is almost hypothetical. This paper not concluded anything relevant to the functional and structural analysis.

- Figure 6. B was not explained anywhere in the text. Figure legend should contain the information on colour patterns.

- Figure 5 legend should be precise. Remove the analysis part in figure legend

Reviewer #3: Decoding the Virtual 2D Map of Cyanobacterial Proteomes

The virtual 2D map of the cyanobacteria proteomes is reported in the manuscript. The authors reported the fundamental proteomics after analyzing the 229 species of cyanobacteria proteomes. Every essential aspect of the cyanobacteria proteome is covered in the study. The research is exciting and valuable enough to be published in PLoS One. Before being accepted for publication, the piece must, however, be appropriately revised in accordance with the remarks offered below.

Comments:

1. The title of the manuscript should be modified.

2. If possible graphical abstract should be provided.

3. Line no 412-414: Protein sequences of 229 cyanobacteria species were downloaded from NCBI and proteomics details of these organisms were analyzed.

4. The author should give a strong hypothesis before this study by taking huge data sets.

5. Why authors have considered 229 species and it will be better if the author would give a justification?

6. Why author has taken the parameter of analysis like molecular weight and isoelectric point in this study?

7. Line no 418 and 419: For analysis, recent data should be taken into account if available.

8. Line no 419 and 420: Only one proteome file was considered for 420 the analysis of those species contained repetitive protein sequence data. Is there any specific reason for that? If yes please mention it if possible.

9. Line no 420 and 421: The data analysis starting date was already mentioned, there is no need for repetition.

10. Line 427 and 428: Please provide the script if possible.

11. Line 437-439 and Line 446-447: Please explain how JASP Unscrambler 11, and time tree: the time scale of life server works.

12. In Fig 6 (b): Is there any particular explanation regarding different color coding between the interactions? If yes, please mention it.

13. All figures need a better resolution.

14. The author should also mention the image source and the software used to make the images.

15. The author must add a paragraph on future prospects

Recommendation: Major Revision

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Vinod Kumar Yata

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 3;17(10):e0275148. doi: 10.1371/journal.pone.0275148.r002

Author response to Decision Letter 0


23 Aug 2022

Dear editor,

Greetings

Please find the rebuttal letter to get point to point response.

Sincerely

Dr. Tapan

Attachment

Submitted filename: Rebuttal_Response to reviewer comments.docx

Decision Letter 1

Arabinda Ghosh

8 Sep 2022

PONE-D-22-12929R1Virtual 2D Map of Cyanobacterial ProteomesPLOS ONE

Dear Dr. Mohanta,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 23 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Arabinda Ghosh

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

As per the reviewer suggestions there are few minor comments needs to be addressed before any decision on the manuscript.

Reviewer 1.

The authors have revised the manuscript very well and now acceptable for publication. However, I have a few minor comments that need to be looked after for future.

1. Do the correlation of pI and Molecular weight has any impact on structural aspects of the protein?

2. Can these Molecular weight and pI play role in structure of the protein?

Reviewer 2.

I agree with the authours response to my queries, and recommend this manuscript for publication

Reviewer 3.

The authors have addressed all the comments. However there are a very few grammatical errors should be addressed. I recommend this article for publication.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have revised the manuscript very well and now acceptable for publication. However, I have a few minor comments that need to be looked after for future.

1. Do the correlation of pI and Molecular weight has any impact on structural aspects of the protein?

2. Can these Molecular weight and pI play role in structure of the protein?

Reviewer #2: (No Response)

Reviewer #3: The authors have addressed all the comments. However there are a very few grammatical errors should be addressed.

I recommend this article for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Vinod Kumar Yata

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: BDM_ Recommendation.docx

PLoS One. 2022 Oct 3;17(10):e0275148. doi: 10.1371/journal.pone.0275148.r004

Author response to Decision Letter 1


10 Sep 2022

Dear Editor,

Greetings

Please find the attached letter to find the response to reviewer commnts.

Regards

Dr. Tapan

Attachment

Submitted filename: Rebuttal_Response to reviewer comments.docx

Decision Letter 2

Arabinda Ghosh

12 Sep 2022

Virtual 2D Map of Cyanobacterial Proteomes

PONE-D-22-12929R2

Dear Dr. %Mohanta%,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Arabinda Ghosh

Academic Editor

PLOS ONE

Acceptance letter

Arabinda Ghosh

23 Sep 2022

PONE-D-22-12929R2

Virtual 2D Map of Cyanobacterial Proteomes

Dear Dr. Mohanta:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Arabinda Ghosh

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Correlation regression analysis of number of protein sequences with molecular weight of proteomes (kDa).

    (PPTX)

    S2 Fig. Molecular weight of proteomes and pI of cyanobacterial protein.

    (PPTX)

    S3 Fig. Correlation plot of amino acid sequence length vs isoelectric point.

    (PPTX)

    S4 Fig. Gamma distribution of isoelectric point of cyanobacterial proteome.

    (PPTX)

    S5 Fig. Gamma distribution of amino acid sequence length in cyanobacterial proteome.

    (PPTX)

    S1 Table. Table depicting the list of the cyanobacterial species with the number of protein sequences used in this study.

    (DOCX)

    S1 File. Accession number, protein name, molecular weight and isoelectric point of the cyanobacterial proteins.

    (XLSX)

    S2 File. Details about the species name Genebank accession number, GC% content and pI of cyanobacterial species.

    (XLSX)

    Attachment

    Submitted filename: Rebuttal_Response to reviewer comments.docx

    Attachment

    Submitted filename: BDM_ Recommendation.docx

    Attachment

    Submitted filename: Rebuttal_Response to reviewer comments.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES