Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2021 Oct 6;50(D1):D758–D764. doi: 10.1093/nar/gkab891

CyanoOmicsDB: an integrated omics database for functional genomic analysis of cyanobacteria

Peng Zhou 1,2, Li Wang 2,2, Hai Liu 3, Chunyan Li 4, Zhimin Li 5,6, Jinxiang Wang 7, Xiaoming Tan 8,
PMCID: PMC8728175  PMID: 34614159

Abstract

With their photosynthetic ability and established genetic modification systems, cyanobacteria are essential for fundamental and biotechnological research. Till now, hundreds of cyanobacterial genomes have been sequenced, and transcriptomic analysis has been frequently applied in the functional genomics of cyanobacteria. However, the massive omics data have not been extensively mined and integrated. Here, we describe CyanoOmicsDB (http://www.cyanoomics.cn/), a database aiming to provide comprehensive functional information for each cyanobacterial gene. CyanoOmicsDB consists of 8 335 261 entries of cyanobacterial genes from 928 genomes. It provides multiple gene identifiers, visualized genomic location, and DNA sequences for each gene entry. For protein-encoding genes, CyanoOmicsDB can provide predicted gene function, amino acid sequences, homologs, protein-domain super-families, and accession numbers for various public protein function databases. CyanoOmicsDB integrates both transcriptional and translational profiles of Synechocystis sp. PCC 6803 under various environmental culture coditions and genetic backgrounds. Moreover, CyanoOmicsDB includes 23 689 gene transcriptional start sites, 94 644 identified peptides, and 16 778 post-translation modification sites obtained from transcriptomes or proteomes of several model cyanobacteria. Compared with other existing cyanobacterial databases, CyanoOmicsDB comprises more datasets and more comprehensive functional information. CyanoOmicsDB will provide researchers in this field with a convenient way to retrieve functional information on cyanobacterial genes.

INTRODUCTION

Cyanobacteria are the only prokaryotes that can perform oxygen-evolving photosynthesis (1). Cyanobacteria have been emerging as popular model organisms for fundamental and biotechnological research because they are amenable to genetic engineering and possess a relatively fast growth rate and good tolerance to environmental stresses (2–4).

In 1997, Synechocystis sp. PCC 6803 became the first cyanobacterium whose genome was sequenced entirely (5). Since then, the number of sequenced cyanobacterial genomes increased rapidly, especially when high-throughput next-generation sequencing (NGS) became a reliable and routine technique. Although hundreds of cyanobacterial genomes were sequenced within the last two decades, only limited cyanobacterial genes were functionally characterized in a few model cyanobacteria.

Microarray or RNA-sequencing (RNA-seq) based transcriptomic analysis, a powerful tool for linking genes with their functions, was also normally used to identify the differentially expressed genes in cyanobacteria under different environmental conditions. In addition, transcriptional start sites were systematically identified in multiple cyanobacterial species using primary transcriptome analysis (6–9). These transcriptional data are useful in demonstrating the biological functions of genes.

A comprehensive online database including genomic, transcriptomic, and reference information of cyanobacteria is essential for researchers in this field to in silico analysis of gene functions before conducting laboratory experiments. CyanoBase (http://genome.microbedb.jp/cyanobase) was first established as a genome database for Synechocystis sp. PCC 6803 in 1998 and has been updated several times in the last two decades (10–13). Currently, CyanoBase comprises 86 complete and 290 draft genomes and has become the most popular cyanobacterial database in this field. However, CyanoBase does not contain any transcriptomic data. CyanoEXpress (http://cyanoexpress.sysbiolab.eu/) currently comprises visualized expression data of 3 078 genes from Synechocystis sp. PCC 6803 in response to 178 environmental and genetic perturbations (14). CyanOmics (https://lag.ihb.ac.cn/index.html) comprises a few omics datasets for Synechococcus sp. PCC 7002 (15). In addition, CyanoClust (http://cyanoclust.c.u-tokyo.ac.jp/) contains protein homology information for cyanobacteria and plastids (16), and CyanoLyase (http://cyanolyase.genouest.org/) contains the sequences and motifs of phycobilin lyases and related proteins from cyanobacteria, red algae, and cryptophytes (17).

Herein, we describe CyanoOmicsDB, an integrated web database containing genomic data of 928 cyanobacterial strains, 56 independent transcriptomic datasets, 3 primary transcriptomic datasets, and 15 proteomic datasets, which is currently the most comprehensive omics database for cyanobacteria.

MATERIALS AND METHODS

Data retrieval

To comprehensively investigate the genomic sequences of cyanobacteria, we downloaded all available cyanobacterial genomic sequences and their annotation from the NCBI assembly database (https://www.ncbi.nlm.nih.gov/assembly/) using the NCBI-datasets tool. The amino acid sequences of each protein-coding gene were used as queries to search against the Integrative Protein Signature Database (InterPro) using InterProScan 5.45–80.0 (18). Gene Expression Omnibus (GEO) Series Matrix files containing gene expression profiles were downloaded from the GEO database (19) using the GEOquery package (20), whereas the raw reads of RNA-seq were download from the Sequence Read Archive (SRA) database (21) using the SRA-toolkit (https://github.com/ncbi/sra-tools) (Figure 1).

Figure 1.

Figure 1.

Overview of the process of creating CyanoOmicsDB. Raw data were recovered from public databases and then assessed to establish associations between cyanobacterial genes and their nucleotide sequences, amino acid sequences, annotations, accession numbers to various databases (gene2Acc), differential expression profiles (gene2FoldChange), and references (gene2Reference). Except for the gene-sequence association, all other associations were formatted and integrated as four collections in the back-end MongoDB database. Amino acid sequences were used to create a BLAST database (Blastdb), and the nucleotide sequences and their annotations were used in configuring JBrowse tracks on the CyanoOmisDB server. Vue.js was used as the front-end application framework of CyanoOmicsDB. Vue components (indicated by upper cases) were created to retrieve and display detailed information of the retrieved gene or genome from the back-end.

Genomic data aggregation and processing

The metainformation of these genomes was extracted using the NCBI data format tool and formatted as a tab-separated values (TSV) file. The basic information for each gene, including locus_tag, gene symbol, old_locus_tag, genomic location, protein id, and encoding product, was extracted from the general feature format (GFF) files. The Enzyme Commission (EC) (22), Gene Ontology (GO) (23,24), Protein Families Database (Pfam) (25), MetaCyc (26), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (27,28) identifiers for each gene were extracted from the resulting InterProScan outputs. The data for each gene were aggregated as another TSV file (Figure 1 and Supplementary Dataset S1).

Local blast for searching homologs

All amino acid sequences were retrieved from the raw fasta files (FAA) and combined into a single fasta file. The name of each amino acid sequence was formatted as its locus_tag. The local Basic Local Alignment Search Tool (BLAST) database was established from the resulting formatted fasta file using the makeblastdb command (29). For searching homologs of a gene, the resultant database was searched against using the BlastP command with the amino acid sequences of the gene as a query and with ‘-qcov_hsp_perc 70 and -evalue 1e-5’ as a set of parameters. The locus_tag of each hit was used to recover protein id, product, and species name from the metadata of all cyanobacterial genes. The results were combined with the original BlastP output and shown as a new table in the HOMOLOGS module.

Retrieval of nucleotide and amino acid sequences on request

The nucleotide sequences were obtained from the fasta files (FNA) containing genomic sequences according to chromosome accession number and the start and end positions of the recovered gene. The amino acid sequences were recovered from the above-formatted amino acid fasta file according to the locus_tag of the recovered gene.

Visualization of genomic data using JBrowse

JBrowse (30) was installed and set up according to its official documentation. The genomic sequences and annotations were formatted as reference sequences and feature tracks, respectively, using the perl scripts provided by JBrowse (Figure 1).

Transcriptomic data aggregation and processing

Raw transcriptomic data of Synechocystis sp. PCC 6803 was from either GEO or SRA database. Culture conditions and experimental groups were extracted from GPL files in the GEO database or description information in the SRA database. GEO microarray data series was downloaded using the GEOquery (20) package and analyzed using the Limma package (31) in R language. For RNA-seq data, raw reads were downloaded from the SRA database, aligned to reference genomes (GCA_000009725.1 and GCF_000009725.1) of Synechoccystis sp. PCC 6803 using the Bowtie2 (32). Raw read counts of genes were computed using the HTSeq-count program (33). The summarized count matrix was analyzed using the DESeq2 (34) package. And the output Log2Foldchange and adjusted p-value of different comparisons were aggregated as a TSV file (Supplementary Dataset S2).

Proteomic data aggregation and processing

The identified peptides and the post-translational modifications (PTMs) of cyanobacterial proteins were obtained from the reported publications on cyanobacterial proteomics (35–39) and further integrated into the gene information collection. Furthermore, genomic positions of nucleotide sequences coding for these peptides were confirmed by mapping their arrangements to the corresponding reference genomes and generating JBrowse tracks. Differential expression profiles of cyanobacterial proteins were also obtained from reported publications (40–47) and integrated into the differential expression collection.

Reference data aggregation and processing

For the recent publication page, the reference information containing title, authors, journal, digital object identifiers (DOIs), and abstract was retrieved from PubMed using Entrez package in Python language script with ‘cyanobacteria’ or ‘cyanobacterium’ as keywords. The recent 200 publications were integrated into a TSV file. For the reference page of each gene, the information was recovered using the same script with the keywords containing both the species name and the locus_tag or gene symbol.

Web app implementation

The associations between cyanobacterial genes with their functional information were formatted and integrated as four collections in the back-end MongoDB database (Figure 1). A popular Quasar framework was used for building concise user interfaces of CyanoOmicsDB. In addition, self-constructed search engines and sortable, filterable, and paginated data tables were created for displaying information from each dataset or search result.

RESULTS

Database content

At the time of writing (7 April 2021), CyanoOmicsDB contained 8 335 261 entries of unique cyanobacterial genes from 186 complete and 742 draft cyanobacterial genomes (Supplementary Table S1). We developed a pipeline to download cyanobacterial genome datasets from the NCBI assembly database and extract, format, and import information for each gene into the CyanoOmicsDB. For each gene, CyanoOmicsDB provides basic information, gene annotations and six functional analysis modules containing JBrowse (for genome visualization), sequence, homologs, families/domains, differential gene expression, and references (Figure 1).

Basic information

For most genomes, CyanoOmicsDB provides both GenBank (GCA) (48) and RefSeq (GCF) (49) genome assemblies separately. Locus_tags, gene symbols and old_locus_tags were extracted from all annotated GFF files of GenBank or RefSeq assemblies and set as retrievable gene identifiers in our database. Further, the lengths, genomic location, and gene-coding type are also provided in the basic information module.

Gene annotations

Based on the annotated GFF files and the InterProScan output results, we extracted and collected the accession numbers for each gene from multiple widely used annotation databases, including EC,GO, Pfam, MetaCyc and KEGG. These accession numbers depicted on the CyanoOmicsDB are linked to the corresponding databases for more detailed information. Additionally, transmembrane domains of each protein-coding gene were identified using TMHMM 2.0 (50) and used as a criterion to determine whether the protein is a membrane protein or a soluble protein. The InterProScan output result for each protein-coding gene is indicated in the module tab named ‘FAMILIES/DOMAINS’ in which the conserved domains and homologous super-families are depicted graphically.

Identified peptides and post-translational modifications

To provide useful information for expressed proteins, we obtained identified peptides and their PTMs from some published proteome data of Synechocystis sp. PCC 6803 (35,36,38), Synechococcus sp. PCC 7002 (37,39), or Synechococcus sp. WH 8102 (51) (Supplementary Table S2). In total, 94 644 unique peptides and 16 778 PTM sites from 6 967 cyanobacterial proteins (Supplementary Dataset S3) were integrated into CyanoOmicsDB. For each gene, peptides identified in the same publication are indicated as separated lines in CyanoOmicsDB. Each PTM is indicated as a word combining its amino acid position, the modified amino acid residue, and the abbreviated modification type. Full names of PTM modification types were indicated in mouseover texts. By clicking on peptides or PTMs, users will be directed to the source publications in which these peptides or PTMs were experimentally identified.

Genome visualization using JBrowse

In the JBROWSE module, the retrieved gene's genomic region is displayed on the corresponding reference genome track with the recovered gene highlighted in yellow background. Users can freely view the adjacent genome areas in the same genome track by dragging or zooming in or out. The features of genes encoded on different strands are exhibited in different colors. By left-clicking on the features, users can access detailed web pages of neighboring genes. The 23 689 transcriptional start sites or transcriptional units of Synechocystis sp. PCC 6803, Nostoc sp. PCC 7120, and Synechococcus elongatus UTEX 2973, which were systematically identified by primary transcriptome analysis (7,9,52), are displayed as independent tracks named ‘TSS’ or ‘Transcript_unit’. Also, 94 644 peptides and 16 778 PTM sites identified from the proteomes of Synechocystis sp. PCC 6803, Synechococcus sp. PCC 7002, and Synechococcus sp. WH 8102 are shown as independent tracks named as ‘Peptides’ and ‘PTMs’, respectively.

Sequence retrieval and analysis

By default, CyanoOmicsDB exhibits both nucleotide and amino acid sequences of the retrieved genes in the SEQUENCE module tab. Users can freely set the start and end positions of a gene to update nucleotide sequences. CyanoOmicsDB also provides links to further sequence analysis in the SEQUENCE module, including KEGG, InterProScan, local and online BlastP, and online BlastN and BlastX. Notably, CyanoOmicsDB can conduct local BlastP against a local amino acid database containing only amino acid sequences of cyanobacterial proteins, with a default parameter described in the Materials and Methods section. Therefore, if users want to set up custom parameters, including search database and expect threshold, they should choose online Blast interfaces.

Gene homologs

A local BlastP can be automatically performed using the default parameters when loading the HOMOLOGS module to show homologous genes of the retrieved gene. The results of BlastP will be outputted in the tab-separated format and parsed as a table containing the locus_tag, protein id, encoding products and species names of hits. Additionally, the percentage identity of identical residues between target and query sequences (identity), query coverage and E-value will be included in the same table. By clicking on the locus_tag of each hit, users will be directed to the detailed webpage of the hit.

References

The academic literature was recovered from PubMed using both gene identifiers and species names as keywords. Only the literature containing the recovered species name in its title/abstract and gene identifiers in the main texts was linked to specific gene entry. Basic information, including titles, authors, journals, PMIDs and DOIs of the related literature collected for each gene, will be indicated in the REFERENCES module tab, if any. The texts containing keywords can be extracted and displayed after the basic literature information. The detailed abstract and full texts can be accessed by clicking on either titles or DOIs.

Gene expression

To investigate gene function, gene expression profiles under different environmental conditions or genetic backgrounds are informative. Raw transcriptomic data of Synechocystis sp. PCC 6803 deposited in 40 GEO datasets and 16 SRA transcriptome studies were collected and reanalyzed using the Limma and DESeq2 packages, respectively. The differential expressions of proteins were directly collected from publications on cyanobacterial proteomes. In total, CyanoOmicsDB contains 203 pairwise transcriptome comparisons and 25 proteome comparisons among different culturing conditions and genetic backgrounds (Supplementary Table S3). Transcriptional or translational changes of each gene in various comparisons, the corresponding conditions, and GEO/SRP/PMID accession numbers were combined and displayed in the GENE EXPRESSION module. By clicking on these accession numbers, users will be directed to the GEO Accession viewer, the SRA Run Selector, or the published literature for more detailed descriptions of the comparisons.

CyanoIdMapping tool

For conversion of gene identifiers from different databases, CyanoOmicsDB provides the online CyanoIdMapping tool (http://www.cyanoomics.cn/lz/id-mapping). Users can convert RefSeq gene identifiers of cyanobacteria to the corresponding GenBank identifiers or vice versa.

Search engines

To quickly and accurately retrieve data, CyanoOmicsDB provides multiple ways for users to search data. First, CyanoOmicsDB provides a search bar on the top panel of each webpage. Using this bar, users can search any fields in either species or gene datasheets. Second, CyanoOmicsDB provides filters in any listing pages, using which users can conveniently narrow down the search results.

DISCUSSION

Comparison of CyanoOmicsDB with other similar databases

So far, there are several reported databases of cyanobacteria with different contents, for example, CyanoBase, CyanoExpress, CyanOmics, CyanoClust and CyanoLyase. A detailed comparison of CyanoOmicsDB with these databases is shown in the Supplementary Table S4. Compared with CyanoLyase and CyanoClust that focus on a limited number of genes, CyanoOmisDB collects all genes for each included genome. Compared with CyanOmics that collects several transcriptome and proteome datasets for Synechococcus sp. PCC 7002, CyanoOmicsDB integrates more omics datasets for sequenced cyanobacterial strains. Compared with CyanoEXpress that collects the most transcriptomic datasets of Synechocystis sp. PCC 6803 so far and only provides gene transcription information, CyanoOmicsDB integrates genomic, transcriptomic, and proteomic data together and provides comprehensive functional information for each gene of Synechocystis sp. PCC 6803, including gene annotation, homologs, gene expression, references and so on.

Undoubtedly, CyanoBase represents one of the most comprehensive databases for cyanobacteria until now and has been widely used in academia in the past two decades. Compared with CyanoBase, CyanoOmicsDB includes more cyanobacterial gene entries and provides more diverse interfaces to other gene function databases for each gene. Besides genomic data, both transcriptomic and proteomic data deposited in public databases were mined and incorporated into CyanoOmicsDB, which will give valuable clues for inferring gene functions.

Different gene identifiers from the GenBank and RefSeq genome assemblies

The GenBank and RefSeq genome assembly records for all cyanobacteria, except seven strains that were sequenced recently, are included in CyanoOmicsDB (Supplementary Table S1). Normally, RefSeq assemblies have the same nucleotide sequences as corresponding GenBank assemblies. Because of the reannotation by NCBI, gene annotation in the RefSeq geneset is well maintained and not always the same as that in the GenBank genesets. Especially, gene identifiers in the RefSeq genesets are entirely different from those in the GenBank genesets. And it is confusing that different gene identifiers were used to represent the same cyanobacterial gene in academic literature and databases. For example, CyanoBase and CyanoExpress use gene identifiers from GenBank assemblies, but CyanOmics uses those from RefSeq assemblies. Thus, associations between different gene identifiers and the id-mapping tool will be helpful for researchers in this field.

A gene can be retrieved in CyanoOmicsDB using locus_tag, gene symbol or old_locus_tag, although locus_tags are used as the primary gene identifiers in CyanoOmicsDB. Further, locus_tags in the GenBank genesets are linked to old_locus_tags in the RefSeq genesets for genes that are annotated by both genome assemblies. Alternatively, gene identifiers can be mutually converted in batches using the CyanoIdMapping tool.

Data mining of cyanobacterial transcriptomic data

The gene expression profiles can provide valuable information on the biological functions of genes. There are massive amounts of transcriptomic data accumulated in the public databases, and this amount is still increasing. However, they are obtained from different experiments and platforms. Take the microarray data analyzed in this work as an example, these data are generated from several different microarray types, which contain different numbers of probes and chip designs. Furthermore, both the culture condition and the genetic background of each sample are difficult to directly extract from the metadata and need to be manually checked. Therefore, it is time-consuming to deal with these public transcriptomic data. Until now, CyanoOmicsDB integrates 56 transcriptomic datasets of Synechocystis sp. PCC 6803, which encompass almost all the transcriptomic data of this species. Data mining of other cyanobacterial transcriptomic data is still ongoing, and the results will be incorporated into CyanoOmicsDB in the future.

In conclusion, CyanoOmicsDB provides a convenient and alternative means to retrieve and analyze gene functions of cyanobacteria and will be helpful to the research community.

DATA AVAILABILITY

All raw genomic data can be found at NCBI assembly database using the accession numbers listed in Supplementary Table S1. All raw reads and microarray data series, whose accession numbers are listed in Supplementary Table S3, can be downloaded from SRA and GEO databases, respectively. Processed datasets (Supplemental Dataset 1–3) supporting both this article and CyanoOmicsDB are available in the Figshare repository (https://figshare.com/s/c3729a3e623c6f003fc0). CyanoOmicsDB is freely available at http://www.cyanoomics.cn/.

All genomic and transcriptomic data were from the public NCBI assembly, GEO, and SRA databases. Accession numbers are listed in Supplementary Table S1 and S3.

Supplementary Material

gkab891_Supplemental_File

ACKNOWLEDGEMENTS

We thank Dr Tao Zhu, Dr He Zhang, and Ms. Huili Sun for their valuable suggestions. We are also grateful to Dr Zhuo Chen for his help in collecting proteomic data.

Contributor Information

Peng Zhou, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

Li Wang, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

Hai Liu, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

Chunyan Li, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

Zhimin Li, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China; College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, China.

Jinxiang Wang, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

Xiaoming Tan, State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Natural Science Foundation of China [31871303 to X.T.]; Open Funding Project of the State Key Laboratory of Biocatalysis and Enzyme Engineering [SKLBEE2019015 to Z.L.]; Key Laboratory of Biofuel, Chinese Academy of Sciences [CASKLB201802 to X.T.]; State Key Laboratory of Freshwater Ecology and Biotechnology [2018FB09 to X.T.]. Funding for open access charge: Open Funding Project of the State Key Laboratory of Biocatalysis and Enzyme Engineering [SKLBEE2019015].

Conflict of interest statement. None declared.

REFERENCES

  • 1. Kirsch F., Klahn S., Hagemann M.. Salt-regulated accumulation of the compatible solutes sucrose and glucosylglycerol in cyanobacteria and its biotechnological potential. Front. Microbiol. 2019; 10:2139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Hitchcock A., Hunter C.N., Canniffe D.P.. Progress and challenges in engineering cyanobacteria as chassis for light-driven biotechnology. Microb. Biotechnol. 2020; 13:363–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Hagemann M., Hess W.R.. Systems and synthetic biology for the biotechnological application of cyanobacteria. Curr. Opin. Biotechnol. 2017; 49:94–99. [DOI] [PubMed] [Google Scholar]
  • 4. Savakis P., Hellingwerf K.J.. Engineering cyanobacteria for direct biofuel production from CO2. Curr. Opin. Biotechnol. 2015; 33:8–14. [DOI] [PubMed] [Google Scholar]
  • 5. Kaneko T., Tabata S.. Complete genome structure of the unicellular cyanobacterium Synechocystis sp. PCC6803. Plant Cell Physiol. 1997; 38:1171–1176. [DOI] [PubMed] [Google Scholar]
  • 6. Mitschke J., Georg J., Scholz I., Sharma C.M., Dienst D., Bantscheff J., Voss B., Steglich C., Wilde A., Vogel J.et al.. An experimentally anchored map of transcriptional start sites in the model cyanobacterium Synechocystis sp. PCC6803. Proc. Natl. Acad. Sci. U.S.A. 2011; 108:2124–2129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Kopf M., Klahn S., Scholz I., Matthiessen J.K., Hess W.R., Voss B.. Comparative analysis of the primary transcriptome of Synechocystis sp. PCC 6803. DNA Res. 2014; 21:527–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Pfreundt U., Kopf M., Belkin N., Berman-Frank I., Hess W.R.. The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101. Sci. Rep. 2014; 4:6187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Tan X., Hou S., Song K., Georg J., Klähn S., Lu X., Hess W.R.. The primary transcriptome of the fast-growing cyanobacterium Synechococcus elongatus UTEX 2973. Biotechnol. Biofuels. 2018; 11:218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Nakamura Y., Kaneko T., Hirosawa M., Miyajima N., Tabata S.. CyanoBase, a www database containing the complete nucleotide sequence of the genome of Synechocystis sp. strain PCC6803. Nucleic. Acids. Res. 1998; 26:63–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Nakamura Y., Kaneko T., Tabata S.. CyanoBase, the genome database for Synechocystis sp. strain PCC6803: status for the year 2000. Nucleic. Acids. Res. 2000; 28:72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Nakao M., Okamoto S., Kohara M., Fujishiro T., Fujisawa T., Sato S., Tabata S., Kaneko T., Nakamura Y.. CyanoBase: the cyanobacteria genome database update 2010. Nucleic. Acids. Res. 2010; 38:D379–D381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Fujisawa T., Narikawa R., Maeda S.I., Watanabe S., Kanesaki Y., Kobayashi K., Nomata J., Hanaoka M., Watanabe M., Ehira S.et al.. CyanoBase: a large-scale update on its 20th anniversary. Nucleic Acids Res. 2017; 45:D551–D554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hernandez-Prieto M.A., Futschik M.E.. CyanoEXpress: a web database for exploration and visualisation of the integrated transcriptome of cyanobacterium Synechocystis sp. PCC6803. Bioinformation. 2012; 8:634–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Yang Y., Feng J., Li T., Ge F., Zhao J.. CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002. Database. 2015; 2015:bau127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Sasaki N.V., Sato N.. CyanoClust: comparative genome resources of cyanobacteria and plastids. Database (Oxford). 2010; 2010:bap025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Bretaudeau A., Coste F., Humily F., Garczarek L., Le Corguille G., Six C., Ratin M., Collin O., Schluchter W.M., Partensky F.. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions. Nucleic Acids Res. 2013; 41:D396–D401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Jones P., Binns D., Chang H.Y., Fraser M., Li W., McAnulla C., McWilliam H., Maslen J., Mitchell A., Nuka G.et al.. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014; 30:1236–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Clough E., Barrett T.. The gene expression omnibus database. Methods Mol. Biol. 2016; 1418:93–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Davis S., Meltzer P.S.. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007; 23:1846–1847. [DOI] [PubMed] [Google Scholar]
  • 21. Kodama Y., Shumway M., Leinonen R.International Nucleotide Sequence Database, C. . The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res. 2012; 40:D54–D56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 2000; 28:304–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Gene Ontology, C. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 2021; 49:D325–D334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T.et al.. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000; 25:25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J.et al.. Pfam: The protein families database in 2021. Nucleic. Acids. Res. 2021; 49:D412–D419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Caspi R., Billington R., Keseler I.M., Kothari A., Krummenacker M., Midford P.E., Ong W.K., Paley S., Subhraveti P., Karp P.D.. The MetaCyc database of metabolic pathways and enzymes - a 2019 update. Nucleic Acids Res. 2020; 48:D445–D453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Kanehisa M., Furumichi M., Sato Y., Ishiguro-Watanabe M., Tanabe M.. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021; 49:D545–D551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Kanehisa M., Goto S.. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acid. Res. 2000; 28:27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L.. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Buels R., Yao E., Diesh C.M., Hayes R.D., Munoz-Torres M., Helt G., Goodstein D.M., Elsik C.G., Lewis S.E., Stein L.et al.. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016; 17:66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Ritchie M.E., Phipson B., Wu D., Hu Y., Law C.W., Shi W., Smyth G.K.. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Langmead B., Salzberg S.L.. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012; 9:357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Anders S., Pyl P.T., Huber W.. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015; 31:166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Love M.I., Huber W., Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Spät P., Klotz A., Rexroth S., Maček B., Forchhammer K.. Chlorosis as a developmental program in cyanobacteria: the proteomic fundament for survival and awakening. Mol. Cell. Proteomics. 2018; 17:1650–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Spat P., Macek B., Forchhammer K.. Phosphoproteome of the cyanobacterium Synechocystis sp. PCC 6803 and its dynamics during nitrogen starvation. Front. Microbiol. 2015; 6:248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Yang M.K., Qiao Z.X., Zhang W.Y., Xiong Q., Zhang J., Li T., Ge F., Zhao J.D.. Global phosphoproteomic analysis reveals diverse functions of serine/threonine/tyrosine phosphorylation in the model cyanobacterium Synechococcus sp. strain PCC 7002. J. Proteome Res. 2013; 12:1909–1923. [DOI] [PubMed] [Google Scholar]
  • 38. Ma Y., Yang M., Lin X., Liu X., Huang H., Ge F.. Malonylome analysis reveals the involvement of lysine malonylation in metabolism and photosynthesis in cyanobacteria. J. Proteome Res. 2017; 16:2030–2043. [DOI] [PubMed] [Google Scholar]
  • 39. Chen Z., Zhang G., Yang M., Li T., Ge F., Zhao J.. Lysine acetylome analysis reveals photosystem II manganese-stabilizing protein acetylation is involved in negative regulation of oxygen evolution in model cyanobacterium Synechococcus sp. PCC 7002. Mol. Cell. Proteomics. 2017; 16:1297–1311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Borirak O., de Koning L.J., van der Woude A.D., Hoefsloot H.C., Dekker H.L., Roseboom W., de Koster C.G., Hellingwerf K.J.. Quantitative proteomics analysis of an ethanol- and a lactate-producing mutant strain of Synechocystis sp. PCC6803. Biotechnol. Biofuels. 2015; 8:111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Xiong Q., Feng J., Li S.T., Zhang G.Y., Qiao Z.X., Chen Z., Wu Y., Lin Y., Li T., Ge F.et al.. Integrated transcriptomic and proteomic analysis of the global response of Synechococcus to high light stress. Mol. Cell. Proteomics. 2015; 14:1038–1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Wegener K.M., Singh A.K., Jacobs J.M., Elvitigala T., Welsh E.A., Keren N., Gritsenko M.A., Ghosh B.K., Camp D.G. 2nd, Smith R.D.et al.. Global proteomics reveal an atypical strategy for carbon/nitrogen assimilation by a cyanobacterium under diverse environmental perturbations. Mol. Cell. Proteomics. 2010; 9:2678–2689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Huang S., Chen L., Te R., Qiao J., Wang J., Zhang W.. Complementary iTRAQ proteomics and RNA-seq transcriptomics reveal multiple levels of regulation in response to nitrogen starvation in Synechocystis sp. PCC 6803. Mol. Biosyst. 2013; 9:2565–2574. [DOI] [PubMed] [Google Scholar]
  • 44. Qiao J., Huang S., Te R., Wang J., Chen L., Zhang W.. Integrated proteomic and transcriptomic analysis reveals novel genes and regulatory mechanisms involved in salt stress responses in Synechocystis sp. PCC 6803. Appl. Microbiol. Biotechnol. 2013; 97:8253–8264. [DOI] [PubMed] [Google Scholar]
  • 45. Liu J., Chen L., Wang J., Qiao J., Zhang W.. Proteomic analysis reveals resistance mechanism against biofuel hexane in Synechocystis sp. PCC 6803. Biotechnol. Biofuels. 2012; 5:68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Tian X., Chen L., Wang J., Qiao J., Zhang W.. Quantitative proteomics reveals dynamic responses of Synechocystis sp. PCC 6803 to next-generation biofuel butanol. J. Proteomics. 2013; 78:326–345. [DOI] [PubMed] [Google Scholar]
  • 47. Qiao J., Wang J., Chen L., Tian X., Huang S., Ren X., Zhang W.. Quantitative iTRAQ LC-MS/MS proteomics reveals metabolic responses to biofuel ethanol in cyanobacterial Synechocystis sp. PCC 6803. J. Proteome Res. 2012; 11:5286–5300. [DOI] [PubMed] [Google Scholar]
  • 48. Sayers E.W., Cavanaugh M., Clark K., Ostell J., Pruitt K.D., Karsch-Mizrachi I.. GenBank. Nucleic Acids Res. 2019; 48:D84–D86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D.et al.. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Sonnhammer E.L., von Heijne G., Krogh A.. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1998; 6:175–182. [PubMed] [Google Scholar]
  • 51. Li Y.Y., Chen X.H., Xue C., Zhang H., Sun G., Xie Z.X., Lin L., Wang D.Z.. Proteomic response to rising temperature in the marine cyanobacterium Synechococcus grown in different nitrogen sources. Front. Microbiol. 2019; 10:1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Mitschke J., Vioque A., Haas F., Hess W.R., Muro-Pastor A.M.. Dynamics of transcriptional start site selection during nitrogen stress-induced cell differentiation in Anabaena sp. PCC7120. Proc. Natl. Acad. Sci. U.S.A. 2011; 108:20130–20135. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkab891_Supplemental_File

Data Availability Statement

All raw genomic data can be found at NCBI assembly database using the accession numbers listed in Supplementary Table S1. All raw reads and microarray data series, whose accession numbers are listed in Supplementary Table S3, can be downloaded from SRA and GEO databases, respectively. Processed datasets (Supplemental Dataset 1–3) supporting both this article and CyanoOmicsDB are available in the Figshare repository (https://figshare.com/s/c3729a3e623c6f003fc0). CyanoOmicsDB is freely available at http://www.cyanoomics.cn/.

All genomic and transcriptomic data were from the public NCBI assembly, GEO, and SRA databases. Accession numbers are listed in Supplementary Table S1 and S3.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES