SILVA is a resource of databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains. |
gene sequences of 16S for prokaryotes and 18S for Eukarya |
https://www.arb-silva.de/ |
[121] |
Ribosomal Database Project: aligned and annotated rRNA gene sequence data |
16S rRNA sequences |
http://rdp.cme.msu.edu/ |
[122] |
Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. |
Taxonomy based on the 16S rRNA gene |
https://greengenes.secondgenome.com/ |
[123] |
Genome Taxonomy Database is an initiative to establish a standardized microbial taxonomy based on genome phylogeny. The genomes used to construct the phylogeny are obtained from RefSeq and Genbank. |
a comprehensive and phylogenomic-based taxonomy for bacterial and archaeal taxa |
https://gtdb.ecogenomic.org/ |
[52, 53] |
Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data |
protein sequence and annotation database |
https://www.uniprot.org/ |
[124] |
NIH National Center for Biotechnology Information (NCBI) GenBank is an annotated collection of all publically available DNA sequences. Complete bimonthly release updates are available. Data is exchanged daily with the DNA DataBank of Japan and the European Nucleotide Archive. |
genomic sequence and annotation |
https://www.ncbi.nlm.nih.gov/genbank/ |
[125] |
NIH/NCBI Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins |
genomic, transcriptomics, and proteomic sequence and annotation |
https://www.ncbi.nlm.nih.gov/refseq/ |
[126] |
University of California Santa Cruz (UCSC) Genome Browser for exploring genome sequences and annotation. GenBank updates for mRNA, RefSeq, and EST data occur on a semi-quarterly basis. |
genome sequence and annotation database |
http://genome.ucsc.edu/ |
[127] |
NIH National Human Genome Research Institute Encyclopedia of DNA Elements (ENCODE) Consortium project uses Reference Genomes from NCBI or UCSC |
DNA methylation, and immunoprecipitation (IP) of proteins that interact with DNA and RNA, modified histones, transcription factors, chromatin regulators, and RNA-binding proteins. Genome sequence and annotation database. |
https://www.encodeproject.org/ |
[128] |
Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Updates are released every 2–3 months. |
genome sequence and annotation, gene models, transcriptional data, genetic variation and comparative analysis |
http://ensembl.org/ |
[129] |
The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This a joint effort between the National Cancer Institute and the National Human Genome Research Institute. |
Individual patient tumor samples: DNA, RNA, Protein, epigenetic changes |
https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga |
[130] |
Cancer Cell Line Encyclopedia (CCLE) is a collaboration between the Broad Institute, and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models. CCLE contains genomics data and visualization for over 1400 cell lines. |
Copy Number, mRNA expression (Affy), RPPA, RRBS, and mRNA expression (RNAseq) |
https://portals.broadinstitute.org/ccle |
[131] |
Therapeutically Applicable Research to Generate Effective Treatments (TARGET) is a community resource project. TARGET is organized into a collaborative network of disease-specific project teams with the goal of identifying molecular changes that drive childhood cancers. |
clinical information, gene expression, miRNA expression, copy number, sequencing data for cancers |
https://ocg.cancer.gov/programs/target |
Initiative phs000218 |
Omics Discovery Index (OmicsDI) an open-source platform that enables access, discovery and dissemination of omics data sets. |
genomics, transcriptomics, proteomics, metabolomics |
https://www.omicsdi.org/ |
[132] |
Multi-Omics Profiling Expression Database (MOPED) is a repository for multi-omics data of human and model organisms. |
transcriptomics and proteomics data and visualization |
https://omictools.com/moped-tool |
[133] |
ProteomeXchange (PX) Consortium consists of PRIDE, PeptideAtlas, PASSEL, MassIVE and jPOST. Devoted to mass spectrometry (MS)-based proteomics data. |
proteomics data sets |
http://www.proteomexchange.org/ |
[134, 135] |