Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 9.
Published in final edited form as: J Pineal Res. 2020 Jul 8;69(3):e12673. doi: 10.1111/jpi.12673

Resource: A Multi-species Multi-timepoint Transcriptome Database and Webpage for the Pineal Gland and Retina

Eric Chang 1,a,b, Cong Fu 2,3,4,5,a, Steven L Coon 2,6,a, Shahar Alon 7,c, Marjan Bozinoski 8, Matthew Breymaier 9, Diego M Bustos 2,d, Samuel J Clokie 2,e, Yoav Gothilf 7, Caroline Esnault 1,b, P Michael Iuvone 10, Christopher E Mason 8, Margaret J Ochocinska 2,f, Adi Tovin 7,g, Charles Wang 11, Pinxian Xu 12, Jinhan Zhu 13,14, Ryan Dale 1,a,b, David C Klein 2,15,h
PMCID: PMC7513311  NIHMSID: NIHMS1616514  PMID: 32533862

Abstract

The website and database https://snengs.nichd.nih.gov provides RNA sequencing data from multi-species analysis of the pineal glands from zebrafish (Danio rerio), chicken (White Leghorn), rat (Sprague Dawley), mouse (129sv), rhesus macaque (Macaca mulatta) and human (Homo sapiens); in most cases retinal data are also included along with results of the analysis of a mixture of RNA from tissues. Studies cover day and night conditions; in addition, a time series over multiple hours, a developmental time series and pharmacological experiments on rats are included. The data have been uniformly re-processed using the latest methods and assemblies to allow for comparisons between experiments and to reduce processing differences. The website presents search functionality, graphical representations, Excel tables, and track hubs of all data for detailed visualization in the UCSC Genome Browser. As more data are collected from investigators and improved genomes become available in the future, the website will be updated. This database is in the public domain and elements can be reproduced by citing the URL or this report. This effort makes the results of 21st century transcriptome profiling widely available in a user-friendly format that is expected to broadly influence pineal research.

Introduction

The pineal transcriptome has been studied for over 30 years, starting with Northern blot detection of single transcripts encoding proteins involved in melatonin synthesis, including those encoding Tph1 and Asmt (Hiomt) 14. Since then, pineal transcriptomics has spanned the development of transcriptomic assays including cDNA-based hybridization technology, qRT-PCR and RNA-Seq 515.

High-throughput sequencing offers many advantages for assaying the transcriptome, but the field of bioinformatics moves quickly with tools, algorithms, and genome assemblies changing from year to year. As a result, data from earlier studies cannot be meaningfully compared with data from later studies that used different methods without completely re-processing all data uniformly using updated methods, assemblies, and annotations. This has been recognized for example in the recount2 project 16, which has re-processed tens of thousands of human RNA-Seq samples from public repositories uniformly.

Furthermore, for detailed study of particular loci it is critical to visualize expression alongside genomic data from other studies. Genome browsers such as the UCSC Genome Browser 17 allow just this. In particular, this browser supports track hubs that allow for the configuration, coloration and organization of collections of many tracks using a web interface 18. This allows researchers to generate highly customized views tailored to their research interest, viewing pineal gland data from this study directly alongside a wealth of publicly available data prepared and made available by the UCSC team.

Cross-species transcriptome reports appear in the literature, focused on two-species comparisons, e.g., mouse versus human 19 and zebrafish versus human 20. Here, we introduce a website that aggregates multiple RNA-Seq studies of the pineal gland spanning years of transcriptomics research on six species across three vertebrate classes, processed uniformly and presented in a user-friendly site allowing inspection of individual genes as well as UCSC Genome Browser track hubs of each experiment. The site can be found at https://snengs.nichd.nih.gov. The results of this effort facilitate the comparative and evolutionary analysis of the pineal gland and retina, reflecting an interest in the evolutionary history that links these tissues as derivatives of a common ancestral photodetector21,22. Most of the tens of thousands of transcripts profiled are otherwise absent from the pineal and retinal literature and in many cases have not been well studied in any tissue. Accordingly, the web page opens new avenues of research.

This report alerts investigators to the availability of this resource, which will be of special value where user-friendly compiled pineal and retinal RNA-Seq data are otherwise not available in any format. The graphs and other information extracted from the web page are in the public domain. The web page and its underlying infrastructure is designed to be easily updated as data from new experiments become available or as reanalysis of existing datasets using improved software and updated genomes is completed.

Methods

Animal

Samples were collected to identify differential day/night ratios; and, in the case of the rat, expression was studied as a function of development, denervation and adrenergic–cyclic AMP stimulation (Table 1) 9,12,2325. In many cases retinal tissue was profiled in parallel. Mixed tissue RNA samples were used in conjunction with the pineal gland and retina to estimate the enrichment of a transcript.

Table 1.

Experiments on the database.

Study No. Animals Experiment name Tissue Lighting Sampling times Notes Reps Refs
101 Chicken, White Leghorn Pineal gland and retina; time series; constant darkness PG, R D:D CT 0, 4, 8, 12, 16, 20 N/A 3 N/A
102 Human Pineal gland; day and night PG L:D 12:12 ZT 6, 18 N/A 2, 4 N/A
103 Mouse, 129sv Pineal gland, retina and mixed tissue; day and night; Eya2 KO PG, R, MT L:D 12:12 ZT 6, 18 Eya2 KO 1 N/A
104 Rat, Sprague Dawley Pineal gland; day and night; and mixed tissue, day; polyA PG, MT L:D 14:10 ZT 7, 19 N/A 1 23
105 Rat, Sprague Dawley Pineal gland development; day and night PG L:D 14:10 ZT 7, 19 Ages: E21, P5, P20, P40 1 23, 24
106 Rat, Sprague Dawley Pineal gland (RP), retina (RR) and mixed tissue (RX); 24-hr time series PG, R, MT L:D 14:10 ZT 1, 7, 13, 15, 19, 23 N/A 1 23
107 Rat, Sprague Dawley Pineal gland; day and night; and mixed tissue, day; Ribominus PG, MT L:D 14:10 ZT 7, 19 N/A 1 23
108 Rat, Sprague Dawley Pineal gland; superior cervical decentralization (DCN) or ganglionectomy (SCGX); day and night PG L:D 14:10 ZT 7, 19 DCN, SCGX, Sham, Control 3 8
109 Rat, Sprague Dawley Pineal gland in vitro; norepinephrine (NE) or dibutyryl cyclic AMP (DBcAMP) PG N/A N/A Cultured glands; NE, DBcAMP, Control 3 8
110 Rat, Sprague Dawley Pineal gland marker genes; day and night PG L:D 14:10 ZT 7, 19 N/A 1 23
111 Rhesus macaque Pineal gland, retina and mixed tissue; time series PG, R, MT L:D 12:12 ZT 6, 12, 18, 24 Dawn, Day, Dusk, Night 3 12
112 Zebrafish Pineal gland; time series; constant darkness PG D:D CT 2, 6, 10, 14, 18, 22 N/A 2 7
113 Zebrafish Eye, pineal gland and mixed tissue; day and night Eye, PG, MT L:D 12:12 ZT 6, 18 clocka KO 1 N/A

Thirteen experiments encompassing six species are on the website; additional experiments are to be added as data become available. Experimental details are available on the website (https://snengs.nichd.nih.gov/experiments) and the listed references. Abbreviations: CT, circadian time; DD, constant darkness; L:D, light:dark; MT, mixed tissue; N/A, not available; PG, pineal gland; R, retina; Reps, number of replicates; Refs, references; RP, rat pineal gland; RR, rat retina; RX rat mixed tissue; ZT, Zeitgeber time.

Sequencing

Illumina sequencing was used in all cases. Specific experimental details are available on each experiment’s page on the website (see https://snengs.nichd.nih.gov/experiments ), that recapitulates original experimental methods in the respective original manuscript. All data across all species were re-analyzed in the assemblies and annotations described (https://snengs.nichd.nih.gov/methods). Quality of FASTQ files were analyzed with FastQC v0.11.8 and MultiQC v1.6 and all samples demonstrated high-quality sequencing. Adapters were removed and light quality trimming was performed with cutadapt v1.18 using additional arguments -q 20 --minimum-length 25.

These trimmed reads were provided to Salmon v0.12.0 26 for transcript quantification using an index built for transcriptomes as described in (https://snengs.nichd.nih.gov/methods), and run using the additional arguments –gcBias –seqBias --libTypeA. For each gene, the per-transcript values reported by Salmon v0.12.0 were summed to provide a gene-level expression estimate in units of transcripts per million reads (TPM). These are the values reported in the tables and plots of each gene page.

Genomic visualization

For genomic visualization in the UCSC genome browser, trimmed reads were aligned using HISAT2 v2.1.0 to the respective genome indicated below. From these aligned reads, normalized bigWig files were created using the deepTools v3.1.3 bamCoverage tool using additional options --minMappingQuality 20 --smoothLength 10 --normalizeUsing BPM --binSize 1 such that multi-mappers were ignored. For stranded libraries, the tool was run twice: once with --filterRNAstrand forward and once with --filterRNAstrand reverse to get separate tracks for each strand. The resulting bigWig files were combined into a UCSC track hub using the trackhub Python package.

UCSC (typically used for extensive visualization capabilities) and Ensembl (typically used for its comprehensive annotations) are not consistent in their chromosome nomenclature. To facilitate linking from gene-level transcription estimates on this website to genomic signal at UCSC, we converted chromosome names from Ensembl to the UCSC equivalents by matching the md5sums of each chromosome; see GitHub repository (https://github.com/NICHD-BSPC/chrom-name-mappings for details and code.

Genome and transcriptome assemblies

For each species, the genomic assembly indicated (https://snengs.nichd.nih.gov/methods) was used for visualization in the UCSC Genome Browser, while the transcriptome was used to calculate TPM expression estimates to display in plots on individual gene pages.

Implementation details

The website is written in the Python programming language using the ‘Flask’ framework. Configuration of the website is driven by a YAML format file that points to Salmon v0.12.0 output along with details like methods descriptions, UCSC track hub colors, bar plot colors, and any other experiment-specific configuration. This greatly streamlines the process of adding new studies and new species. Data were processed using lcdb-wf (https://github.com/lcdb/lcdb-wf), which is itself driven by YAML configuration in a species-agnostic manner, allowing for uniform processing across all studies.

Results

The Home Page (https://snengs.nichd.nih.gov) introduces the user to the main sections of the database (Figure 1A). Selecting the Search section displays the Search subpage (Figure 1B). Entering a gene symbol (i.e., Aanat) in the query box opens the Results subpage, which contains a listing of species and experiments (Figure 2). The Search function will accept alternative symbols; however, when difficulty is encountered obtaining a result, the user is encouraged to refer to gene databases for assistance. This page contains information on the samples, including species, a brief description of the experiment and a Link to the gene page.

Figure 1. Home page and Search subpage.

Figure 1.

1A. The Home page (https://snengs.nichd.nih.gov/home ) opens subpages which are organized to search for genes specifically and to retrieve information relevant to the entire dataset. In addition, the Methods page contains useful information about the experiments and the Help page has useful videos on use of the UCSC Genome Browser. 1B. The Search page (https://snengs.nichd.nih.gov/search). Entering an official gene symbol or an Ensemble ID symbol in the Search box retrieves data from all species. For aliases please refer to the Ensembl or NCBI Gene databases.

Figure 2. Results subpage.

Figure 2.

The results of querying a gene symbol generates a listing of the experiments and species in which the gene was found (https://snengs.nichd.nih.gov/search). Depending on the size of the gene family, multiple gene symbols may be displayed. In this case, one has to use the data cautiously. From this page, highlighted links (View) direct the user to the Gene subpage, which lists results of a single species and experiment.

Clicking on that Link (Figure 2), opens a Gene subpage (Figure 3) with links to the Ensembl data (gene id) and the UCSC Genome Browser for the gene (Open UCSC track for this gene), in addition to presenting experimental results in a bar graph. These results are normalized count data (in TPM). In cases where multiple experiments for a species exist, all experiments are displayed and can be viewed by scrolling vertically.

Figure 3. Gene subpage.

Figure 3.

An example of the gene page (https://snengs.nichd.nih.gov/species/Rat/gene/ENSRNOG00000011182#Time_Series-anchor ) that displays information about a specific gene including relevant experimental details, the UCSC track hub (Open UCSC track hub for this gene) and a help page for use of the UCSC Genome Browser. General information about each gene is available by clicking on the gene id, e.g. ENSRNOG00000011182. Experimental results are displayed below in a bar graph. In addition, accessing the results of a single experiment will open other experiments dependent on availability.

Selecting Experiments from the Home page displays the experiments with four links (Figure 4). The first is the Search subpage, described above. The second (Download) retrieves the data in an Excel file. The third retrieves the Details (Figure 5) of the experiment, including sample preparation and data analysis; scrolling horizontally is necessary to open the table. The last is a Link to the UCSC Genome Browser, which documents the location of reads mapping for each gene.

Figure 4. Experiments subpage.

Figure 4.

The Experiments subpage (https://snengs.nichd.nih.gov/experiments ) is accessed from the Home page. It is an index for all experiments, leading to several resources. The highlighted link retrieves the Search subpage, described above. The Data link (Download) returns the data for an entire experiment in an Excel file, which also contains the average expression values and variance. Selecting Details opens the page with experimental details (see Figure 5). The highlighted link to the UCSC Track Hub (Link) gives access to the mapped data on the UCSC Genome Browser.

Figure 5. Details subpages.

Figure 5.

The Details subpages are accessed from the Experiments subpage (see Figure 4; https://snengs.nichd.nih.gov/experiments ) by clicking on Details for a specific experiment. This yields information on sample preparation,RNA preparation, and data processing; and, the location of archived data. The search box is used to interrogate the table with identifiers (e.g. SRX3229487 ) or fragments of identifiers (e.g. _04h) in the table. The Samples section in this figure is truncated for presentation purposes.

The Methods and Help pages are not presented as figures. The Methods page contains general information on the Bioinformatics methods and identifies the genomic assemblies used; the Help subpage has links to tutorials on the use of the UCSC browser and contact information for further assistance.

As an example of the utility of comparing data across multiple species in a uniform format, we searched for differences in the day/night levels of transcripts among species. As shown in Table 2, the large night/day rhythms in the transcript abundance of several genes in the rat are not seen in the rhesus monkey or to a similar degree in other species (https://snengs.nichd.nih.gov/search). This emphasizes the importance of posttranslational modifications that occur 27,28. It also is a caution against making generalities based on studies of one species.

Table 2.

Comparative analysis of rhythmic transcript levels in the vertebrate pineal gland.

Night/Day Day/Night
Species >30-fold 3- to 30-fold >30-fold 3- to 30-fold
Chicken Gos Spcs1, Gnb3, Lbh, Lypla1, Prdm8, Aanat, Tph1, Am89a, Ckmt1a, Chga, Ddc, Ndrg Rbp4, Rcan2, SSx2ip, Chgb, Calb2, R3hdml, Atoh8, Efr3a
Human DUSP1, HKDC1
Mouse Gh, Prl Aanat, Odc1, Mat2a, Kif5c, Nap1l5, Tbc1d15, Crem, Tbc1d1, Tjap1, Ndufa3, Syt4, Mitf, Rmdn3, Extl3, Amd1, Ywhaz, Ccnl1, Slc3a2, Impa1, Azin1, Prosc, Iqcb1, Crx, Rab3gap1, Srxn1, Manf, Ppa2, Gja1, Psme2, Arf, Cbx7, Tph1, mt-Ts2, Fgf12, Mpp6, Gnai2, Necap1, Tpm4, Atp2a2, Hdhd3, Rnf13, Ip6k1, Dnajb6, Sik3, Ergic1, Tmem229b, Clptm1, Hsph1*, Auh, mt-Tw, mt-Ti Igkj4, Igkj1, Tpt1-ps3, Ighj4 Enpp2, Ttr, Chmp1a, Unc119, Ccnd2, Acp2, Atp6v0a2, Tef, Igf2. Ermard, Lamb2, Fabp7, Twf1, Ewsr1, Etf1, Fxyd1, Arih2, Zfand6, Wbscr22, Ndrg1, Tbc1d17, Cox17, Fam166a, Atox1, Rpgrip1, Ackr1, mt-Tl1, Dpysl3, Cisd3, Prpf19, Sag, Tpm3, Ift46, Apod, Taz
Rat Aanat, Atp7b, Slc15a1, Dclk3 Irs2, Crem, Sik1, Ptch1, Cd24, Zrsr1, Rcan1, Kctd3, Bsx, Mat2a, Etnk1, Camk1g, Mbnl2, Gxylt1,Gem, Nptx1, Pcdh1, Eml5, Galnt16, Pde4b, Reep2, Syt4, Tjp2, Snap25, Hbb,Hba-a2, Dnm2, Fkbp5, Man2a1, Fry, Dclk1, Mcam, Arhgap24, Hba-a2, Hspa5, Slc17a6, Farp2, Rhob, Cry2, Lamb1, Hsph1, Ncald, Abca1, Mapk6, Ankrd52, Snrk, Slc7a6, Shroom3, Sik2,Ttc8, Nacad, Qsox1, Xpot, Zhx1, Wipf3, Abcf1,Frmpd1 Matr3 Gucy1a1, Frmd4b, Eef1a2, Scrn1, Hook1, Ttr, Pdc, Cfl2
Rhesus PENK, CCN2, RP1, FAM167a, TGFBR3, ATP2A3, OPN1SW, VASH1, PDC, GNGT1, LMOD1
Zebrafish Nr1d1 Sik1, Dbpb, Dusp1, Dtx4, Rdh8b, Gjd2b Gpr137bb, Aanat2, CR391986.1, Dclk2a, Guca1a, Tph1a, Gchi1, Ptn, Myh9a, Lpde6ga, Id2b, Cxcl14, Gabarapb Nfil3-5 Bhlhe40, Rbp3, Cry1aa, Rorcb, Pde6ha, Rorca, Camk1gb, Rbp4, Rp4l, Kera, Ry1bb, Per2, Irbpl, Cyp27c1, Nfil3-6, Pfkfb4b, Sagb, Ahcy, Sdha, Eno1a, Add45ga, Tmtops2a, lrp1a, Hbba1, Aldocb, Tmem237b, Gpr146, Aldoa, Jag2b, Aclya, Cry1ba, Ybx1, Rcvrn3, Acadm. Stra6, Hbba1

The day and night levels of transcripts were compared by calculating night/day and day/night ratios of normalized values (+ 0.1 TPM). Non-coding RNAs were eliminated. Only the top 1000 genes with official symbols were further grouped by ratios into greater than 30-fold and 3- to 30-fold differences. Genes are listed according to strength of rhythm. Human pineal data are included, noting that times of death and of tissue removal were not tightly controlled; accordingly, the indication of rhythmicity might be impacted. The data were downloaded from the Experiments page. In addition to the single datasets for chicken, human, mouse and rhesus, the zebrafish “eye, pineal gland & mixed tissue” and rat “pineal marker genes” datasets were used.

The data also focus on the similarity of the genomic profiles of pineal glands from the species studied (Table 3). Selective expression of each gene was calculated as the ratio of expression of a specific gene in the pineal gland to that in a mixture of RNA from a group of tissues. As expected, three genes responsible for melatonin synthesis (Tph1, Aanat and Asmt) were selectively expressed in the glands studied. Another group of genes selectively expressed in the pineal gland includes those established as markers of the retina. The high expression of these genes only in the pineal gland and retina is known21,22. However, the specific functions of these retina-related genes and other selectively expressed genes in the pineal gland have not received significant attention and deserve further analysis.

Table 3.

Highly conserved selectively expressed pineal transcripts in four vertebrate species.

Genes selectively expressed among top 1000 genes
Three species Adra1a, Adrb1, Ankrd33, Casz1, Drd4, Gngt*, Grk*, Grm*, Guca1a, Impg1*, Kif*, Opn*, Pax3, Pcdh*, Pla2g*, Ppef2, Prph2, Rdh*, Rp1*, Rps*, Rxrg, Slc16*, Slc6a*, Trim*.
Four species Aanat*, Aipl1, Asmt, Bsx, Cabp*, Cacna1*, Cacna2d*, Celf3, Chrna3*, Chrnb4, Cngb3*, Col*, Cplx*, Crb*, Crx, Gch1, Impg2, Isl2, Kcn*, Lhx4, Lrit*, Myo*, Neurod*, Otx2, Pde6*, Ptprn, Rbp3, Slc24*, Slc38a*, Tmem*, Tph1, Ush2a.

Genes were ranked according to selective expression in the pineal glands from zebrafish, mouse, rat and rhesus. Expression was normalized (+ 0.1 TPM) and selective expression was calculated relative to expression in a mixture of tissue. The top 1000 selectively expressed genes were identified and those present in three or four out of four of the species are listed below. The data sources are given in the legend to Table 2.

*

Asterisk, more than one homolog exists in some species; for example, Aanat* represents Aanat in mouse, rat and rhesus in addition to Aanat1 and Aanat2 in zebrafish.

An analysis of the conserved highly expressed and tissue specific transcripts in the pineal gland, retina and in both tissues (Table 4) was done by identifying the highly tissue specific transcripts. They were then binned according to their expression ratio (pineal gland: retina). The results reveal a relatively smaller sets of pineal-specific and retina-specific genes, and a larger group of genes expressed in both tissues. Noting that these genes are selectively expressed only in these two tissues and not in others, it is highly likely that these genes represent evolutionarily conserved elements that can be considered to be related to the common origin of both tissues. In some cases, their roles have been identified, but in many cases, a functional role has not been established.

Table 4.

Transcripts enriched in pineal gland, retina and both the pineal gland and retina.

Enriched transcripts
Group Four of four species Three of four species
Pineal gland Aanat*, Asmt, Chrnb4, Gch1, Gnat2, Gnb3*, Guca1a, Lhx4, Pde6c, Sall1*, Tph1*, Pax3, Alx4, Bsx, Chrnb3, Gngt2*, Lrrc38, Ptpn20
Pineal gland and retina Arr3*, Cabp4, Cacna1f*, Cacna2d4, Cnga1*, Cngb3, Cplx4*, Crb2*, Crx, Drd4, Fam161a, Gabrr1, Gabrr3*, Gngt1, Grk1*, Guca1b, Impg1*, Impg2*, Kcnb*, Lrit1*, Mpp4, Msi1, Myo*, Nyx, Opn1sw, Otx2*, Pdc*, Pde6g*, Rbp3, Rlbp1, Rom1*, Rp1l1, Slc24a1, Stx3, Tulp*, Unc119*, Ush2a Adrb1, Cabp5* Crabp*, Crb1, Crocc, Egflam, Fabp*, Fam169a, Gnb5, Gng1, Grik1*, Gucy2d, Hcn*, Igsf9, Impdh1, Kcn*, Kcna*, Kcnj14, Lrit2, Lrit3, Mak, Mgarp, Neurod4, Ntng2, Nxnl1, Pcdh15, Pla2g*, Plch2, Ppef2, Prph2*, Prss3*, Rax*, Reep6, Rorb, Rrp1b, Samd11, Slc16a*, Slc17a*, Slc24*, Slc38a*, Slc39*, Slc4*, Slc6a6, Tmem2l5, Tmem237*, Trpm1*
Retina Abca4*, Ankrd33*, Ccdc*, Cdhr1, Chrna3a, Col*, Cryaa, Fscn2, Gucy2f, Irs1*, Isl1, Kcnv2*, Nr2e3, Nrl, Pde6a, Pde6b, Pde6h, Rdh8, Rho, Rpl, Rrh, Sag*, Sh2d*, Six3*, Slc1a7, Tfap2*, Vsx1, Vsx2 Cryba*, Crybb2, Crygm*, Gabrr2, Glb1l2, Gnat1, Grm6, Isl2, Lgsn, Lim2, Mab21l1, Opn1mw*, Opn4*, Pax6*, Prdm13, Prph*, Rcvrn*, Rgr, Rtbdn*, Samd7, Vax2

Enrichments in the pineal gland and retina relative to other tissues have been assessed by the determining the ratio of the normalized (+0.1 TPM) abundance of a transcript in each tissue relative to that in the mixed RNA sample (+0.1 TPM) to yield a relative expression value (rEx). Mixed RNA samples were made by mixing equal amounts of RNA from 6 to 20 tissues. The rEx values of the top 300 enriched transcript from the pineal gland and the top 300 from the retina were compared (pineal gland rEx/retina rEx) and transcripts that were > 10-fold were binned as pineal gland and those < 10 fold as retina; maximum levels were approximately 1000 for the pineal gland and 1/1000 for the retina. The remaining transcripts comprise the pineal gland and retina group. Zebrafish, mouse, rat and rhesus are included. The rat data are from the 24-hr time series experiment, the zebrafish data from the experiment with mixed tissue, and the rhesus and mouse data are from single experiments; the latter are from 129 mice. Data from all time points have been averaged and normalized (TPN + 0.1). The results below indicate whether a listed transcript is detected in all four or in only three species evaluated.

*

Asterisk, more than one homolog exists in some species; for example, Aanat* represents Aanat in mouse, rat and rhesus in addition to Aanat1 and Aanat2 in zebrafish.

Discussion

This database will serve as a foundation for future molecular biological research on the pineal gland and retina, making available the data to scientists with a computer and an internet connection. The uniform processing of raw data makes the comparison of results more meaningful and takes advantage of advances in tools, algorithms, assemblies, and annotations since original publication. Whereas the human and mouse genomes are the most highly annotated, and the chicken and zebrafish less so, the maturity of all annotations allows for in-depth analysis of nearly all genes. A potential problem is that symbols used for identification of a gene in one species may not be used in other species or may be used for different genes. Hence, in cases where identification is questionable, confirmation may require analysis of sequence homology.

Utility and accuracy of RNA-Seq data

In judging the utility and accuracy of the RNA-Seq data, it should be noted that there is good agreement with data from other methods for the analysis of pineal gland and retina material, including microarray, Northern blot and qRT-PCR as regards day/night differences 513. Accordingly, the RNA-Seq data can be viewed as highly useful and reliable.

The method also has advantages over other methods, perhaps the most important is that it sequences all transcripts, including those without a history in any literature. This opens new avenues for study for the pineal gland and retina. One of the most fertile areas is the identification of noncoding RNAs, both micro RNAs and long noncoding RNAs 9,23. Some of these are known to have daily rhythms in both tissues. Noteworthy is the discovery of a unique micro RNA-183–96-182 cluster in the pineal gland and retina 9,29, which represents the major component of pineal miRNAs. Accordingly, it can be considered to be an additional marker of the common ancestral photodetector which gave rise to the pineal gland and retina. Although the function of this cluster remains unknown in the pineal gland, it has been reported to play a role in phototransduction and development in the eye30,31.

Study of pineal miRs also led to the discovery of very high levels of pY RNA1-s2 in the retina, relative to other tissues25, including the pineal gland. Moreover, it was found that pY RNA1-s2 selectively binds the nuclear matrix protein Matrin 3 and to a lesser degree to heterogeneous nuclear ribonucleoprotein U-like protein. The distribution of py RNA1-s2 in all retinae and retinal cell lines suggests a role in vision. Both these discoveries could not have been made using methods other than RNAseq.

Likewise, the finding of robust daily rhythms in the abundance of several long non-coding RNAs23 in the pineal gland under neural control, and the discovery of expression of lncSN134 in both the retina and pineal gland was dependent on the use of RNAseq. The long noncoding RNAs range in size significantly and like pineal miRs, remain largely unstudied and unknown.

Whereas RNAseq is a powerful technique, the results must be viewed with healthy skepticism, especially with transcripts that are weakly expressed and when evaluating small night/day differences in transcript abundance. Confirmation by an independent method should be considered. In addition, in the case of weakly expressed transcripts, the mapping of reads on the UCSC browser should be evaluated to confirm that the read assignment pattern is consistent with the intron/exon features of the transcript.

Experimental design

A problem that is considered in any study designed to measure day/night differences is the number of time points per day. Often this is limited by factors including the housing of animals and the number of animals per point necessary to obtain sound data. RNA-Seq introduces another factor, the cost of analysis. Accordingly, the design of the studies included in the database (Table 1) is also a reflection of the cost of sequencing and bioinformatics. The studies included sampling that ranged from two to six time points per day. When sampling is done at only two time points, noon and midnight, the potential for overlooking a dawn/dusk rhythm exists. Accordingly, it is best not to limit experiments to two time point studies, and to use four or more to detect daily rhythms. However, in the case of study of daily rhythms in the pineal gland, a two time point study will capture most large changes. Moreover, this approach is highly instructive, in that it provides valuable date on the levels of tens of thousands of transcripts. Accordingly, one can see merit in such studies.

The number of replicates to use is also another important issue. RNA-seq data are typically highly reproducible for most transcripts when normalized. This reflects a feature of the method, in that there is redundancy in the detection of a transcript, as a result of fragmentation and amplification. In the final analysis, each calculated transcript level is not simply a single measurement, but reflects multiple detection events, depending on the size of the transcript and abundance. Accordingly, in N=1 situations, it is possible to obtain an indication of statistical variance of all transcripts, and use this to determine whether, for example, a day/night difference is statistically significant.

Transcriptomics versus proteomics

Whereas RNA-Seq does provide a highly useful tool for the discovery and characterization of transcripts, it is not a substitute for proteomics. The study of an mRNA and its encoded protein often are in agreement as regards the presence and dynamic changes in both. However, this is clearly not the case in all situations.

An excellent example is Aanat. In the rat, Aanat mRNA, protein and activity increase at night, reflecting phosphorylation of the protein at two sites. When lights are turned on in the middle of the night, a rapid decrease in enzyme activity occurs, with little change in mRNA levels. The changes in enzyme activity are due to dephosphorylation of the protein, which is rapidly destroyed by proteasomal proteolysis, as reviewed32. Another example of mRNA levels and protein levels not exhibiting similar dynamics is found in studies of the rhesus pineal gland. There is little daily change in mRNA encoding Aanat, although the changes in enzyme activity are robust33. These observations are evidence that it is necessary to determine whether changes in an mRNA are associated with changes in an encoded protein to determine the relationship. Unfortunately, the science of proteomics has not advanced to the all-inclusive nature of mRNA analysis, in part because it is difficult to uniformly detect the possible posttranslational modifications.

Use of the database will allow investigators to initiate efforts to identify transcripts that are highly expressed in the pineal gland relative to the retina and or other tissues, transcripts that are highly expressed in the pineal gland of one species but not another, transcripts that exhibit marked night/day differences, transcripts that are under neural/adrenergic cyclic AMP control, and transcripts that exhibit changes in expression during development. In doing so, the web page should promote and enhance future studies of pineal cell biology.

Referencing the web page

The data on the web page are in the public domain and the use of the figures and data does not require authorization of the authors. The web page should be referenced by citing this publication.

Acknowledgements

The authors wish to express their appreciation for discussion and testing services provided by Apratim Mitra and Sydney Hertafeld of the Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The authors also want to acknowledge the contributions of the following: David “Dr. J.” Jacobowitz, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences; Stephen W. Hartley and James C. Mullikin, National Human Genome Research Institute, National Institutes of Health; Leming Shi, United States Food and Drug Administrations, National Center for Toxicological Research and the School of Basic Medical Sciences, Anhui Medical University, Hefei; and, Artem Zykovich, NICHD. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).

Footnotes

Conflicts of interest: No conflicts of interest related to this manuscript exist.

References

  • (1).Darmon MC; Guibert B; Leviel V; Ehret M; Maitre M; Mallet J Sequence of two mRNAs encoding active rat tryptophan hydroxylase. Journal of neurochemistry 1988, 51, 312–316. [DOI] [PubMed] [Google Scholar]
  • (2).Grenett HE; Ledley FD; Reed LL; Woo SL Full-length cDNA for rabbit tryptophan hydroxylase: functional domains and evolution of aromatic amino acid hydroxylases. Proc Natl Acad Sci U S A 1987, 84, 5530–5534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Ishida I; Obinata M; Deguchi T Molecular cloning and nucleotide sequence of cDNA encoding hydroxyindole O-methyltransferase of bovine pineal glands. J Biol Chem 1987, 262, 2895–2899. [PubMed] [Google Scholar]
  • (4).Dumas S; Darmon MC; Delort J; Mallet J Differential control of tryptophan hydroxylase expression in raphe and in pineal gland: evidence for a role of translation efficiency. J Neurosci Res 1989, 24, 537–547. [DOI] [PubMed] [Google Scholar]
  • (5).Zilberman-Peled B; Bransburg-Zabary S; Klein DC; Gothilf Y Molecular evolution of multiple arylalkylamine N-acetyltransferase (AANAT) in fish. Mar Drugs 2011, 9, 906–921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (6).Appelbaum L; Toyama R; Dawid IB; Klein DC; Baler R; Gothilf Y Zebrafish serotonin-N-acetyltransferase-2 gene regulation: pineal-restrictive downstream module contains a functional E-box and three photoreceptor conserved elements. Mol Endocrinol 2004, 18, 1210–1221. [DOI] [PubMed] [Google Scholar]
  • (7).Ben-Moshe Livne Z; Alon S; Vallone D; Bayleyen Y; Tovin A; Shainer I; Nisembaum LG; Aviram I; Smadja-Storz S; Fuentes M; Falcon J; Eisenberg E; Klein DC; Burgess HA; Foulkes NS; Gothilf Y Genetically Blocking the Zebrafish Pineal Clock Affects Circadian Behavior. PLoS Genet 2016, 12, e1006445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Hartley SW; Coon SL; Savastano LE; Mullikin JC; Program NCS; Fu C; Klein DC Neurotranscriptomics: The Effects of Neonatal Stimulus Deprivation on the Rat Pineal Transcriptome. PLoS One 2015, 10, e0137548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Clokie SJ; Lau P; Kim HH; Coon SL; Klein DC MicroRNAs in the pineal gland: miR-483 regulates melatonin synthesis by targeting arylalkylamine N-acetyltransferase. J Biol Chem 2012, 287, 25312–25324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Rovsing L; Clokie S; Bustos DM; Rohde K; Coon SL; Litman T; Rath MF; Moller M; Klein DC Crx broadly modulates the pineal transcriptome. Journal of neurochemistry 2011, 119, 262–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Bustos DM; Bailey MJ; Sugden D; Carter DA; Rath MF; Moller M; Coon SL; Weller JL; Klein DC Global daily dynamics of the pineal transcriptome. Cell Tissue Res 2011, 344, 1–11. [DOI] [PubMed] [Google Scholar]
  • (12).Backlund PS; Urbanski HF; Doll MA; Hein DW; Bozinoski M; Mason CE; Coon SL; Klein DC Daily Rhythm in Plasma N-acetyltryptamine. J Biol Rhythms 2017, 32, 195–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Tovin A; Alon S; Ben-Moshe Z; Mracek P; Vatine G; Foulkes NS; Jacob-Hirsch J; Rechavi G; Toyama R; Coon SL; Klein DC; Eisenberg E; Gothilf Y Systematic identification of rhythmic genes reveals camk1gb as a new element in the circadian clockwork. PLoS Genet 2012, 8, e1003116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Bailey MJ; Beremand PD; Hammer R; Bell-Pedersen D; Thomas TL; Cassone VM Transcriptional profiling of the chick pineal gland, a photoreceptive circadian oscillator and pacemaker. Mol Endocrinol 2003, 17, 2084–2095. [DOI] [PubMed] [Google Scholar]
  • (15).Karaganis SP; Kumar V; Beremand PD; Bailey MJ; Thomas TL; Cassone VM Circadian genomics of the chick pineal gland in vitro. BMC Genomics 2008, 9, 206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Collado-Torres L; Nellore A; Kammers K; Ellis SE; Taub MA; Hansen KD; Jaffe AE; Langmead B; Leek JT Reproducible RNA-seq analysis using recount2. Nat Biotechnol 2017, 35, 319–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (17).Kent WJ; Sugnet CW; Furey TS; Roskin KM; Pringle TH; Zahler AM; Haussler D The human genome browser at UCSC. Genome Res 2002, 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Raney BJ; Dreszer TR; Barber GP; Clawson H; Fujita PA; Wang T; Nguyen N; Paten B; Zweig AS; Karolchik D; Kent WJ Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 2014, 30, 1003–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Roberts GP; Larraufie P; Richards P; Kay RG; Galvin SG; Miedzybrodzka EL; Leiter A; Li HJ; Glass LL; Ma MKL; Lam B; Yeo GSH; Scharfmann R; Chiarugi D; Hardwick RH; Reimann F; Gribble FM Comparison of Human and Murine Enteroendocrine Cells by Transcriptomic and Peptidomic Profiling. Diabetes 2019, 68, 1062–1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Kansler ER; Verma A; Langdon EM; Simon-Vermot T; Yin A; Lee W; Attiyeh M; Elemento O; White RM Melanoma genome evolution across species. BMC Genomics 2017, 18, 136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Klein DC Evolution of the vertebrate pineal gland: the AANAT hypothesis. Chronobiol Int 2006, 23, 5–20. [DOI] [PubMed] [Google Scholar]
  • (22).Klein DC The 2004 Aschoff/Pittendrigh lecture: Theory of the origin of the pineal gland--a tale of conflict and resolution. J Biol Rhythms 2004, 19, 264–279. [DOI] [PubMed] [Google Scholar]
  • (23).Coon SL; Munson PJ; Cherukuri PF; Sugden D; Rath MF; Moller M; Clokie SJ; Fu C; Olanich ME; Rangel Z; Werner T; Mullikin JC; Klein DC Circadian changes in long noncoding RNAs in the pineal gland. Proc Natl Acad Sci U S A 2012, 109, 13319–13324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).Yamazaki F; Moller M; Fu C; Clokie SJ; Zykovich A; Coon SL; Klein DC; Rath MF The Lhx9 homeobox gene controls pineal gland development and prevents postnatal hydrocephalus. Brain structure & function 2014, 220, 1497–1509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Yamazaki F; Kim HH; Lau P; Hwang CK; Iuvone PM; Klein D; Clokie SJ pY RNA1-s2: a highly retina-enriched small RNA that selectively binds to Matrin 3 (Matr3). PLoS One 2014, 9, e88217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (26).Patro R; Duggal G; Love MI; Irizarry RA; Kingsford C Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 2017, 14, 417–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Ganguly S; Gastel JA; Weller JL; Schwartz C; Jaffe H; Namboodiri MA; Coon SL; Hickman AB; Rollag M; Obsil T; Beauverger P; Ferry G; Boutin JA; Klein DC Role of a pineal cAMP-operated arylalkylamine N-acetyltransferase/14–3-3-binding switch in melatonin synthesis. Proc Natl Acad Sci U S A 2001, 98, 8083–8088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Ganguly S; Weller JL; Ho A; Chemineau P; Malpaux B; Klein DC Melatonin synthesis: 14–3-3-dependent activation and inhibition of arylalkylamine N-acetyltransferase mediated by phosphoserine-205. Proc Natl Acad Sci U S A 2005, 102, 1222–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Xu S; Witmer PD; Lumayag S; Kovacs B; Valle D MicroRNA (miRNA) transcriptome of mouse retina and identification of a sensory organ-specific miRNA cluster. J Biol Chem 2007, 282, 25053–25066. [DOI] [PubMed] [Google Scholar]
  • (30).Li H; Gong Y; Qian H; Chen T; Liu Z; Jiang Z; Wei S Brain-derived neurotrophic factor is a novel target gene of the has-miR-183/96/182 cluster in retinal pigment epithelial cells following visible light exposure. Mol Med Rep 2015, 12, 2793–2799. [DOI] [PubMed] [Google Scholar]
  • (31).Xiang L; Chen XJ; Wu KC; Zhang CJ; Zhou GH; Lv JN; Sun LF; Cheng FF; Cai XB; Jin ZB miR-183/96 plays a pivotal regulatory role in mouse photoreceptor maturation and maintenance. Proc Natl Acad Sci U S A 2017, 114, 6376–6381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Klein DC Arylalkylamine N-acetyltransferase: “the Timezyme”. J Biol Chem 2007, 282, 4233–4237. [DOI] [PubMed] [Google Scholar]
  • (33).Coon SL; Del Olmo E; Young WS 3rd; Klein DC Melatonin synthesis enzymes in Macaca mulatta: focus on arylalkylamine N-acetyltransferase (EC 2.3.1.87). J Clin Endocrinol Metab 2002, 87, 4699–4706. [DOI] [PubMed] [Google Scholar]

RESOURCES