Abstract
The website and database https://snengs.nichd.nih.gov provides RNA sequencing data from multi‐species analysis of the pineal glands from zebrafish (Danio rerio), chicken (White Leghorn), rat (Rattus nove gicus), mouse (Mus musculus), rhesus macaque (Macaca mulatta), and human (Homo sapiens); in most cases, retinal data are also included along with results of the analysis of a mixture of RNA from tissues. Studies cover day and night conditions; in addition, a time series over multiple hours, a developmental time series and pharmacological experiments on rats are included. The data have been uniformly re‐processed using the latest methods and assemblies to allow for comparisons between experiments and to reduce processing differences. The website presents search functionality, graphical representations, Excel tables, and track hubs of all data for detailed visualization in the UCSC Genome Browser. As more data are collected from investigators and improved genomes become available in the future, the website will be updated. This database is in the public domain and elements can be reproduced by citing the URL and this report. This effort makes the results of 21st century transcriptome profiling widely available in a user‐friendly format that is expected to broadly influence pineal research.
Keywords: biological rhythms, neurotranscriptomics, pineal, retina, RNA‐Seq, transcriptome, webpage
1. INTRODUCTION
The pineal transcriptome has been studied for over 30 years, starting with Northern blot detection of single transcripts encoding proteins involved in melatonin synthesis, including those encoding Tph1 and Asmt (Hiomt). 1 , 2 , 3 , 4 Since then, pineal transcriptomics has spanned the development of transcriptomic assays including cDNA‐based hybridization technology, qRT‐PCR, and RNA‐Seq. 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15
High‐throughput sequencing offers many advantages for assaying the transcriptome, but the field of bioinformatics moves quickly with tools, algorithms, and genome assemblies changing from year to year. As a result, data from earlier studies cannot be meaningfully compared with data from later studies that used different methods without completely re‐processing all data uniformly using updated methods, assemblies, and annotations. This has been recognized for example in the recount2 project, 16 which has re‐processed tens of thousands of human RNA‐Seq samples from public repositories uniformly.
Furthermore, for detailed study of particular loci it is critical to visualize expression alongside genomic data from other studies. Genome browsers such as the UCSC Genome Browser 17 allow just this. In particular, this browser supports track hubs that allow for the configuration, coloration, and organization of collections of many tracks using a web interface. 18 This allows researchers to generate highly customized views tailored to their research interest, viewing pineal gland data from this study directly alongside a wealth of publicly available data prepared and made available by the UCSC team.
Cross‐species transcriptome reports appear in the literature, focused on two‐species comparisons, for example, mouse versus human 19 and zebrafish versus human. 20 Here, we introduce a website that aggregates multiple RNA‐Seq studies of the pineal gland spanning years of transcriptomics research on six species across three vertebrate classes, processed uniformly and presented in a user‐friendly site allowing inspection of individual genes as well as UCSC Genome Browser track hubs of each experiment. The site can be found at https://snengs.nichd.nih.gov. The results of this effort facilitate the comparative and evolutionary analysis of the pineal gland and retina, reflecting an interest in the evolutionary history that links these tissues as derivatives of a common ancestral photodetector. 21 , 22 Most of the tens of thousands of transcripts profiled are otherwise absent from the pineal and retinal literature and in many cases have not been well studied in any tissue. Accordingly, the web page opens new avenues of research.
This report alerts investigators to the availability of this resource, which will be of special value where user‐friendly compiled pineal and retinal RNA‐Seq data are otherwise not available in any format. The graphs and other information extracted from the web page are in the public domain. The web page and its underlying infrastructure is designed to be easily updated as data from new experiments become available or as reanalysis of existing datasets using improved software and updated genomes is completed.
2. METHODS
2.1. Animals
Samples were collected to identify differential day/night ratios; and, in the case of the rat, expression was studied as a function of development, denervation, and adrenergic–cyclic AMP stimulation (Table 1). 9 , 12 , 23 , 24 , 25 In many cases, retinal tissue was profiled in parallel. Mixed tissue RNA samples were used in conjunction with the pineal gland and retina to estimate the enrichment of a transcript.
Table 1.
Study No. | Animals | Experiment name | Tissue | Lighting | Sampling times | Notes | Reps | Refs |
---|---|---|---|---|---|---|---|---|
101 | Chicken, White Leghorn | Pineal gland and retina; time series; constant darkness | PG, R | D:D | CT 0, 4, 8, 12, 16, 20 | N/A | 3 | N/A |
102 | Human | Pineal gland; day and night | PG | L:D 12:12 | ZT 6, 18 | N/A | 2, 4 | N/A |
103 | Mouse, 129sv | Pineal gland, retina and mixed tissue; day and night; Eya2 KO | PG, R, MT | L:D 12:12 | ZT 6, 18 | Eya2 KO | 1 | N/A |
104 | Rat, Sprague Dawley | Pineal gland; day and night; and mixed tissue, day; polyA | PG, MT | L:D 14:10 | ZT 7, 19 | N/A | 1 | 23 |
105 | Rat, Sprague Dawley | Pineal gland development; day and night | PG | L:D 14:10 | ZT 7, 19 | Ages: E21, P5, P20, P40 | 1 | 23, 24 |
106 | Rat, Sprague Dawley | Pineal gland (RP), retina (RR) and mixed tissue (RX); 24‐hr time series | PG, R, MT | L:D 14:10 | ZT 1, 7, 13, 15, 19, 23 | N/A | 1 | 23 |
107 | Rat, Sprague Dawley | Pineal gland; day and night; and mixed tissue, day; Ribominus | PG, MT | L:D 14:10 | ZT 7, 19 | N/A | 1 | 23 |
108 | Rat, Sprague Dawley | Pineal gland; superior cervical decentralization (DCN) or ganglionectomy (SCGX); day and night | PG | L:D 14:10 | ZT 7, 19 | DCN, SCGX, Sham, Control | 3 | 8 |
109 | Rat, Sprague Dawley | Pineal gland in vitro; norepinephrine (NE) or dibutyryl cyclic AMP (DBcAMP) | PG | N/A | N/A | Cultured glands; NE, DBcAMP, Control | 3 | 8 |
110 | Rat, Sprague Dawley | Pineal gland marker genes; day and night | PG | L:D 14:10 | ZT 7, 19 | N/A | 1 | 23 |
111 | Rhesus macaque | Pineal gland, retina and mixed tissue; time series | PG, R, MT | L:D 12:12 | ZT 6, 12, 18, 24 | Dawn, Day, Dusk, Night | 3 | 12 |
112 | Zebrafish | Pineal gland; time series; constant darkness | PG | D:D | CT 2, 6, 10, 14, 18, 22 | N/A | 2 | 7 |
113 | Zebrafish | Eye, pineal gland and mixed tissue; day and night | Eye, PG, MT | L:D 12:12 | ZT 6, 18 | clocka KO | 1 | N/A |
Thirteen experiments encompassing six species are on the website; additional experiments are to be added as data become available. Experimental details are available on the website (https://snengs.nichd.nih.gov/experiments) and the listed references.
Abbreviations: CT, circadian time; D:D, constant darkness; L:D, light:dark; MT, mixed tissue; N/A, not available; PG, pineal gland; R, retina; Refs, references; Reps, number of replicates; RP, rat pineal gland; RR, rat retina; RX rat mixed tissue; ZT, Zeitgeber time.
2.2. Sequencing
Illumina sequencing was used in all cases. Specific experimental details are available on each experiment's page on the website (see https://snengs.nichd.nih.gov/experiments) that recapitulates original experimental methods in the respective original manuscript. All data across all species were re‐analyzed in the assemblies and annotations described (https://snengs.nichd.nih.gov/methods). Quality of FASTQ files was analyzed with FastQC v0.11.8 and MultiQC v1.6 and all samples demonstrated high‐quality sequencing. Adapters were removed and light quality trimming was performed with cutadapt v1.18 using additional arguments ‐q‐minimum length 25.
These trimmed reads were provided to Salmon v0.12.0 26 for transcript quantification using an index built for transcriptomes as described (https://snengs.nichd.nih.gov/methods), and run using the additional arguments –gcBias –seqBias ‐‐libTypeA. For each gene, the per‐transcript values reported by Salmon v0.12.0 were summed to provide a gene‐level expression estimate in units of transcripts per million reads (TPM). These are the values reported in the tables and plots of each gene page.
2.3. Genomic visualization
For genomic visualization in the UCSC Genome Browser, trimmed reads were aligned using HISAT2 v2.1.0 to the respective genome indicated below. From these aligned reads, normalized bigWig files were created using the deepTools v3.1.3 bamCoverage tool using additional options ‐‐minMappingQuality 20 ‐‐smoothLength 10 ‐‐normalizeUsing BPM ‐‐binSize 1 such that multi‐mappers were ignored. For stranded libraries, the tool was run twice: once with ‐‐filterRNAstrand forward and once with ‐‐filterRNAstrand reverse to get separate tracks for each strand. The resulting bigWig files were combined into a UCSC track hub using the trackhub Python package.
UCSC (typically used for extensive visualization capabilities) and Ensembl (typically used for its comprehensive annotations) are not consistent in their chromosome nomenclature. To facilitate linking from gene‐level transcription estimates on this website to genomic signal at UCSC, we converted chromosome names from Ensembl to the UCSC equivalents by matching the md5sums of each chromosome; see GitHub repository (https://github.com/NICHD‐BSPC/chrom‐name‐mappings) for details and code.
2.4. Genome and transcriptome assemblies
For each species, the genomic assembly indicated (https://snengs.nichd.nih.gov/methods) was used for visualization in the UCSC Genome Browser, while the transcriptome was used to calculate TPM expression estimates to display in plots on individual gene pages.
2.5. Implementation details
The website is written in the Python programming language using the ‘Flask’ framework. Configuration of the website is driven by a YAML format file that points to Salmon v0.12.0 output along with details like methods descriptions, UCSC track hub colors, bar plot colors, and any other experiment‐specific configuration. This greatly streamlines the process of adding new studies and new species. Data were processed using lcdb‐wf (https://github.com/lcdb/lcdb‐wf), which is itself driven by YAML configuration in a species‐agnostic manner, allowing for uniform processing across all studies.
3. RESULTS
The Home Page (https://snengs.nichd.nih.gov) introduces the user to the main sections of the database (Figure 1A). Selecting the Search section displays the Search subpage (Figure 1B). Entering a gene symbol (ie, Aanat) in the query box opens the Results subpage, which contains a listing of species and experiments (Figure 2). The Search function will accept alternative symbols; however, when difficulty is encountered obtaining a result, the user is encouraged to refer to gene databases for assistance. This page contains information on the samples, including species, a brief description of the experiment and a Link to the gene page.
Clicking on that Link (Figure 2), opens a Gene subpage (Figure 3) with links to the Ensembl data (gene id) and the UCSC Genome Browser for the gene (Open UCSC track hub for this gene), in addition to presenting experimental results in a bar graph. These results are normalized count data (in TPM). In cases where multiple experiments for a species exist, all experiments are displayed and can be viewed by scrolling vertically.
Selecting Experiments from the Home page displays the experiments with four links (Figure 4). The first is the Search subpage, described above. The second (Download) retrieves the data in an Excel file. The third retrieves the Details (Figure 5) of the experiment, including sample preparation and data analysis; scrolling horizontally is necessary to open the table. The last is a Link to the UCSC Genome Browser, which documents the location of reads mapping for each gene.
The Methods and Help pages are not presented as figures. The Methods page contains general information on the Bioinformatics methods and identifies the genomic assemblies used; the Help subpage has links to tutorials on the use of the UCSC browser and contact information for further assistance.
As an example of the utility of comparing data across multiple species in a uniform format, we searched for differences in the day/night levels of transcripts among species. As shown in Table 2, the large night/day rhythms in the transcript abundance of several genes in the rat are not seen in the rhesus monkey or to a similar degree in other species (https://snengs.nichd.nih.gov/search). This emphasizes the importance of post‐translational modifications that occur. 27 , 28 It also is a caution against making generalities based on studies of one species.
Table 2.
Species | Night/Day | Day/Night | ||
---|---|---|---|---|
>30‐fold | 3‐ to 30‐fold | >30‐fold | 3‐ to 30‐fold | |
Chicken | Gos | Spcs1, Gnb3, Lbh, Lypla1, Prdm8, Aanat, Tph1, Am89a, Ckmt1a, Chga, Ddc, Ndrg | Rbp4, Rcan2, SSx2ip, Chgb, Calb2, R3hdml, Atoh8, Efr3a | |
Human | DUSP1, HKDC1 | |||
Mouse | Gh, Prl | Aanat, Odc1, Mat2a, Kif5c, Nap1l5, Tbc1d15, Crem, Tbc1d1, Tjap1, Ndufa3, Syt4, Mitf, Rmdn3, Extl3, Amd1, Ywhaz, Ccnl1, Slc3a2, Impa1, Azin1, Prosc, Iqcb1, Crx, Rab3gap1, Srxn1, Manf, Ppa2, Gja1, Psme2, Arf, Cbx7, Tph1, mt‐Ts2, Fgf12, Mpp6, Gnai2, Necap1, Tpm4, Atp2a2, Hdhd3, Rnf13, Ip6k1, Dnajb6, Sik3, Ergic1, Tmem229b, Clptm1, Hsph1, Auh, mt‐Tw, mt‐Ti | Igkj4, Igkj1, Tpt1‐ps3, Ighj4 | Enpp2, Ttr, Chmp1a, Unc119, Ccnd2, Acp2, Atp6v0a2, Tef, Igf2. Ermard, Lamb2, Fabp7, Twf1, Ewsr1, Etf1, Fxyd1, Arih2, Zfand6, Wbscr22, Ndrg1, Tbc1d17, Cox17, Fam166a, Atox1, Rpgrip1, Ackr1, mt‐Tl1, Dpysl3, Cisd3, Prpf19, Sag, Tpm3, Ift46, Apod, Taz |
Rat |
Aanat, Atp7b, Slc15a1, Dclk3 |
Irs2, Crem, Sik1, Ptch1, Cd24, Zrsr1, Rcan1, Kctd3, Bsx, Mat2a, Etnk1, Camk1g, Mbnl2, Gxylt1,Gem, Nptx1, Pcdh1, Eml5, Galnt16, Pde4b, Reep2, Syt4, Tjp2, Snap25, Hbb, Hba‐a2, Dnm2, Fkbp5, Man2a1, Fry, Dclk1, Mcam, Arhgap24, Hspa5, Slc17a6, Farp2, Rhob, Cry2, Lamb1, Hsph1, Ncald, Abca1, Mapk6, Ankrd52, Snrk, Slc7a6, Shroom3, Sik2,Ttc8, Nacad, Qsox1, Xpot, Zhx1, Wipf3, Abcf1,Frmpd1 | Matr3 | Gucy1a1, Frmd4b, Eef1a2, Scrn1, Hook1, Ttr, Pdc, Cfl2 |
Rhesus | PENK, CCN2, RP1, FAM167a, TGFBR3, ATP2A3, | OPN1SW, VASH1, PDC, GNGT1, LMOD1 | ||
Zebrafish | Nr1d1 |
Sik1, Dbpb, Dusp1, Dtx4, Rdh8b, Gjd2b, Gpr137bb, Aanat2, CR391986.1, Dclk2a, Guca1a, Tph1a, Gchi1, Ptn, Myh9a, Lpde6ga, Id2b, Cxcl14, Gabarapb |
Nfil3‐5 | Bhlhe40, Rbp3, Cry1aa, Rorcb, Pde6ha, Rorca, Camk1gb, Rbp4, Rp4l, Kera, Ry1bb, Per2, Irbpl, Cyp27c1, Nfil3‐6, Pfkfb4b, Sagb, Ahcy, Sdha, Eno1a, Add45ga, Tmtops2a, lrp1a, Hbba1, Aldocb, Tmem237b, Gpr146, Aldoa, Jag2b, Aclya, Cry1ba, Ybx1, Rcvrn3, Acadm. Stra6, Hbba1 |
The day and night levels of transcripts were compared by calculating night/day and day/night ratios of normalized values (TPM + 0.1). Noncoding RNAs were eliminated. Only the top 1000 genes with official symbols were further grouped by ratios into greater than 30‐fold and 3‐ to 30‐fold differences. Genes are listed according to strength of rhythm. Human pineal data are included, noting that times of death and of tissue removal were not tightly controlled; accordingly, the indication of rhythmicity might be impacted. The data were downloaded from the Experiments page. In addition to the single datasets for chicken, human, mouse, and rhesus, the zebrafish “eye, pineal gland & mixed tissue” and rat “pineal marker genes” datasets were used.
The data also focus on the similarity of the genomic profiles of pineal glands from the species studied (Table 3). Selective expression of each gene was calculated as the ratio of expression of a specific gene in the pineal gland to that in a mixture of RNA from a group of tissues. As expected, three genes responsible for melatonin synthesis (Tph1, Aanat and Asmt) were selectively expressed in the glands studied. Another group of genes selectively expressed in the pineal gland includes those established as markers of the retina. The high expression of these genes only in the pineal gland and retina is known. 21 , 22 However, the specific functions of these retina‐related genes and other selectively expressed genes in the pineal gland have not received significant attention and deserve further analysis.
Table 3.
Genes selectively expressed among top 1000 genes | |
---|---|
Three species | Adra1a, Adrb1, Ankrd33, Casz1, Drd4, Gngt*, Grk*, Grm*, Guca1a, Impg1*, Kif*, Opn*, Pax3, Pcdh*, Pla2g*, Ppef2, Prph2, Rdh*, Rp1*, Rps*, Rxrg, Slc16*, Slc6a*, Trim* |
Four species | Aanat*, Aipl1, Asmt, Bsx, Cabp*, Cacna1*, Cacna2d*, Celf3, Chrna3*, Chrnb4, Cngb3*, Col*, Cplx*, Crb*, Crx, Gch1, Impg2, Isl2, Kcn*, Lhx4, Lrit*, Myo*, Neurod*, Otx2, Pde6*, Ptprn, Rbp3, Slc24*, Slc38a*, Tmem*, Tph1, Ush2a |
Genes were ranked according to selective expression in the pineal glands from zebrafish, mouse, rat, and rhesus. Expression was normalized (TPM + 0.1) and selective expression was calculated relative to expression in a mixture of tissue. The top 1000 selectively expressed genes were identified and those present in three or four out of four of the species are listed above. The data sources are given in the legend to Table 2. Asterisk (*), more than one homolog exists in some species; for example, Aanat* represents Aanat in mouse, rat, and rhesus in addition to Aanat1 and Aanat2 in zebrafish.
An analysis of the conserved highly expressed and tissue specific transcripts in the pineal gland, retina and in both tissues (Table 4) was done by identifying the highly tissue specific transcripts. They were then binned according to their expression ratio (pineal gland: retina). The results reveal a relatively smaller sets of pineal‐specific and retina‐specific genes, and a larger group of genes expressed in both tissues. Noting that these genes are selectively expressed only in these two tissues and not in others, it is highly likely that these genes represent evolutionarily conserved elements that can be considered to be related to the common origin of both tissues. In some cases, their roles have been identified, but in many cases, a functional role has not been established.
Table 4.
Group | Enriched transcripts | |
---|---|---|
Four of four species | Three of four species | |
Pineal gland | Aanat*, Asmt, Chrnb4, Gch1, Gnat2, Gnb3*, Guca1a, Lhx4, Pde6c, Sall1*, Tph1*, Pax3 | Alx4, Bsx, Chrnb3, Gngt2*, Lrrc38, Ptpn20 |
Pineal gland and retina | Arr3*, Cabp4, Cacna1f*, Cacna2d4, Cnga1*, Cngb3, Cplx4*, Crb2*, Crx, Drd4, Fam161a, Gabrr1, Gabrr3*, Gngt1, Grk1*, Guca1b, Impg1*, Impg2*, Kcnb*, Lrit1*, Mpp4, Msi1, Myo*, Nyx, Opn1sw, Otx2*, Pdc*, Pde6g*, Rbp3, Rlbp1, Rom1*, Rp1l1, Slc24a1, Stx3, Tulp*, Unc119*, Ush2a | Adrb1, Cabp5* Crabp*, Crb1, Crocc, Egflam, Fabp*, Fam169a, Gnb5, Gng1, Grik1*, Gucy2d, Hcn*, Igsf9, Impdh1, Kcn*, Kcna*, Kcnj14, Lrit2, Lrit3, Mak, Mgarp, Neurod4, Ntng2, Nxnl1, Pcdh15, Pla2g*, Plch2, Ppef2, Prph2*, Prss3*, Rax*, Reep6, Rorb, Rrp1b, Samd11, Slc16a*, Slc17a*, Slc24*, Slc38a*, Slc39*, Slc4*, Slc6a6, Tmem215, Tmem237*, Trpm1* |
Retina | Abca4*, Ankrd33*, Ccdc*, Cdhr1, Chrna3a, Col*, Cryaa, Fscn2, Gucy2f, Irs1*, Isl1, Kcnv2*, Nr2e3, Nrl, Pde6a, Pde6b, Pde6h, Rdh8, Rho, Rpl, Rrh, Sag*, Sh2d*, Six3*, Slc1a7, Tfap2*, Vsx1, Vsx2 | Cryba*, Crybb2, Crygm*, Gabrr2, Glb1l2, Gnat1, Grm6, Isl2, Lgsn, Lim2, Mab21l1, Opn1mw*, Opn4*, Pax6*, Prdm13, Prph*, Rcvrn*, Rgr, Rtbdn*, Samd7, Vax2 |
Enrichments in the pineal gland and retina relative to other tissues have been assessed by the determining the ratio of the normalized (TPM ++.1) abundance of a transcript in each tissue relative to that in the mixed RNA sample (TPM + 0.1) to yield a relative expression value (rEx). Mixed RNA samples were made by mixing equal amounts of RNA from 6 to 20 tissues. The rEx values of the top 300 enriched transcript from the pineal gland and the top 300 from the retina were compared (pineal gland rEx/retina rEx) and transcripts that were > 10‐fold were binned as pineal gland and those < 10 fold as retina; maximum levels were approximately 1000 for the pineal gland and 1/1000 for the retina. The remaining transcripts comprise the pineal gland and retina group. Zebrafish, mouse, rat, and rhesus are included. The rat data are from the 24‐hr time series experiment, the zebrafish data from the experiment with mixed tissue, and the rhesus and mouse data are from single experiments; the latter is from 129sv mice. Data from all time points have been averaged and normalized (TPN + 0.1). The results above indicate whether a listed transcript is detected in all four or in only three species evaluated. Asterisk (*), more than one homolog exists in some species; for example, Aanat* represents Aanat in mouse, rat, and rhesus in addition to Aanat1 and Aanat2 in zebrafish.
4. DISCUSSION
This database will serve as a foundation for future molecular biological research on the pineal gland and retina, making available the data to scientists with a computer and an internet connection. The uniform processing of raw data makes the comparison of results more meaningful and takes advantage of advances in tools, algorithms, assemblies, and annotations since original publication. Whereas the human and mouse genomes are the most highly annotated, and the chicken and zebrafish less so, the maturity of all annotations allows for in‐depth analysis of nearly all genes. A potential problem is that symbols used for identification of a gene in one species may not be used in other species or may be used for different genes. Hence, in cases where identification is questionable, confirmation may require analysis of sequence homology.
4.1. Utility and accuracy of RNA‐Seq data
In judging the utility and accuracy of the RNA‐Seq data, it should be noted that there is good agreement with data from other methods for the analysis of pineal gland and retina material, including microarray, Northern blot and qRT‐PCR as regards day/night differences. 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 Accordingly, the RNA‐Seq data can be viewed as highly useful and reliable.
The method also has advantages over other methods, perhaps the most important is that it sequences all transcripts, including those without a history in any literature. This opens new avenues for study for the pineal gland and retina. One of the most fertile areas is the identification of noncoding RNAs, both micro RNAs and long noncoding RNAs. 9 , 23 Some of these are known to have daily rhythms in both tissues. Noteworthy is the discovery of a unique micro RNA‐183‐96‐182 cluster in the pineal gland and retina, 9 , 29 which represents the major component of pineal miRNAs. Accordingly, it can be considered to be an additional marker of the common ancestral photodetector which gave rise to the pineal gland and retina. Although the function of this cluster remains unknown in the pineal gland, it has been reported to play a role in phototransduction and development in the eye. 30 , 31
Study of pineal miRNAs also led to the discovery of very high levels of pY RNA1‐s2 in the retina, relative to other tissues, 25 including the pineal gland. Moreover, it was found that pY RNA1‐s2 selectively binds the nuclear matrix protein Matrin 3 and to a lesser degree to heterogeneous nuclear ribonucleoprotein U‐like protein. The distribution of pY RNA1‐s2 in all retinae and retinal cell lines suggests a role in vision. Both these discoveries could not have been made using methods other than RNA‐Seq.
Likewise, the finding of robust daily rhythms in the abundance of several long noncoding RNAs 23 in the pineal gland under neural control, and the discovery of expression of lncSN134 in both the retina and pineal gland was dependent on the use of RNA‐Seq. The long noncoding RNAs range in size significantly and like pineal miRNAs, remain largely unstudied and unknown.
Whereas RNA‐Seq is a powerful technique, the results must be viewed with healthy skepticism, especially with transcripts that are weakly expressed and when evaluating small night/day differences in transcript abundance. Confirmation by an independent method should be considered. In addition, in the case of weakly expressed transcripts, the mapping of reads on the UCSC browser should be evaluated to confirm that the read assignment pattern is consistent with the intron/exon features of the transcript.
4.2. Experimental design
A problem that is considered in any study designed to measure day/night differences is the number of time points per day. Often this is limited by factors including the housing of animals and the number of animals per point necessary to obtain sound data. RNA‐Seq introduces another factor, the cost of analysis. Accordingly, the design of the studies included in the database (Table 1) is also a reflection of the cost of sequencing and bioinformatics. The studies included sampling that ranged from two to six time points per day. When sampling is done at only two time points, noon, and midnight, the potential for overlooking a dawn/dusk rhythm exists. Accordingly, it is best not to limit experiments to two time point studies and to use four or more to detect daily rhythms. However, in the case of study of daily rhythms in the pineal gland, a two time point study will capture most large changes. Moreover, this approach is highly instructive, in that it provides valuable data on the levels of tens of thousands of transcripts. Accordingly, one can see merit in such studies.
The number of replicates to use is also another important issue. RNA‐Seq data are typically highly reproducible for most transcripts when normalized. This reflects a feature of the method, in that there is redundancy in the detection of a transcript, as a result of fragmentation and amplification. In the final analysis, each calculated transcript level is not simply a single measurement, but reflects multiple detection events, depending on the size of the transcript and abundance. Accordingly, in N = 1 situations, it is possible to obtain an indication of statistical variance of all transcripts, and use this to determine whether, for example, a day/night difference is statistically significant.
4.3. Transcriptomics versus proteomics
Whereas RNA‐Seq does provide a highly useful tool for the discovery and characterization of transcripts, it is not a substitute for proteomics. The study of an mRNA and its encoded protein often are in agreement as regards the presence and dynamic changes in both. However, this is clearly not the case in all situations.
An excellent example is Aanat. In the rat, Aanat mRNA, protein, and activity increase at night, reflecting phosphorylation of the protein at two sites. When lights are turned on in the middle of the night, a rapid decrease in enzyme activity occurs, with little change in mRNA levels. The changes in enzyme activity are due to dephosphorylation of the protein, which is rapidly destroyed by proteasomal proteolysis, as reviewed. 32 Another example of mRNA levels and protein levels not exhibiting similar dynamics is found in studies of the rhesus pineal gland. There is little daily change in mRNA encoding Aanat, although the changes in enzyme activity are robust. 33 These observations are evidence that it is necessary to determine whether changes in an mRNA are associated with changes in an encoded protein to determine the relationship. Unfortunately, the science of proteomics has not advanced to the all‐inclusive nature of mRNA analysis, in part because it is difficult to uniformly detect the possible post‐translational modifications.
Use of the database will allow investigators to initiate efforts to identify transcripts that are highly expressed in the pineal gland relative to the retina and or other tissues, transcripts that are highly expressed in the pineal gland of one species but not another, transcripts that exhibit marked night/day differences, transcripts that are under neural/adrenergic cyclic AMP control, and transcripts that exhibit changes in expression during development. In doing so, the web page should promote and enhance future studies of pineal cell biology.
4.4. Referencing the web page
The data on the web page are in the public domain and the use of the figures and data does not require authorization of the authors. The web page should be referenced by citing this publication.
CONFLICTS OF INTEREST
No conflicts of interest related to this manuscript exist.
ACKNOWLEDGEMENTS
The authors wish to express their appreciation for discussion and testing services provided by Apratim Mitra and Sydney Hertafeld of the Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The authors also want to acknowledge the contributions of the following: David “Dr J.” Jacobowitz, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences; Stephen W. Hartley and James C. Mullikin, National Human Genome Research Institute, National Institutes of Health; Leming Shi, United States Food and Drug Administrations, National Center for Toxicological Research and the School of Basic Medical Sciences, Anhui Medical University, Hefei; and, Artem Zykovich, NICHD. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).
Chang E, Fu C, Coon SL, et al. Resource: A multi‐species multi‐timepoint transcriptome database and webpage for the pineal gland and retina. J Pineal Res. 2020;69:e12673 10.1111/jpi.12673
In memory of David Jacobowitz (1931‐2018), a gentleman and distinguished scholar with unbridled enthusiasm.
Eric Chang, Cong Fu and Steven L. Coon contributed equally to this manuscript.
REFERENCES
- 1. Darmon MC, Guibert B, Leviel V, Ehret M, Maitre M, Mallet J. Sequence of two mRNAs encoding active rat tryptophan hydroxylase. J Neurochem. 1988;51:312‐316. [DOI] [PubMed] [Google Scholar]
- 2. Grenett HE, Ledley FD, Reed LL, Woo SL. Full‐length cDNA for rabbit tryptophan hydroxylase: functional domains and evolution of aromatic amino acid hydroxylases. Proc Natl Acad Sci USA. 1987;84:5530‐5534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ishida I, Obinata M, Deguchi T. Molecular cloning and nucleotide sequence of cDNA encoding hydroxyindole O‐methyltransferase of bovine pineal glands. J Biol Chem. 1987;262:2895‐2899. [PubMed] [Google Scholar]
- 4. Dumas S, Darmon MC, Delort J, Mallet J. Differential control of tryptophan hydroxylase expression in raphe and in pineal gland: evidence for a role of translation efficiency. J Neurosci Res. 1989;24:537‐547. [DOI] [PubMed] [Google Scholar]
- 5. Zilberman‐Peled B, Bransburg‐Zabary S, Klein DC, Gothilf Y. Molecular evolution of multiple arylalkylamine N‐acetyltransferase (AANAT) in fish. Mar Drugs. 2011;9:906‐921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Appelbaum L, Toyama R, Dawid IB, Klein DC, Baler R, Gothilf Y. Zebrafish serotonin‐N‐acetyltransferase‐2 gene regulation: pineal‐restrictive downstream module contains a functional E‐box and three photoreceptor conserved elements. Mol Endocrinol. 2004;18:1210‐1221. [DOI] [PubMed] [Google Scholar]
- 7. Ben‐Moshe Livne Z, Alon S, Vallone D, et al. Genetically blocking the Zebrafish pineal clock affects circadian behavior. PLoS Genet. 2016;12:e1006445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Hartley SW, Coon SL, Savastano LE, et al. Neurotranscriptomics: The Effects of Neonatal Stimulus Deprivation on the Rat Pineal Transcriptome. PLoS One. 2015;10:e0137548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Clokie SJ, Lau P, Kim HH, Coon SL, Klein DC. MicroRNAs in the pineal gland: miR‐483 regulates melatonin synthesis by targeting arylalkylamine N‐acetyltransferase. J Biol Chem. 2012;287:25312‐25324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Rovsing L, Clokie S, Bustos DM, et al. Crx broadly modulates the pineal transcriptome. J Neurochem. 2011;119:262‐274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Bustos DM, Bailey MJ, Sugden D, et al. Global daily dynamics of the pineal transcriptome. Cell Tissue Res. 2011;344:1‐11. [DOI] [PubMed] [Google Scholar]
- 12. Backlund PS, Urbanski HF, Doll MA, et al. Daily rhythm in plasma N‐acetyltryptamine. J Biol Rhythms. 2017;32:195‐211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Tovin A, Alon S, Ben‐Moshe Z, et al. Systematic identification of rhythmic genes reveals camk1gb as a new element in the circadian clockwork. PLoS Genet. 2012;8:e1003116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Bailey MJ, Beremand PD, Hammer R, Bell‐Pedersen D, Thomas TL, Cassone VM. Transcriptional profiling of the chick pineal gland, a photoreceptive circadian oscillator and pacemaker. Mol Endocrinol. 2003;17:2084‐2095. [DOI] [PubMed] [Google Scholar]
- 15. Karaganis SP, Kumar V, Beremand PD, Bailey MJ, Thomas TL, Cassone VM. Circadian genomics of the chick pineal gland in vitro. BMC Genom. 2008;9:206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Collado‐Torres L, Nellore A, Kammers K, et al. Reproducible RNA‐seq analysis using recount2. Nat Biotechnol. 2017;35:319‐321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Kent WJ, Sugnet CW, Furey TS, et al. The human genome browser at UCSC. Genome Res. 2002;12:996‐1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Raney BJ, Dreszer TR, Barber GP, et al. Track data hubs enable visualization of user‐defined genome‐wide annotations on the UCSC Genome Browser. Bioinformatics. 2014;30:1003‐1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Roberts GP, Larraufie P, Richards P, et al. Comparison of human and murine enteroendocrine cells by transcriptomic and peptidomic profiling. Diabetes. 2019;68:1062‐1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Kansler ER, Verma A, Langdon EM, et al. Melanoma genome evolution across species. BMC Genom. 2017;18:136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Klein DC. Evolution of the vertebrate pineal gland: the AANAT hypothesis. Chronobiol Int. 2006;23:5‐20. [DOI] [PubMed] [Google Scholar]
- 22. Klein DC. The 2004 Aschoff/Pittendrigh lecture: Theory of the origin of the pineal gland–a tale of conflict and resolution. J Biol Rhythms. 2004;19:264‐279. [DOI] [PubMed] [Google Scholar]
- 23. Coon SL, Munson PJ, Cherukuri PF, et al. Circadian changes in long noncoding RNAs in the pineal gland. Proc Natl Acad Sci U S A. 2012;109:13319‐13324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Yamazaki F, Moller M, Fu C, et al. The Lhx9 homeobox gene controls pineal gland development and prevents postnatal hydrocephalus. Brain Struct Funct. 2014;220:1497‐1509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Yamazaki F, Kim HH, Lau P, et al. pY RNA1‐s2: a highly retina‐enriched small RNA that selectively binds to Matrin 3 (Matr3). PLoS One. 2014;9:e88217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias‐aware quantification of transcript expression. Nat Methods. 2017;14:417‐419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ganguly S, Gastel JA, Weller JL, et al. Role of a pineal cAMP‐operated arylalkylamine N‐acetyltransferase/14‐3‐3‐binding switch in melatonin synthesis. Proc Natl Acad Sci USA. 2001;98:8083‐8088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ganguly S, Weller JL, Ho A, Chemineau P, Malpaux B, Klein DC. Melatonin synthesis: 14‐3‐3‐dependent activation and inhibition of arylalkylamine N‐acetyltransferase mediated by phosphoserine‐205. Proc Natl Acad Sci USA. 2005;102:1222‐1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Xu S, Witmer PD, Lumayag S, Kovacs B, Valle D. MicroRNA (miRNA) transcriptome of mouse retina and identification of a sensory organ‐specific miRNA cluster. J Biol Chem. 2007;282:25053‐25066. [DOI] [PubMed] [Google Scholar]
- 30. Li H, Gong Y, Qian H, et al. Brain‐derived neurotrophic factor is a novel target gene of the has‐miR‐183/96/182 cluster in retinal pigment epithelial cells following visible light exposure. Mol Med Rep. 2015;12:2793‐2799. [DOI] [PubMed] [Google Scholar]
- 31. Xiang L, Chen XJ, Wu KC, et al. miR‐183/96 plays a pivotal regulatory role in mouse photoreceptor maturation and maintenance. Proc Natl Acad Sci USA. 2017;114:6376‐6381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Klein DC. Arylalkylamine N‐acetyltransferase: "the Timezyme". J Biol Chem. 2007;282:4233‐4237. [DOI] [PubMed] [Google Scholar]
- 33. Coon SL, Del Olmo E, Young WS, Klein DC 3rd. Melatonin synthesis enzymes in Macaca mulatta: focus on arylalkylamine N‐acetyltransferase (EC 2.3.1.87). J Clin Endocrinol Metab. 2002;87:4699‐4706. [DOI] [PubMed] [Google Scholar]