Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Mar 17;16(3):e1007666. doi: 10.1371/journal.pcbi.1007666

Methods detecting rhythmic gene expression are biologically relevant only for strong signal

David Laloum 1,2, Marc Robinson-Rechavi 1,2,*
Editor: Attila Csikász-Nagy3
PMCID: PMC7100990  PMID: 32182235

Abstract

The nycthemeral transcriptome embodies all genes displaying a rhythmic variation of their mRNAs periodically every 24 hours, including but not restricted to circadian genes. In this study, we show that the nycthemeral rhythmicity at the gene expression level is biologically functional and that this functionality is more conserved between orthologous genes than between random genes. We used this conservation of the rhythmic expression to assess the ability of seven methods (ARSER, Lomb Scargle, RAIN, JTK, empirical-JTK, GeneCycle, and meta2d) to detect rhythmic signal in gene expression. We have contrasted them to a naive method, not based on rhythmic parameters. By taking into account the tissue-specificity of rhythmic gene expression and different species comparisons, we show that no method is strongly favored. The results show that these methods designed for rhythm detection, in addition to having quite similar performances, are consistent only among genes with a strong rhythm signal. Rhythmic genes defined with a standard p-value threshold of 0.01 for instance, could include genes whose rhythmicity is biologically irrelevant. Although these results were dependent on the datasets used and the evolutionary distance between the species compared, we call for caution about the results of studies reporting or using large sets of rhythmic genes. Furthermore, given the analysis of the behaviors of the methods on real and randomized data, we recommend using primarily ARS, empJTK, or GeneCycle, which verify expectations of a classical distribution of p-values. Experimental design should also take into account the circumstances under which the methods seem more efficient, such as giving priority to biological replicates over the number of time-points, or to the number of time-points over the quality of the technique (microarray vs RNAseq). GeneCycle, and to a lesser extent empirical-JTK, might be the most robust method when applied to weakly informative datasets. Finally, our analyzes suggest that rhythmic genes are mainly highly expressed genes.

Author summary

To be active, genes have to be transcribed to RNA. For some genes, the transcription rate follows a circadian rhythm with a periodicity of approximately 24 hours; we call these genes “rhythmic”. In this study, we compared methods designed to detect rhythmic genes in gene expression data. The data are measures of the number of RNA molecules for each gene, given at several time-points, usually spaced 2 to 4 hours, over one or several periods of 24 hours. There are many such methods, but it is not known which ones work best to detect genes whose rhythmic expression is biologically functional. We compared these methods using a reference group of evolutionarily conserved rhythmic genes. We compared data from baboon, mouse, rat, zebrafish, fly, and mosquitoes. Surprisingly, no method was particularly effective. Furthermore, we found that only very strong rhythmic signals were relevant with each method. More precisely, when we use a usual cut-off to define rhythmic genes, the group of genes considered as rhythmic contains many genes whose rhythmicity cannot be confirmed to be biologically relevant. We also show that rhythmic genes mainly contain highly expressed genes. Finally, based on our results, we provide recommendations on which methods to use and how, and suggestions for future experimental designs.


This is a PLOS Computational Biology Benchmarking paper.

Introduction

The nycthemeral transcriptome is characterized by the set of genes that display a rhythmic change in their mRNAs levels with a periodicity of 24 hours. These include, but are not limited to, circadian genes whose rhythm is endogenous and entrainable. In baboon, 82% of protein-coding genes have been reported to be rhythmic in at least one tissue [1]. The nycthemeral rhythmicity of these transcripts can be driven by the internal oscillator clock or by other circadian input such as food-intake, the light-dark cycle, sleep-wake behavior, or social activities. Moreover, the nycthemeral transcriptome is tissue-specific [2, 3]. Given the importance of biological rhythms in understanding biology and medicine, many algorithms have been proposed to detect such rhythms. Some were developed specifically for biological data, while others were adapted from other fields where periodicity is important, such as Lomb Scargle (LS). Most methods are based on non-parametric models that search for referenced patterns, classically sinusoid, called time-domain methods, while some are frequency-domain methods based on spectral analysis [4]. Some of them have been designed to detect more diverse waveforms, including asymmetric patterns, such as RAIN [5] or empirical_JTK (empJTK) [6]. For instance, RAIN outperformed the original JTK_CYCLE algorithm for simulated data consisting of sinusoidal and ramp waveforms [6]. Thus, methods differ in the conception of their algorithm and in how they take into account features of the dataset such as curve shapes, period, noise level, presence of missing data, phase shifts, sampling rates [7], asymmetry of the waveform, or the number of cycles (total period length of the experiment). Each method has in principle different strengths and weaknesses for some features of the dataset. In Arabidopsis, HAYSTACK identified 45% more cycling transcripts than COSOPT, mainly due to the inclusion of a ‘spike’ pattern in its model [8]. Deckard et al. [7] studied how four methods (LS, JTK_CYCLE, de Lichtenberg, and persistent homology) performed across a variety of organisms and periodic processes. Based on synthetic data, they investigated the algorithms’ ability to distinguish periodic from non-periodic profiles, to recover period, phase and amplitude, and they evaluated their performance for different signal shapes, noise levels, and sampling rates. They proposed a decision tree to recommend one of these four algorithms based on these features of datasets [7].

The performance of algorithms to identify such periodic signal has been assessed so far based on synthetic (i.e., simulated) data, or on benchmark sets of known cycling genes. Hughes et al. [9] recently published guidelines for the analysis of biological rhythms and proposed a web-based application (CircaInSilico) for generating synthetic genome biology data to benchmark statistical algorithms for studying biological rhythms. While such benchmarks are useful to explore the behavior of methods in a set of cases, the applicability of results to real data is limited. For example, simulations need to impose an a priori fluctuation pattern, typically cosine. The fluctuation of transcript abundance of core clock genes does seem to follow a cosine shape [10], but sometimes follows non-sinusoidal periodic patterns in mouse liver (e.g., Nr1d1 or Arntl) [11] (based on the data from [12]). The fluctuations of the nycthemeral transcriptome are entrained by a complex network involving external cues [1316], as simplified in Fig 1a, which might yield non-sinusoidal periodic patterns among rhythmic genes even if circadian genes were sinusoidal. But the biological relevance of these waveforms is still not clear. This raises two issues: benchmarks based on simulations are biased towards methods that detect the same types of patterns as simulated; and when an algorithm detects more rhythmic genes, it could be more true positives or more false positives. When pattern constraints are released this increases the number of genes detected as rhythmic, but is not necessarily informative on the capacity of the algorithm to detect genes whose rhythmicity is biologically relevant.

Fig 1. The nycthemeral transcriptome is the group of genes whose mRNAs have periodic variations with a 24h period, called rhythmic genes. To detect these rhythmic genes, we applied seven methods to time-series datasets that produced different density distribution of p-values.

Fig 1

a) Simplified diagram of the entrainment of nycthemeral gene expression. Environmental cues include the light-dark cycle, food-intake, sleep-wake behavior, social activities, or any other 24h periodic event. b) Density distribution of p-values obtained before (raw) and after the default correction (software) for the seven methods applied to mouse liver data (microarray) sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes (99 genes from KEGG “circadian entrainment” among which we expect a large proportion of rhythmic mRNA accumulation). The default p-values of ARS, GeneCycle, and LS are uncorrected. Mouse image credit to Anthony Caravaggi (license CC BY-NC-SA 3.0).

Using real data and randomization tests, we compared seven methods: JTK_CYCLE (JTK) [17], LS [18, 19], ARSER (ARS) [4], and meta2d (Fisher integration of LS, ARS, and JTK), are frequently used by many studies and are all included within the MetaCycle R package [20]. We also included empirical_JTK (empJTK) [6] and RAIN [5], which have been recently developed to deal with more non cosine patterns and with asymmetric waveforms. empJTK and RAIN aim to improve the original JTK algorithm which assumed that any underlying rhythms have symmetric waveforms (more precisely, only the waveform coded into the JTK algorithm will be detected, which is the sine curve by default) [5]. Finally, robust.spectrum [21] extents a robust rank-based spectral estimator to the detection of periodic signals. It is integrated in the GeneCycle R package [22] and called GeneCycle in this paper. We excluded de Lichtenberg [23], Persistent Homology [24], COSOPT [25], Fisher’s G test [26], MAPES, Capon, and other algorithms for reasons such as i. difficult accessibility of the software which limit their use by researchers, ii. their higher sensitivity to certain features of the data such as the sampling density, the number of replicates and/or periods, noise level, and waveform, iii. their weaker efficiency on simulated data or known cycling genes, or iv. their previously reported less good detection of non-sinusoidal periodic patterns [4, 6, 7, 2729]. We first analysed the behavior of these seven methods applied to a variety of real datasets in animals, and within each dataset, we compared results between representative gene subsets such as highly and lowly expressed genes, known cycling genes, and randomized data. Contrary to real data, randomized data is not expected to show any signal of rhythmicity, which we used to test proper statistical behavior under the null hypothesis. Secondly, as function tends to be conserved between orthologs [30], true rhythmic genes are expected to be enriched in orthologs that are themselves rhythmic in other species. Indeed, evolutionary conservation provides a valuable filter through which to highlight functional biological networks, notably for clock-controlled functions [31]. The biological relevance of rhythmic genes is expected to be higher for rhythmic orthologs. An unknown proportion of the genes reported as rhythmic but not conserved will be true positives, whose rhythmicity evolved recently in one species or was lost in the other. This would only be a problem if a method would somehow favour these non-conserved ones while reporting true positives; we do not see any reason to expect such a behavior. On the other hand, errors in the prediction of rhythmicity by each method are not expected to be conserved between orthologs. Rather than benchmarking rhythm detection methods based on a profile, we used the biological relevance of genes detected rhythmic. Notably we considered that, among orthologs, those which conserved their rhythmic expression can formed a suitable reference group of rhythmic genes. Thus, the best methods are expected to report rhythmic genes with a high proportion of rhythmic orthologs. We used this approach to compare the algorithms based on their ability to capture biologically relevant evolutionary conservation signal within nycthemeral genes, and compared them to a Naive method.

Results

We used gene expression time-series datasets that come from circadian experiments and kept the data from healthy, wild-type individuals for seven species (S1 Table), allowing comparisons among vertebrates and among insects. We benchmarked methods on animal data since organ homology allowed to compare datasets for which we expect conservation of functional patterns (tissue-specific rhythms). For readability, we present vertebrate results in the main figures and insect results in supplementary results (S6 File). Apart from the rat and Anopheles datasets, data with several biological replicates were obtained already normalized over replicates (one value per time-point).

We define a rhythmic gene as a gene which displays a nycthemeral change in its mRNA abundance, i.e. occurring over 24 hours and repeated every 24 hours. All these rhythmic genes represent the nycthemeral transcriptome. Different organs have been reported to have transcriptomes which are more or less rhythmic [2]. The rhythmic expression of these genes can be entrained directly by the internal clock or indirectly by external inputs, such as the light-dark cycle or food-intake [1316] (Fig 1a). We consider the entirety of these rhythms to be a biologically relevant signal to detect. That is why we preferred data from light-dark and ad-libitum experimental conditions whenever possible (S1 Table), as providing a better representation of wild conditions.

Some methods are distinguished by their higher sensitivity to alternative patterns such as peak, box, or asymmetric profiles. A visual inspection of the KEGG “Circadian entrainment” gene set (see Methods) provides indeed informal confirmation that such patterns can be observed among known cycling genes, such as Npas2, Nr1d1, or Bhlhe41 (S1 File).

Analysis of statistical behaviors of methods applied to real data

p-values distribution analysis

First, a good method should produce a uniform distribution of p-values when there is no structure in the data, in contrast to the distribution obtained from empirical data, which is expected to be skewed towards low p-values because of the presence of rhythmic genes. We investigated the properties of the different methods applied to randomized vs real data. We also investigated to what extent the density distribution of p-values of each method was affected by gene expression levels. Indeed, higher expression provides more power for detecting rhythmic patterns—highly expressed genes have more chance to shape rhythmic patterns because the variations of expression levels are relative to the general expression level—but this should not be the main driver of results. I.e., a method to detect rhythmicity should not be essentially reporting high expression levels. Even if true rhythmic genes were enriched in high expressed genes, we expect a good method to report both high and low p-values, at each expression level.

Fig 1b shows the density distribution of raw p-values obtained for the seven methods applied to mouse liver data (microarray) sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes (8 to 99 genes according to species, see Methods). Results from the other datasets are provided in S2 and S3 Files. Surprisingly, only ARS and GeneCycle displayed close to the expected uniform raw p-value distribution for randomized data (Fig 1b). The adjustment by default of empJTK (minimum of the p-value calculated from an empirical null distribution, and of Bonferroni) recovered the expected uniform distribution, suggesting that this correction allows recovering proper p-values (Fig 1b). We used each software output “p-values” for calls, which we call “default p-value”. In some software, these values result from an internal p-value adjustment, so we also analysed “raw p-values” (uncorrected, see Methods and Table 1 for JTK). For ARS, GeneCycle, and LS, the default p-values are uncorrected. Under the null hypothesis, LS has an abnormal peak near p-value = 1 (Fig 1b), implying an issue with its definition of the null hypothesis, or maybe a one-sided test when a two-sided test would be appropriate. The three other algorithms (RAIN, JTK, and meta2d) seem to have issues with false positives, displaying large proportions of low p-values even for randomized data. This issue was also recently reported by Hutchison and Dinner [32] who in addition showed that a combined method, such as meta2d which integrates results from ARS, JTK, and LS, under-perform the individual methods for low p-values [32].

Table 1. Raw, default, and BH.Q in JTK algorithm.
JTK description R
raw p-value No correction -
default p-value Bonferroni correction of raw p-values p.adjust(raw.pvals, method=“bonf”)
BH.Q (this paper) Benjamini-Hochberg correction of raw p-values p.adjust(raw.pvals, method=“BH”)
BH.Q (software) Benjamini-Hochberg correction of default p-values p.adjust(default.pvals, method=“BH”)

Before analysing the impact of expression levels, we checked that the data follow a typical bimodal density distribution of gene expression (S1 File) and that using the median of time-points for gene expression gives similar results to using the minimum or the mean value (S1 File). Unsurprisingly, higher expression levels imply a higher power to detect rhythmic patterns (S1 File). The p-values distributions imply that most methods detect almost all highly expressed genes, and almost no lowly expressed genes, as “rhythmic” (Fig 1b). The normalization of gene expression values (Z-score transformation) did not change the p-values distributions within highly expressed genes, and particularly did not recover rhythmicity within lowly expressed genes (S1 File). This was not due to sampling biases of microarray data since results are consistent with RNAseq data (S1 File). Thus, the differences obtained between highly and lowly expressed genes either reflect true biology or a lower signal to noise ratio in lowly expressed genes. We think that (i) a method which is able to detect at least some lowly expressed genes as rhythmic is preferable, and (ii) a method should not detect almost all highly expressed genes as rhythmic. Overall, ARS, empJTK, and GeneCycle had the best behavior, producing a uniform distribution under the null hypothesis, and a skew towards low p-values for all empirical data.

Much more rhythmic signal is detected among genes with high amplitude (S1 File). This does not necessarily imply that the rhythmicity of the low amplitude genes isn’t biologically relevant. From data of the same mouse experiment [2], we observed differences of p-value density distributions between microarray and RNAseq, with the skew towards low p-values less marked for RNAseq data (S1 File). This can be due to the more precise temporal resolution of the microarray time-series dataset, or to differences in the detection of gene expression by RNAseq vs microarrays (Fig 2a). When we restricted the microarray time-series to the same time-points as in the RNAseq series, we obtained a p-value distribution very similar to that of the RNAseq data (Fig 2b). The same time-series restriction applied to known cycling genes produced comparable results (S1 File). This supports a major role of the temporal resolution for method results, relative to a minor role for the difference between RNAseq and microarrays. That is why for the next steps, we only considered the microarray dataset for the mouse.

Fig 2. Fewer time points per cycle lead to a weaker detection of rhythmic patterns even if the transcriptome profiling quality is better.

Fig 2

a) Bhlhe41, Npas2, and Per1 expression over time from data of the same mouse experiment [2] using two transcriptome profiling techniques: microarray vs RNAseq. The number of time-points with data is 24 for microarray and 8 for RNAseq. b) The restriction of microarray time-series to the same time-points as in the RNAseq series produces similar p-value distributions to those obtained with RNAseq. This supports a major role of the temporal resolution for method results, relative to a minor role for the difference between RNAseq and microarrays. Mouse image credit to Anthony Caravaggi (license CC BY-NC-SA 3.0).

This observation can be generalized to diverse datasets. We see that each method loses in efficiency when the number of 24h cycles decreases, or when the number of time-points sampled decreases (Fig 3a). We show only results of this comparison for ARS, GeneCycle, and empJTK because they were the only methods with correct behavior in their p-value distributions (Fig 1b). For the same number of time-points, performance seems better with two cycles than only one cycle, as shown comparing zebrafish and baboon data which have both twelve time-points (Fig 3a). But this observation could be confused by the comparison of different species or different samples’ quality. ARS performed better with a smaller total number of time-points but over two cycles than with more total time-points over a single cycle (mouse RNAseq vs baboon in Fig 3a), indicating that ARS is very dependant on the repetitive nature of profiles. The reduction of the number of time-points of the mouse microarray dataset shows similar effects on the rhythm detection by ARS, GeneCycle, and empJTK (Fig 3b). Of note, GeneCycle presented more or less no differences between having a few time-points over two cycles and having more time-points over a single cycle (black arrow Fig 3b).

Fig 3. Datasets with one replicate per time-point over a unique cycle of 24 hours do not provide enough information to detect rhythmicity.

Fig 3

Methods lose in statistical power for detecting rhythmic patterns in gene expression when the number of 24h cycles decreases, or when the number of time-points sampled decreases. a) Default p-value distributions obtained for ARS, GeneCycle, and empJTK applied to different datasets and sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes (8 to 99 genes according to species, see Methods). For each dataset, the number of time-points with data and the temporal resolution is illustrated around a 24h clock. For the same number of time-points, performance seems better with two cycles than only one cycle (zebrafish vs baboon). b) The reduction of the number of time-points of the mouse liver microarray dataset shows increasingly weak rhythm detection by ARS, GeneCycle, and empJTK, shown by a flattening of the p-value distribution on the full dataset (red arrow). GeneCycle showed no difference between a few time-points over two cycles or more time-points over a single cycle (black arrow). c) Scatter-plots of p-values obtained before and after down-sampling (every 2h over 48h vs. every 2h over 24h) for the full dataset. Each point is a gene. R is the Pearson correlation; p-value < 2.2e–16 in all cases. After down-sampling, the rhythmic signal is retrieved for the same genes. Images credit: Anthony Caravaggi (mouse), Ian Quigley (zebrafish), wikipedia GNU GPL Muhammad Mahdi Karim (baboon), and Public Domain for other images (from PhyloPic).

Overlap between methods

Among genes called rhythmic, we analysed the number of those called in common by the different methods. For p-value thresholds of 0.05 or 0.01, we found a large proportion of genes called rhythmic by only one or few methods (Fig 4a and S1 File which shows the Jaccard index heatmap for mouse liver). Nevertheless, the overlap between all methods was the largest category for the mouse liver data (Fig 4a). Using a very low false positive tolerance with FDR thresholds of 0.5%, all methods except LS overlap largely (S1 File). If we ignore p-value thresholds and consider the first 6000 genes detected rhythmic for each method, the overlap becomes stronger (Fig 4d). We obtained similar results from the most informative dataset (S1 File). Indeed, the rat lung dataset has 36 time-points spread over three 24h cycles (Fig 3a). Thus, the same genes seem to be called rhythmic by all methods but the threshold of significance appears inconsistent. Some methods are expected to produce different p-values because their underlying assumptions are different, i.e. other than sinusoidal for RAIN and empJTK. But the bulk of the methods are designed to find sinusoidal patterns and thus should ideally produce similar p-values, or at least similar ordering of results. Thus, our observations suggest an issue with the significance of p-value thresholds for the methods. While in principle effect size is often more relevant than p-value, these methods are all used in practice to produce p-values, define a threshold, and provide a list of “rhythmic genes”, thus consistency of these p-values is important. With a smaller number of top rhythmic genes, the overlap between methods was weaker (Fig 4c and 4d). Thus the methods agree on a large number of rhythmic genes, but not necessarily on the order of significance among them. Finally, for baboon liver data there was less overlap of methods (Fig 4b; S1 File), which might be due to the low information in that data (Fig 3a).

Fig 4. Methods detect the same first top rhythmic genes, but with inconsistencies in the meaning of their p-values.

Fig 4

Upset diagrams show the number of rhythmic genes called in common by the methods. Each intersection is exclusive, i.e. one gene can appear in only one intersection. (a,b) Upset diagram for mouse liver dataset (microarray) (a) and baboon liver dataset (b) for the p-value thresholds of 0.05 (black) or 0.01 (grey) for calling genes rhythmic. The Venn diagram (a) illustrates the upset diagram with, for instance, 2343 genes called rhythmic by all methods. (c,d) Upset diagram for mouse liver dataset (microarray) for the first 1000 (c) or 6000 (d) genes detected rhythmic for each method. With a smaller number of top rhythmic genes, the overlap between methods is weaker. Images credit: Anthony Caravaggi (mouse) and wikipedia GNU GPL Muhammad Mahdi Karim (baboon).

Use of evolutionary conservation as a benchmark

Signal of evolutionary conservation

We expect biologically relevant rhythmic activity of genes to be more conserved between species than putative false positives from detection methods. For each condition (species and tissue), we defined the group of genes whose orthologs are called rhythmic in the homologous tissue of another species (Fig 5a). For example, starting with all mouse genes, we only kept mouse-zebrafish one-to-one orthologs. Considering the liver, these orthologs were separated into two groups: genes for which the ortholog is detected as rhythmic in zebrafish liver, called rhythmic orthologs; and the remaining one-to-one orthologs (Fig 5b). Mouse-zebrafish orthologs, that are detected rhythmic in zebrafish liver, were significantly more enriched in small p-values in mouse liver, for all methods (Kolmogorov-Smirnov test p-values < 0.001 with Kolmogorov’s D statistic around 10-15% of maximum deviation, Fig 5d). Similar results were obtained using different methods and/or a different threshold to call orthologs as rhythmic in zebrafish liver (S1 File). This result obtained for distant species (S1 File) shows that the conservation of rhythmicity at the transcriptomic level is informative. Similar results were obtained in other species comparisons (S1 File), with a stronger signal for evolutionarily close species such as mouse and rat (with Kolmogorov’s D statistic around 10-15% of maximum deviation, S1 File), although we found no consistent correlations of the orthologs p-values between the rat and the mouse (S1 File). Of note, the comparison of species under different conditions (light-dark versus dark-dark) is a limitation in itself since the overlap of the rhythmic transcriptome between these two conditions has been shown to be low [3335] (although this interpretation remains limited by the thresholds used). However, we found a good correlation of p-values obtained between these two conditions in the head of Anopheles gambiae (R=0.605, S1 File) suggesting that this limitation does not hide most of the conserved signal. Thus, for the same homologous organ, rhythmic orthologs have a stronger statistical signal of rhythmicity than non-rhythmic orthologs. We are going to use this evolutionary conservation of the rhythmicity of gene expression in order to compare the performance of methods. We expect that a method which detects more genes with biologically relevant rhythmicity should also detect more conservation of rhythmicity. This is both justified in principle, because evolutionary conservation implies relevance to the functioning of the organism, and in practice, since orthologs of rhythmic genes have smaller p-values (Fig 5d).

Fig 5. Signal of evolutionary conservation of rhythmic gene expression.

Fig 5

Orthologous genes detected as rhythmic in the same organ of two species have a stronger statistical signal of rhythmicity than those not detected as rhythmic in at least one species. a) Mouse and zebrafish share orthologous genes, some of which are rhythmic in the homologous tissues. b) Method used for ortholog benchmarking, as in panel d: From all mouse genes, only mouse-zebrafish one-to-one orthologs are kept. Considering the liver, these orthologs are separated into two groups: genes for which the ortholog is detected as rhythmic in zebrafish liver, called rhythmic orthologs; and the remaining one-to-one orthologs. c) Chart providing the legends to inform about the method and the threshold used to call genes rhythmic for each condition (species and tissue). d) p-values density distribution of rhythmic orthologs vs non-rhythmic orthologs obtained for the seven methods applied to mouse liver data. Mouse-zebrafish orthologs, that are detected rhythmic in zebrafish liver, are significantly more enriched in small p-values in mouse liver, for all methods (Kolmogorov-Smirnov test p-values < 0.001). Images credit: Anthony Caravaggi (mouse), Ian Quigley (zebrafish).

Only strong rhythmic signals of gene expression are relevant

In this last part, we compared the performances of methods to detect the rhythmic orthologs. For a given dataset, the best method is expected to report rhythmic genes with the highest proportion of rhythmic orthologs. It should be noted that this does not imply that we expect all rhythmic behavior to be conserved between orthologs, but rather that true rhythmic genes should have more rhythmic orthologs than false-positive predictions. For a given p-value threshold, each method detects a certain number of rhythmic genes (genes with p-value under the threshold). At each threshold we calculated the proportion of orthologs rhythmic in species2 among one-to-one species1-species2 orthologs, as defined in Fig 6a. This proportion allows to assess how each method is able to detect the conservation of rhythmicity and can be calculated for each p-value threshold. The benchmark set is composed of orthologs detected rhythmic in the second species, called rhythmic orthologs. To define this set, we chose a rhythmicity detection method among ARS, empJTK, and GeneCycle, in agreement with results of previous sections, and a p-value threshold of 0.01.

Fig 6. Only strong rhythmic signals of gene expression are relevant.

Fig 6

Methods designed for rhythm detection in gene expression show an advantage only for the genes with a strong rhythmic signal, i.e. related to very small p-values. For a fixed number of top genes called rhythmic, all the methods, despite their design differences, retrieve approximately the same proportion of biologically functional rhythmic genes and the same genes themselves. a) Method to obtain figure b: For a given p-value threshold, each method detects a certain number of rhythmic genes (genes with p-value ≤ threshold). At each threshold, we calculate the proportion of orthologs rhythmic in species2 (A) among one-to-one species1-species2 orthologs (B). The benchmark set is composed of one-to-one orthologs detected rhythmic in the second species (using method ARS, GeneCycle, or empJTK), called rhythmic orthologs. b) Variation of the proportion rhythmic orthologs/all orthologs in mouse as a function of the number of mouse orthologs detected rhythmic, for each method applied to the mouse lung dataset. The benchmark gene set is composed of mouse-rat orthologs, detected rhythmic in rat lung by the GeneCycle method with default p-value ≤ 0.01. The black line is the Naive method which orders genes according to their median expression levels (median of time-points), from highest expressed to lowest expressed gene, then, for each gene, calculates the proportion of rhythmic orthologs among those with higher expression. The proportion of the benchmark set among one-to-one orthologs is higher for highly expressed genes (4th quartile) than for lowly expressed genes (1st quartile) (∼60% vs ∼20% respectively). Diamonds correspond to a p-value threshold of 0.01. c) Upset diagram showing the number of rhythmic orthologs (figure a) called in common by the methods among the first 1000 mouse-rat orthologs that are called rhythmic in mouse lung. Images credit: Anthony Caravaggi (mouse) and Public Domain for other images (from PhyloPic).

A risk is that orthologs have conservation of gene expression levels and that there is a bias towards calling highly expressed genes “rhythmic”. To control for this in the benchmarking, we added a “Naive” method based only on expression levels. This Naive method simply orders genes (orthologs here) according to their median expression levels (median of time-points), from highest expressed to lowest expressed gene, then, for each gene, we calculated the proportion of rhythmic orthologs among those with higher expression. We also present results for subsets obtained from the division in four quartiles of expression levels. Fig 6b shows the variation of the proportion defined above as a function of the number of orthologs detected rhythmic, obtained for each method applied to the mouse lung dataset. The benchmark gene set was defined by mouse-rat orthologs, detected rhythmic in rat lung by the GeneCycle method (default p-value ≤ 0.01). Genes are given by order of their detection by the methods. The genes with small p-values, i.e. with a strong signal of rhythmicity, had a high proportion of rhythmic orthologs. Importantly, for all methods, this proportion was higher than that obtained from the Naive method (Fig 6b). Results are consistent in almost all species comparisons, with exceptions for cerebral tissues (S4 File). However, the thresholds of 0.01 are to the right of the intersection between the curves of rhythm detection methods and the Naive, except for LS. This means that, for an apparently reasonable threshold (p-value ≤ 0.01), ranking genes by expression level performed “better” than all methods designed specially for rhythm detection. We made the same observation using an FDR-based threshold (FDR≤ 0.01 or FDR≤ 0.1, S1 File). These results imply that even with a stringent p-value or FDR threshold, such as 0.01, the rhythmic nature of some of the genes considered rhythmic is not relevant. These rhythm detection methods were relevant only for genes with very high signal of rhythmicity, where they performed better than a Naive method. Finally, for the top 1000 mouse-rat orthologs detected as rhythmic in mouse lung, all the methods reported a similar proportion of rhythmic orthologs, around 62%, mainly highly expressed genes (fourth quartile of gene expression) (Fig 6b). And the overlap between these orthologs was largely detected by all methods (Fig 6c). Thus, for genes with a high signal of rhythmicity, all methods performed similarly to detect the tissue-specific conservation of gene expression rhythmicity. Similar results were obtained for other species comparisons (S5 File).

Discussion

The methods designed for rhythm detection in gene expression perform similarly and only for strong rhythmic signal. In this study, we show that orthologous genes detected as rhythmic in the same organ of two species have a stronger statistical signal of rhythmicity than those detected as not-rhythmic in at least one species. These results support our hypothesis that the nycthemeral rhythmicity at the gene expression level is biologically functional, and that this functionality is more conserved between orthologous genes than between random genes. We define the nycthemeral transcriptome as all genes displaying a rhythmic expression repeated every 24 hours. In order to assess the performance of seven methods to detect these rhythms, we used this concept of conservation of the rhythmicity between species for benchmarking. We employed genes whose orthologs had a rhythmic expression called in the same homologous organ as a proxy for a true positive set, as done in some previous benchmarks. For instance, Rosikiewicz et al. [36] assessed the quality of microarrays quality control methods based on evolutionary conservation of expression profiles, and Kryuchkova et al. [37] benchmarked tissue-specificity methods in the same way. This approach based on real data, also used by Boyle et al. [3] to solve the issue of weak overlap between the same tissues from the same species from different experiments, avoids relying on simulations which tend to favor methods using the same model, e.g. the same patterns, and has the advantage of not being based on specific assumptions, other than general evolutionary conservation of function. By taking into account the tissue-specificity of rhythmic gene expression and different species comparisons, we show that no method is strongly favoured. For instance, one would have expected that the added features of RAIN and empJTK allowing then to detect more diverse patterns than a classical sinusoidal would have favored them. But this flexibility did not provide them any advantage in the benchmark. Furthermore, the comparison of the methods with a ‘Naive’ one, uninformed about rhythmicity, shows an advantage for informed methods only for the genes with a strong rhythmic signal. Thus, only genes with a strong rhythmic signal, i.e. the top genes called rhythmic, can be considered as relevant. Even if the threshold of “relevance” of these genes is dependant on the evolutionary distance of the species compared, these results suggest a call for caution about the results of previous studies reporting or based on large sets of rhythmic genes. For the same number of genes called rhythmic, all the methods, despite their design differences, retrieved approximately the same proportion of biologically functional rhythmic genes (Fig 6b) and the same genes themselves (Fig 6c).

The issue of significance

For the same p-value threshold, the number of genes called rhythmic is different from one method to another, with a large proportion of these genes detected rhythmic by only one or a few methods. But, if we consider the top genes called rhythmic for each method, without taking into account any p-value threshold, the overlap of rhythmic genes become strong between the methods (Fig 4c and 4d). This highlights an issue with the meaning of the p-value and the associated thresholds used. This is directly related to the issue of correction that needs to be improved in this field. When a smaller number of top rhythmic genes is used, the overlap between methods becomes weaker (Fig 4c and 4d). Thus, the order of calling genes rhythmic is different from one method to another. Finally, since methods performed better than a Naive method only for genes with a strong rhythmic signal, we can not conclude for the relevance of the other genes called rhythmic, even when they have very low nominal p-values.

ARS, empJTK, and GeneCycle produce consistent p-values

ARS, empJTK, and GeneCycle were the methods that showed the best behavior on real and randomized data (single species tests). They were the only methods displaying both a uniform distribution of their p-values under the null hypothesis, and a left-skewed distribution when applied to real data. For empJTK, its default correction allowed to produce these expected results. However, each of these three methods is conceptually completely different, which indicates that there is not one conceptual framework which dominates rhythmic gene detection. ARS combines time-domain and frequency-domain analyses. GeneCycle, which is the robust spectrum function of the R package, is based on a robust spectral estimator which is incorporated into the hypothesis testing framework using a so-called g-statistic together with correction for multiple testing. And, empJTK improves the original JTK including additional reference waveforms in its rhythm detection. The other methods all presented major issues. LS has a right-skewed distribution of its initial uncorrected p-values suggesting an invalid null hypothesis. JTK, RAIN, and meta2d had also issues with their null hypothesis displaying left-skewed distributions of their uncorrected p-values. Their default adjustment was excessive, favoring high p-values obtained after correction. Hutchison and Dinner [32] observed this on simulated data, and proposed that it was due to non independence of measurements from the same time series.

Biological insight into gene rhythmicity in animal tissues

Our results support the hypothesis that rhythmic genes are largely enriched in highly expressed genes (Table 2). Experimental noise that would mask the rhythmic signal of lowly expressed genes could also explain this result in part, especially considering that the datasets with good sampling used microarray technology. BooteJTK compares the noise to the amplitude of a time series, in addition to evaluating the rank order of the values, and thus might provide a more relevant rhythm detection by improving the variance estimation from biological replicates [38]. The observation of known cycling genes in different organs seems to indicate different profiles of rhythmicity possible for the same gene. For instance, Npas2 displays a cosine shape in kidney and lung, and a peak/box shape in liver and muscle (Fig 2a). This observation suggests that methods might perform differently depending on the organ studied. This is also one of the reasons why all our analyses were made for homologous organs.

Table 2. t-test comparing the expression levels between rhythmic (p-value ≤ 0.005) and non-rhythmic genes (randomly chosen same number of genes among those with p-value > 0.01), in mouse liver dataset (microarray).

Method group n Mean
ARS rhythmic 4019 1151.2
non-rhythmic 4019 383.6
t-test t = 25.4 p <2.2e–16 df = 7021.4
meta2d rhythmic 4520 1113.2
non-rhythmic 4520 398.5
t-test t = 24.8 p <2.2e–16 df = 8050.1
empJTK rhythmic 3373 1113.9
non-rhythmic 3373 442.5
t-test t = 20.1 p <2.2e–16 df = 6260.8
RAIN rhythmic 5044 1066.3
non-rhythmic 5044 384.2
t-test t = 25.4 p <2.2e–16 df = 8935.3
JTK rhythmic 2646 1214.4
non-rhythmic 2646 454.3
t-test t = 19.6 p <2.2e–16 df = 4742.9
LS rhythmic 736 1500.3
non-rhythmic 736 526.5
t-test t = 12.6 p <2.2e–16 df = 1351.2
GeneCycle rhythmic 4145 1082.8
non-rhythmic 4145 425.6
t-test t = 22.0 p <2.2e–16 df = 7622.8

In mouse-baboon comparisons, there were no significant differences of p-value density between rhythmic and non-rhythmic orthologs in cerebral tissues: brain stem, cerebellum, and supra-chiasmatic nucleus, except for the hypothalamus (S4 File). This could be explained by the fact that there are only low amplitudes of expression of clock genes and few rhythmic genes in almost all brain regions. This is assumed to be due to an inefficient synchronization of individual cellular oscillators in brain cells to avoid noise into the synchronizator element [39]. In addition, it could also be an essential aspect for intrinsic brain processes which could require a constant expression of most genes.

The importance of having an informative dataset

Because of the cost and complexity of circadian experiments, time-series datasets of gene expression in animals are rare, especially in the same experimental conditions. Algorithms must be able to deal with little data, but importantly experiments should take into account the algorithms’ sensitivity. All algorithms appeared to produce relatively poor p-values distributions when applied to the available Drosophila or baboon datasets, and, for the baboon dataset, were almost always less efficient than the Naive method (S1 File). This baboon dataset is probably not very informative, which raises questions about the biological conclusions from the associated study [1]. With only one replicate per time-point, over only one cycle of 24 hours, the algorithms are unable to detect repetitive patterns. Variations over a single 24 hours cycle appear to be insufficient to detect rhythmic signal, when there is no evidence of repetition over several cycles. Moreover, each data comes from different outbred individuals. The variations of gene expression between two time-points can be due to individual variations or real oscillation within the population. It is possible that sinusoidal patterns with a continuous trend over successive time-points could be detected without replicates, although power will be lacking, but patterns such as the peak pattern will be extremely sensitive to inter-individual variation. Fig 3 generally suggests that datasets with one replicate per time-point over a unique cycle of 24 hours do not provide enough information that would allow to correctly detect the rhythmicity. It seems that ARS in peculiar is very sensitive to the repetitive nature of profiles. Of note, for time-series with low sampling frequency, a recent improvement of empJTK, called BooteJTK, allows to detect rhythms robustly relative to sampling frequency [38]. Thus, if only one 24h cycle is feasible, several biological replicates must be favored. Our results support the conclusions of Hutchison et al. [6] who indicate that for a fixed number of samples, better sensitivity and specificity are achieved with higher numbers of replicates than with higher sampling density. We propose that future experiments should produce data with two biological replicates per time-points as a strict minimum. Obviously, we suggest considering biological replicates as new cycles within one replicate, as proposed in recent guidelines [9]. GeneCycle, and to a lesser extent empJTK, were the most robust methods when applied to weakly informative datasets. Thus, the performance of the algorithms is dependent on techniques and experimental designs used for the experiments. The optimization of experimental plans (see section Recommendations) could improve the methods’ performance for the detection of rhythmically expressed genes. Moreover, we recommend producing data over at least two cycles to be sure of the repetitive nature of profiles, and to avoid a potential random influence of the shared environment, which might be considered rhythmic since it affects all replicates. Finally, contrary to the mouse experiment, the rat experiment has been done under zeitgeber conditions which have most likely resulted in more genes being expressed rhythmically, so in proportion, more periodic patterns. This might explain the higher density of small p-values obtained for the rat dataset (Fig 3a). Comparison between these two datasets is not expected to have removed the signal, since we found a good correlation of p-values obtained between two conditions, light-dark versus dark-dark, in data produced from the same experiment (S1 File).

Limitations and improvement of methods

ARS and GeneCycle need complete chronological data and cannot deal with biological replicates. Except for LS, RAIN, and empJTK, all other methods studied here assume equally spaced time-points. Furthermore, ARS needs an integer sampling interval with regular time-series datasets and cannot deal with missing values, or with several replicates per time-point. In this study, ARS appeared to be efficient only for the dataset with at least two cycles of data. Indeed it produced aberrant p-value distributions when applied to datasets restricted to one cycle of 24 hours. But, for these datasets, all algorithms behaved poorly. The improvement of JTK by empJTK produced much better results than the original JTK algorithm. It is possible that the improvement of RAIN suggested by Hutchison and Dinner [32], which allows to produce uniform p-values distribution under the null, might similarly improve the results of RAIN. We believe that LS could be a very interesting method if its null hypothesis could be clarified and would thus provide p-values with proper behavior. LS has advantages that other algorithms don’t. For instance, it can deal with irregular intervals, missing data, and has been shown to stay efficient on small sample size [27], which constitutes one of the big issues of circadian transcriptomic data. On the other hand, relative to JTK, ARS, or MICOP methods, LS has also been shown to be highly sensitive to the increase of sampling intervals and to noise for proteomic data [40].

A good method must, at least, display a uniform distribution under the null hypothesis, and a classic skewed distribution when applied to full dataset or even more to known cycling genes. It should also be able to detect efficiently rhythmic orthologs, which represent an important part of the functionally relevant nycthemeral rhythmicity. In this study, we did not assess the amplitudes, phases, and precise period provided by the algorithms. We only analysed the performance of methods for nycthemeral or circadian rhythms in gene expression data, and cannot conclude directly for ultradian or seasonal rhythms, and for other types of datasets which are not gene expression data.

Recommendations

Experimental design

  1. Always use at least 2 biological replicates per time-point.

  2. One full period sampled is the minimum required. Two periods are to be preferred.

  3. Favor time-points number (small temporal resolution) over transcriptome profiling quality (e.g., microarray vs RNAseq).

  4. Favor regular sampling because only few algorithms can deal with irregular interval time-series.

  5. For a fixed number of samples, favor higher numbers of replicates over higher sampling density (see also [6]).

Recommendations about the choice of rhythm detection method, the arrangement of the time-series dataset, and the interpretation of results based on these seven methods studied

  1. Only genes with a strong rhythmic signal should be considered as relevant. By “strong” we mean the top genes called rhythmic, knowing that the threshold of p-value ≤ 0.01 is already not stringent enough for some methods.

  2. Take into account that detected rhythmic genes are strongly enriched in highly expressed genes.

  3. LS could be a good candidate to improve.

  4. Favor ARS, GeneCycle, or empJTK with default parameters.

  5. Consider biological replicates as new cycles with one replicate.

  6. Check by eye for rhythms of known circadian genes.

  7. Never duplicate and concatenate data before running algorithms [9].

  8. Never consider technical replicates as biological replicates [9].

Methods

Pre-processing

For each time-series dataset, only protein coding genes were kept. For microarrays, we removed probIDs which were assigned to several GeneIDs. ProbIDs or genes which contained one or several missing values have been removed, allowing comparison between all methods even those which can not deal with missing values. Genes with no expression (= 0) at all time-points were also removed. For each species dataset, we only kept comparable conditions to other species of reference. Tissues separated in sub-tissues such as adrenal gland in adrenal cortex and adrenal medulla in baboon experiment were removed.

For each condition (species and tissue), several datasets have been built: i. the full original dataset; ii. the first and fourth quartiles of the median gene expression level of the original data; iii. randomized data (time-points redistributed randomly); iv. randomized data restricted to the first and fourth quartiles of the median gene expression level; and v. a subset of known cycling genes when such data was available (8 to 99 genes according to species).

Normalization by Z-score

The normalization of gene expression values by Z-score transforms the pre-processed data such that for gene i with the original expression value at time-point j is gene.ij, we have:

gene.ij.normalized=gene.ij-xi

with xi=miZij. mi is the mean expression of gene i: mi=gene.ijj; and Zi=mi-msd; m and sd being the mean and the standard deviation of the original full dataset.

Orthology relationships

For each species comparison, orthologs relationships have been downloaded from OMA [41]. For simplicity, we only considered one-to-one orthologs. In species comparisons, we only kept orthologous genes that had available data in both species.

Algorithms and packages

MetaCycle R package was performed with parameters: minper = 20h and maxper = 28h. This package incorporates the 3 algorithms to detect rhythmic signals from time-series datasets: ARSER (ARS), JTK_CYCLE (JTK), and Lomb-Scargle (LS). It also provides meta2d that integrates analysis results from multiple methods based on an implementation strategy (see “Introduction to implementation steps of MetaCycle” in MetaCycle documentation for more details). ARS does not deal with several replicates per time-point. To not introduce biases, we only kept one replicate for ARS performing when the dataset was provided with several replicates per time-point.

Rain R package was performed with parameters: period = 24h, period.delta = 4h (width of period interval), and method = ‘independent’. In order to obtain unadjusted pvalues as output, we modified the source code of the rain and MetaCycle R packages.

Empirical-JTK (empJTK) was executed by running bash commands with parameters: cosine waveform, 24 hours’ period, look for phases every 2 hour from 0 to 22 hours and look for asymmetries every 2 hour from 2 to 22 hours (GitHub alanlhutchison/empirical-JTK_CYCLE-with-asymmetry). It is important to run empJTK with python version 2.7.11. Raw p-value correspond to P output (P-value corresponding to Tau, uncorrected for multiple hypothesis testing), and default p-value correspond to empP output (min(p-value calculated from empirical null distribution, Bonferroni)).

GeneCycle R package [22] was downloaded from CRAN. We used the robust.spectrum function developped by [21] that computes a robust rank-based estimate of the periodogram/correlogram.

Plots have been created using ggplot2 R package (version 3.1.0); Upset diagrams using UpSetR R package (version 1.3.3) [42]; and Venn diagram using venn R package (version 1.7).

Statistical analysis of rhythmic gene expression

All the rhythm detection methods (See Materials) were applied to each pre-processed dataset, producing a list of p-values as output. Then, for each gene having several results (ProbIDs or transcripts), we combined p-values by Brown’s method using the EmpiricalBrownsMethod R package. Thus, for each dataset, we obtained a unique p-value per gene. Whenever the per-gene normalization was not necessary (unique data for all genes), we obtained the original p-value for each gene. FDR is the false discovery rate adjustment of default p-values using p.adjust R function.

Naive method

The Naive method is only based on expression levels of genes and is not informed about rhythm detection. It simply orders genes according to their median expression levels (median of time-points), from highest expressed to lowest expressed gene. Then, for each gene i, we calculate the proportion of rhythmic orthologs among those with higher expression, i.e. among the genes from the highest expressed one to the gene i.

Availability of data and scripts

The data and scripts for reproducing plots and analysis are available at https://github.com/laloumdav/rhythm_detection_benchmark.

Materials

Ethics statement

We had ethical issues to use olive baboon data since these data needed the sacrifice of twelve baboons. We would like to remind that such data would have been impossible to get in Switzerland where the primate research is prohibited. We still support Switzerland ethical considerations in matter of animal research and think that the scientific knowledge can not justify an irresponsible employment of life on earth. While being aware that our results would have been less robust without these data and that these considerations on primate could also be generalized to other living organisms.

Datasets

Mus musculus (13 tissues)

Raw microarray and RNA-seq data, from [2], was downloaded from GEO accession (GSE54652). Microarray gc-rma normalized data was sent by Katharina Hayer from CircaDB database [43]. Expression values were already normalized between biological replicates to average out both biological variance between individual animals and technical variance between individual dissections. RNA-seq data was already normalized using DESeq2. Data was obtained for adrenal gland, aorta, brain stem, brown adipose, cerebellum, heart, hypothalamus, kidney, liver, lung, muscle, SCN (only microarray), and white adipose. Probesets on the Affymetrix MoGene-1.0-ST-V1 array were cross-referenced to best-matching gene symbols by using Ensembl BioMart software.

Papio anubis [olive baboon] (11 tissues used)

RNA-seq data from [1] was downloaded already normalized by using DESeq2. Read counts per gene were calculated using FeatureCounts. We kept data for aorta, brain stem, cerebellum, heart, hypothalamus, kidney, liver, lung, muscle, SCN, and white adipose tissues. Data were already provided with Ensembl gene symbols.

Rattus norvegicus (lung)

Raw microarray data from [44] was downloaded from GEO accession (GSE25612). Over 3 days, 54 samples were extracted in light-dark condition with a temporal resolution closer for some time-points (See paper for more details). Contrary to the study, we still considered the 3 successive days samples as successive days measurements. ARS, JTK and RAIN methods don’t operate with irregular time-series. We normalized time-series by calculating the mean value of irregular time-points to obtain regular time-series. rma normalization was performed using the rma R-package. Probesets on the Affymetrix 230-2-probe array were cross-referenced to best-matching gene symbols by using Ensembl BioMart software.

Dano rerio (liver)

Raw microarray data from [3] was downloaded from GEO accession (GSE87659). Data was already rma-normalized, averaged gene-level signal intensity, and already cross-referenced to best-matching transcript symbols.

Anopheles gambiae (head and body)

Raw microarray data from [33] was downloaded from GEO accession (GSE22585). Non-blood fed female mosquito heads and bodies were collected under light dark and constant dark conditions. We only used data collected in LD condition, except for the comparison of both conditions (LD versus DD). We normalized data using the rma R package and cross-referenced to best-matching gene symbols by using VectorBase software.

Aedes aegypti (head)

Raw microarray data from Ľeming was downloaded from GEO accession (GSE60496). Non-blood fed female mosquito heads were collected under light dark and constant dark conditions. We only used data collected in LD condition. NimbleGen Aedes aegypti 12plex array already rma normalized were provided with VectorBase geneIDs.

Drosophila melanogaster (head and body)

RNA-seq data from [45] was downloaded from GEO accession (GSE64108). They measured RNA concentrations in the head and body of 3-, 5-, and 7-week-old adult flies in ad libidum feeding or 12-hour time-restricted feeding conditions. We only used data from ad libidum feeding condition of 5-week-old adult flies with best temporal resolution.

Cross-referenced gene IDs and known cycling genes

GeneID, protein coding status, ProbSetID, transcriptsID were downloaded from Ensembl [46] or VectorBase [47] using BioMart.

Known cycling genes were obtained from the KEGG [48] or FlyBase [49] database:

  • KEGG circadian entrainment entry pathway for the mouse (mmu04713) and the rat (rno04713). This is the pathway by which light activates SCN neurons and the resulting signaling cascade that leads to a phase resetting of the circadian rhythm generated in these neurons. Most of these genes are not involved in generating the rhythm itself and as such cannot be called ‘clock genes’.

  • KEGG circadian rhythm entry pathway for the baboon (human hsa04710), and Anopheles (aga04711)

  • FlyBase circadian rhythm entry pathway for Drosophila (GO:0007623).

Supporting information

S1 Table. Gene expression time-series datasets.

Gene expression time-series datasets that come from circadian experiments. We kept data from healthy, wild-type individuals for these seven species, allowing comparisons among vertebrates and among insects. We preferred data from light-dark (LD) and ad-libitum experimental conditions whenever possible as providing a better representation of wild conditions. LD for regular alternation of light and darkness each 24h; and DD for continuous darkness usually after an entrainment to a 12h:12h light:dark.

(XLSX)

S1 File. Supplementary results.

(PDF)

S2 File. Density distribution of raw and default p-values obtained for the seven rhythm detection methods applied to vertebrate datasets.

Density distribution of p-values obtained before (raw) and after the default correction (software) for the seven methods applied to each vertebrate dataset, sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes when such data was available (8 to 99 genes according to species). The default p-values of ARS, GeneCycle, and LS are uncorrected.

(PDF)

S3 File. Following S2 File.

(PDF)

S4 File. Signal of evolutionary conservation of rhythmic gene expression in vertebrates.

p-values density distribution of rhythmic orthologs vs non-rhythmic orthologs obtained for the seven methods applied to different vertebrate datasets. Orthologous genes detected as rhythmic in the same organ of two species have a stronger statistical signal of rhythmicity than those detected as not-rhythmic in at least one species. From all species_1 genes, only species_1-species_2 one-to-one orthologs are kept. Considering homologous tissues, these orthologs are separated into two groups: genes for which the ortholog is detected as rhythmic in this tissue of species_2, called rhythmic orthologs; and the remaining one-to-one orthologs.

(PDF)

S5 File. Variation of the proportion A/B as a function of the number of orthologs detected rhythmic, obtained for each method applied to different vertebrate datasets.

The benchmark gene set is composed of species_1-species_2 orthologs, detected rhythmic in the homologous tissue of species_2 by the ARS, GeneCycle, or empJTK method with default p-value ≤ 0.01 or 0.05. See Fig 6 for definitions of sets A and B. The black line is the Naive method which orders genes according to their median expression levels (median of time-points), from highest expressed to lowest expressed gene, then, for each gene, calculates the proportion of rhythmic orthologs among those with higher expression.

(PDF)

S6 File. Results in insects.

(PDF)

Acknowledgments

We thank Paul Franken for useful discussions, as well as all members of the Robinson-Rechavi lab.

Data Availability

Data are publicly available from the NCBI GEO database, as specified in the Materials/Datasets section.

Funding Statement

Funding was received from Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (173048, to MR-R). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Mure LS, Le HD, Benegiamo G, Chang MW, Rios L, Jillani N, et al. Diurnal transcriptome atlas of a primate across major neural and peripheral tissues. Science. 2018;359 (6381). 10.1126/science.aao0318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Zhang R, Lahens NF, Ballance HI, Hughes ME, Hogenesch JB. A circadian gene expression atlas in mammals: Implications for biology and medicine. Proceedings of the National Academy of Sciences. 2014;111(45):16219–16224. 10.1073/pnas.1408886111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Boyle G, Richter K, Priest HD, Traver D, Mockler TC, Chang JT, et al. Comparative Analysis of Vertebrate Diurnal/Circadian Transcriptomes. PLOS ONE. 2017;12(1):1–18. 10.1371/journal.pone.0169923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Yang R, Su Z. Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics. 2010;26(12):i168–i174. 10.1093/bioinformatics/btq189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Thaben PF, Westermark PO. Detecting Rhythms in Time Series with RAIN. Journal of Biological Rhythms. 2014;29(6):391–400. 10.1177/0748730414553029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Hutchison AL, Maienschein-Cline M, Chiang AH, Tabei SMA, Gudjonson H, Bahroos N, et al. Improved Statistical Methods Enable Greater Sensitivity in Rhythm Detection for Genome-Wide Data. PLOS Computational Biology. 2015;11(3):1–29. 10.1371/journal.pcbi.1004094 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Deckard A, Anafi RC, Hogenesch J, Haase S, Harer J. Design and Analysis of Large-Scale Biological Rhythm Studies: A Comparison of Algorithms for Detecting Periodic Signals in Biological Data. Bioinformatics (Oxford, England). 2013;29 10.1093/bioinformatics/btt541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Michael TP, Mockler TC, Breton G, McEntee C, Byer A, Trout JD, et al. Network Discovery Pipeline Elucidates Conserved Time-of-Day–Specific cis-Regulatory Modules. PLOS Genetics. 2008;4(2):1–17. 10.1371/journal.pgen.0040014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Hughes ME, Abruzzi KC, Allada R, Anafi R, Arpat AB, Asher G, et al. Guidelines for Genome-Scale Analysis of Biological Rhythms. Journal of Biological Rhythms. 2017;32(5):380–393. 10.1177/0748730417728663 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Korenčič A, Bordyugov G, Košir R, Rozman D, Goličnik M, Herzel H. The Interplay of cis-Regulatory Elements Rules Circadian Rhythms in Mouse Liver. PLOS ONE. 2012;7(11):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Chudova D, Ihler A, Lin K, Andersen B, Smyth P. Bayesian detection of non-sinusoidal periodic patterns in circadian expression data. Bioinformatics (Oxford, England). 2009;25:3114–20. 10.1093/bioinformatics/btp547 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Miller BH, McDearmon EL, Panda S, Hayes KR, Zhang J, Andrews JL, et al. Circadian and CLOCK-controlled regulation of the mouse transcriptome and cell proliferation. Proceedings of the National Academy of Sciences. 2007;104(9):3342–3347. 10.1073/pnas.0611724104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Yoo SH, Yamazaki S, Lowrey PL, Shimomura K, Ko CH, Buhr ED, et al. PERIOD2::LUCIFERASE real-time reporting of circadian dynamics reveals persistent circadian oscillations in mouse peripheral tissues. Proceedings of the National Academy of Sciences. 2004;101(15):5339–5346. 10.1073/pnas.0308709101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Boothroyd CE, Wijnen H, Naef F, Saez L, Young MW. Integration of Light and Temperature in the Regulation of Circadian Gene Expression in Drosophila. PLOS Genetics. 2007;3:1–16. 10.1371/journal.pgen.0030054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Nagoshi E, Saini C, Bauer C, Laroche T, Naef F, Schibler U. Circadian Gene Expression in Individual Fibroblasts: Cell-Autonomous and Self-Sustained Oscillators Pass Time to Daughter Cells. Cell. 2004;119(5):693–705. 10.1016/j.cell.2004.11.015 [DOI] [PubMed] [Google Scholar]
  • 16. Gerber A, Saini C, Curie T, Emmenegger Y, Rando G, Gosselin P, et al. The systemic control of circadian gene expression. Diabetes, Obesity and Metabolism. 2015;17(S1):23–32. 10.1111/dom.12512 [DOI] [PubMed] [Google Scholar]
  • 17. Hughes ME, Hogenesch JB, Kornacker K. JTK_CYCLE: An Efficient Nonparametric Algorithm for Detecting Rhythmic Components in Genome-Scale Data Sets. Journal of Biological Rhythms. 2010;25(5):372–380. 10.1177/0748730410379711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Lomb NR. Least-squares frequency analysis of unequally spaced data. Astrophysics and Space Science. 1976;39(2):447–462. 10.1007/BF00648343 [DOI] [Google Scholar]
  • 19. Scargle J. Studies in astronomical time series analysis. II—Statistical aspects of spectral analysis of unevenly spaced data. The Astrophysical Journal. 1983;263. [Google Scholar]
  • 20. Wu G, Hogenesch JB, Anafi RC, Hughes ME, Kornacker K. MetaCycle: an integrated R package to evaluate periodicity in large scale data. Bioinformatics. 2016;32(21):3351–3353. 10.1093/bioinformatics/btw405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Ahdesmäki M, Lähdesmäki H, Pearson R, Huttunen H, Yli-Harja O. Robust detection of periodic time series measured from biological systems. BMC Bioinformatics. 2005;6(1):117 10.1186/1471-2105-6-117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ahdesmaki M, Fokianos K, Strimmer K. GeneCycle: Identification of Periodically Expressed Genes; 2012. Available from: https://CRAN.R-project.org/package=GeneCycle.
  • 23. de Lichtenberg U, Wernersson R, Jensen TS, Nielsen HB, Fausbøll A, Schmidt P, et al. New weakly expressed cell cycle-regulated genes in yeast. Yeast. 2005;22(15):1191–1201. 10.1002/yea.1302 [DOI] [PubMed] [Google Scholar]
  • 24. Cohen-Steiner D, Edelsbrunner H, Harer J, Mileyko Y. Lipschitz Functions Have Lp-Stable Persistence. Foundations of Computational Mathematics. 2010;10(2):127–139. 10.1007/s10208-010-9060-6 [DOI] [Google Scholar]
  • 25.Straume M. DNA Microarray Time Series Analysis: Automated Statistical Assessment of Circadian Rhythms in Gene Expression Patterning. In: Numerical Computer Methods, Part D. vol. 383 of Methods in Enzymology. Academic Press; 2004. p. 149—166. Available from: http://www.sciencedirect.com/science/article/pii/S0076687904830076. [DOI] [PubMed]
  • 26. Fokianos K, Strimmer K, Wichert S. Identifying periodically expressed transcripts in microarray time series data. Bioinformatics. 2004;20(1):5–20. 10.1093/bioinformatics/btg364 [DOI] [PubMed] [Google Scholar]
  • 27. Zhao W, Agyepong K, Serpedin E, Dougherty ER. Detecting Periodic Genes from Irregularly Sampled Gene Expressions: A Comparison Study. EURASIP Journal on Bioinformatics and Systems Biology. 2008;2008(1):769293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Dequéant ML, Ahnert S, Edelsbrunner H, Fink TMA, Glynn EF, Hattem G, et al. Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock. PLOS ONE. 2008;3(8):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Wu G, Zhu J, Yu J, Zhou L, Huang JZ, Zhang Z. Evaluation of Five Methods for Genome-Wide Circadian Gene Identification. Journal of Biological Rhythms. 2014;29(4):231–242. 10.1177/0748730414537788 [DOI] [PubMed] [Google Scholar]
  • 30. Gabaldón T, Koonin EV. Functional and evolutionary implications of gene orthology. Nature Reviews Genetics. 2013;14:360 EP –. 10.1038/nrg3456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Gerhart-Hines Z, Lazar MA. Circadian Metabolism in the Light of Evolution. Endocrine Reviews. 2015;36(3):289–304. 10.1210/er.2015-1007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hutchison AL, Dinner AR. Correcting for Dependent P-values in Rhythm Detection. bioRxiv. 2017;.
  • 33. Rund SSC, Hou TY, Ward SM, Collins FH, Duffield GE. Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proceedings of the National Academy of Sciences. 2011;108(32):E421–E430. 10.1073/pnas.1100584108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Leming MT, Rund SS, Behura SK, Duffield GE, O’Tousa JE. A database of circadian and diel rhythmic gene expression in the yellow fever mosquito Aedes aegypti. BMC Genomics. 2014;15(1):1128 10.1186/1471-2164-15-1128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Wijnen H, Naef F, Boothroyd C, Claridge-Chang A, Young MW. Control of Daily Transcript Oscillations in Drosophila by Light and the Circadian Clock. PLOS Genetics. 2006;2:1–18. 10.1371/journal.pgen.0020039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Rosikiewicz M, Robinson-Rechavi M. IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics. Bioinformatics. 2014;30(10):1392–1399. 10.1093/bioinformatics/btu027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Kryuchkova-Mostacci N, Robinson-Rechavi M. A benchmark of gene expression tissue-specificity metrics. Briefings in Bioinformatics. 2016;18(2):205–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Hutchison AL, Allada R, Dinner AR. Bootstrapping and Empirical Bayes Methods Improve Rhythm Detection in Sparsely Sampled Data. Journal of Biological Rhythms. 2018;33(4):339–349. 10.1177/0748730418789536 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Schibler U. The daily rhythms of genes, cells and organs. EMBO reports. 2005;6(S1):S9–S13. 10.1038/sj.embor.7400424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Iuchi H, Sugimoto M, Tomita M. MICOP: Maximal information coefficient-based oscillation prediction to detect biological rhythms in proteomics data. BMC Bioinformatics. 2018;19(1):249 10.1186/s12859-018-2257-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Altenhoff AM, Gonnet GH, Train CM, Dylus D, Glover NM, de Farias TM, et al. The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Research. 2017;46(D1):D477–D485. 10.1093/nar/gkx1019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: Visualization of Intersecting Sets. IEEE Transactions on Visualization and Computer Graphics. 2014;20(12):1983–1992. 10.1109/TVCG.2014.2346248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Pizarro A, Hayer K, Lahens NF, Hogenesch J. CircaDB: A database of mammalian circadian gene expression profiles. Nucleic acids research. 2012;41 10.1093/nar/gks1161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Sukumaran S, Jusko WJ, DuBois DC, Almon RR. Light-dark oscillations in the lung transcriptome: implications for lung homeostasis, repair, metabolism, disease, and drug action. Journal of Applied Physiology. 2011;110(6):1732–1747. 10.1152/japplphysiol.00079.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Christopher B, Gill S, Melkani G, Panda S. type; 2015. Available from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64108. GSE64108.
  • 46. Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. Ensembl 2018. Nucleic Acids Research. 2017;46(D1):D754–D761. 10.1093/nar/gkx1098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Giraldo-Calderón GI, Emrich SJ, MacCallum RM, Maslen G, Dialynas E, Topalis P, et al. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Research. 2014;43(D1):D707–D713. 10.1093/nar/gku1117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 1999;27(1):29–34. 10.1093/nar/27.1.29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. the FlyBase Consortium, Thurmond J, Goodman JL, Kaufman TC, Strelets VB, Calvi BR, et al. FlyBase 2.0: the next generation. Nucleic Acids Research. 2018;47(D1):D759–D765. 10.1093/nar/gky1003 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007666.r001

Decision Letter 0

Bjoern Peters, Attila Csikász-Nagy

7 Oct 2019

Dear Dr Robinson-Rechavi,

Thank you very much for submitting your manuscript 'Methods detecting rhythmic gene expression are only reliable for strong signal' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Attila Csikász-Nagy

Associate Editor

PLOS Computational Biology

Bjoern Peters

Benchmarking Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Comments for PCOMPBIOL-D-19-01419

————————

This manuscript works to compare different rhythm detection methods using identification of orthologous rhythmic genes between species as a measurement of method accuracy. It identifies methods that perform better at this, and finds an association with expression level. It also re-enforces the previous findings that several of the methods do not have well-behaving p-values.

This thorough and extensive manuscript will be a beneficial addition to the circadian rhythm detection literature but would be significantly improved by addressing the major and minor issues which are within the scope of the paper, detailed below:

The manuscript can be primarily improved along these points

1) Clarification of the definition of strong signal

If I am not mistaken, the way you are defining “strong rhythmic signal” is just as genes with high expression. Why are you doing it this way and not using amplitudes? I can imagine a time series with high expression but small amplitude (1000 -> 1002 -> 1000 -> 998 -> 1000), and there are several papers with suggestions on how to define amplitude (Max - min, SD, Sqrt(2)*(Max-min) etc. I think this ends up making your argument a little misleading. If you mean “high expression genes” then say that instead of “strong rhythmic signal”.

Building on this, it is not clear to me why we should be concerned that rhythm detection methods are preferential to high-amplitude time series (that is, time series with a high signal(amplitude)-to-noise ratio. If anything, this seems like it would be preferable, and in fact methods like Bootstrap eJTK from Hutchison et al 2018 try to explicitly incorporate this.

2) Clarification of the definition of reliable

Though it is in your title, you do not actually define what you mean by “reliable”. Past definitions have used “able to distinguish signal from noise” as a proxy, or “able to identify the same gene when downsampled from the same biological experiment (Hutchison et al 2018) but here it seems you use “find genes rhythmic in one organism to have rhythmic orthologues in a different organism”. It is not clear that this gives more information about the differences or limitations of the methods or biological insight into differences between the organisms, tissues, or experimental conditions being compared. I think you need to make a stronger argument why we should expect the particular genes that you are comparing across organisms should be rhythmic, and the failure to identify them as such is a limitation of the methods as opposed to a interesting biological finding. As mentioned above, comparing across several organisms at once might be a way to strengthen this argument, or looking at “known rhythmic genes” such as the KEGG annotated ones (though see below for concerns about using these genes with the expectation that their expression profiles will be rhythmic).

3) Comparison across species only and not experimental design

While I think your use of orthologous genes in homologous tissues is an useful one, I think it is less useful when you confound the results comparing experiments done in LD vs. DD. While there should be an overlap for LD vs DD, the literature has already shown sufficient difference that making the comparison you have chosen with mouse DD vs rat LD weakens your results and your argument. I think your use of orthologous genes would be stronger if you used homologous organs in two different species where both organisms are tested under LD or DD conditions.

If you are going to compare across experimental designs and species simultaneously, then it might be interesting to look across more than two species at a time, as using that overlap among several species should mitigate some of the confounding from looking across experimental designs and be a stronger criterion on which to compare different methods.

4) Use and interpretation of p-values

I think something else to note here is that the expected proportion of overlap for these methods run on the same data is already between 60-70% (Hutchison et al. 2018), even within the same tissue, lab, and experimental collection. In your Figure 7, though, you show quite high levels of overlap. This discrepancy may be explained because you are not multiple-hypothesis correcting your rhythmicity results. This brings up a concern of its own. In lines 448-450 you mention that 0.01 might not be stringent enough of a cutoff. Why aren’t you multiple-hypothesis correcting in that case? Do your results change when you multiple-hypothesis correct?

Building on this, in your section at line 315 “The issue of significance” and “ARS, empJTK, and GeneCycle produce consistent p-values”, you correctly distinguish between the methods that generate correct p-values and those that don’t, but I think it might be worth discussing further the meaning and point of p-values as heuristics for rhythmicity. At a small scale, the overlap of the top 100 genes might vary widely, but be important because it will determine what individual genes to next conduct an experiment on, be it a knock out, or a new chronotherapy target. At that level, differences between the methods become important, and whether the gene has a 5% of being a false positive or 0.5% chance becomes very important. At a large scale, it might determine what biological processes to experimentally intervene upon, and the false positive rate becomes important there as well. I think merely ranking the order of genes misses this aspect of the rhythm detection methods.

5) Alternative conclusions of your results

The title of your paper is “Methods detecting rhythmic gene expression are only reliable for strong signal”. I do not know if this statement can be distinguished from other statements that could be drawn from the same data, such as

- For genes with low expression, it is difficult to tell if their rhythmicity is conserved across species

- Rhythmicity is not conserved among genes with low expression.

- If orthologous genes have discordant expression levels, it is less likely that they will both be identified as rhythmic

- Methods detecting rhythmic gene expression agree more often for highly expressed genes

————————

Minor points:

Figure 1a:

I do not understand the placement of “Positive elements” as it seems to me it is next to an arrow that is a repressor

Figure 1b:

i) I don’t know what the goal of reporting the raw p-values is for JTK, meta2d, RAIN, and empJTK. It is clear from the design of the methods that they will all under-estimate p-values before correction, which you show, but I do not know what value to your argument or the discussion is added by having the raw p-values included in what is already a very busy (but very informative) figure.

ii) I think it would be worth including a subset image on the far right with a high right peak, to demonstrate the case where false negatives predominate and signal is discarded. It is arguably as important for a method to have a low false negative rate as a low false positive rate, and there are several examples in the figure (meta2d, JTK, RAIN) that show large right-sided peaks.

iii) Do we expect the RNA expression of all the ‘KEGG circadian entrainment’ genes to have rhythmic RNA expression? There are several cases where the protein levels are rhythmic and the RNA is not, cases where rhythmic protein modifications drive circadian involvement in genes that do not have RNA expression rhythmicity, and cases where protein levels are constant but the protein binds with a rhythmically expressed protein. All these cases would be indications for inclusion in in ‘KEGG circadian

Line 119:

You argue here that our analyses should not be biased towards high expressing time series, but is there i) a biological argument that these are more relevant or ii) an analytical argument that distinguishing strong signals that rise above the noise is an important thing to look for (the idea behind BooteJTK Hutchison et al 2018)?

Figure 2a

i) The figure might be legible if instead of p-values you plotted -log10(p-values). If you’re trying to plot a uniform distribution it makes sense to deal with p-values in their original state, but for these scatterplots I think it would help delineate signal from noise.

Figure 3

The caption should read “Fewer time points per cycle” instead of “Less time points”

Figure 5c and 5d

After some time looking at the figures, I still don’t understand how if the “upset diagrams” are showing the intersection of genes in the top 6000 across methods, as you reduce methods I would expect you have to have higher numbers of intersection (more genes would be rhythmic in both empJTK and ARS than in empJTK, ARS, and LS, since some genes rhythmic in empJTK and ARS won’t be rhythmic in LS, and therefore excluded from that intersection). That I have put in a good effort to understand these plots and still cannot suggests that the explanation should be more explicit, or the data should be replotted.

Figure 6d

Is this distribution comparison the best way of comparing across tissues? I would expect a scatterplot comparing p-values for each orthologous gene would better support your argument for overlap or divergence of methods.

294-295: Give X et al before citation in the same way you do in 297

412: This statement is untrue. empJTK accepts as inputs any time points as the header.

326: Are you saying we shouldn’t pay attention to rhythmic genes with low expression? Are there any biologically plausible reasons that genes that are not rhythmic across species would have high expression vs. low expression levels?

Reviewer #2: Review – Methods detecting rhythmic gene expression are only reliable for strong signal.

In this manuscript the authors focus on the importance of identifying correctly the nycthemeral transcriptome or the rhythmic expression patterns of certain genes, which has been linked with relevant biological processes. The authors enumerate several methods commonly used to detect rhythmic genes and aim at benchmarking these different approaches (7 different methods) by evaluating their performance in real datasets compared to randomized ones and also by including homology context (evolutionary conservation). The study objective is twofold: define an approach to benchmark future methods based on biological data rather than simulated data, and provide recommendations on the design of time-series experiments as well as the choice of detection method.

The manuscript is nicely written, clear and detailed. The stated problem is relevant and clearly demonstrated in the statistical evaluation of the results (p-values distributions analysis) from the selected methods (JTK, LS, ARS, meta2d, empJTK, RAIN and GeneCycle), which indeed show different ability to detect true rhythmic genes.

A concern about the manuscript is that when the authors explain the benchmarking methods, it is not clear whether the methods are benchmarked for their ability to detect rhythmicity or their capacity to predict functionally relevant genes. The methods are built to perform the former and that should be the main criterion to evaluate them. The benchmarking approaches used are correct; however, the concern raised by the authors about identifying non-functionally relevant genes with rhythmic expression patterns is a general concern and not specific for these methods. Thus, this should be made clearer in the text.

In more detail, there are several concepts that need clarification or further discussion:

1) Conservation of rhythmicity and biological relevance.

In several parts of the text, especially in the abstract, the authors seem to use conserved rhythmicity in orthologs as biological/functional relevance and unconserved rhythmicity as biological irrelevance.

For instance the authors write in the abstract: ‘In this study, we show that the nycthemeral rhythmicity at the gene expression level is biologically functional and that this functionality is more conserved between orthologous genes than between random genes’, ‘Rhythmic genes defined with a standard p-value threshold of 0.01 for instance could include genes whose rhythmicity is biologically irrelevant’.

The authors show indeed that rhythmicity is enriched among orthologs, which could indicate that this expression pattern is conserved for functional reasons. This has also been shown in other contexts (i.e. posttranslational modifications – functional sites) and can in principle be used as a way to prioritize these rhythmic genes. However, unconserved rhythmic expression patterns can also be functionally/biologically relevant. The authors mention this in the text: ‘It should be noted that this does not imply that we expect all rhythmic behavior to be conserved between orthologs’. However, this is somehow not reflected and clear in other parts of the manuscript. Even though the way the authors use homology to benchmark the methods is appropriate because a higher number of rhythmic orthologs is expected among true rhythmic genes, there is as well an unknown percentage of true positives among the unconserved rhythmic genes. This should be made clearer in the text. Also, it would very beneficial if the authors could provide some examples of conserved rhythmic genes with known relevant function (i.e. overlap with known cycling genes) and/or examples of clear false positives among the unconserved ones.

2) Use of p-values. The authors compare the overlap of rhythmic genes detected by the different methods and observe that “the same genes seem to be called rhythmic by all methods but the threshold of significance appear(s) inconsistent. These results suggest an issue with the significance of p-value thresholds for the methods… Thus the methods agree on a large number of rhythmic genes, but not necessarily on the order of significance 210 among them”. The studied methods follow different approaches to detect rhythmicity, would the authors expect equal p-value ranking by all methods?

Also, it is not clear whether the ‘issue with the significance of p-values thresholds’ is specifically an issue attributed to the studied methods or if this is more a common misuse of p-values (lower p-value, stronger evidence). Clarifying these points would help the reader, constrain the significance of p-values and at the same time benefit the argument of incorporating biological context.

3) Known Cycling Genes. The authors show the distribution of p-values for known cycling genes for each of the methods; however, it might also be useful to see the number of those genes predicted as rhythmic by each method.

4) Access to the code. It would be very beneficial to have access to the code to reproduce the results or replicate them in other datasets or other methods. Could the authors make the code available in a repository (i.e. GitHub)?

Minor changes

Page 3. Line 85. I would suggest changing the term ‘normal’ to refer to healthy/wild type individuals.

Page 4. Line 126. There is no description of the subset of known cycling genes in the Methods section. The authors should consider including a table with these genes in the Additional material.

Figure 1a. The panel aims at explaining the concept of nycthemeral gene expression but even though it is meant to be a simplification, it is very convoluted and may benefit from a simplified flow and a less synthetic explanation in the figure caption.

Rhythmic genes are enriched in highly expressed genes. The number of detected rhythmic genes is enriched among highly expressed genes in all the benchmarked methods. This is not surprising and the authors clearly state this and provide some possible reasons. However, the statement would benefit from a statistical test showing the enrichment, similarly to how the enrichment of orthologs was demonstrated (page 8).

Figure 2a. Transforming the y-axis of the plot into the -log(default p-value) may help visualize better the enrichment of rhythmic genes among the highly expressed genes compared to the lowly expressed ones.

Figure 3, 4. Use of ‘every’ instead of ‘each’ to refer to all the individual time points registered.

Figure 5a. The use of the Venn diagram is redundant with the Upset plot and could be removed. In general in all panels, the colour scheme seems unnecessary and it just makes the diagrams more difficult to read. The authors should consider that the number of overlaps at all levels (methods) might not be as informative as having for instance the number of common predicted genes by all, by half of the methods or by more than two. This would simplify the visualization and convey a similar message. If the message is how similar the results from different methods are, the authors could consider using a Jaccard index heatmap for instance.

Methods. Missing values. There is no description of how missing values were handled (if any).

Reviewer #3: I enjoyed reading the very well-written manuscript by Laloum and Robinson-Rechave. The authors have performed a benchmarking study of existing methods for detecting diurnal rhythms in gene expression. That might sound run-of-the-mill, but the analysis is especially interesting since it highlights p value distributions and evolutionary conservation. In all, a worthwhile study, definitely mature enough to be reported to the community.

I have no major criticisms, only more minor points:

Noting that the number of time points is crucial for detection power is almost trivial. RNA-Seq may be more sensitive, but I cannot imagine anyone expecting that it should outweigh cutting the number of time points by a factor 3 (Zhang et al.). I suggest shortening the report of those results.

The problem with non-uniform null pvalue distributions in original JTK could be corrected (empJTK). The same correction could be made to RAIN to produce "empRAIN", as suggested by Hutchison and Dinner in the cited bioRxiv report, with as they write "little additional effort". If this is true, authors might carry this out and include the results. Note (also the Editor please) that I'm not requiring this extra work. But the authors should in any case emphasize this in the "Limitations and improvement" section in the discussion, to stimulate such an effort.

Something that should be discussed is amplitude. Amplitude is closely related to function: High amplitudes are to most researchers in the field suggestive of (perhaps) more significant function. For most methods, high amplitudes should (for constant noise level) lead to lower pvalues. In fact, since pvalues are themselves random variables with very broad distributions (broader the lower the amplitude, typically), amplitude may be a strong filtering co-variate with higher potential than orthology to isolate "good" rhythmic transcripts. This has of course been noted earlier, see e.g., the "guidelines" JBR paper from 2017 or the original RAIN paper from 2014.

The broad distribution of pvalues and realizing that they are random variables themselves makes the results in Figure 5 quite expected: Venn diagrams by necessity then give a limited overlap for fixed cutoffs. This is something that was thoroughly discussed in the CMLS review by Lück and Westermark in 2016, which could be cited in relation to the Venn diagrams.

Related, regarding the analysis in Figures 6 and 7. It might be interesting to discuss whether rhythmic orthologs tend to have higher amplitudes, leading to a tendency for lower p values, and thus to detection by more methods.

The reporting of the results in Figures 6 and 7 is a bit of a mess to follow. The reader must be helped better to see the essential point. It is no trivial analysis, but the main text might be shortened to focus on the essentials (mainly Figure 7b). Explaining the analysis thoroughly could be done only in the methods section.

Page 3 close to top: To be fair, original JTK could detect an arbitrary waveform with a slight code modification, as stated in the original JTK paper. The limitation is rather that only this one waveform will be detected (a sine curve by default).

On page 7 3rd row from bottom: "efficieny", is "statistical power" meant?

Reviewer #4: There have been several reviews on this subject.

This manuscript adds to the existing literature

It includes more recent algorithms and, admirably, attempts to move beyond synthetic data (with its inherent biases) in the evaluation.

On these fronts the authors should be commended.

I do, however, have several significant concerns (Some more significant than others ;)

Some of the conclusions are overstated

There seem to be a few statistical issues/inconsistencies

One of recommendations seems wholly unsupported (and I would argue wrong)

Still I suspect that authors can address all my concerns

(1) In ALL P value histograms – you show p values less that 0 (the distribution goes to the left of the green 0, about -.2) and greater than 1. (the distribution goes to ~ 1.2) I am guessing/hoping this is a mislabeling? Did you try to do some weird kernel smoothing of these distributions that didn’t reflect their constrained domains?

(2) The significance thresholds used throughout this manuscript are all very low and not reflective of current best practices (see for example the review by Hughes et al the authors site) In the setting of testing ~ 20,000 genes. A p value threshold of 0.01 is likely to be loaded with false positives. The analysis needs to be re-done with a reasonable threshold (q<=.1?) as including multiple test corrections is both important and standard.

(3) The initial analysis showing that the p-value distributions resulting from the application of some of these methods on randomly generated data -is nonuniform seems important. (A) But in the end your argument that this is important is significantly undercut by your eventual use of the p-values as a simple ranking method. If we are just using the numbers for ranking purposes (which on the face of it – I don’t think I would advocate) then the fact that the p values don’t follow a uniform distribution wouldn’t be a deal breaker. You should either explain this – and consider ALL the methods throughout the review , stick to a reasonable p or q value threshold, or provide some reasoning for both excluding the methods based on distributional properties that you ultimately neglect.(B) More theoretical point – the formal argument is that p values should be uniformly distributed under the “null hypothesis” I am not an expert in all these methods. What are thir null-hypthesis with regard to the temporal pattern? No variation? Gaussian white noise? Autoregressive noise? What is the statistical distribution of your random data? Does it match the null hypotheses for these different methods?

(4) You spend a few pages separating data into high/low expression genes – and then mean/variance normalizing gene expression (within a gene) to test if that would influence the identification of cycling genes. But a quick review shows me that several of these methods (e.g JTK, eJTK, GeneCycle) are rank based. (purely use rank of the sample’s expression of Gene X compared to all other samples). By construction these methods should be completely unaffected by your normalization process (which is what you find). Thus these numerical experiments unnecessarily lengthens the manuscript. It seems like you could quickly explain this and then note that the bias toward High expression is not an artifact of (at least those) methods. It either reflects true biology or a lower signal to noise ratio in lowly expressed gene

(5) Figure 4B Analysis of reducing number of data points and comparison of fewer data points over 2 cycles vs more over 1 cycle. You only show the distributions of p values here. This seems insufficient. You could theoretically be identifying completely different genes (with the reduced data sets) as compared to the full data set. Ie (assuming the full data set is a better representation) – the significant genes in the reduced data set could be all false positives. Similarly when you GeneCycle is not influenced by this comparison – you don’t really know that, You have only shown that it picks out the same # of genes – not the same genes.

(6) You repeatedly find that the baboon data set does not seem to have much overlap with the mouse (or other data sets). In reviewing this study I find that is has very low read depth – complicating any quantitative analysis. You might want to raise this possibility? (up to you)

(7) Final Recommendations “Consider biological replicates as new cycles with one replicate” Perhaps I misunderstand your point - I think this is simply wrong. I can’t find that you supported this (one way or another) in your analysis . More to the point there is a big – real world – difference between euthanizing 3 mice every few hours for a day… and doing 1 mouse for three days. In the first case a single/random unknown influence of the shared environment (e.g. there was a loud noise at 12 pm stopping the mice from eating) would influence all three mice euthanized at that time. Concatenating their data as if they happened in 3 consecutive days would make this appear like a rhythmic behavior (a daily peak near noon). Having 3 days of real data would minimize this risk (unless the noise repeats every day at the same time) There are almost always unknown perturbations – this suggestion is a recipe to make them all appear rhythmic.

(8) Throughout the text you refer to conserved cycling as biologically important. (and non-conserved cycling to be unimportant). I understand that you are trying to use evolutionary conservation. But this is a giant oversimplification. The cycling of a gene – unique to a particular animal – might be very important to that animal. Conserved cycling of a transcript – that has a very long half-life protein product – might be completely physiologically unimportant. This language needs to be cleared up throughout. At best could say something “presumed evolutionary importance” but even this is too strong. I leave it to the authors to find some less loaded terminology

But the terminology should be refined

(9) For figures 6 and 7 I am a bit confused. In these comparisons was the analysis in both species done using the same method. (ie e-JTK defines cyciling in the rat liver – and then e-JTK is used to asses cycling in mouse liver). This seems like the more fair test for each method. But in reading the text it seemed like you used GeneCycle to find (for example) the rhythmic genes in the “benchmark” rat set. and then used all the other method in the lung and looked for overlap. If that is what was done --- it should be redone to be fair (but I doubt it will change the results much). If the “fair” test is the one that was already done – the text should be clarified.

(10) The analysis of conservation – while important – is also overstated.

You postulate (A) “Biologically important cycling should be conserved across species” and then jump to (B)“The statistical test that identifies a property most likely to be conserved is the best test. (B) does not follow from (A) For example A test completely unrelated to cycling could pick out a feature that is more likely to be conserved. This does not make it a good cycling test. Indeed this (I think) what you find when you say the “Naïve test” is better. High expression is actually a more phylogencially conserved feature than cycling per se. More to the point this test is assessing a desirable side feature of a cycling evaluation algorithm – but not its main function. As such this is an important – but secondary issue. Again the discussion needs to be reframed and toned down (Limitations better acknowledged)

(11) No mention is made of the precision of the different methods in describing the particulars of a cycling behavior (e.g the amplitude or period or some other feature depending on how the waveform is described. This is often of great import. Particularly as more recent experiments are trying to assess if there is a change in these parameters with a stimulus. These factors should be considered. It looks like some of the earlier reives did this – but as one of the points of this review is to look at newer methods – it should be redone. This might make a particular differenc with regard to questions of using more replicates (or adding time points) in two a smaller number of cycles – or measuring more cycles (eg. 2 animals every hour for 2 day) vs 2 animals every 4 hours for 4 days)

(12) You try to squeeze a lot on the figures. Perhaps I am just to old - but when printed out I find them to be unreadable.

Text and plots need to be bigger. Perhaps plot a bit less on each figure (if thats what you need to do)

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Alan L Hutchison, MD PhD

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007666.r003

Decision Letter 1

Bjoern Peters, Attila Csikász-Nagy

9 Jan 2020

Dear Dr Robinson-Rechavi,

Thank you very much for submitting your manuscript, 'Methods detecting rhythmic gene expression are biologically relevant only for strong signal', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org.

If you have any questions or concerns while you make these revisions, please let us know.

Sincerely,

Attila Csikász-Nagy

Associate Editor

PLOS Computational Biology

Bjoern Peters

Benchmarking Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #2: The authors have responded all the comments and edited the manuscript with the appropriate changes.

Reviewer #3: I am still endorsing this paper and am generally happy with the revisions. A careful re-reading revealed the following minor points.

Density estimates for p value distributions go beyond the support for the p values, i.e. have mass at < 0 and > 1. Density estimates can easily be restricted to 0 <= p <= 1, which should be done.

Re: Figure S12b. The main text correctly mentions the lack of consistent p value correlation, but the naive reader may be mislead by the astronomically low p values. This may be related to the KS tests e.g. Figure 5d: the KS test is sensitive to the most minute differences in distributions: even very very similar distributions with minuscule differences are often detected by this test. Thus, some kind of deviance measure to assess similarity of the distributions would be beneficial here. Use of Pearson correlation coefficients: Especially for Figure S12 it seems important to use e.g., a permutation test to assess significance since the log p values cannot be assumed to follow a normal distribution. Was this done?

New text "Of note, the comparison of 260 species under different conditions (light-dark versus dark-dark) is a limitation in itself 261 since the overlap of the rhythmic transcriptome between these two conditions has been 262 shown to be low":

Where has it been shown to be low? In fact I even discourage the use of the term "overlap" which is arbitrary and can be misleading; overlap is usually defined by arbitrary cutoffs. It introduces a black/white thinking into a highly greyscale reality. In any case, mouse liver LD/DD were compared in the DODR paper and the correspondence did not seem "low".

The caption of Figure S5 mentions 7 methods, only 5 are shown.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007666.r005

Decision Letter 2

Bjoern Peters, Attila Csikász-Nagy

18 Jan 2020

Dear Prof. Robinson-Rechavi,

We are pleased to inform you that your manuscript 'Methods detecting rhythmic gene expression are biologically relevant only for strong signal' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch within two working days with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Attila Csikász-Nagy

Associate Editor

PLOS Computational Biology

Bjoern Peters

Benchmarking Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007666.r006

Acceptance letter

Bjoern Peters, Attila Csikász-Nagy

5 Mar 2020

PCOMPBIOL-D-19-01419R2

Methods detecting rhythmic gene expression are biologically relevant only for strong signal

Dear Dr Robinson-Rechavi,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Sarah Hammond

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Gene expression time-series datasets.

    Gene expression time-series datasets that come from circadian experiments. We kept data from healthy, wild-type individuals for these seven species, allowing comparisons among vertebrates and among insects. We preferred data from light-dark (LD) and ad-libitum experimental conditions whenever possible as providing a better representation of wild conditions. LD for regular alternation of light and darkness each 24h; and DD for continuous darkness usually after an entrainment to a 12h:12h light:dark.

    (XLSX)

    S1 File. Supplementary results.

    (PDF)

    S2 File. Density distribution of raw and default p-values obtained for the seven rhythm detection methods applied to vertebrate datasets.

    Density distribution of p-values obtained before (raw) and after the default correction (software) for the seven methods applied to each vertebrate dataset, sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes when such data was available (8 to 99 genes according to species). The default p-values of ARS, GeneCycle, and LS are uncorrected.

    (PDF)

    S3 File. Following S2 File.

    (PDF)

    S4 File. Signal of evolutionary conservation of rhythmic gene expression in vertebrates.

    p-values density distribution of rhythmic orthologs vs non-rhythmic orthologs obtained for the seven methods applied to different vertebrate datasets. Orthologous genes detected as rhythmic in the same organ of two species have a stronger statistical signal of rhythmicity than those detected as not-rhythmic in at least one species. From all species_1 genes, only species_1-species_2 one-to-one orthologs are kept. Considering homologous tissues, these orthologs are separated into two groups: genes for which the ortholog is detected as rhythmic in this tissue of species_2, called rhythmic orthologs; and the remaining one-to-one orthologs.

    (PDF)

    S5 File. Variation of the proportion A/B as a function of the number of orthologs detected rhythmic, obtained for each method applied to different vertebrate datasets.

    The benchmark gene set is composed of species_1-species_2 orthologs, detected rhythmic in the homologous tissue of species_2 by the ARS, GeneCycle, or empJTK method with default p-value ≤ 0.01 or 0.05. See Fig 6 for definitions of sets A and B. The black line is the Naive method which orders genes according to their median expression levels (median of time-points), from highest expressed to lowest expressed gene, then, for each gene, calculates the proportion of rhythmic orthologs among those with higher expression.

    (PDF)

    S6 File. Results in insects.

    (PDF)

    Attachment

    Submitted filename: Review.docx

    Attachment

    Submitted filename: Review_2nd.docx

    Data Availability Statement

    Data are publicly available from the NCBI GEO database, as specified in the Materials/Datasets section.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES