Abstract
Down syndrome (DS), trisomy of human chromosome 21 (Hsa21), results in a broad range of phenotypes. A recent study reported that DS cells show genome-wide transcriptional changes in which up- or down-regulated genes are clustered in gene expression dysregulation domains (GEDDs). GEDDs were also reported in fibroblasts derived from a DS mouse model duplicated for some Hsa21-orthologous genes, indicating cross-species conservation of this phenomenon. Here we investigate GEDDs using the Dp1Tyb mouse model of DS, which is duplicated for the entire Hsa21-orthologous region of mouse chromosome 16. Our statistical analysis shows that GEDDs are present both in DS cells and in Dp1Tyb mouse fibroblasts and hippocampus. However, we find that GEDDs do not depend on the DS genotype but occur whenever gene expression changes. We conclude that GEDDs are not a specific feature of DS but instead result from the clustering of co-regulated genes, a function of mammalian genome organisation.
Subject terms: Gene expression, Gene regulation, Chromatin structure, Transcription, Experimental models of disease
Gene expression dysregulation domains (GEDDs) have been reported in Down syndrome (DS) cells, where changes in gene expression are clustered. Here the authors find that, while GEDDs are present in DS cells and in the Dp1Tyb mouse model of DS, GEDDs do not depend on the DS genotype and occur whenever gene expression changes, suggesting they result from the clustering of co-regulated genes as a function of mammalian genome organisation.
Introduction
Down syndrome (DS), also known as trisomy 21, is a leading cause of cognitive deficits, occurring in 1 in 700 births. DS results in a broad range of phenotypes, including cognitive impairment, congenital heart defects, craniofacial abnormalities and early-onset dementia1. The predominant view is that these phenotypes result from an increased dosage of one or more of the genes on Hsa21; currently, Hsa21 is estimated to contain 234 protein-coding genes2. The increased dosage of these genes is predicted to lead to increased transcript and protein levels (for the coding genes), which in turn would affect cellular and organismal physiology resulting in the observed pathologies. One DS pathological mechanism may be through the action of proteins such as transcription factors or chromatin modifiers that alter expression of other non-Hsa21 genes. A notable recent study proposed that trisomy 21 results in genome-wide transcriptional changes in which upregulated or downregulated genes are clustered in regions termed gene expression dysregulation domains (GEDDs), with genes whose expression changes in the same direction (up or down) being clustered3. The study postulated that GEDDs were the result of genome-wide chromatin changes in DS, potentially caused by overexpression of a chromatin modifier on Hsa21. The study also reported that GEDDs were present in a mouse model of DS named Ts65Dn that is trisomic for 132 protein-coding genes on mouse chromosome 16 (Mmu16) that are orthologous to Hsa21. However, the published study was limited in scope because GEDDs were identified using four replicate RNA samples derived from three independent fibroblast cultures isolated from a single pair of monozygotic twins discordant for trisomy 21, and the analysis of mouse Ts65Dn fibroblasts was carried out using just one replicate3.
Evidently, such genome-wide changes in gene expression could make a significant contribution to DS phenotypes and thus merit further investigation. We examined the phenomenon of GEDDs using a recently created mouse model of DS termed Dp1Tyb, which has three copies of the entire 23 Mb region of Mmu16 that is orthologous to Hsa21 containing 148 protein-coding genes, including all the 132 genes duplicated in Ts65Dn4. Dp1Tyb mice show a number of DS-like phenotypes, including congenital heart defects4, locomotor defects5, learning and memory deficits and craniofacial abnormalities (Eva Lana-Elola, Sheona Watson-Scales, Elizabeth M.C. Fisher and Victor L.J. Tybulewicz, 2019 unpublished).
Here we develop a robust statistical method to evaluate whether changes in gene expression are more clustered than expected by chance, a necessary feature of GEDDs. Using this approach, we are indeed able to detect GEDDs in both human DS fibroblasts and in Dp1Tyb mouse fibroblasts and hippocampus. However, we show that the presence of GEDDs does not depend on genotype (for example, DS in humans or the Dp1Tyb duplication in mouse) but is seen whenever gene expression changes. Indeed, we detect GEDDs in gene expression data sets with no relation to DS or mouse models of DS. Furthermore, we show that the boundaries of GEDDs correlate with boundaries of topologically associating domains (TADs), units of higher-order chromatin structure that are enriched in co-regulated genes6,7. We conclude that GEDDs are not a specific feature of DS but instead result from the organisation of mammalian genomes whereby genes located close to each other are more likely to be co-regulated.
Results
Differential gene expression in Dp1Tyb mouse embryonic fibroblasts (MEFs)
The report of GEDDs in gene expression data from human DS fibroblasts and DS induced pluripotent stem cells (IPSCs) and from mouse Ts65Dn fibroblasts (a model of DS)3 presented an opportunity to use mouse genetics to map and identify the dosage-sensitive gene(s) causing this phenomenon. We decided to make use of the recently described Dp1Tyb mouse model of DS, which carries a duplication of a 23 Mb region of Mmu16 that is orthologous to Hsa21 and contains 148 protein-coding genes4. This mouse strain has been backcrossed for well over 10 generations such that the genetic background of all animals is C57BL/6J with <0.1% genetic variability between mice. This duplicated region in Dp1Tyb includes all 132 Mmu16 genes duplicated in Ts65Dn but does not have increased dosage of any of the 43 coding genes non-orthologous to Hsa21 that are duplicated in Ts65Dn8. Thus, if GEDDs are caused by increased dosage of Hsa21 genes or their orthologues on Mmu16 in the mouse, they should also be seen in cells from Dp1Tyb mice. The Dp1Tyb strain offers a further advantage in that we have also generated a series of strains with shorter nested duplications on Mmu16 (Dp2Tyb to Dp9Tyb), allowing mapping of the causative gene(s) and their eventual identification4.
Since GEDDs had been reported in gene expression data from human DS fibroblasts and Ts65Dn MEFs, we chose to analyse Dp1Tyb MEFs. We grew cultures of MEFs from four wild-type (WT) and five Dp1Tyb littermate embryos, isolated RNA from them and carried out RNA sequencing (RNAseq) in order to quantitate gene expression. Differential gene expression analysis showed 66 significantly differentially expressed genes (adjusted p value [padj] < 0.05) with 39 of these located within the duplicated region in Dp1Tyb, all of which were upregulated (Fig. 1, Supplementary Data 1). The mean fold change of these duplicated genes in Dp1Tyb vs WT MEFs was 1.47, close to the expected value of 1.5.
GEDDs in Dp1Tyb MEFs
The existence of GEDDs was previously proposed by analysing the fold change in gene expression of DS fibroblasts or IPSCs vs euploid cells and looking for clustering of upregulated or downregulated genes across the genome3. In this study, the visualisation of these domains of clustered gene expression changes was simplified using a Loess smoothed curve. However, the study did not determine whether the clustering of gene expression changes seen in human DS cells was statistically significant. We employed a similar approach to the gene expression data from Dp1Tyb MEFs to look for GEDDs. As expected, we could see a clear increase of about 1.5-fold in gene expression in Dp1Tyb MEFs across the duplicated region of Mmu16 (Fig. 2). Across the rest of the genome, however, the fold changes were small (Fig. 2, Supplementary Fig. 1). Notably, the changes in gene expression were much smaller than those previously reported for the comparison between DS and euploid human fibroblasts that had led to the concept of GEDDs (Supplementary Fig. 2a)3. These larger fold changes in the human data are most likely the consequence of much greater variation in gene expression seen in the RNAseq data (Supplementary Fig. 2b). Nonetheless, in the analysis of the RNAseq data from Dp1Tyb MEFs, the Loess smoothed curves showed regions of apparent clustered upregulation or downregulation. However, the extent of upregulation or downregulation was small, and from this analysis, it was not possible to determine whether such clusters were statistically significant.
To address this issue, we devised two statistical tests to determine whether upregulated or downregulated genes were clustered more than would be expected by chance, a necessary feature of GEDDs. First, for each chromosome we counted the numbers of flips defined as a change in the direction of the fold change from one gene to the adjacent gene on the chromosome (Fig. 3a). This number of flips was compared to the distribution of numbers of flips determined from 100,000 permutations of the same chromosome with randomised orders of the genes. Clustering of upregulated or downregulated genes would result in a significant decrease in the number of flips compared to the random (bootstrapped) distribution. Second, we used a spatial correlation measure9. To this end, we calculated an energy function E for each chromosome by multiplying the fold change of each gene with each of its two neighbours and then adding these products together for all genes on the chromosome (Fig. 3a). This measure takes into account the magnitude as well as the direction of the change of each gene and would be increased if there were more clustering of upregulated or downregulated genes. Again, this measure was compared to the distribution of E of 100,000 bootstrapped versions of the same chromosome. The presence of GEDDs would result in a larger value of E than expected by chance.
We undertook both tests (numbers of flips, energy function) on the human DS fibroblast gene expression data in Letourneau et al., which had reported the existence of GEDDs3. On most chromosomes, we were indeed able to detect significantly decreased numbers of flips and increased energy compared to the numbers expected by chance (>2 standard deviations (SDs) from mean of randomised distribution), validating these two approaches, and confirming the presence of GEDDs statistically (Fig. 3b, Supplementary Fig. 3, Supplementary Fig. 4, Table 1). Next, we examined the RNAseq data from Dp1Tyb and WT MEFs. Here again we found significantly decreased numbers of flips and increased energy on most chromosomes, implying that GEDDs were also present in Dp1Tyb MEFs (Fig. 3c, Supplementary Fig. 5, Supplementary Fig. 6, Table 1).
Table 1.
Sample | Number of significant chromosomes | |
---|---|---|
Flips | Energy | |
Human DS fibroblasts | 20 | 22 |
Dp1Tyb MEFs | 7 | 16 |
Dp1Tyb hippocampus | 8 | 17 |
Table shows the numbers of chromosomes that had significantly (>2 SD) reduced numbers of flips or increased energy in comparisons of changes in gene expression in human DS vs euploid fibroblasts or in Dp1Tyb vs WT MEFs or hippocampus
DS, Down syndrome; GEDD, gene expression dysregulation domain; MEF, mouse embryonic fibroblast
GEDDs in Dp1Tyb hippocampus
The analysis by Letourneau et al. showed a correlation in the distribution of GEDDs between those seen in human DS and mouse Ts65Dn fibroblasts3. Thus we examined whether the same was true for gene expression changes detected in Dp1Tyb fibroblasts. However, excluding genes on Hsa21 and their orthologues on Mmu16, we were able to detect only a very small correlation in the expression changes between orthologous genes in human DS and mouse Dp1Tyb fibroblasts (Fig. 4a). In view of this and the very small changes in gene expression we saw between Dp1Tyb and WT MEFs (only 27 differentially expressed genes outside the duplicated region), we wondered whether another cell type might show larger differences in gene expression, making the correlation between human and mouse gene expression changes easier to detect.
We decided to evaluate the gene expression changes in hippocampus from five WT and five Dp1Tyb mice, since this region of the brain plays an important role in learning and memory and its function is altered in several mouse models of DS10–17, including Dp1Tyb (Elizabeth M.C. Fisher and Victor L.J. Tybulewicz, 2019 unpublished). Compared to the MEF expression data, we saw more changes in gene expression in the hippocampus. There were 515 significantly differentially expressed genes with 76 of these located within the duplicated region in Dp1Tyb, all of which were upregulated, and a further 439 differentially expressed genes outside the duplication (Fig. 1, Supplementary Data 2). The mean fold change of the duplicated genes in Dp1Tyb vs WT hippocampus was 1.41, once again close to the expected value of 1.5. Plots of fold change of genes along chromosomes also showed clear upregulation of genes across the duplicated region with only small changes across the rest of the genome (Fig. 5, Supplementary Fig. 7). Once again, the fold changes in gene expression were much smaller than those previously reported for the comparison between DS and euploid human fibroblasts3, most likely due to much higher variation in gene expression in the human data (Supplementary Fig. 2a, b). Moreover, analysis of clustering of gene expression changes showed many chromosomes with significantly decreased numbers of flips and increased energy confirming that GEDDs were also detectable in the Dp1Tyb hippocampus (Supplementary Fig. 8, Supplementary Fig. 9, Table 1).
The study reporting GEDDs in human DS fibroblasts showed that these domains were also seen in DS IPSCs with substantial correlation in the location and magnitude of GEDDs between these two different cell types3. Thus we examined the correlation in gene expression changes between the Dp1Tyb mouse hippocampus and Dp1Tyb MEFs. Excluding the duplicated genes on Mmu16, we were able to detect only very weak correlation (Fig. 4a). This lack of correlation was also seen in an overlay of the Loess curves of the RNAseq data from Dp1Tyb MEFs and hippocampus (Supplementary Fig. 10). Furthermore, there was also very low correlation in the gene expression changes between Dp1Tyb hippocampus and human DS fibroblasts (Fig. 4a). Taken together, these data show that GEDDs can be detected in Dp1Tyb MEFs and hippocampus using sensitive statistical tests, but as shown by the very poor correlation in expression changes, the location of these is not conserved between human DS and mouse models or indeed between different tissues in the Dp1Tyb mouse strain.
GEDDs are not caused by a decreased dynamic range of gene expression
Letourneau et al. reported that, in the comparison of DS and euploid fibroblasts, DS cells had elevated expression of genes expressed at a low level and decreased expression of more highly expressed genes3. Since genes tend to be clustered according to level of expression, the authors suggest that GEDDs may arise because of a smaller dynamic range of gene expression in DS cells compared to euploid cells, leading to clustered increases in gene expression of lowly expressed genes and clustered decreases of highly expressed genes. In view of this, we examined the gene expression changes in Dp1Tyb MEFs and hippocampus compared to WT controls as a function of gene expression level. Following the approach used by Letourneau et al.3, we divided genes into low, medium and high levels of expression and evaluated the distribution of fold changes in gene expression between Dp1Tyb and WT MEFs or hippocampus in comparison to the control of comparing WT to WT expression. We saw no evidence for increased or decreased fold changes in Dp1Tyb cells in the lowly or highly expressed genes, respectively (Fig. 4b, c). Thus there is no change in the dynamic range of gene expression in Dp1Tyb MEFs or hippocampus, and this cannot explain the GEDDs in these cell types.
GEDDs are not due to increased mRNA levels in Dp1Tyb cells
A recent study by Mowery et al. demonstrated that interleukin-7-cultured pro-B cells from Ts1Rhr mice had increased levels of mRNA compared to WT controls18. This mouse strain has a duplication of a 33-gene Hsa21-orthologous region on Mmu16, which is entirely included within the duplication in Dp1Tyb mice4,19. The study demonstrated that this increase in mRNA is caused by an additional copy of the Hmgn1 gene, which codes for HMGN1, a nucleosome-binding protein that modulates chromatin compaction. Furthermore, the increase in mRNA level was not even, with a larger increase in lower expressed genes and a smaller increase in the more highly expressed genes. The authors argue that, in a standard RNAseq analysis that assumes no change in overall RNA levels and is normalised to median read counts (relative normalisation), this uneven increase in RNA levels would lead to an apparent increase in the expression of lower expressed genes and decrease in the expression of higher expressed genes. This in turn would result in the appearance of GEDDs because of the tendency for genes to be clustered according to the level of expression. Since the Hmgn1 gene is duplicated in Dp1Tyb mice and significantly increased in the expression in both Dp1Tyb MEFs and hippocampus (Supplementary Data 1, 2), it is possible that the GEDDs we observed in tissues from these mice might also be a consequence of increased mRNA levels and the use of relative normalisation. To address this possibility, we carried out another RNAseq experiment on Dp1Tyb and WT MEFs, but this time added non-mammalian synthetic ERCC (External RNA Controls Consortium) RNA controls at a fixed amount per cell to each sample, similar to the strategy employed by Mowery et al.18. To determine whether the overall amount of mRNA was increased in Dp1Tyb MEFs, we analysed the data by normalising to the spiked-in ERCC controls (absolute normalisation) and compared this to relative normalisation of the same data. We found that the mean fold change of gene expression between Dp1Tyb and WT MEFs was ~1.8% higher using absolute normalisation compared to relative normalisation, indicating a small increase in overall mRNA level in Dp1Tyb MEFs (Supplementary Fig. 11a). This is much less than the ~10% increase seen by Mowery et al. in Ts1Rhr pro-B cells18 and may be partly accounted for by the increase in the transcriptome in Dp1Tyb cells compared to WT cells (~0.7%). Importantly, we could see no skewed increase in expression in favour of lower expressed genes (Supplementary Fig. 11b), thus an increase in mRNA level cannot explain the GEDDs detected in these cells.
GEDDs are not caused by DS genotype
In view of the very low correlation between the human and mouse gene expression changes and the very small magnitude of these changes in both Dp1Tyb MEFs and hippocampus, we wondered whether the expression changes contributing to the detection of GEDDs were due to experimental variation and not to the human or mouse DS genotype. To address this, we repeated the analysis of the expression data from both the human DS fibroblasts3 and the mouse Dp1Tyb MEFs and hippocampus but mixed samples so as to eliminate the effect of genotype. The human DS fibroblast data consisted of four DS samples and four euploid samples. Thus we compared two DS and two euploid samples against two other DS and two other euploid samples, thereby eliminating the effect of genotype in the comparison. With 4 DS and 4 euploid samples, there are 18 possible combinations in which such no genotype difference comparisons could be carried out, and we calculated fold changes in gene expression for all of these. Similar switching was carried out with the Dp1Tyb MEF and hippocampus data.
Plotting the fold changes superimposed on Loess smoothing curves from the genotype-switched analysis of the human DS fibroblasts suggested the presence of GEDDs, for example on Hsa21 (Fig. 6a, b). Analysis of GEDDs using the flips and energy measures showed that, for many chromosomes in most of the switched no-genotype-difference analyses, there were significantly reduced numbers of flips and increased energy, indicating GEDDs (Fig. 6c, d). Similarly, eliminating the genotype contribution in the mouse Dp1Tyb MEF and hippocampus data by switching genotypes still showed that in most of the switched combinations most chromosomes had detectable GEDDs as measured by flips or energy (Fig. 7a–h, Supplementary Fig. 12 and 13). Thus we conclude that clustering of upregulated or downregulated genes (i.e. GEDDs) is detectable even when there is no difference in genotype between the samples being compared and, in this case, is most likely due to small variations in gene expression from one fibroblast culture to another or from one mouse to another. Furthermore, this implies that GEDDs may not be a specific feature of DS.
GEDDs caused by genetic changes unrelated to DS
If GEDDs are not specifically caused by the DS genotype but are detected when expression changes are due to other causes, it may be that they are a general phenomenon occurring whenever the expression of genes changes. To address this, we turned to a completely unrelated RNAseq data set that one of us (H.A.) had previously worked on, consisting of gene expression in follicular and marginal zone B cells taken from mice with a null mutation of the Zfp36l1 gene or from WT control mice20. We carried out three differential gene expression analyses, comparing WT and ZFP36L1-deficient follicular B cells, WT and ZFP36L1-deficient marginal zone B cells and WT follicular with WT marginal zone B cells. All three of these comparisons identified substantial numbers of differentially expressed genes20. Plots of fold changes in gene expression with Loess smoothing curves indicated the possible locations of clustered gene expression changes (Supplementary Fig. 14, and data not shown). Furthermore, analysis of clustering of gene expression changes once again showed many chromosomes with significantly decreased numbers of flips and increased energy confirming that GEDDs were also detectable in each of these three comparisons (Table 2). Thus we conclude that GEDDs are caused by perturbations in gene expression that are unrelated to DS and hence are not a specific feature of the syndrome.
Table 2.
Sample | Number of significant chromosomes | |
---|---|---|
Flips | Energy | |
ZFP36L1-deficient vs WT follicular B cells | 8 | 13 |
ZFP36L1-deficient vs WT marginal zone B cells | 5 | 8 |
Follicular vs marginal zone WT B cells | 4 | 10 |
Table shows the numbers of chromosomes that had significantly (>2 SD) reduced numbers of flips or increased energy in comparisons of changes in gene expression in ZFP36L1-deficient vs WT follicular or marginal zone B cells or in follicular vs marginal zone WT B cells
GEDD, gene expression dysregulation domain; WT, wild type
GEDD boundaries align with TAD boundaries
Finally, we considered the relationship of the GEDDs that we had detected in Dp1Tyb MEFs and hippocampus with previously described genomic domains that relate to gene expression. TADs are regions of the genome showing increased intra-domain interactions compared to inter-domain interactions6,7. TADs appear to be loops of chromatin held together at their ends, one consequence of which is that genes within TADs are more likely to be co-regulated21–23. The boundaries of TADs are often marked by binding of the CTCF protein, which may be involved in the formation of TADs6. These boundaries also often correspond to the boundaries of replication domains, regions of either early or late replication24. Finally, some regions of the genome have been found to associate with the nuclear lamina. These lamina-associated domains (LADs) are typically regions of gene repression and correspond partially to TADs, though generally LADs are larger than TADs25. We compared the location of these elements relative to GEDDs and found that TAD boundaries and CTCF binding sites were enriched at GEDD boundaries from both Dp1Tyb MEFs and hippocampus (Fig. 8a, c). In contrast, there was no obvious enrichment of replication domain or LAD boundaries. Inverting the comparison, we found that GEDD boundaries and CTCF-binding sites were enriched at TAD boundaries, as were replication domain and LAD boundaries, although to a lower extent (Fig. 8b, d). Thus we conclude that the co-regulation of gene expression in GEDDs is most likely a consequence of their partial correspondence to TADs, genomic structures that tend to contain co-regulated genes.
Discussion
The recent report that DS cells show clustered upregulation or downregulation of gene transcription compared to euploid cells, a phenomenon termed GEDDs, gave rise to the hypothesis that at least some DS phenotypes may be caused by chromatin changes leading to genome-wide dysregulation of gene expression3. This is an important hypothesis to address in the context of trying to understand the pathological mechanisms that underpin DS phenotypes, since it suggests that some phenotypes may be the result of widespread changes of gene expression across the genome, rather than direct effects of increased expression of one or a few Hsa21 genes. The observation that GEDDs were also reported in the Ts65Dn mouse model of DS presented us with an opportunity to use the Dp1Tyb mouse strain and, potentially, the associated mapping panel of mouse duplication strains (Dp2Tyb to Dp9Tyb)4 to locate and identify the dosage-sensitive gene(s) that give rise to GEDDs, and hence, ultimately, to test the importance of GEDDs in causing DS phenotypes.
We developed two statistical tests to confidently detect clustering of gene expression changes occurring in the same direction and using these were able to show GEDDs in the gene expression data from human DS fibroblasts reported by Letourneau et al.3. We were also able to detect GEDDs in our own gene expression data from Dp1Tyb MEFs and hippocampus. However, the location of the expression changes was not conserved between the human and mouse data. We noted that the magnitude of the gene expression changes was larger in the human DS fibroblast data compared to the Dp1Tyb mouse fibroblast and hippocampus data. This difference may arise from the greater number of biological replicates used in our mouse studies compared to the original human DS fibroblast data3. This is supported by the larger variation in gene expression, i.e. more noise, in the human fibroblast data compared to our Dp1Tyb mouse fibroblast and hippocampus data.
Letourneau et al. had previously suggested that GEDDs may be caused by a smaller dynamic range of gene expression in DS cells with increased expression of lower expressed genes and decreased expression of higher expressed genes3. We saw no such effect in Dp1Tyb MEFs or hippocampus. More recently, Mowery et al. showed that an additional copy of the Hmgn1 gene results in an increase in total RNA in mouse pro-B cells, with a larger increase in lower than in higher expressed genes18. They suggested that, because of this, the appearance of GEDDs in DS cells was an illusion caused by the use of relative normalisation of RNAseq data to the median gene expression in a sample. To test this idea, we carried out RNAseq on Dp1Tyb and WT MEFs with control spike-in RNAs, which allows an absolute normalisation of the data to the spike-ins, thereby determining absolute RNA levels. We found a small increase in RNA levels in Dp1Tyb MEFs, but this increase was not skewed towards the lower expressed genes and thus cannot explain the GEDDs we observe. This result agrees with our observation, discussed above, that using a relative normalisation there was no increased expression of lower expressed genes and decreased expression of higher expressed genes. We note that, despite an additional copy of the Hmgn1 gene in Dp1Tyb MEFs and ~1.5-fold increased expression of the gene, the increase in RNA level was much smaller than that seen in Ts1Rhr pro-B cells, suggesting that the effects of Hmgn1 overexpression may be cell-type specific.
Importantly, using two different approaches, we were able to show that the presence of GEDDs was not dependent on the DS genotype. First, we mixed samples and carried out differential gene expression analysis between sets of samples that no longer differed by DS genotype; this analysis still detected GEDDs. Second, we were able to detect GEDDs in an unrelated RNAseq data set comparing gene expression in B cells from WT and ZFP36L1-deficient mice. Thus we conclude that GEDDs are not a specific feature of DS but rather this spatial coordination of expression in the genome is seen whenever gene expression changes, even if this is caused by small stochastic variations in gene expression.
A recent publication by Do et al. questioned the validity of GEDDs in DS, based on an inability to reproduce the gene expression changes in human IPSCs and Ts65Dn MEFs reported by Letourneau et al.26. This report points out that gene expression data on the same DS IPSCs had been previously published by the same group27, but analysis of this latter data showed very little correlation in gene expression changes with the data in Letourneau et al.26. Furthermore, Do et al. also carried out RNAseq on Ts65Dn and control MEFs and showed that it did not correlate with the gene expression changes reported in Ts65Dn MEFs by Letourneau et al.26. Taken together, Do et al. suggest that GEDDs are not a reproducible feature of human DS IPSCs or of fibroblasts from the Ts65Dn mouse model of DS. These differences in gene expression changes may be a result of experimental variation; we note that the Letourneau et al.’s study used only three biological replicates for the RNAseq data from DS fibroblasts and only one replicate from DS IPSCs and Ts65Dn mouse fibroblasts3.
Our data show that, irrespective of whether the reported gene expression changes are caused by the DS genotype or by experimental variation, whenever gene expression changes between two different conditions, the upregulated or downregulated expression changes are clustered to a greater extent than would be expected by chance. This clustering of gene expression changes is likely to be caused by the non-random arrangement of genes in the mammalian genome. It has been recognised for some time that co-expressed genes are more likely to be located near each other in the genome28. Co-expressed mammalian genes are clustered at different scales, both at the level of neighbouring genes and in domains spanning many megabases29,30. In particular, TADs are genomic domains that are formed by chromatin loops and are often bounded by CTCF-binding sites6,7. As a consequence of increased intra-domain interactions in TADs, genes within them are more likely to be co-regulated21–23. We found that the boundaries of TADs and CTCF-binding sites were enriched at the boundaries of GEDDs, suggesting that GEDDs at least partially correspond to TADs. This observation explains why we detect a statistically significant increase in clustering of gene expression changes.
In summary, our results show that GEDDs are not a specific feature of DS, but are detected whenever gene expression changes, and are most likely a direct consequence of clustering of co-expressed genes in the mammalian genome.
Methods
Mice
C57BL/6J.129P2-Dp(16Lipi-Zbtb21)1TybEmcf/Nimr (Dp1Tyb) mice4 were bred at the MRC National Institute for Medical Research (now part of the Francis Crick Institute). All mice were backcrossed to C57BL/6JNimr for at least ten generations. MEFs were derived from five Dp1Tyb and four WT littermate E14.5 embryos. For the hippocampal RNAseq experiments, the whole hippocampus was dissected from 5 Dp1Tyb and 5 WT littermate male mice aged 18.5–19 weeks. All animal work was approved by the Ethical Review panel of the Francis Crick Institute and was carried out under Project Licences granted by the UK Home Office.
Cell culture
For the generation of MEFs, E14.5 embryos were decapitated, eviscerated, minced and then treated with trypsin. Embryos were then titurated to obtain a single-cell suspension. Cells were cultured in Dulbecco’s modified Eagle’s medium (Gibco), 10% foetal bovine serum, 1× penicillin/streptomycin (Gibco) and 50 μM 2-mercaptoethanol (Gibco). MEFs were cultured for two passages and then prepared for RNA extraction by trypsinisation to remove them from the culture plates.
RNA preparation, library preparation and sequencing
Total RNA was purified from MEFs using Trizol (Life Technology) following the manufacturer’s instructions. Where RNA spike-ins were used, 4 μl of a 1:100 dilution of ERCC RNA spike-in mix 1 (ThermoFisher), was added to 200,000 MEFs re-suspended in Trizol prior to RNA extraction. The MEF/spike-in/Trizol mixture was then extracted following the manufacturer’s instructions. Hippocampus was homogenised in Qiazol using a TissueRuptor II with disposable probes (Qiagen). RNA was extracted from the homogenised samples using the miRNeasy Kit (Qiagen) followed by treatment with Turbo DNAse (Ambion). RNA concentrations were measured using the Qubit 3 (Life Technologies) or the NanoDrop ND-1000 and RNA quality was assessed using the Bioanalyzer or the Tapestation 2200 (Agilent). Samples with a RNA integrity number >8.5 were taken forward for sequencing. RNAseq libraries were prepared with the TruSeq Stranded mRNA Sample Prep Kit (Illumina). Libraries were sequenced with an Illumina HiSeq 2500 using a 100 base paired-end protocol (MEFs) or with a HiSeq 4000 using a 75 base paired-end protocol (hippocampus) or 100 base paired-end protocol (MEFs with spike-ins).
Analysis of RNAseq data
The quality of the sequencing data was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and illustrated using the multiqc tool31 (Supplementary Fig. 15a–c). The adapter sequences were trimmed using TrimGalore! and the reads were mapped to the genome assembly GRCm38 using TopHat (version 2.0.12)32. Multialigning reads were discarded. Reads mapping to genes were counted using htseq-count33. A R/bioconductor package DESeq2 was used for analysis of differentially expressed genes34. Genes were considered as significantly differentially expressed between mutant and WT conditions when padj < 0.05. Data with the ERCC spike-ins was processed as described above with the following differences: data were mapped to the genome assembly amended with sequences of the ERCC synthetic spike-in RNAs (ThermoFisher) and was normalised using spike-ins before analysing differentially expressed genes using a R/Bioconductor package DESeq2. For each RNAseq sample, the total number of reads, as well as the number and percentage of unique and aligned pairs of sequences, are shown in Supplementary Fig. 15d.
For no-genotype-difference analysis, four samples of each genotype were used and their genotypes were mixed so that in each case we compared gene expression in two WT and two mutant samples against the gene expression of two other WT and two other mutant samples. With 4 WT and 4 mutant samples, there are 18 possible ways to make such comparisons where there is no difference in genotype between the groups of samples being compared; we calculated the expression changes in all such 18 combinations. Expressed genes were filtered using a cutoff value, which was determined by calculating the mean RPKM value for each gene, plotting the log2 transformed values as a density plot and then fitting a normal distribution to estimate the mean and SD, similar to a previously published approach35. Genes with a log2[RPKM] value bigger than mean − (3 × SD) were considered as expressed. R/Bioconductor was used to calculate all statistical tests (Loess smoothing function and Pearson correlation) and to visualise the data (fold change, correlation and density plots).
Statistical analysis of GEDDs
Statistical significance of the existence of GEDDs in chromosomes was determined by using flip number and energy metrics for false discovery rate (FDR) calculations. A flip metric for each chromosome was calculated by applying the following function over all genes in a chromosome: , where F is total number of flips, N is number of genes, and h is gene expression fold-change between two conditions taken from RNAseq experiments. A significantly low number of flips in the chromosome indicates the likely presence of GEDDs. The energy metric is defined as a sum of every gene interacting with its neighbouring genes: where E is total energy. A significantly high energy indicates the likely presence of GEDDs on a chromosome. An FDR calculation was performed by permuting the order of all genes in a given chromosome 100,000 times and then calculating flip and energy metrics for each permutation. This action creates normal distributions of these two metrics for each chromosome. Flip and energy metric values of the original gene order were placed within these distributions and were considered to be significant if they lay >2 SDs from the mean (FDR < 4.55% of event occurring randomly). Hsa21 was excluded from this analysis of human DS fibroblast data and genes in the duplicated region of Mmu16 were excluded from analysis of Dp1Tyb MEFs and hippocampus.
Alignment of GEDDs with other genomic elements
A GEDD was defined as a set of two or more expressed, adjacent and similarly regulated (either positive or negative log2[fold change] values) genes. The boundaries of GEDDs were defined as being half way between two genes with opposite directions of fold change since we have no way of defining the location of these boundaries more precisely. The boundaries of GEDDs from MEFs were compared to the boundaries of TADs (Gene Expression Omnibus (GEO): GSE104367)36, LADs (GEO: GSE17051)37 and replication domains (GEO: GSM450292)38, all from MEFs and to CTCF-binding sites from MEFs (GEO: GSE104427)36. The boundaries of GEDDs from mouse hippocampus were compared to the boundaries of TADs from the mouse cortex (GEO: GSE35156)39, to LADs (GEO: GSE17051)37 and replication domains (GEO: GSM450284)40 from mouse neural precursor cells and to CTCF-binding sites from the mouse hippocampus (GEO: GSE84174)41. For each of these chromatin features, their relative distance to a GEDD or TAD boundary was calculated within a window of ±2.5 Mb around the boundary. The boundary orientation was considered for these two domains. A chromatin feature identified downstream of the start of the GEDD or TAD domain or upstream of the end of the domain was given a positive distance, whereas a feature upstream of the start of the domain or downstream of the end of the domain was given a negative distance. The Gaussian kernel density estimates were computed for the relative distances using the density function with default parameters in the stats R package.
Numbers of genes
Numbers of coding genes were determined using Biomart in Ensembl (mouse genome assembly GRCm38.p5) filtering for protein-coding as gene type.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Nick Luscombe for helpful discussions, and the Advanced Sequencing Facility and the Biological Research Facility of the Francis Crick Institute for sequencing and animal husbandry, respectively. We thank Rasim Barutcu for TAD boundary data. V.L.J.T. and E.M.C.F. were supported by the Wellcome Trust (grants 080174, 098327 and 098328) and V.L.J.T. was supported by the UK Medical Research Council (Programme U117527252) and by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001194), the UK Medical Research Council (FC001194) and the Wellcome Trust (FC001194). J.B. was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC100051), the UK Medical Research Council (FC100051) and the Wellcome Trust (FC100051). E.P. was supported by the BBSRC (FBAFG 509872).
Author contributions
N.A., N.D., J.T. and F.W. carried out the experiments. H.A. and E.P. analysed the data. E.L.-E. and S.W.-S. generated the mouse strain used in this work. H.A., N.A. and V.L.J.T. wrote the paper. J.B., K.P., E.M.C.F. and V.L.J.T. supervised the study.
Data availability
All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request. All RNAseq data has been deposited in the Gene Expression Omnibus, accession number GSE109295. A reporting summary for this article is available as a Supplementary Information file.
Code availability
Scripts for calculating flips and energy are freely available on GitHub (https://github.com/evahelena/GEDDs_paper).
Competing interests
The authors declare no competing interests.
Footnotes
Journal peer review information: Nature Communications thanks Benjamin Pope and other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Helena Ahlfors, Nneka Anyanwu.
Contributor Information
Elizabeth M. C. Fisher, Email: elizabeth.fisher@ucl.ac.uk
Victor L. J. Tybulewicz, Email: Victor.T@crick.ac.uk
Supplementary information
Supplementary Information accompanies this paper at 10.1038/s41467-019-10129-9.
References
- 1.Antonarakis SE, Lyle R, Dermitzakis ET, Reymond A, Deutsch S. Chromosome 21 and down syndrome: from genomics to pathophysiology. Nat. Rev. Genet. 2004;5:725–738. doi: 10.1038/nrg1448. [DOI] [PubMed] [Google Scholar]
- 2.Antonarakis SE. Down syndrome and the complexity of genome dosage imbalance. Nat. Rev. Genet. 2017;18:147–163. doi: 10.1038/nrg.2016.154. [DOI] [PubMed] [Google Scholar]
- 3.Letourneau A, et al. Domains of genome-wide gene expression dysregulation in Down’s syndrome. Nature. 2014;508:345–350. doi: 10.1038/nature13200. [DOI] [PubMed] [Google Scholar]
- 4.Lana-Elola, E. et al. Genetic dissection of Down syndrome-associated congenital heart defects using a new mouse mapping panel. Elife5, 10.7554/eLife.11614 (2016). [DOI] [PMC free article] [PubMed]
- 5.Watson-Scales S, et al. Analysis of motor dysfunction in Down Syndrome reveals motor neuron degeneration. PLoS Genet. 2018;14:e1007383. doi: 10.1371/journal.pgen.1007383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dixon JR, Gorkin DU, Ren B. Chromatin domains: the unit of chromosome organization. Mol. Cell. 2016;62:668–680. doi: 10.1016/j.molcel.2016.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rowley MJ, Corces VG. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 2018;19:789–800. doi: 10.1038/s41576-018-0060-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Duchon A, et al. Identification of the translocation breakpoints in the Ts65Dn and Ts1Cje mouse lines: relevance for modeling down syndrome. Mamm. Genome. 2011;22:674–684. doi: 10.1007/s00335-011-9356-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yeomans, J. M. Statistical Mechanics of Phase Transitions (Oxford Science Publications, Oxford, 1992).
- 10.O’Doherty A, et al. An aneuploid mouse strain carrying human chromosome 21 with Down syndrome phenotypes. Science. 2005;309:2033–2037. doi: 10.1126/science.1114535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Morice E, et al. Preservation of long-term memory and synaptic plasticity despite short-term impairments in the Tc1 mouse model of Down syndrome. Learn Mem. 2008;15:492–500. doi: 10.1101/lm.969608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Witton J, et al. Hippocampal circuit dysfunction in the Tc1 mouse model of Down syndrome. Nat. Neurosci. 2015;18:1291–1298. doi: 10.1038/nn.4072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yu T, et al. Effects of individual segmental trisomies of human chromosome 21 syntenic regions on hippocampal long-term potentiation and cognitive behaviors in mice. Brain Res. 2010;1366:162–171. doi: 10.1016/j.brainres.2010.09.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang L, et al. Human chromosome 21 orthologous region on mouse chromosome 17 is a major determinant of Down syndrome-related developmental cognitive deficits. Hum. Mol. Genet. 2014;23:578–589. doi: 10.1093/hmg/ddt446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Belichenko PV, et al. Down syndrome cognitive phenotypes modeled in mice trisomic for all HSA 21 homologues. PLoS ONE. 2015;10:e0134861. doi: 10.1371/journal.pone.0134861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang X, et al. Genetic dissection of the Down syndrome critical region. Hum. Mol. Genet. 2015;24:6540–6551. doi: 10.1093/hmg/ddv364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dierssen M. Down syndrome: the brain in trisomic mode. Nat. Rev. Neurosci. 2012;13:844–858. doi: 10.1038/nrn3314. [DOI] [PubMed] [Google Scholar]
- 18.Mowery CT, et al. Trisomy of a Down syndrome critical region globally amplifies transcription via HMGN1 overexpression. Cell Rep. 2018;25:1898–1911 e5. doi: 10.1016/j.celrep.2018.10.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Olson LE, Richtsmeier JT, Leszl J, Reeves RH. A chromosome 21 critical region does not cause specific Down syndrome phenotypes. Science. 2004;306:687–690. doi: 10.1126/science.1098992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Newman R, et al. Maintenance of the marginal-zone B cell compartment specifically requires the RNA-binding protein ZFP36L1. Nat. Immunol. 2017;18:683–693. doi: 10.1038/ni.3724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nora EP, et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012;485:381–385. doi: 10.1038/nature11049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shen Y, et al. A map of the cis-regulatory sequences in the mouse genome. Nature. 2012;488:116–120. doi: 10.1038/nature11243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Flavahan WA, et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature. 2016;529:110–114. doi: 10.1038/nature16490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pope BD, et al. Topologically associating domains are stable units of replication-timing regulation. Nature. 2014;515:402–405. doi: 10.1038/nature13986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.van Steensel B, Belmont AS. Lamina-associated domains: links with chromosome architecture, heterochromatin, and gene repression. Cell. 2017;169:780–791. doi: 10.1016/j.cell.2017.04.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Do LH, Mobley WC, Singhal N. Questioned validity of gene expression dysregulated domains in Down’s syndrome. F1000Res. 2015;4:269. doi: 10.12688/f1000research.6735.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hibaoui Y, et al. Modelling and rescuing neurodevelopmental defect of Down syndrome using induced pluripotent stem cells from monozygotic twins discordant for trisomy 21. EMBO Mol. Med. 2014;6:259–277. doi: 10.1002/emmm.201302848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hurst LD, Pal C, Lercher MJ. The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 2004;5:299–310. doi: 10.1038/nrg1319. [DOI] [PubMed] [Google Scholar]
- 29.Woo YH, Walker M, Churchill GA. Coordinated expression domains in mammalian genomes. PLoS ONE. 2010;5:e12158. doi: 10.1371/journal.pone.0012158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Szczepinska T, Pawlowski K. Genomic positions of co-expressed genes: echoes of chromosome organisation in gene expression data. BMC Res. Notes. 2013;6:229. doi: 10.1186/1756-0500-6-229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ewels P, Magnusson M, Lundin S, Kaller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hart T, Komori HK, LaMere S, Podshivalova K, Salomon DR. Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics. 2013;14:778. doi: 10.1186/1471-2164-14-778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Barutcu AR, Maass PG, Lewandowski JP, Weiner CL, Rinn JL. A TAD boundary is preserved upon deletion of the CTCF-rich Firre locus. Nat. Commun. 2018;9:1444. doi: 10.1038/s41467-018-03614-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Peric-Hupkes D, et al. Molecular maps of the reorganization of genome-nuclear lamina interactions during differentiation. Mol. Cell. 2010;38:603–613. doi: 10.1016/j.molcel.2010.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hiratani I, et al. Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome Res. 2010;20:155–169. doi: 10.1101/gr.099796.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hiratani I, et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 2008;6:e245. doi: 10.1371/journal.pbio.0060245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sams DS, et al. Neuronal CTCF is necessary for basal and experience-dependent gene regulation, memory formation, and genomic structure of BDNF and Arc. Cell Rep. 2016;17:2418–2430. doi: 10.1016/j.celrep.2016.11.004. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request. All RNAseq data has been deposited in the Gene Expression Omnibus, accession number GSE109295. A reporting summary for this article is available as a Supplementary Information file.
Scripts for calculating flips and energy are freely available on GitHub (https://github.com/evahelena/GEDDs_paper).