Abstract
Down syndrome is associated with genome-wide perturbation of gene expression, which may be mediated by epigenetic changes. We perform an epigenome-wide association study on neonatal bloodspots comparing 196 newborns with Down syndrome and 439 newborns without Down syndrome, adjusting for cell-type heterogeneity, which identifies 652 epigenome-wide significant CpGs (P < 7.67 × 10−8) and 1,052 differentially methylated regions. Differential methylation at promoter/enhancer regions correlates with gene expression changes in Down syndrome versus non-Down syndrome fetal liver hematopoietic stem/progenitor cells (P < 0.0001). The top two differentially methylated regions overlap RUNX1 and FLI1, both important regulators of megakaryopoiesis and hematopoietic development, with significant hypermethylation at promoter regions of these two genes. Excluding Down syndrome newborns harboring preleukemic GATA1 mutations (N = 30), identified by targeted sequencing, has minimal impact on the epigenome-wide association study results. Down syndrome has profound, genome-wide effects on DNA methylation in hematopoietic cells in early life, which may contribute to the high frequency of hematological problems, including leukemia, in children with Down syndrome.
Subject terms: Methylation analysis, Haematopoiesis, DNA methylation, Epidemiology
Down syndrome has a high co-morbidity with immune and hematopoietic disorders. Here, the authors perform an epigenome-wide association study in newborns with and without Down syndrome to find differential methylation across the genome, including in hematopoietic regulators RUNX1 and FLI1.
Introduction
Down syndrome (DS), caused by constitutive trisomy of chromosome 21 (T21), is one of the most common genetic disorders1, and is associated with a spectrum of adverse phenotypes2. DS is characterized by defects in immune system development and in hematopoiesis, with DS fetuses having perturbed megakaryocyte/red cell and B-lymphoid development3 and DS children having a higher frequency of lymphopenia4 and infections5. Furthermore, children with DS have a 20–30-fold increased risk of acute lymphoblastic leukemia (ALL) and a 500-fold increased risk of acute megakaryoblastic leukemia (AMKL), while displaying a decreased risk of common adult-onset solid tumors6,7. Approximately 10% of DS newborns present with transient abnormal myelopoiesis (TAM), a preleukemic disorder associated with increased peripheral blood blast cells and pathognomonic somatic mutations in the X-linked erythro-megakaryocytic transcription factor (TF) gene GATA18. A further 15–20% have acquired GATA1 mutations without clinical features, so-called “Silent TAM8.” TAM and Silent TAM resolve spontaneously in most cases, but up to 20% acquire additional oncogenic mutations and develop frank AMKL9,10.
DS-related phenotypes vary greatly in presentation and penetrance2,11, and understanding the biological basis of that variation may highlight novel therapeutic approaches, and shed light on the etiology of these conditions in non-DS individuals11. Altered expression of genes, both on Hsa21 and genome-wide, is widely accepted to play a key role in the manifestation of DS-related phenotypes, many of which originate prenatally3. Several studies, including in monozygotic twins discordant for T21, provide strong evidence for the effect of T21 on the human transcriptome12,13, with substantial interindividual variability in expression patterns14.
Studying baseline epigenetic effects of T21 at birth is a powerful approach to pinpoint broad epigenetic landscapes and/or individual genes that underlie DS-related phenotypes. Nevertheless, comprehensive analysis of genome-wide DNA methylation changes in DS is lacking; previous studies comprised very few (N < 30) individuals, did not explore interethnic differences, nor account for the potential impact of somatic GATA1 mutation-harboring clones15–19.
Here, we investigate T21-associated changes in DNA methylation among 196 DS and 439 non-DS newborn blood samples, and consider the potential confounding effects of somatic GATA1 mutations in DS newborns assessed by targeted sequencing. Our epigenome-wide association study (EWAS) of DS identifies 652 significant CpGs and 1052 differentially methylated regions (DMRs) associated with DS, including significant hypermethylation at promoter regions of RUNX1 and FLI1, both critical regulators of hematopoiesis. Further, we find that differential methylation at regulatory regions in newborns with DS correlates with gene expression patterns in DS fetal liver (FL) hematopoietic stem and progenitor cells (HSPC). This is the first multiethnic study of its kind and the largest epigenome-wide analysis of DS patients to-date, revealing insights into the etiology of DS-related phenotypes.
Results
High-quality genome-wide DNA methylation data were obtained for 196 DS and 439 non-DS newborns using Illumina Infinium MethylationEPIC Beadchip genome-wide arrays, including 651,772 CpGs on autosomes in our analyses, with an average 99.9% CpGs with a detection P value < 0.01. Genome-wide copy-number analysis confirmed T21 in all DS newborns (Supplementary Fig. 1) and euploidy in all but one non-DS individual, who was excluded from subsequent analyses. Study characteristics of the 635 newborns (N = 357 Latinos, 178 non-Latino whites, 55 Asians, 34 non-Latino blacks, and 11 other) are presented in Table 1. DS newborns had a slightly lower mean gestational age at birth (P = 0.04) and birth weight (P = 0.001) than non-DS newborns, and a higher frequency of DS newborns were preterm (P = 0.0004) and/or small-for-gestational age (P < 0.0001) than non-DS newborns (Table 1). Age at sampling was higher in DS newborns, the majority being sampled on day 3 of life compared to day 2 for non-DS neonates (P < 0.0001).
Table 1.
Non-DS (N = 439) N (%) |
DS (N = 196) N (%) |
P value | |
---|---|---|---|
Sex | |||
Female | 182 (41.5%) | 106 (54.1%) | 0.0034a |
Male | 257 (58.5%) | 90 (45.9%) | |
Race/ethnicity | |||
Asian | 38 (8.7%) | 17 (8.7%) | 0.00044a |
Latino | 253 (57.6%) | 104 (53.1%) | |
Non-Latino white | 124 (28.2%) | 54 (27.6%) | |
Non-Latino black | 13 (3.0%) | 21 (10.7%) | |
Other | 11 (2.5%) | 0 (0%) | |
Blood collection age (days) | |||
Mean (SD) | 1.33 (±0.70) | 2.49 (±2.04) | <0.0001b |
Median (range) | 1.13 (0–5.25) | 1.75 (0–15.3) | |
Missing | 3 (0.7%) | 5 (2.6%) | |
Gestational age (weeks) | |||
Mean (SD) | 39.2 (±2.0) | 38.2 (±2.2) | 0.041b |
Median (range) | 39.4 (26.4–44.7) | 38.3 (26.4–44.7) | |
Preterm (<37 weeks) | 44 (10.6%) | 39 (22.0%) | 0.0004a |
Missing | 23 (5.2%) | 19 (9.7%) | |
Birth weight (kg) | |||
Mean (SD) | 3.39 (±0.55) | 3.01 (±0.73) | 0.001b |
Median (range) | 3.41 (1.04–5.05) | 3.01 (0.96–8.65) | |
Small-for-gestational agec | 24 (6.1%) | 33 (19.3%) | <0.0001a |
Missing | 0 (0%) | 6 (3.1%) | |
Birth year | |||
Median (range) | 2004 (2000–2008) | 1998 (1996–1999) | <0.0001d |
Missing | 3 (0.7%) | 2 (1.0%) |
kg kilogram, SD standard deviation, DS Down syndrome.
aP values calculated by two-sided Fisher’s exact test.
bP values calculated by linear regression with the birth-related variable as the dependent variable, DS status as the independent variable, and adjusting for the remaining birth-related variables, sex, plate, and race/ethnicity.
cSmall-for-gestational age calculated according to the sex- and gestational age-based intrauterine growth curves previously developed using US data85. Note we were not able to calculate this for newborns born >42 weeks due to limitations of the reference data.
dP value calculated by the two-sided Wilcoxon rank-sum test.
DNA methylation-based clustering separates DS and non-DS newborns
Visualization of principal components analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) plots generated from genome-wide DNA methylation data, excluding CpG probes on sex chromosomes and Hsa21 and CpGs overlapping single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) > 0.05, revealed clear separation of DS and non-DS newborns (Fig. 1). Intriguingly, a subset of 34 DS newborns departed from the DS cluster in the t-SNE plot, and the first PC (explaining 33.2% of overall variance) also stratified these DS newborns from the remainder. Unsupervised hierarchical clustering of the top 2000 most variable CpGs genome-wide (excluding chromosomes 21, X, and Y) resulted in similar grouping of subjects, with the first branch split separating DS from non-DS newborns, and the second split separating the same subset of 34 DS newborns among DS (Fig. 2a). Differences in blood cell proportions inferred from genome-wide DNA methylation data were seen between the three groups, as described in detail below. Two DS newborns clustered with non-DS newborns (Figs. 1 and 2) and visual inspection of copy-number plots revealed that both were likely mosaic for T21 (Supplementary Fig. 2).
Targeted sequencing confirms high frequency of acquired GATA1 variants in DS neonates
We detected 34 somatic GATA1 mutations in 30 out of 184 (16.3%) DS newborns assessed by targeted sequencing (Supplementary Data 1). The 34 mutations displayed a wide range of variant allele frequencies (VAF: 0.96–96.1%). The mean VAF of predicted functional mutations (26.3%) was significantly higher than that of nonfunctional (noncoding/synonymous) somatic GATA1 variants (VAF = 1.7%, P = 0.0034) (Supplementary Fig. 3). There was no significant association between the presence/absence or VAF of GATA1 mutations and sex, race/ethnicity, birth weight, gestational age, or age at blood collection (Supplementary Table 1). In the hierarchical clustering, DS newborns with GATA1 mutations with higher VAFs clustered together (Fig. 2b).
Deconvolution of blood cell proportions
Next, we used reference-based cell-type deconvolution to address whether differences in DNA methylation between DS and non-DS neonates might reflect, or be confounded by, differences in the peripheral blood cellular composition. This confirmed several of the previously reported differences in neonatal blood cell proportions in DS compared with non-DS fetuses and newborns8,20 (Fig. 2a and Supplementary Fig. 4a), with similar patterns in Latino and non-Latino white newborns (Supplementary Fig. 5), including higher proportions of erythroblasts (nucleated red blood cells (nRBCs)) (P = 4.45 × 10−65) and lower proportions of B lymphocytes (P = 3.48 × 10−28) and T lymphocytes (CD4+ T lymphocytes) (P = 2.26 × 10−53) (Supplementary Fig. 4a and Supplementary Table 2)8,20. As the most dramatic difference was a subpopulation of DS newborns (N = 34, 17.3%) with a high proportion of erythroblasts (>25%), which clustered separately in Figs. 1 and 2, we next considered whether this identified the neonates with GATA1 mutations. However, although we found a higher frequency of GATA1 mutations in the newborns with increased erythroblasts compared to those with normal erythroblasts (12/33, 36.4% versus 18/151, 11.9%, P = 0.0015), the separate clustering of these cases is not primarily due to their GATA1 mutation status (Supplementary Figs. 4b, 6 and Supplementary Table 2). Interestingly, deconvolution analysis suggested lower proportions of monocytes (P = 3.75 × 10−23) and granulocytes (P = 8.10 × 10−32) in DS neonates. As previous studies show increased monocytes and granulocytes in DS newborns, this likely reflects the limitations of deconvolution analysis where atypical cells are present, such as blast cells and dysplastic cells, which are common in DS neonates8, and lack a suitable reference “methylome” library. Taken together, the differences in peripheral blood cell composition support the use of robust adjustment for cell-type heterogeneity in our EWAS of DS. Rather than adjusting for the blood cell proportions estimated from our reference-based deconvolution described above, we opted to include components calculated using the reference-free, sparse PCA algorithm ReFACTor (see “Methods”) as covariates in our EWAS models.
Epigenome-wide significant CpGs associated with DS
To investigate the biological significance of the epigenome-wide changes in DS neonatal blood cells, we next assessed differential methylation of CpGs on autosomal chromosomes, including Hsa21, adjusted for variation in cell-type proportions, sex, and ancestry-informative PCs. A total of 652 DS-associated CpGs were detected following Bonferroni correction (P < 7.67 × 10−8, Fig. 3a and Supplementary Data 2), with 319/652 (48.9%) CpGs hypermethylated and 333/652 (51.1%) hypomethylated in DS compared with non-DS newborns. The pattern of DNA methylation was distinctly different for CpGs on Hsa21; the majority of significantly differentially methylated CpGs (64/79, 81.0%) were hypomethylated, whereas on other chromosomes, the proportions of hypomethylated CpGs were on average much lower (median: 43.5%, range: 16.0–80.0%). In addition, when considering all CpGs included in the EWAS, a significantly higher proportion of probes on Hsa21 were hypomethylated (4425/7351, 60.2%) compared with probes on all other autosomes combined (356,222/644,421, 55.3%, P < 0.0001, Chi-squared test), which was particularly the case in shores and shelves but not in CpG islands themselves (Supplementary Table 3).
Remarkably, 13 of 15 hypermethylated and epigenome-wide significant Hsa21 CpGs overlapped RUNX1, at the proximal P2 promoter (Supplementary Fig. 7). RUNX1 was also identified when we considered the epigenome-wide significant CpGs that overlapped known genes (N = 357); of the top 20 significant CpGs, overlapping 11 unique genes, the three with the largest methylation differences (beta coefficients >0.38) were all located in RUNX1 (Table 2 and Supplementary Data 2). The other ten genes included the megakaryocytic gene FLI, SH3D21, and KIAA0087 (each with two CpGs), and DST, VSIG2, KLF16, OLFML1, SETD3, CELF3, and NOL10 (one CpG). Of the four intergenic CpGs, two overlap a putative enhancer of HES1 (Table 2)21. We repeated linear regression analyses for the top significant CpGs in RUNX1 (cg12477880) and FLI1 (cg17239923), genes that are known regulators of hematopoiesis, adjusting for deconvoluted blood cell proportions instead of ReFACTor components (see “Methods”), and the associations with DS remained highly significant (both P < 2.0 × 10−16).
Table 2.
Chr | Position (hg19) | Gene | Probe | Beta coefficienta | P valuea | CpG island overlap |
---|---|---|---|---|---|---|
7 | 26577897 | KIAA0087 (lncRNA) | cg07741821 | −0.289 | 1.26 x 10−39 | |
1 | 36786777 | SH3D21 | cg02993069 | 0.198 | 1.14 x 10−25 | Yes |
21 | 36259241 | RUNX1 | cg12477880 | 0.383 | 2.32 x 10−25 | Yes |
6 | 56607099 | DST | cg08882472 | 0.157 | 6.14 x 10−25 | |
11 | 124621829 | VSIG2 | cg24942416 | −0.175 | 7.33 x 10−24 | |
10 | 85363826 | Intergenic | cg07841633 | −0.329 | 4.21 x 10−23 | |
21 | 36259383 | RUNX1 | cg00994804 | 0.388 | 9.36 x 10−22 | Yes |
3 | 193988737 | Intergenic (HES1 enhancer) | cg11218872 | −0.121 | 1.05 x 10−21 | |
7 | 26578098 | KIAA0087 (lncRNA) | cg02451831 | −0.159 | 2.67 x 10−21 | |
19 | 1851882 | KLF16 | cg13382072 | 0.180 | 4.07 x 10−21 | Yes |
8 | 37575051 | Intergenic | cg24020235 | 0.188 | 7.30 x 10−21 | |
11 | 128556611 | FLI1 | cg17239923 | 0.232 | 1.21 x 10−20 | |
11 | 7519636 | OLFML1 | cg19030331 | 0.165 | 6.05 x 10−20 | |
21 | 36258497 | RUNX1 | cg03142697 | 0.396 | 2.15 x 10−19 | |
14 | 99880641 | SETD3 | cg24999883 | −0.067 | 2.85 x 10−19 | |
1 | 36786615 | SH3D21 | cg12679760 | 0.183 | 9.10 x 10−19 | Yes |
3 | 193988507 | Intergenic (HES1 enhancer) | cg23719650 | −0.107 | 1.11 x 10−18 | |
2 | 10830636 | NOL10 | cg11972401 | 0.058 | 2.10 x 10−18 | |
1 | 151672762 | CELF3 | cg23565347 | −0.092 | 2.15 x 10−18 | |
11 | 128556341 | FLI1 | cg19765472 | 0.207 | 2.42 x 10−18 |
aP values (not adjusted for multiple comparisons) and beta coefficients calculated in the multiethnic EWAS of Down syndrome, using linear regression adjusting for sex, plate, the first ten ReFACToR principal components (PCs), and the first ten EPISTRUCTURE PCs.
Neither removal of DS newborns with GATA1 mutations (N = 30) nor those with high erythroblasts (N = 34) affected EWAS results substantially, and ethnicity-stratified analyses showed similar results, with 622/652 (95.4%) epigenome-wide significant CpGs showing the same direction of effect in both Latinos and non-Latino whites (Supplementary Data 2). In addition, in a subset of newborns with available birth weight and gestational age information (176 DS and 416 non-DS), we repeated the EWAS adjusting for these birth variables, but again, the results were not substantially altered (Supplementary Data 2). In a sex-stratified EWAS, no chromosome X CpGs were epigenome-wide significant in females, and 2 significant CpGs in males did not replicate in females (Supplementary Data 2).
We also determined the overlap of DS-associated CpGs with genomic locations and functional elements. Hypomethylated CpGs were significantly underrepresented at gene promoters and CpG islands (Supplementary Fig. 8 and Supplementary Data 3). Hypermethylated CpGs were significantly enriched at binding sites for NFE2, MAFF, MAFK, and BACH1, TFs that form components of a key erythro-megakaryocyte regulatory network (Supplementary Data 3). Hypermethylated CpGs were also significantly enriched at DNase I hypersensitive sites (DHS) (P = 1.67 × 10−16), at H3K4me1- and H3K4me3-binding sites in hematopoietic stem cells (HSCs, both P < 5.73 × 10−10), and at enhancer loci in HSCs (P < 3.27 × 10−6), whereas hypomethylated CpGs were significantly enriched at H3K36me3 sites (P = 9.92 × 10−11) (Supplementary Data 3), altogether indicating repression of gene expression.
Pathway analysis of genes overlapped by epigenome-wide significant CpGs revealed significant enrichment for 62 Gene Ontology (GO) terms, the majority related to hematopoiesis and immune function, and 12 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, with hematopoietic cell lineage displaying the strongest enrichment in addition to several immune-related pathways (Supplementary Data 4).
We next assessed whether previously reported DS-associated CpGs were replicated. In two previous studies, from Bacalini et al. and Henneman et al.16,17, there were 111 DS-associated CpGs with concordant directions of effect. Of these, 97 were present on the EPIC array and passed quality control (QC) filtering, and 74/97 were associated with DS at P < 0.05 and all with the same direction of effect (Supplementary Data 5).
Finally, an EWAS of GATA1 mutations (presence/absence) in DS newborns revealed 13 epigenome-wide significant CpGs (Supplementary Data 6), 12 of which were hypomethylated in GATA1 mutation-positive DS newborns, although all had beta coefficients <0.10. No sex chromosome CpGs were associated with GATA1 mutations in females or males.
DMRs associated with DS
We identified 1052 DMRs associated with DS across the genome (Fig. 3b and Supplementary Data 7), following adjustment for variation in cell-type proportions, sex, and ancestry-informative PCs. DMRs were identified on all chromosomes, with a particularly high proportion (11.2%) on Hsa21 (Supplementary Fig. 9). The 1052 DMRs overlapped 943 unique genes, and 291/1052 (27.7%) overlapped promoter regions. The top 20 most significant DMRs (Table 3) overlapped 17 genes, with the top two again including the key hematopoietic TF genes RUNX1 (P = 2.30 × 10−84) and FLI1 (P = 1.65 × 10−78). The FLI1 DMR overlapped the promoter of transcript variant 4, which is largely expressed in cord blood megakaryocytes relative to other blood cell types, a pattern not found for other FLI1 transcript variants in gene expression data in BLUEPRINT (Supplementary Fig. 10)22. The top 20 DMRs remained significant following removal of DS newborns with high erythroblasts (N = 34) or removal of GATA1 mutation-positive individuals (N = 30), and in ethnicity-stratified analyses with the exception of CCDC17, ANAPC2, and MIR1224, which were not detected in non-Latino whites.
Table 3.
Chr | Start (hg19) | End (hg19) | Length (bp) | Gene | Mean beta differencea | P valueb | Number of CpGsb | DMR location | Enhancer overlapc | GWAS Catalog SNP overlap (DMR +/−50 Kb) |
---|---|---|---|---|---|---|---|---|---|---|
21 | 36258497 | 36259694 | 1198 | RUNX1 | 0.273 | 2.30 x 10−84 | 11 | Promoter | K562, CD34, astrocytes, hippocampus | Eosinophil counts; Eosinophil % of granulocytes; Eosinophil % of white cells; male-pattern baldness; mean corpuscular hemoglobin; red blood cell count; red cell distribution width; sum eosinophil basophil counts |
11 | 128553855 | 128557589 | 3735 | FLI1 | 0.132 | 1.65 x 10−78 | 19 | Promoter | CD34 | Digit length ratio (right hand); Eosinophil counts; height; myopia (pathological); platelet count; plateletcrit; white blood cell count (basophil) |
22 | 51016501 | 51017723 | 1223 | CPT1B | 0.210 | 2.37 x 10−54 | 15 | Promoter | Blood metabolite levels; blood protein levels; chronic lymphocytic leukemia; mean corpuscular hemoglobin; mean corpuscular volume; multiple sclerosis; red blood cell count; red cell distribution width; reticulocyte count | |
7 | 26577897 | 26578098 | 202 | KIAA0087 (lncRNA) | −0.148 | 5.81 x 10−49 | 2 | Gene body | Heart rate response to recovery post exercise; mean corpuscular hemoglobin; red cell distribution width | |
1 | 36786285 | 36788627 | 2343 | SH3D21 | 0.084 | 2.73 x 10−47 | 10 | Enhancer | K562, CD34, astrocytes | Height; lung function (FEV1/FVC) |
2 | 98377791 | 98378782 | 992 | TMEM131 | −0.128 | 1.40 x 10−43 | 5 | Enhancer | Autoimmune traits; diastolic blood pressure; educational attainment (years of education); highest math class taken; hypothyroidism; medication use (thyroid preparations) | |
3 | 193988507 | 193988737 | 231 | Intergenic (HES1 enhancer) | −0.050 | 8.29 x 10−41 | 3 | Enhancer | Hippocampus, brain inferior temporal lobe | None |
19 | 1851750 | 1851995 | 246 | KLF16 | 0.163 | 1.42 x 10−39 | 3 | Enhancer | CD34, CD19 B cells, hippocampus | Body mass index; cardiovascular disease; cognitive performance; educational attainment (years of education); highest math class taken; lung function (FEV1/FVC); mean corpuscular hemoglobin; mean corpuscular volume; menarche (age at onset); red blood cell count; red cell distribution width |
11 | 124621829 | 124622348 | 520 | VSIG2 | −0.095 | 1.16 x 10−37 | 5 | Promoter | CD19 B cells, brain inferior temporal lobe | Autism spectrum disorder or schizophrenia; blood protein levels; cognitive ability, years of educational attainment or schizophrenia (pleiotropy); schizophrenia; smoking initiation |
1 | 46088336 | 46090107 | 1772 | CCDC17 | −0.084 | 5.20 x 10−34 | 11 | Promoter | Blood metabolite levels; estimated glomerular filtration rate; height; hemoglobin concentration | |
2 | 47382427 | 47382903 | 477 | STPG4 | −0.056 | 7.76 x 10−34 | 7 | Promoter | Height | |
15 | 40571997 | 40572794 | 798 | ANKRD63 | −0.175 | 3.06 x 10−32 | 5 | Intergenic | Height; schizophrenia; type 2 diabetes | |
15 | 52944233 | 52944386 | 154 | FAM214A | 0.117 | 5.82 x 10−30 | 4 | Promoter | K562, CD34, astrocytes, hippocampus, brain inferior temporal lobe, brain mid frontal lobe | None |
3 | 183959000 | 183959853 | 854 | MIR1224 | 0.099 | 7.64 x 10−30 | 9 | Promoter | Body mass index; highest math class taken; mean corpuscular hemoglobin; menarche (age at onset) | |
16 | 15787920 | 15787957 | 38 | NDE1 | −0.105 | 1.49 x 10−29 | 3 | Gene body | Cognitive ability, years of educational attainment or schizophrenia (pleiotropy) | |
2 | 54086854 | 54087552 | 699 | ASB3 | −0.115 | 1.98 x 10−29 | 13 | Promoter | Anorexia nervosa; heel bone mineral density | |
9 | 140066754 | 140068071 | 1318 | ANAPC2 | 0.012 | 3.58 x 10−27 | 6 | Intergenic | Estimated glomerular filtration rate; height; male-pattern baldness; mean corpuscular hemoglobin; mean corpuscular volume; red blood cell count; reticulocyte count | |
17 | 58499679 | 58500186 | 508 | C17orf64 | 0.059 | 1.50 x 10−26 | 8 | Promoter | None | |
5 | 78985434 | 78986160 | 727 | CMYA5 | −0.135 | 8.33 x 10−26 | 10 | Promoter | Height | |
15 | 75641081 | 75641171 | 91 | NEIL1 | 0.015 | 4.19 x 10−25 | 3 | Enhancer | CD19 B cells, brain inferior temporal lobe | Estimated glomerular filtration rate; serum uric acid levels |
4 | 81117647 | 81119473 | 1827 | PRDM8 | 0.174 | 7.93 x 10−14 | 13 | Promoter | Hippocampus, brain inferior temporal lobe | Atrial fibrillation; blood pressure; blood pressure × alcohol consumption interaction; diastolic blood pressure (cigarette smoking interaction); estimated glomerular filtration rate; hypertension; male-pattern baldness |
aMean beta difference between DS and non-DS subjects for DMRs calculated by DMRcate.
bP values (Šidák-corrected) and number of CpGs calculated using the more stringent comb-p method, with the DS EWAS P values at each CpG as input.
cLimited to enhancers identified in blood or brain tissues76.
Of six DMRs with at least ten CpG probes and with mean Δβ-value >0.10 (Table 3), two overlapped regulatory regions in RUNX1 (Fig. 4a) and FLI1 (Fig. 5a), and the remaining four DMRs overlapped promoter regions of genes involved in brain development (CPT1B, CMYA5, and PRDM8) and the immune system (ASB3) (Supplementary Fig. 11 and Table 3). Assessment of genome-wide association study (GWAS) Catalog SNPs revealed that 8 of the top 20 DMRs are nearby SNPs associated with hematological traits, with 7 DMRs nearby SNPs associated with brain-related traits (Table 3 and Supplementary Data 8).
We next assessed overlap between our DS-associated DMRs with those reported in the previous largest EWAS of DS16. Of the 66 DMRs previously identified with a large β-value difference (>0.15), we also detected 37 as DMRs, all of which had concordant Δβ-value directions (Supplementary Data 5).
In addition, we identified 59 DMRs associated with GATA1 mutations in DS neonates (Supplementary Data 6); the most significant region encompassed the noncoding RNA VTRNA2-1 (P = 1.71 × 10−20), with reduced DNA methylation (mean Δβ-value = −0.115) in GATA1 mutation-positive DS newborns versus wild-type DS newborns. DNA methylation in DS newborns at the top two DS-associated DMRs, in RUNX1 and FLI1, was not driven by the presence of GATA1 mutations; in fact, at both DMRs, which were significantly hypermethylated in DS newborns, the mean methylation levels were slightly lower in GATA1 mutation-positive than in wild-type DS newborns, albeit still considerably higher than in non-DS newborns (Supplementary Figs. 12 and 13).
Gene expression changes in DS versus non-DS FL CD34+ cells
FL is the main site of hematopoiesis until birth and neonatal blood is likely derived from FL HSPC. To ascertain whether differences in genome-wide DNA methylation found in neonatal T21 blood cells correlate with differences in gene expression, we analyzed RNA-sequencing data from DS (N = 3) and non-DS (N = 3) FL HSPC. We found 587 significantly differentially expressed genes between DS and non-DS FL CD34+ cells (FDR < 0.1), of which 294 genes were upregulated and 293 downregulated in DS (Fig. 6a and Supplementary Data 9). DS-associated DMRs identified at promoter or enhancer regions in neonatal blood (N = 729, Supplementary Data 7) overlapped 491 genes that demonstrated any change in expression in DS FL cells compared with non-DS cells; hypermethylation at these DMRs correlated with decreased gene expression in DS FL CD34+ cells, whereas hypomethylation correlated with increased gene expression (P < 0.0001, two-tailed Fisher’s exact test) (Fig. 6b and Supplementary Data 7), a relationship that remained when limiting to the significantly differentially expressed genes (P = 0.0002) or after excluding Hsa21 genes (P < 0.0001). Conversely, no relationship between hyper-/hypomethylation and gene expression was found for the 323 DMRs outside of promoters/enhancers (Supplementary Fig. 14).
We next took a closer look at DMRs and gene expression on Hsa21. While an extra copy of Hsa21 would predict for a 1.5-fold increased expression of the genes on this chromosome, this is often not the case suggesting that epigenetic mechanisms regulate gene expression in the context of aneuploidy. Here, as for non-Hsa21 genes, we found that Hsa21 changes in gene expression in DS fetal hematopoietic cells negatively correlated with DNA methylation status at promoter/enhancer regions in DS neonatal hematopoietic cells (Fig. 6c and Supplementary Fig. 14). Several Hsa21 genes that play a key role in hematopoiesis, such as RUNX1, ERG, DYRK1A, and ETS2 showed less than the expected 1.5-fold change when compared to normal FL, and this was accompanied by significant hypermethylation at promoters/enhancers of these genes in neonatal blood (Supplementary Data 7 and Fig. 6c).
Finally, we analyzed the expression of RUNX1 and FLI1, the top 2 genes with DMRs and both essential regulators of megakaryopoiesis, via single-cell qRT-PCR on index-sorted DS and non-DS FL myeloid progenitors with megakaryocyte–erythroid potential (Lin−CD34+CD38+CD45RA−). This showed markedly lower expression of FLI1 in DS FL myeloid progenitors (P < 0.0001), consistent with hypermethylation of the FLI1 promoter in DS neonatal blood (Fig. 5b), while RUNX1 expression was increased in DS myeloid progenitors (P < 0.0001) compared to normal FL counterparts (Fig. 4b), perhaps reflecting the significant DMR hypermethylation found at the RUNX1 P2 promoter but not the P1 promoter in DS newborns (Supplementary Fig. 7).
Discussion
We report the results from the largest, and first multiethnic, EWAS of DS in blood cell samples at birth, confirming several known loci and identifying many novel regions, including at FLI1, that were significantly differentially methylated in newborns with DS compared with newborns without DS. The 652 epigenome-wide significant CpGs and 1052 DMRs demonstrate the profound epigenome-wide consequences of T21, which likely contribute toward phenotypic variation in DS. The majority of DS-associated DNA methylation changes were found on euploid (non-21) chromosomes, as previously reported15–17, supporting that T21 results in genome-wide perturbations in gene regulation12,23. Indeed, our results from RNA sequencing of fetal DS and non-DS HSPCs support the early-life, genome-wide perturbation of gene expression in hematopoietic cells that broadly correlates with differential DNA methylation patterns in DS.
The effects of T21 on the function of RUNX1, a crucial regulator of hematopoiesis particularly in early development24, appear complex. Although DS was largely associated with hypomethylation on Hsa21, we found significant hypermethylation at RUNX1, as reported previously in DS and in DS-ALL16–18,25. We note that RUNX1 hypermethylation in DS was specific to the proximal P2 promoter26, which is thought to be the dominant regulator of RUNX1 expression during embryonic development, driving formation of the hemogenic endothelium and early hematopoiesis27,28. Dosage of RUNX1 during these early stages is tightly controlled29,30, suggesting that RUNX1 downregulation via P2 promoter hypermethylation may be required for viable embryo development in DS. The distal P1 promoter becomes active once cells commit to the hematopoietic lineage and is the predominant promoter in definitive hematopoiesis27,28, consistent with the pattern of RUNX1 expression we observed in DS FL myeloid progenitors. Promoter switching from P2 to P1 involves changes in DNA methylation at the P1 promoter but not at P2, which was found to be unmethylated across cell types31, suggesting that P2 hypermethylation is unique to DS.
The most significant DMR outside of Hsa21 overlapped FLI1, another important regulator of megakaryopoiesis32,33, specifically at the promoter of transcript variant 4 that is mainly expressed in megakaryocytes. FLI1 protein is a critical binding partner of both RUNX1 and GATA1 during terminal megakaryocyte maturation34,35 and all three proteins cooperate in transcriptional control of megakaryocyte differentiation36. Similar to RUNX1, germline loss of FLI1 has been associated with thrombocytopenia33, defects in megakaryopoiesis37, and familial platelet disorders38. We report that FLI1 expression is significantly reduced in DS FL myeloid progenitor cells. Our results support that T21 leads to epigenetic dysregulation of both RUNX1 and FLI1, which may contribute toward abnormal megakaryocyte development in DS FL cells3, and to the development of TAM and the concomitant risk of AMKL in DS infants. The etiology and timing of these epigenetic changes remain to be determined. Along with RUNX1, FLI1 is also a critical regulator of embryonic hematopoiesis24,39; thus, compensatory epigenetic downregulation of RUNX1 and FLI1 may be required for viable embryogenic development in DS, but potentially also results in increased risk of hematological malignancies.
Our results confirm previous studies that pinpointed RUNX1 as one of the most differentially methylated genes in blood in individuals with DS16,17. It is interesting that DS-associated hypermethylation at RUNX1 has also been reported in DS brain tissue18,40, supporting the early fetal origins of these epigenetic changes and potential pleiotropic effects on DS phenotypes. Indeed, RUNX1 has been shown to play a role in proliferation and differentiation of select neural progenitor cells, including in hippocampal precursor cells41,42. The overlap of DS-associated DMRs with GWAS loci for both cognitive-related and hematological traits, such as at KLF16, further supports the possibility that epigenetic dysregulation may underlie both hematologic defects and cognitive development in DS. Remarkably, two of the most significant DMRs, overlapping promoters of CPT1B and CMYA5, were recently associated with hippocampal volume in non-DS individuals; for both DMRs, the direction of DNA methylation changes in DS newborns was associated with smaller hippocampal volume43. Additional DS-associated DMRs overlapped NDE1, PRDM8, and the enhancer locus for HES1, genes that all play an important role in neurogenesis44–46.
Cell-type deconvolution revealed that DS newborns had relatively high proportions of erythroblasts, possibly indicative of intrauterine or perinatal hypoxia47, pulmonary hypertension48, or TAM49. Although no DS newborns in this study developed childhood leukemia50, targeted GATA1 sequencing identified that ~14% harbored a likely functional somatic GATA1 mutation, consistent with the observation that the majority of DS newborns with TAM and Silent TAM will not develop AMKL8. We found significant association between GATA1 mutations and higher erythroblast proportions; however, almost two-thirds of DS newborns with high erythroblast proportions did not harbor GATA1 mutations, suggesting a greater role for pre- and perinatal hypoxic conditions in contributing to this phenotype.
Nonfunctional GATA1 variants tended to have much lower VAF than functional ones, supporting that GATA1-truncating mutations confer a growth advantage to fetal hematopoietic cells and are clonally selected during development of TAM. Moreover, the true frequency at which somatic GATA1 mutations arise in utero may be higher than detected at the current limits of detection and by sampling blood at birth. The etiology of GATA1 mutations in DS remains unknown, but is potentially related to T21-associated upregulation of GATA13; increased transcription is a known cause of DNA mutagenesis51. Intriguingly, human adaptation to hypoxic conditions includes upregulation of GATA1 to drive erythropoiesis52. Thus, hypoxic intrauterine conditions in developing DS fetuses may contribute to the generation of GATA1 mutations or at least to expansion of mutant GATA1 clones53. Our EWAS of GATA1 mutations in DS revealed a DMR overlapping VTRNA2-1, a metastable epiallele at which DNA methylation levels were previously associated with the periconceptional environment54, suggesting a potential environmental role in the development of GATA1 mutations.
An important strength of our study was the use of newborn-dried bloodspots (DBS), which increased our power to detect differentially methylated loci associated with DS, as epigenetic influences of environmental exposures and age-related changes as well as drift would be much reduced compared with studies in older individuals. Our study does have some limitations. Although DBS biospecimens were all obtained from newborns in California, we did not match DS and non-DS newborns by demographic variables such as sex, race/ethnicity, or birth year. This should not have biased our findings, however, as our EWAS was adjusted for sex and principal components, and similar results were found in Latinos and non-Latino whites. Second, analytical tools such as ReFACTor and cell-type deconvolution were developed in euploid individuals, although we did confirm some known differences in blood cell proportions (using conventional cell enumeration methods) between DS and non-DS individuals. Reference-free adjustment for cell-type composition was performed in our EWAS, given the highly significant differences in estimated blood cell proportions and to maximize our power to detect epigenetic changes associated with trisomy 21; however, we cannot rule out that some of the DNA methylation changes associated with DS may reflect differences in peripheral blood cell composition between DS and non-DS newborns, and future studies should explore the epigenetic effects of DS in sorted blood cells. Finally, our study was limited to newborn whole-blood samples, and would not detect tissue-specific DNA methylation differences outside of blood that may underlie DS-related phenotypes. Studies have, however, demonstrated similarities in the epigenetic effects of T21 across tissues40, and in the use of blood DNA methylation as a biomarker for traits in other tissues, such as brain-related phenotypes43.
Our results demonstrate the profound genome-wide effects of T21 on DNA methylation, with important implications for the defects in hematopoiesis, cognition, immune function, and other developmental processes that arise in individuals with DS. Determining the etiologies of these epigenetic changes will be essential to understand and potentially ameliorate DS phenotypes. Epigenetic changes in DS may occur due to triplication of specific genes on Hsa21, such as HMGN123 or DNMT3L55, the effects of additional genomic material on three-dimensional chromatin organization, or via some compensatory mechanism triggered early in DS fetal development. One might predict compensatory hypermethylation of triplicated genes; thus, it is also important to understand why Hsa21 is largely hypomethylated in DS, and how this hypomethylation is distributed across the three copies of Hsa21. Finally, case–control studies within DS populations are required to investigate the association between epigenetic variation across tissues and the variable penetrance and expressivity of DS-related phenotypes.
Methods
Study subjects
This study was approved by Institutional Review Boards at the California Health and Human Services Agency, University of Southern California, and University of California Berkeley, and by Hammersmith and Queen Charlotte’s Hospital Research Ethics Committee (ref 04/Q0406/145). The deidentified newborn DBS from the California Biobank Program for this project (SIS request numbers 572 and 600) were obtained with a waiver of consent from the Committee for the Protection of Human Subjects of the State of California. FL samples were obtained with written consent.
DBS were obtained from 198 DS newborns, without a leukemia diagnosis by 15 years of age, from the California Biobank Program via linkage between the California Department of Public Health Genetic Disease Screening Program and California Cancer Registry50. We also obtained newborn DBS from 442 non-DS (cancer-free) children from the California Biobank Program56. Demographic and birth-related data for subjects that passed QC are summarized in Table 1. The majority of individuals were reported as Latino (N = 357) or non-Latino white (N = 178), and the remainder as African American (N = 34), Asian/Pacific Islander (N = 55), or other (N = 11).
A separate sample set was used for gene expression studies. Second-trimester FL samples were collected during elective surgical termination of pregnancy and processed immediately. Donated fetal tissue was also provided by the Human Developmental Biology Resource (www.hdbr.org) regulated by the UK Human Tissue Authority (www.hta.gov.uk).
Genome-wide DNA methylation arrays
DNA was extracted from one-third portions of each newborn DBS using the Qiagen DNA Investigator blood card protocol, and bisulfite conversion performed using Zymo EZ DNA Methylation kits. Bisulfite-converted DNA samples from DS and non-DS newborns were block-randomized (ensuring equivalent distribution of sex and race/ethnicity on all plates) and run on Illumina Infinium MethylationEPIC Beadchip genome-wide DNA methylation arrays.
DNA methylation array data processing, visualization, and annotation
For QC assessment of DNA methylation array data, we imported raw IDAT files into R and used the “minfi” package to calculate mean detection P values using the “detectionP” function. Further data QC and normalization were performed using the R package “SeSAMe,” with background correction using “noob” and using P value with out-of-band (OOB) array hybridization for removal of poor-performing probes, accounting for deleted and hyperpolymorphic regions (R version: 3.6.0)57,58. The R package “conumee” was used to generate copy-number variation (CNV) plots for all subjects to check T21 status59, with twenty randomly selected non-DS newborns used to construct a CNV reference, and subjects were removed if the reported DS status did not match T21 status based on visual inspection (one “control” appeared to have T21 and was excluded). Subjects with missingness >5% (2 DS and 2 non-DS) were removed, resulting in a final study number of 196 DS and 439 non-DS newborns. CpG probes with missingness >5% were subsequently removed (N = 137,060), and the remaining missing values imputed using “impute.knn” function from the “impute” package.
We removed probes located on chromosomes X and Y as well as CpGs located at SNP sites with a minor allelic frequency >5%, resulting in a final CpG probe set N = 651,772. Using data from this final set of probes, and also excluding probes on Hsa21, we calculated PCs in R using the “prcomp” command to create PCA plots. Additional dimensional reduction plots to visualize clustering within samples were generated using the t-SNE algorithm, with the “Rtsne” package60. Betas were converted to M values for construction of heatmaps using the “ComplexHeatmap” package61 for the 2000 CpGs with the greatest mean absolute deviation across chromosomes, excluding 21, X, and Y, and annotated for DS status, sex, and deconvoluted blood cell proportions.
All CpGs in the EWAS results were annotated using the annotation database from the “IlluminaHumanMethylationEPICanno.ilm10b4.hg19” package in R62. DMRs of interest were visualized using the “coMET” package63.
Assessment and adjustment of cell-type heterogeneity
Reference-based deconvolution of blood cell proportions in DS and non-DS newborns was performed using the Identifying Optimal Libraries algorithm64,65. We used the “estimateCellCounts2” function in the R package “FlowSorted.Blood.EPIC” and DNA methylation data from cord blood cell reference samples in the R package “FlowSorted.CordBloodCombined.450k,” to estimate proportions of monocytes, granulocytes, natural killer cells, B lymphocytes, T lymphocytes (both CD4+ and CD8+), and nRBC/erythroblasts in newborns66,67. We performed separate linear regression tests with each blood cell-type proportion as the dependent variable and DS status as the independent variable, adjusting for plate, sex, the first ten EPISTRUCTURE PCs to account for genetic ancestry (see below), gestational age, age at DBS collection, and birth weight. We also tested the association of each blood cell-type proportion with birth-related and demographic variables. Within DS newborns, we performed additional regression models to test the association between GATA1 mutations, treated either as a binary variable (presence/absence) or a linear variable (i.e., VAF), and blood cell proportions and birth variables, adjusting for plate, sex, and EPISTRUCTURE PCs as above.
To account for cell-type heterogeneity in the EWAS, we obtained Reference-Free Adjustment for Cell-Type composition (ReFACTor) PCs using the GLINT tool (v1.0.4)68,69, with the assumed number of cell types in the data (k) set to 7 to align with the reference-based approach described above, and with adjustment for plate and sex.
GATA1 mutation sequencing in DS newborns
We performed targeted sequencing of GATA1 in 184 DS newborns with sufficiently remaining DNA isolated from DBS, using methods modified from previously described protocols8,70. In brief, targeted amplification of GATA1 from isolated genomic DNA was performed in tandem, with the addition of sample barcodes and sequencing adapters. Six primer pairs generating 150–210-bp amplicons covering the entirety of exon 2 and the first 115 bases of exon 3 and including three exon/intron boundaries were individually amplified in quadruplicate following Fluidigm’s Access ArrayTM IFC 4-Primer Amplicon Tagging Workflow (Fluidigm: PN 68000161). Primer sequences are included in Supplementary Table 4. After amplification, 2 μl for each sample were pooled and PCR products purified using AMPureXP beads (Beckman Coulter). Quality and size distribution were determined using a Tapestation system (Agilent Technologies). Library concentration was determined by the Qubit dsDNA HS Assay kit (Thermo Fisher). Sequencing was performed on an Illumina MiSeq as 150 base-paired-end reads. Mapping and variant analysis were performed using an in-house pipeline generating VarScan somatic data71, as previously described8,70. VAF were manually verified and compared to in-run controls using the Integrative Genomics Viewer for visualization72. The limit of detection of mutant GATA1 sequence was 0.3–2% depending on read quality and depth of sequencing.
Epigenome-wide association analyses
GLINT was used to obtain EPISTRUCTURE PCs, adjusting for the first ten ReFACTor PCs, to account for genetic ancestry73. CpG probes on chromosomes X and Y, and at SNP sites with MAF > 5% were removed, resulting in a final probe set N = 651,772. A multiethnic EWAS of DS was performed using linear regression in R, with each CpG β-value as the dependent variable and DS status the independent variable, adjusting for sex, plate, the first ten ReFACTor PCs to adjust for cell-type proportions, and first ten EPISTRUCTURE PCs to adjust for genetic ancestry.
For sensitivity analysis to account for potential confounding of uncorrected population stratification in the overall multiethnic EWAS, we repeated the EWAS separately in the two largest self-reported race/ethnicity groups, in Latinos (DS N = 104, non-DS N = 256) and non-Latino whites (DS N = 54, non-DS N = 124). In these race/ethnicity-stratified EWAS models, we only adjusted for the first six EPISTRUCTURE PCs to reach acceptable levels of epigenomic inflation (λ = 1.46 for Latinos and 1.09 for non-Latino whites). In addition, we repeated the multiethnic EWAS following: (1) removal of the separate cluster of DS individuals (N = 34) observed in our visualization plots and with high nRBC proportions in the cell-type deconvolution analyses (Figs. 1 and 2), or (2) removal of DS newborns with any somatic GATA1 mutations identified by sequencing (N = 30).
To explore epigenetic changes associated with GATA1 mutations, we performed a separate EWAS within DS newborns with GATA1 mutation status as a binary dependent variable. Bonferroni correction was applied to correct for multiple testing (P < 7.67 × 10−8, based on 651,772 CpGs).
Gene pathway enrichment analyses were performed for genes overlapped by epigenome-wide significant CpGs using the “methylglm” function in the R package “methylGSA,” with assessment of GO and KEGG pathways74. We used a gene-list minimum size of 10 and maximum size of 500, and only considered pathways with an FDR-corrected P value < 0.05 as significant.
We also performed enrichment analysis to assess the significant overlap between epigenome-wide significant CpGs and genomic locations (i.e., promoters, exons, introns, 1–5 Kb, 3′-UTRs, 5′-UTRs, intergenic, intron/exon boundaries, CpG islands, shelves, shores, and open sea), as well as functional features, including TF-binding sites, histone modification markers, DHS, and predicted enhancer regions. The number of significant and nonsignificant CpGs overlapping each feature was compared by the Fisher’s exact test. For TF-binding sites, we included all available TFs (N = 161) in the ENCODE ChiP-seq database for the K562 cell line (wgEncodeRegTfbsClusteredV3.bed). Histone modification data were downloaded for primary HSCs (cell line E035) from the Roadmap Epigenomics Mapping Consortium database75. DHS sites were downloaded from the ENCODE project (wgEncodeRegDnaseClusteredV3.bed file). Finally, we assessed overlap with previously identified enhancer regions for three HSC cell lines (BI_CD34_Primary_RO01536, BI_CD34_Primary_RO01480, and BI_CD34_Primary_RO01549), CD19+ B cells (CD19_primary), GM12878 lymphoblastoid cells, K562 cells, and four brain cell lines (astrocytes, frontal lobe cells, temporal lobe cells, and hippocampus cells)76. All enrichment analyses were performed separately for hyper- and hypomethylated CpGs, and Bonferroni correction for multiple testing was applied as warranted for each analysis based on the number of comparisons.
DMRs associated with DS were identified using two different methods, DMRcate and comb-p77,78. The P values obtained in each EWAS were used for comb-p. DMRcate was run with adjustment for cell-type heterogeneity using the first ten ReFACTor PCs, as well as for sex, plate, and the first ten EPISTRUCTURE PCs (except in race/ethnicity-stratified analyses below). We retained DMRs that spanned a minimum of two CpGs, had a maximum distance of 1000 bp between methylation peaks, had an FDR-corrected P value < 0.01 in DMRcate, had a Šidák-corrected P value < 0.01 in comb-p, and that displayed any overlap between the coordinates of regions called by DMRcate and comb-p.
In the analyses of DMRs in DS, we generated DMR calls for (i) overall DS versus non-DS subjects, (ii) Latino and (iii) non-Latino white-stratified analyses (in both of which the first six EPISTRUCTURE PCs were used to adjust for genetic ancestry, as for the EWAS models above), (iv) following removal of the 34 DS newborns with high nRBC proportions, (v) following removal of the 30 DS newborns with GATA1 mutations, and (vi) for GATA1 mutation status as described for the EWAS of GATA1 mutations. To assess the potential functions of genes overlapped by the most significant DMRs in the overall analysis of DS, we investigated the presence of SNPs, and their corresponding trait associations, in the NHGRI-EBI Catalog of published GWAS and with reported P values < 5.0 × 10−8 in regions spanning +50 and −50 Kb of each DMR locus79.
Gene expression analysis in DS and non-DS FL CD34+ cells by bulk RNA sequencing
GATA1 mutation analysis, CD34+ separation, and immunohistochemistry were performed as previously described3,80. Fluorescence in situ hybridization was used to confirm the presence (N = 3) or absence (N = 3) of T21 in FL. Bulk RNA sequencing of DS and non-DS FL cells was performed using the SMART-Seq2 protocol81. In brief, 100 purified HSC or progenitor cells (HSPCs) from gestation-matched DS (N = 3) and non-DS (N = 3) 2nd-trimester FL samples were sorted directly into lysis buffer containing 0.4% Triton X-100 (Sigma-Aldrich), RNase inhibitor (Clontech), 2.5-mM dNTPs (Thermo Fisher), and 2.5-μM oligo-dT30VN primer (Biomers.net). cDNA was generated using SuperScript II (Invitrogen), preamplified using KAPA HiFi HotStart ReadyMix (KAPA Biosystems) using 18 cycles of amplification. After PCR amplification, the cDNA libraries were purified with AMPure XP beads (Beckman Coulter) according to the manufacturer’s instructions. Post-purification libraries were resuspended in EB buffer (Qiagen). The quality of cDNA traces was assessed by using a High Sensitivity DNA Kit in a Bioanalyzer instrument (Agilent Technologies). Library preparation was performed using the Nextera XT DNA Library Preparation Kit (Illumina) according to the manufacturer’s instructions. Indexed cDNA libraries were multiplexed and sequenced using Illumina HiSeq2500 to generate 150-bp paired-end reads, yielding >30 million reads per sample.
Following sequencing, QC analysis was conducted using the fastQC package (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Reads were mapped to the human genome assembly hg19 using STAR software82. Quality and adapter trimming were performed using TrimGalore (https://github.com/FelixKrueger/TrimGalore). The featureCounts function from the Subread package in R was used to quantify gene expression levels using standard parameters. We excluded from our analyses any genes with <100 combined reads across all six samples. Differential gene expression between groups was assessed using the DESeq2 package83.
Single-cell gene expression analysis in DS and non-DS FL CD34+ cells
Single-cell analysis of RUNX1 and FLI1 gene expression was performed by reverse transcription (RT)-qPCR using the Biomark HD microfluidics system (Fluidigm). Flow cytometry was performed as previously described84. Samples were FACS-sorted using BD Fusion instruments, and data analyzed using FlowJo software. Single cells from FL common myeloid progenitors (CMP:Lin−CD34+CD38+CD45RA−CD123+) and megakaryocyte–erythroid progenitors (MEP:Lin−CD34+CD38+CD45RA−CD123−) from three DS and three non-DS subjects were index-sorted into a 96-well plate containing 5 μL of preamplification mix, which contained One-Step RT-PCR System with Platinum Taq kit (Invitrogen), SUPERASE-In RNase inhibitor (Ambion), low EDTA TE buffer (Invitrogen), and 0.2X Taqman assay mastermix. Plates were sealed, briefly centrifuged, and cDNA synthesis and sequence-specific preamplification performed (three-step PCR of step 1, reverse transcriptase at 50 °C for 15 min, step 2, inactivation of RTase, activation of Taq at 95 °C for 2 min, and step 3, specific target amplification at 95 °C for 15 s, then 60 °C for 4 min repeated for 20 cycles). Preamplified products were diluted by adding 20 μL of low EDTA TE buffer, and samples then analyzed using Universal PCR Master Mix (Applied Biosystems) and individual Taqman gene expression assays (RUNX1 Assay ID: Hs01021970_m1, FLI1 Assay ID: Hs00956711_m1 [Life Technologies]), on the Biomark System (Fluidigm) using the 96.96 Dynamic Arrays as per the manufacturer’s protocol. Sorted cells were simultaneously analyzed for relative expression levels of RUNX1 and FLI1. Gene expression was normalized to the average expression of three housekeeping genes B2M (Assay ID: Hs00984230_m1), GAPDH (Hs02758991_g1), and ACTB (Assay ID Hs01060665_g1)84.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was supported by an Alex’s Lemonade Stand Foundation “A” Award (A.J.d.S.), a National Institutes of Health (NIH) National Cancer Institute (NCI) Grant R01CA175737 (J.L.W. and X.M.), and a NIH NCI Administrative Supplement grant 3R01CA175737-05S1 (J.L.W., X.M., and A.J.d.S.). A.R. is supported by a Blood Cancer UK Clinician Scientist Fellowship (17001), Lady Tata Memorial International Fellowship, and Wellcome Clinical Research Career Development Fellowship (216632/Z/19/Z). P.V. and I.R. are supported by Blood Cancer UK Specialist Programme Grant 13001 and by the NIHR Oxford Biomedical Centre Research Fund. The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. A subset of biospecimens and/or data used in this study were obtained from the California Biobank Program at the California Department of Public Health (CDPH), SIS request numbers 572 and 600, in accordance with Section 6555(b), 17 CCR. The CDPH is not responsible for the results or conclusions drawn by the authors of this publication. We thank Robin Cooley and Steve Graham (Genetic Disease Screening Program, CDPH) for their assistance and expertise in the procurement and management of DBS specimens. We thank Hong Quach and Diana Quach at the UC Berkeley QB3 Genetic Epidemiology and Genomics Laboratory for their support in preparing and processing samples for genome-wide DNA methylation arrays. Human fetal material was provided by the Joint MRC/Wellcome Trust Grant 099175/Z/12/Z Human Developmental Biology Resource (http://hdbr.org). We would like to acknowledge the WIMM Flow Cytometry Facility, the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics; and MRC WIMM Single Cell Facility.
Source data
Author contributions
A.J.d.S., J.L.W., X.M., A.R. and I.R. designed the study. I.S.M., T.J., K.M.W., A.R., I.R., J.L.W. and A.J.d.S. prepared the paper. N.E., A.R., H.M.H., S.S.M., J.A. and K.G. performed experiments. I.S.M., S.L., T.J., N.E., H.M.H., P.P., J.M.S., R.R., K.D.S., K.M.W., A.R., I.R. and A.J.d.S. analyzed data. J.L.W., X.M., P.V. and I.R. provided study samples. All authors edited and approved the paper.
Data availability
ENCODE TF-binding site dataset wgEncodeRegTfbsClusteredV3.bed.gz was downloaded from the UCSC Genome Browser (https://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeRegTfbsClustered/). Histone modification data were downloaded for primary HSCs (cell line E035, CD34 primary cells) from the Roadmap Epigenomics Mapping Consortium database (https://egg2.wustl.edu/roadmap/data/byFileType/peaks/). ENCODE DNase I hypersensitive site data are available at http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeRegDnaseClustered/. NHGRI-EBI GWAS Catalog data are available at https://www.ebi.ac.uk/gwas/docs/file-downloads. FLI1 transcript variant data were downloaded from the BLUEPRINT Consortium Blood Atlas (https://blueprint.haem.cam.ac.uk/mRNA/). This study used biospecimens from the California Biobank Program. Any uploading of genomic data (including genome-wide DNA methylation data) and/or sharing of these biospecimens or individual data derived from these biospecimens has been determined to violate the statutory scheme of the California Health and Safety Code Sections 124980(j), 124991(b), (g), (h), and 103850 (a) and (d), which protect the confidential nature of biospecimens and individual data derived from biospecimens. The individual-level data derived from these biospecimens and that support the findings of this study are available from the corresponding author upon request, and with permission from the California Biobank Program. RNA-seq data from DS and non-DS FL CD34+ cells have been deposited at the Gene Expression Omnibus (GEO) with accession code: GSE160637. Source data are provided with this paper.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Jeffrey Craig, and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Shaobo Li, Thomas Jackson, Natalina Elliot.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-21064-z.
References
- 1.Parker SE, et al. Updated national birth prevalence estimates for selected birth defects in the United States, 2004–2006. Birth Defects Res. A. Clin. Mol. Teratol. 2010;88:1008–1016. doi: 10.1002/bdra.20735. [DOI] [PubMed] [Google Scholar]
- 2.Korenberg JR, et al. Down syndrome phenotypes: the consequences of chromosomal imbalance. Proc. Natl. Acad. Sci. U. S. A. 1994;91:4997–5001. doi: 10.1073/pnas.91.11.4997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Roy A, et al. Perturbation of fetal liver hematopoietic stem and progenitor cell development by trisomy 21. Proc. Natl Acad. Sci. U. S. A. 2012;109:17579–17584. doi: 10.1073/pnas.1211405109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.de Hingh YC, et al. Intrinsic abnormalities of lymphocyte counts in children with down syndrome. J. Pediatr. 2005;147:744–747. doi: 10.1016/j.jpeds.2005.07.022. [DOI] [PubMed] [Google Scholar]
- 5.Ram G, Chinen J. Infections and immunodeficiency in Down syndrome. Clin. Exp. Immunol. 2011;164:9–16. doi: 10.1111/j.1365-2249.2011.04335.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hasle H, Clemmensen IH, Mikkelsen M. Risks of leukaemia and solid tumours in individuals with Down’s syndrome. Lancet. 2000;355:165–169. doi: 10.1016/S0140-6736(99)05264-2. [DOI] [PubMed] [Google Scholar]
- 7.Hasle H, Friedman JM, Olsen JH, Rasmussen SA. Low risk of solid tumors in persons with Down syndrome. Genet. Med. 2016;18:1151–1157. doi: 10.1038/gim.2016.23. [DOI] [PubMed] [Google Scholar]
- 8.Roberts I, et al. GATA1-mutant clones are frequent and often unsuspected in babies with Down syndrome: identification of a population at risk of leukemia. Blood. 2013;122:3908–3917. doi: 10.1182/blood-2013-07-515148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bhatnagar N, Nizery L, Tunstall O, Vyas P, Roberts I. Transient abnormal myelopoiesis and AML in Down Syndrome: an update. Curr. Hematol. Malig. Rep. 2016;11:333–341. doi: 10.1007/s11899-016-0338-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Klusmann JH, et al. Treatment and prognostic impact of transient leukemia in neonates with Down syndrome. Blood. 2008;111:2991–2998. doi: 10.1182/blood-2007-10-118810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Antonarakis SE. Down syndrome and the complexity of genome dosage imbalance. Nat. Rev. Genet. 2017;18:147–163. doi: 10.1038/nrg.2016.154. [DOI] [PubMed] [Google Scholar]
- 12.Letourneau A, et al. Domains of genome-wide gene expression dysregulation in Down’s syndrome. Nature. 2014;508:345–350. doi: 10.1038/nature13200. [DOI] [PubMed] [Google Scholar]
- 13.Liu B, Filippi S, Roy A, Roberts I. Stem and progenitor cell dysfunction in human trisomies. EMBO Rep. 2015;16:44–62. doi: 10.15252/embr.201439583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Prandini P, et al. Natural gene-expression variation in Down syndrome modulates the outcome of gene-dosage imbalance. Am. J. Hum. Genet. 2007;81:252–263. doi: 10.1086/519248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kerkel K, et al. Altered DNA methylation in leukocytes with trisomy 21. PLoS Genet. 2010;6:e1001212. doi: 10.1371/journal.pgen.1001212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bacalini MG, et al. Identification of a DNA methylation signature in blood cells from persons with Down Syndrome. Aging. 2015;7:82–96. doi: 10.18632/aging.100715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Henneman P, et al. Widespread domain-like perturbations of DNA methylation in whole blood of Down syndrome neonates. PLoS ONE. 2018;13:e0194938. doi: 10.1371/journal.pone.0194938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mendioroz, M. et al. Trans effects of chromosome aneuploidies on DNA methylation patterns in human Down syndrome and mouse models. Genome Biol.16, 263 (2015). [DOI] [PMC free article] [PubMed]
- 19.Sailani MR, et al. DNA-methylation patterns in trisomy 21 using cells from monozygotic twins. PLoS ONE. 2015;10:e0135555. doi: 10.1371/journal.pone.0135555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Thilaganathan B, Tsakonas D, Nicolaides K. Abnormal fetal immunological development in Down’s syndrome. Br. J. Obstet. Gynaecol. 1993;100:60–62. doi: 10.1111/j.1471-0528.1993.tb12952.x. [DOI] [PubMed] [Google Scholar]
- 21.Jung I, et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 2019;51:1442–1449. doi: 10.1038/s41588-019-0494-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Grassi, L. et al. Cell type specific novel lncRNAs and circRNAs in the BLUEPRINT haematopoietic transcriptomes atlas. Haematologica 10.3324/haematol.2019.238147 (2020).
- 23.Lane AA, et al. Triplication of a 21q22 region contributes to B cell transformation through HMGN1 overexpression and loss of histone H3 Lys27 trimethylation. Nat. Genet. 2014;46:618–623. doi: 10.1038/ng.2949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chen MJ, Yokomizo T, Zeigler BM, Dzierzak E, Speck NA. Runx1 is required for the endothelial to haematopoietic cell transition but not thereafter. Nature. 2009;457:887–891. doi: 10.1038/nature07619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kubota Y, et al. Integrated genetic and epigenetic analysis revealed heterogeneity of acute lymphoblastic leukemia in Down syndrome. Cancer Sci. 2019;110:3358–3367. doi: 10.1111/cas.14160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Miyoshi H, et al. Alternative splicing and genomic structure of the AML1 gene involved in acute myeloid leukemia. Nucleic Acids Res. 1995;23:2762–2769. doi: 10.1093/nar/23.14.2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bee T, et al. Nonredundant roles for Runx1 alternative promoters reflect their activity at discrete stages of developmental hematopoiesis. Blood. 2010;115:3042–3050. doi: 10.1182/blood-2009-08-238626. [DOI] [PubMed] [Google Scholar]
- 28.Sroczynska P, Lancrin C, Kouskoff V, Lacaud G. The differential activities of Runx1 promoters define milestones during embryonic hematopoiesis. Blood. 2009;114:5279–5289. doi: 10.1182/blood-2009-05-222307. [DOI] [PubMed] [Google Scholar]
- 29.Lie-A-Ling, M. et al. Regulation of RUNX1 dosage is crucial for efficient blood formation from hemogenic endothelium. Development 145, 10.1242/dev.149419 (2018). [DOI] [PMC free article] [PubMed]
- 30.Cai Z, et al. Haploinsufficiency of AML1 affects the temporal and spatial generation of hematopoietic stem cells in the mouse embryo. Immunity. 2000;13:423–431. doi: 10.1016/S1074-7613(00)00042-X. [DOI] [PubMed] [Google Scholar]
- 31.Webber BR, et al. DNA methylation of Runx1 regulatory regions correlates with transition from primitive to definitive hematopoietic potential in vitro and in vivo. Blood. 2013;122:2978–2986. doi: 10.1182/blood-2013-03-489369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bastian LS, Kwiatkowski BA, Breininger J, Danner S, Roth G. Regulation of the megakaryocytic glycoprotein IX promoter by the oncogenic Ets transcription factor Fli-1. Blood. 1999;93:2637–2644. doi: 10.1182/blood.V93.8.2637. [DOI] [PubMed] [Google Scholar]
- 33.Hart A, et al. Fli-1 is required for murine vascular and megakaryocytic development and is hemizygously deleted in patients with thrombocytopenia. Immunity. 2000;13:167–177. doi: 10.1016/S1074-7613(00)00017-0. [DOI] [PubMed] [Google Scholar]
- 34.Eisbacher M, et al. Protein–protein interaction between Fli-1 and GATA-1 mediates synergistic expression of megakaryocyte-specific genes through cooperative DNA binding. Mol. Cell. Biol. 2003;23:3427–3441. doi: 10.1128/MCB.23.10.3427-3441.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Huang H, et al. Differentiation-dependent interactions between RUNX-1 and FLI-1 during megakaryocyte development. Mol. Cell. Biol. 2009;29:4103–4115. doi: 10.1128/MCB.00090-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pimkin M, et al. Divergent functions of hematopoietic transcription factors in lineage priming and differentiation during erythro-megakaryopoiesis. Genome Res. 2014;24:1932–1944. doi: 10.1101/gr.164178.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Raslova H, et al. FLI1 monoallelic expression combined with its hemizygous loss underlies Paris-Trousseau/Jacobsen thrombopenia. J. Clin. Invest. 2004;114:77–84. doi: 10.1172/JCI21197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Stockley J, et al. Enrichment of FLI1 and RUNX1 mutations in families with excessive bleeding and platelet dense granule secretion defects. Blood. 2013;122:4090–4093. doi: 10.1182/blood-2013-06-506873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bergiers, I. et al. Single-cell transcriptomics reveals a new dynamical function of transcription factors during embryonic hematopoiesis. Elife 7, 10.7554/eLife.29312 (2018). [DOI] [PMC free article] [PubMed]
- 40.Laufer BI, Hwang H, Vogel Ciernia A, Mordaunt CE, LaSalle JM. Whole genome bisulfite sequencing of Down syndrome brain reveals regional DNA hypermethylation and novel disorder insights. Epigenetics. 2019;14:672–684. doi: 10.1080/15592294.2019.1609867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Fukui H, Runker A, Fabel K, Buchholz F, Kempermann G. Transcription factor Runx1 is pro-neurogenic in adult hippocampal precursor cells. PLoS ONE. 2018;13:e0190789. doi: 10.1371/journal.pone.0190789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Theriault FM, et al. Role for Runx1 in the proliferation and neuronal differentiation of selected progenitor cells in the mammalian nervous system. J. Neurosci. 2005;25:2050–2061. doi: 10.1523/JNEUROSCI.5108-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jia, T. et al. Epigenome-wide meta-analysis of blood DNA methylation and its association with subcortical volumes: findings from the ENIGMA Epigenetics Working Group. Mol. Psychiatry, 10.1038/s41380-019-0605-z (2019). [DOI] [PMC free article] [PubMed]
- 44.Bakircioglu M, et al. The essential role of centrosomal NDE1 in human cerebral cortex neurogenesis. Am. J. Hum. Genet. 2011;88:523–535. doi: 10.1016/j.ajhg.2011.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ishibashi M, et al. Targeted disruption of mammalian hairy and Enhancer of split homolog-1 (HES-1) leads to up-regulation of neural helix-loop-helix factors, premature neurogenesis, and severe neural tube defects. Genes Dev. 1995;9:3136–3148. doi: 10.1101/gad.9.24.3136. [DOI] [PubMed] [Google Scholar]
- 46.Inoue M, et al. Prdm8 regulates the morphological transition at multipolar phase during neocortical development. PLoS ONE. 2014;9:e86356. doi: 10.1371/journal.pone.0086356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Webb D, Roberts I, Vyas P. Haematology of Down syndrome. Arch. Dis. Child. Fetal Neonatal Ed. 2007;92:F503–F507. doi: 10.1136/adc.2006.104638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Nitzan I, et al. Elevated nucleated red blood cells in neonates with Down syndrome and pulmonary hypertension. J. Pediatr. 2019;213:232–234. doi: 10.1016/j.jpeds.2019.05.068. [DOI] [PubMed] [Google Scholar]
- 49.Bozner P. Transient myeloproliferative disorder with erythroid differentiation in Down syndrome. Arch. Pathol. Lab. Med. 2002;126:474–477. doi: 10.5858/2002-126-0474-TMDWED. [DOI] [PubMed] [Google Scholar]
- 50.Brown AL, et al. Inherited genetic susceptibility of acute lymphoblastic leukemia in Down syndrome. Blood. 2019;134:1227–1237. doi: 10.1182/blood.2018890764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kim N, Jinks-Robertson S. Transcription as a source of genome instability. Nat. Rev. Genet. 2012;13:204–214. doi: 10.1038/nrg3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Azad P, et al. Senp1 drives hypoxia-induced polycythemia via GATA1 and Bcl-xL in subjects with Monge’s disease. J. Exp. Med. 2016;213:2729–2744. doi: 10.1084/jem.20151920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Debieve F, Moiset A, Thomas K, Pampfer S, Hubinont C. Vascular endothelial growth factor and placenta growth factor concentrations in Down’s syndrome and control pregnancies. Mol. Hum. Reprod. 2001;7:765–770. doi: 10.1093/molehr/7.8.765. [DOI] [PubMed] [Google Scholar]
- 54.Silver, M. J. et al. Independent genomewide screens identify the tumor suppressor VTRNA2-1 as a human epiallele responsive to periconceptional environment. Genome Biol.16, 118 (2015). [DOI] [PMC free article] [PubMed]
- 55.Lu J, et al. Global hypermethylation in fetal cortex of Down syndrome due to DNMT3L overexpression. Hum. Mol. Genet. 2016;25:1714–1727. doi: 10.1093/hmg/ddw043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Nielsen AB, et al. Increased neonatal level of arginase 2 in cases of childhood acute lymphoblastic leukemia implicates immunosuppression in the etiology. Haematologica. 2019;104:e514–e516. doi: 10.3324/haematol.2019.216465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zhou W, Triche TJ, Jr, Laird PW, Shen H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 2018;46:e123. doi: 10.1093/nar/gky691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Triche TJ, Jr, Weisenberger DJ, Van Den Berg D, Laird PW, Siegmund KD. Low-level processing of Illumina Infinium DNA Methylation BeadArrays. Nucleic Acids Res. 2013;41:e90. doi: 10.1093/nar/gkt090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Mah, C. K., Mesirov, J. P. & Chavez, L. An accessible GenePattern notebook for the copy number variation analysis of Illumina Infinium DNA methylation arrays. F1000Res 7, 10.12688/f1000research.16338.1 (2018). [DOI] [PMC free article] [PubMed]
- 60.van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
- 61.Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
- 62.Hansen, K. D. IlluminaHumanMethylationEPICanno.ilm10b2.hg19: annotation for Illumina’s EPIC methylation arrays. R package version 0.6.0, https://bitbucket.com/kasperdanielhansen/Illumina_EPIC (2016).
- 63.Martin, T. C., Yet, I., Tsai, P. C. & Bell, J. T. coMET: visualisation of regional epigenome-wide association scan results and DNA co-methylation patterns. BMC Bioinform.16, 131 (2015). [DOI] [PMC free article] [PubMed]
- 64.Gervin, K. et al. Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data. Clin. Epigenetics11, 125 (2019). [DOI] [PMC free article] [PubMed]
- 65.Koestler, D. C. et al. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL). BMC Bioinform.17, 120 (2016). [DOI] [PMC free article] [PubMed]
- 66.Salas, L. A. &. Koestler, D. C. Illumina EPIC data on immunomagnetic sorted peripheral adult blood cells. R package version 1.5.3, https://github.com/immunomethylomics/FlowSorted.Blood.EPIC (2019).
- 67.Bakulski KM, et al. DNA methylation of cord blood cell types: applications for mixed cell birth studies. Epigenetics. 2016;11:354–362. doi: 10.1080/15592294.2016.1161875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Rahmani E, et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods. 2016;13:443–445. doi: 10.1038/nmeth.3809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Rahmani E, et al. GLINT: a user-friendly toolset for the analysis of high-throughput DNA-methylation array data. Bioinformatics. 2017;33:1870–1872. doi: 10.1093/bioinformatics/btx059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Labuhn M, et al. Mechanisms of progression of myeloid preleukemia to transformed myeloid leukemia in children with Down syndrome. Cancer Cell. 2019;36:123–138.e10. doi: 10.1016/j.ccell.2019.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Koboldt DC, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25:2283–2285. doi: 10.1093/bioinformatics/btp373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Robinson JT, Thorvaldsdottir H, Wenger AM, Zehir A, Mesirov JP. Variant review with the integrative genomics viewer. Cancer Res. 2017;77:e31–e34. doi: 10.1158/0008-5472.CAN-17-0337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Rahmani, E. et al. Genome-wide methylation data mirror ancestry information. Epigenetics Chromatin10, 1 (2017). [DOI] [PMC free article] [PubMed]
- 74.Ren X, Kuan PF. methylGSA: a Bioconductor package and Shiny app for DNA methylation data length bias adjustment in gene set testing. Bioinformatics. 2019;35:1958–1959. doi: 10.1093/bioinformatics/bty892. [DOI] [PubMed] [Google Scholar]
- 75.Bernstein BE, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 2010;28:1045–1048. doi: 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hnisz D, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Peters, T. J. et al. De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin8, 6 (2015). [DOI] [PMC free article] [PubMed]
- 78.Pedersen BS, Schwartz DA, Yang IV, Kechris KJ. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics. 2012;28:2986–2988. doi: 10.1093/bioinformatics/bts545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Buniello A, et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Tunstall-Pedoe O, et al. Abnormalities in the myeloid progenitor compartment in Down syndrome fetal liver precede acquisition of GATA1 mutations. Blood. 2008;112:4507–4511. doi: 10.1182/blood-2008-04-152967. [DOI] [PubMed] [Google Scholar]
- 81.Picelli S, et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 2014;9:171–181. doi: 10.1038/nprot.2014.006. [DOI] [PubMed] [Google Scholar]
- 82.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol.15, 550 (2014). [DOI] [PMC free article] [PubMed]
- 84.O’Byrne S, et al. Discovery of a CD10-negative B-progenitor in human fetal life identifies unique ontogeny-related developmental programs. Blood. 2019;134:1059–1071. doi: 10.1182/blood.2019001289. [DOI] [PubMed] [Google Scholar]
- 85.Olsen IE, Groveman SA, Lawson ML, Clark RH, Zemel BS. New intrauterine growth curves based on United States data. Pediatrics. 2010;125:e214–e224. doi: 10.1542/peds.2009-0913. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
ENCODE TF-binding site dataset wgEncodeRegTfbsClusteredV3.bed.gz was downloaded from the UCSC Genome Browser (https://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeRegTfbsClustered/). Histone modification data were downloaded for primary HSCs (cell line E035, CD34 primary cells) from the Roadmap Epigenomics Mapping Consortium database (https://egg2.wustl.edu/roadmap/data/byFileType/peaks/). ENCODE DNase I hypersensitive site data are available at http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeRegDnaseClustered/. NHGRI-EBI GWAS Catalog data are available at https://www.ebi.ac.uk/gwas/docs/file-downloads. FLI1 transcript variant data were downloaded from the BLUEPRINT Consortium Blood Atlas (https://blueprint.haem.cam.ac.uk/mRNA/). This study used biospecimens from the California Biobank Program. Any uploading of genomic data (including genome-wide DNA methylation data) and/or sharing of these biospecimens or individual data derived from these biospecimens has been determined to violate the statutory scheme of the California Health and Safety Code Sections 124980(j), 124991(b), (g), (h), and 103850 (a) and (d), which protect the confidential nature of biospecimens and individual data derived from biospecimens. The individual-level data derived from these biospecimens and that support the findings of this study are available from the corresponding author upon request, and with permission from the California Biobank Program. RNA-seq data from DS and non-DS FL CD34+ cells have been deposited at the Gene Expression Omnibus (GEO) with accession code: GSE160637. Source data are provided with this paper.