Abstract
Chromatin is folded into successive layers to organize linear DNA. Genes within the same topologically associating domains (TADs) demonstrate similar expression and histone-modification profiles, and boundaries separating different domains have important roles in reinforcing the stability of these features. Indeed, domain disruptions in human cancers can lead to misregulation of gene expression. However, the frequency of domain disruptions in human cancers remains unclear. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we analyzed 288,457 somatic structural variations (SVs) to understand the distributions and effects of SVs across TADs. Notably, SVs can lead to the fusion of discrete TADs, and complex rearrangements markedly change chromatin folding maps in the cancer genomes. Notably, only 14% of the boundary deletions resulted in a change in expression in nearby genes of more than twofold.
Subject terms: Data mining, Cancer, Gene expression
A pan-cancer genomic analysis reports the effects of structural variations on chromatin domains (TADs). Most TAD disruptions do not result in appreciable changes in expression of nearby genes.
Main
Genome organization inside the nucleus is hierarchically organized1. Chromosomes are organized into chromosome territories2. Inside chromosome territories, certain regions of the chromatin are attached to the nuclear periphery and form repressive nuclear lamin-associated domains (LADs)3. Recent chromosome conformation studies have revealed that mammalian chromosomes are structured into largely tissue-invariant TADs in which the DNA interactions are more frequent within a given domain than with regions in other domains4,5. TADs are considered to represent functional domains because a given TAD encompasses the regulatory elements for the genes inside the same domain6,7. Therefore, the integrity of the domain structures is important for the proper regulation of genes8–12. The disruption of domain boundaries can result in ectopic interactions between neighboring domains and affect the regulation of nearby genes5,9. Regulatory landscapes are an important part of human malignancies, and studies have shown that the ‘hijacking’ of enhancers can lead to overexpression of oncogenes (for example, growth factor independent 1 family oncogenes (GFI1 and GFI1B)) in medulloblastoma13 or proto-oncogene MECOM activation due to an inversion between TADs in acute myeloid leukemia cells, which facilitates tumor formation14. Several other studies have reported the deregulation of chromatin folding structures in different cancer types11,15,16. Hence, genomic rearrangements can have a significant role in the reshuffling of TAD structures that results in altered gene regulation. Despite these recent examples of SVs that result in altered local enhancer–promoter landscapes, the frequency of such regulatory architecture rearrangements in cancer genomes remains unclear. Similarly, whether there are loci affected by potential changes in regulatory structure outside of those currently reported in the literature is unknown. To address these questions, we comprehensively characterized the effects of different SVs on TADs and gene-expression patterns observed in various tumor types to expand understanding of the link between chromatin folding and genomic rearrangements in cancer genomes.
Results
TAD boundaries are affected by different types of somatic SV in cancer genomes
Previous reports have indicated that TADs are a largely cell-type-invariant feature of genome organization4,17. In this pan-cancer analysis, we sought to generate a common set of boundaries observed in different cell types. We used high-resolution chromosome conformation (Hi-C) datasets from five human cell lines that represent three distinct embryonic germ layers (GM12878 and HMEC, mesoderm; IMR90, endoderm; HUVEC and NHEK, ectoderm)17 to identify TAD boundaries in different cell types (Extended Data Fig. 1a). We called TAD boundaries from 25-kb-binned Hi-C data for each cell type with an insulation score18 approach. This method calculates a score (TAD signal), for each bin, for the average interactions with the nearby loci for a 2-Mb genomic window. Boundaries are determined as regions with local insulation minima along the diagonal of the Hi-C matrix18. As a result, a number of boundaries, which ranged from 3,926 to 4,690, were found for different cell types. We next investigated whether our TAD boundary calls were consistent with the previously reported boundaries and showed attributes of TAD boundaries. To test this, we compared available boundary regions for IMR90 cells that were identified using a directionality-based approach (with a bin size of 40 kb)4. Our IMR90 boundary calls were highly overlapping (>84%) with published boundaries (Extended Data Fig. 1b). This showed that the current boundary regions were comparable with previously mapped boundaries even though they were identified at a different Hi-C resolution and using a different detection algorithm. Furthermore, we observed known TAD boundary signatures4 around our boundary calls for each cell type (Extended Data Fig. 1c). Across all cell types, we identified a common set of 2,477 boundaries (Supplementary Table 1, Extended Data Fig. 1d). There was a significant (P < 10−6) overlap (a 50-kb distance was allowed) between TAD boundaries among all profiled cell types. The median distance between the common boundaries was approximately 750 kb, consistent with the reported median TAD size in human cells4,19 (Extended Data Fig. 1e). The resulting 2,477 common regions were used for the rest of the analyses (referred to as boundaries hereafter).
Next, to test whether the overall chromatin architecture is similar in cancer and non-cancer cells, we intersected these boundaries with the TAD boundaries found in cancer cell lines. We observed a high overlap with boundaries from a leukemia cell line K562 (ref. 17) and a breast cancer cell line MCF7 (ref. 20) (85% and 83.4%, respectively; Extended Data Fig. 1f,g). These analyses revealed that a significant (P < 10−7) percentage of boundaries was conserved between normal and malignant cells. We next examined the enrichment of CCCTC-binding factor (CTCF)-binding and DNase I hypersensitivity sites, as well as active transcription start sites and heterochromatic regions around boundaries from various cell types that have previously been profiled by the Encyclopedia of DNA Elements (ENCODE) consortium19 and the Roadmap Epigenome project21. We observed that CTCF-binding sites and active promoter marks were enriched, whereas the heterochromatin state was depleted at the boundaries. In addition, TAD signal levels were the lowest at the boundaries compared with flanking sites (Fig. 1a), consistent with the role of TAD boundaries in the reduction of the contacts between adjacent domains. Overall, these common 2,477 boundaries exhibited the genomic features of TAD boundaries across different human cell types.
To understand the effects of SVs on TAD boundaries in human cancers, we used 288,457 high-confidence somatic SVs as part of the ICGC PCAWG project. The PCAWG Consortium aggregated whole-genome sequencing (WGS) data from 2,658 cancers across 38 tumor types generated by the ICGC and TCGA projects. These sequencing data were re-analyzed with standardized, high-accuracy pipelines to align to the human genome (reference build hs37d5) and identify germline variants and somatically acquired mutations, as described in the lead paper of the PCAWG Consortium22. We used SV breakpoint orientations as a measurement to classify deletions, inversions, duplications or complex rearrangements as described previously23. Complex rearrangements included chromothripsis24 and other alterations, which covered SV break-ends with concomitant deletions, inversions or duplications. SVs were further categorized into two subgroups based on the length of the events—SVs that were longer than 2 Mb in genomic length (long-range SVs) and shorter than 2 Mb in genomic length (short-range SVs). The majority of deletions, inversions and duplications could be categorized as short-range; however, complex events tended to be longer in length (Extended Data Fig. 2a). In this study, we focused on short-range SVs because long-range SVs could affect multiple boundaries due to the genomic length of the event. We identified SVs that affected the TAD boundaries (boundary affecting (BA)) as the ones that spanned the whole length of a boundary (around 75 kb). As a result, 5.0%, 8.5%, 12.8% and 19.9% of all deletions, inversions, duplications and complex events were called BA events, respectively (Fig. 1b). Compared with the expected number of boundary disruptions based on randomly shuffled boundaries, these ratios are strongly enriched in BA-duplications (P < 10−4, 1.43-fold enrichment). In contrast, we observed a depletion (0.87-fold enrichment, P = 0.052) in BA-deletions, whereas BA-inversions and BA-complex events occurred at expected levels (P > 0.05) compared with the shuffled TAD boundaries (Fig. 1c). Overall, these results suggest that deletions tended to occur within the same TAD, whereas duplications tended to span regions across different TADs.
In cancer cells, boundaries are affected to various degrees due to structural alterations, which suggests that some mechanistic differences could cause different SV types. Length distributions of the BA-SVs were uniformly distributed (Extended Data Fig. 2b). Most of the BA-SVs targeted a single boundary; 74% of BA-deletions, 65% of BA-inversions, 71% of BA-duplications and 64% of BA-complex events affected a single boundary per variant (Fig. 1d). The number of affected boundaries did not markedly change with the minimum length of the SVs (Fig. 1d, Extended Data Fig. 2c). The majority (98.4%) of the boundaries were affected in cancer genomes, although a few boundaries were located in the low-mappability regions of the genome. Interestingly, TAD boundaries are significantly less likely (P < 0.02) to be affected by known deletion and duplication polymorphisms derived from genomes of healthy human populations25–27 (Extended Data Fig. 2d). Genomic length of the germline alterations tends to be shorter compared with somatic alterations observed in tumors due to negative selection against large SVs in the germline28. Therefore, we selected germline and somatic deletions with a genomic length between 75 kb and 250 kb that occurred in all cancer samples (Fig. 1e). This filtering ensured that the selected somatic (median, 137 kb) or germline (median, 113 kb) deletions had the length potential to disrupt TAD boundaries. We observed that germline deletions that affected TAD boundaries were rare (less than 0.1%; 6 affected out of total 924 deletions) compared with somatic deletions (4.1%), even in cases in which similar genomic ranges and less than 1% of the total boundaries were affected by germline events, suggesting that germline variations in TAD boundaries may not be as well tolerated as similar somatic alterations.
Chromatin folding disruptions are specific to histological subtypes
We next focused on the distributions of BA-SVs across 38 different histological cancer subtypes22. The number of BA-SVs generally followed the total number of SVs in a given cancer type. Our analysis revealed that, among all cancer types, leiomyosarcoma and uterus adenocarcinoma had higher numbers with—on average—25 and 22 BA-SVs per sample, respectively, compared with a median of around 7 BA-SVs per sample across all cancer samples (Fig. 2a, b). Ovarian, esophageal and breast adenocarcinomas also contained high numbers of BA-SVs with—on average—20, 19 and 18 BA-SVs per sample, respectively. On the other hand, hematopoietic cancers (myeloid-MDS or myeloid-AML) had the lowest BA-SV rates. Only glioblastoma samples (CNS-GBM) showed lower-than-expected BA-SVs (P < 10−3) across all cancer types. The median SV length of a given cancer type was not strongly correlated with the observed distributions (r2 = 0.03–0.45) (Extended Data Fig. 3a). The observed differences in BA-SV rates are likely driven by the differences in the burden and mechanisms of SVs across histological types. For instance, leiomyosarcoma and esophageal adenocarcinoma had a higher complex SV burden and, as a result, observed BA events were also mostly complex rearrangements (Fig. 2b), whereas ovary and stomach adenocarcinoma samples contained BA-duplications due to an overall higher duplication rate (Fig. 2b). Similarly, the total number of SVs in an individual tumor affects the observed BA-SVs in that sample (Fig. 2b, Extended Data Fig. 3b). Long-range BA-SVs had similar distributions across histological types. Again, leiomyosarcoma and breast adenocarcinoma contained a higher number of BA-SVs compared with other cancer types, whereas leukemia samples had no BA-SVs per sample (Extended Data Fig. 4a). Taken together, our findings show that the impact of BA-SVs is varied substantially across tumor types and these events were reflective of overall SV burden and type.
Recurrently affected boundaries in specific cancer types
Next, we sought to identify the affected boundaries near known driver genes in the COSMIC cancer gene census29. We noted that many of the boundaries of cancer driver genes were altered in specific histological subtypes (Fig. 3a, Supplementary Table 2). Of those recurrently affected boundaries, two adjacent boundaries between KIAA1549 and BRAF were prone to BA-duplications specifically in samples of pilocytic astrocytoma (Fig. 3b). This region has previously been implicated in pilocytic astrocytoma, producing an oncogenic fusion between the aforementioned genes30. In addition, boundaries near the MDM2 locus were most affected in leiomyosarcoma (Fig. 3b), likely due to neochromosome formations that included the MDM2 and CDK4 genes31. We also observed a higher mutational load specifically on chromosome 12 in leiomyosarcoma samples (Fig. 3b). Another recurrent BA-SV event was the high number of BA-deletions around RBFOX1 in colorectal adenocarcinoma samples (Extended Data Fig. 4b). We surveyed the BA-SV distributions on individual chromosomes and observed a positive correlation with the number of boundaries (r2 = 0.68–0.92) and gene density (r2 = 0.7–0.85) on a given chromosome (Extended Data Fig. 4c,d). Notably, distributions of BA-SVs per chromosome were generally specific to the histology subtype; for example, chromosome 17 was affected predominantly by BA-complex events in breast and esophageal adenocarcinoma samples (Extended Data Fig. 5a,b). These findings emphasize the cancer specificity of BA-SVs, in which active mechanisms lead to the overall SV burden and type in different tumor types yield potential changes in TAD structures, especially around cancer driver genes. We next examined SVs that occurred within TADs, which potentially resulted in the disruption of CTCF–CTCF chromatin loops32. We identified a number of chromatin loops that were potentially disrupted in various cancer types (Supplementary Table 3). For instance, a CTCF site near FOXC1 overlaps with recurrent deletions in esophageal, gastric and colon adenocarcinomas (Fig. 3c). Other potentially altered loops include a CTCF site near BCL6 in hepatocellular carcinoma and breast adenocarcinoma, and CLCN4 in colorectal adenocarcinomas (Extended Data Fig. 6a,b). Therefore, chromatin folding perturbations can occur at various scales, include TADs and CTCF–CTCF chromatin loops in cancer genomes and recurrently altered boundaries are generally cancer-type specific.
Most domain disruptions do not result in marked gene-expression changes
To ascribe potential functional effects of BA-SVs on chromatin domains, we annotated the TADs by profiling the context of aggregate chromatin states within each TAD. We used a probabilistic approach that calculated the occurrence of chromatin states in cell types recorded in the Roadmap Epigenome data. Coverage of 15 chromatin state enrichments in each domain was calculated and normalized to the length of the domain. The obtained matrix was grouped using the k-means clustering approach and five distinct groups of TADs were identified similar to a previous classification of chromatin domains17,19,33. These groups comprised heterochromatin (61), low/quiescent (705), repressed (481), low-active (764) and active (365) domains (Fig. 4a, Supplementary Table 4). In addition, we used constitutive LADs34 identified in three different human cell types to profile the outcomes of the SVs that occurred between LADs and inter-LADs. We evaluated the annotation results by profiling the distributions of domain sizes. Repressed domains were larger in size and covered the majority of the genome compared with active domains, in agreement with previous TAD annotations19,35 (Extended Data Fig. 7a,b). The median expression of genes within each domain was calculated for 2,921 cancer-free samples from 45 different tissues (GTEx consortium)36 as well as for samples from 998 patients with cancer from ICGC expression datasets. Analysis of expression levels confirmed that genes within repressed domains or LADs had significantly lower expression patterns than genes within active domains or inter-LADs (P < 2.2 × 10−16) (Fig. 4b, Extended Data Fig. 7c). Furthermore, distributions of replication timing for various cell types and open/closed chromatin compartment calls from TCGA data37 corroborated the data of the annotated domains (Extended Data Fig. 7d,e). Utilizing our domain annotations, we checked the distributions of flanking domains for BA-deletion, BA-inversion, BA-duplication or BA-complex events. The majority of the BA-SVs affected the same flanking domain types, such as boundaries that separated low and low domains or low-active and low-active domains (Extended Data Fig. 8a). However, BA-SVs between different domain types occurred significantly more frequently than the expected rate, which suggests that BA-SVs have a potential role in gene-expression changes (Extended Data Fig. 8a). Therefore, we compared expression values of the genes that reside on each side of the SVs.
We initially focused on BA-deletions between repressed and active domains, as previous studies showed that fused repressed–active domains could lead to an upregulation of nearby genes38,39. Indeed, genes located on the repressed side of deletions were significantly upregulated (P < 0.001, Supplementary Table 5) in samples with deletions compared with the rest of the samples in the same histological subtype (Fig. 4c), whereas the same effect was not observed for BA-deletions between repressed–repressed or active–active domains (Extended Data Fig. 8b). For example, a BA-deletion in a malignant lymphoma sample was associated with a 37-fold increase in the expression level of WNT4 compared with the rest of samples from patients with lymphoma (Fig. 4d). Similarly, a BA-deletion in the genome of a patient with breast adenocarcinoma correlated with 26-fold overexpression of SLC22A2 compared with the rest of the patients with breast cancer (Fig. 4e). However, this correlation of gene expression with BA-deletions between active and repressed domains was not universal. The fold change in expression of SLC2A10 was 1.10 in a uterus adenocarcinoma sample with a BA-deletion compared with the rest of uterus tumor samples (Fig. 4f). Therefore, not every BA-deletion correlated with a marked change in gene expression; in fact, only 25% of BA-deletions between repressed and active domains coincided with twofold changes in gene expression (Supplementary Table 5). To use a higher number of events, we next extended our analysis to all BA-deletions that occurred between different domain types. We classified domains as ‘more’ or ‘less’ transcriptionally active based on the annotations of domains (the ordering of domain types is described in Fig. 4a). This analysis resulted in a non-significant (P > 0.05) difference between genes that were located on more or less transcriptionally active domains after BA-deletions (Fig. 4g); and 14% of all BA-deletions coincided with a twofold change (Supplementary Table 5). We observed a similar non-significant difference for BA-duplications and BA-complex events (Extended Data Fig. 8c, Supplementary Tables 6,7).
Next, we compared the events between LADs and inter-LADs to profile whether alterations in the lamin organization could contribute to gene expression in tumor samples. We observed that deletions significantly occurred in LADs and duplications in inter-LADs, whereas SVs were less likely to occur between LADs and inter-LADs (Extended Data Fig. 8d). We noticed certain correlations between gene expression and events between LADs and inter-LADs—for example, a complex rearrangement in a melanoma sample coincided with a sevenfold upregulation of TRIM42 (which resides in a LAD) compared with the rest of the patients with melanoma (Fig. 4h). Overall, however, we did not observe a significant change for deletion, duplication and complex events between LADs and inter-LADs (Extended Data Fig. 8e, Supplementary Tables 8–10). These observations suggest that gene regulation in cancer genomes is multifactorial, although disruptions in chromatin folding domains may contribute to expression levels in certain cases, the effects of disruption do not always coincide with the expression changes.
Cell-type-specific alterations in chromatin folding patterns by different SV types
Next, to evaluate whether BA-SVs indeed altered chromatin folding patterns, we generated high-resolution Hi-C data for four cancer cell lines (SW480 and SNU-C1 for colorectal adenocarcinoma, HCC1954 for breast adenocarcinoma and OE33 for esophageal adenocarcinoma), which were previously profiled by WGS. For the majority of the BA-SVs detected by the WGS data (>90%), we were able to observe a change in the folding pattern in Hi-C contact maps of the respective cell line (Extended Data Fig. 9a). Break-ends of BA-SVs exhibited a strong contact frequency (14.6-fold) in cancer cells compared with non-cancerous cells (Extended Data Fig. 9b). The shortest BA-event with a detectable change in our Hi-C maps was a 460-kb long duplication in SW480 cells (Extended Data Fig. 9c). By contrast, we observed several discrepancies between SVs detected in WGS data and Hi-C maps. These SV break-ends tended to be located in repetitive regions of the genome or overlapped with inter-chromosomal translocations (Extended Data Fig. 9a,c). Our results demonstrate that BA-SVs detected using WGS data generally result in altered chromatin folding patterns in cancer cells.
We subsequently studied how BA-deletions, BA-inversions, BA-duplications and BA-complex rearrangements change the contact maps and noticed distinct interaction patterns in chromatin contact maps for different BA-SVs (Fig. 5a, Extended Data Fig. 9d–f). This observation of specific changes in Hi-C maps due to different SV types is consistent with findings from a recent study40. Furthermore, it has also been suggested that SVs could lead to TAD fusions40 (also referred to as neo-TADs3,4); we therefore analyzed whether the BA-SVs observed in our cancer cell lines exhibited similar neo-TAD formation. We grouped bins on the basis of their location compared with the SV breakpoints and the nearest TAD boundary. If bins were between the SV breakpoints and the nearest TAD boundary, we classified these interactions as intra-TAD/SV and if bins were not constrained by the nearest boundary, we classified these interactions as inter-TAD/SV (Fig. 5b). Our analysis revealed that intra-TAD/SV interactions were stronger than the inter-TAD/SV interactions, when controlling for genomic distance effects, which suggests that the SVs can lead to cross-boundary interactions and potentially the formation of new chromatin folding domains based on the location of existing nearby TAD boundaries (Fig. 5b). For instance, an inversion in OE33 cells that encompassed ERBB2 formed a neo-TAD on chromosome 17 (Fig. 5c), a duplication in HCC1954 cells on chromosome 4 (Fig. 5c) and a duplication near KRAS in SW480 cells (Extended Data Fig. 9g) resulted in a TAD-like configuration between previously disparate two TADs (Fig. 5c). These new TAD-like patterns could only be observed in cell lines that had the SV, suggesting that these folding patterns were the result of a specific alteration (Extended Data Fig. 10a). In all of these events, we observed that new interactions spanned the nearest boundary and formed ‘triangular shapes’ that were consistent with the TAD patterns observed in non-rearranged genomes. Therefore, BA-SVs have the potential to form new TAD structures in cancer cells that could reconfigure cis-regulatory interactions.
Complex rearrangements markedly change chromatin folding maps in the cancer genomes
We noticed that complex rearrangements in which deletion, inversion or duplication break-ends overlapped resulted in marked changes in Hi-C maps. SNU-C1 cells contain a complex rearrangement (chromothripsis) across the entire chromosome 15, which was reported by WGS and spectral karyotyping41. This chromosome has 239 rearrangements in the SNU-C1 cells and we observed marked changes only in SNU-C1 Hi-C maps in which the differences in folding patterns overlapped with the identified SV break-ends (Fig. 6a, Extended Data Fig. 10b). Similarly, we noticed a chromothripsis-like event that covered chromosome 21 of HCC1954 cells in WGS data and, similarly, the Hi-C map of chromosome 21 in HCC1954 cells showed considerable changes (Fig. 6b). In addition to the complex rearrangements that covered whole chromosomes, we noticed regional complex rearrangements that had abnormal chromatin folding patterns. For example, the MYC locus in SW480 cells contains 135 rearrangements in a 4-Mb genomic window (Fig. 6c), whereas a larger complex event was observed in HCC1954 cells around the similar locus, which also involved two other cancer driver genes, TERT and APC, on chromosome 5 (Fig. 6d). We could detect the changes in biological Hi-C replicates, suggesting that these BA-SV effects are reproducible (Extended Data Fig. 10c). Given that complex rearrangements are the most frequent genomic alterations observed in the cancer genomes (Fig. 1b), studying the causes and consequences of these events using the chromatin conformation-based datasets would be critical for our understanding of the contribution of these events to the formation of cancer.
Discussion
We explored the distributions of somatic SVs in a variety of tumor types and their potential roles in the disruption of chromatin folding and gene regulation. We found that certain boundaries are affected in a cancer-specific manner, which was likely due to the distribution of cancer-specific driver genes. Additionally, we observed a difference between the disruptions between different SV types; deletions tended to occur within TADs and LADs, whereas duplications tended to span TADs and generally occurred within inter-LAD regions. These results suggest that mechanistic differences may underlie the generation of different types of SV. For example, genome organization may influence partner selection during genomic rearrangements, as suggested by the distribution of different SV types in the genome to varying degrees. Disruption of folding domains could result in aberrant interactions between flanking domains and potentially contribute to the re-shaping of gene expression around the affected regions. Notably, we did not observe a strong association between global changes in gene expression after the disruption of each TAD, and only 14% of overall cases resulted in upregulation of more than twofold, which is consistent with the findings of recent studies42,43. These low expression changes may be reminiscent of mutations, in which there is a subset of chromatin-scale events that may be more likely to have functional effects (drivers) among a backdrop of considerable passenger events. Although we compared expression patterns of tumors in this study, cancer genomes may have other alterations that could affect the observed gene expression patterns, including copy-number alterations, dysregulation of transcription factors, chromatin regulators or cis-regulatory elements44. Therefore, the availability of histology-specific matched control samples coupled with WGS and chromatin organization datasets will augment our understanding of the functions of SV in genome folding and transcriptional dysregulation in cancers and contribute to our ability to discern signal from noise in appropriate contexts.
Methods
Hi-C data analysis
Chromatin conformation assay (Hi-C) data for cell lines of GM12878, HUVEC, IMR90, HMEC, NHEK and K562 were downloaded from GEO (GSE63525). Intra-chromosomal 25-kb-resolution raw observed, MAPQGE30-filtered values were normalized by dividing by the multiplication of Knight and Ruiz normalization scores for two contacting loci. We calculated the TAD signal by moving a window across the Hi-C matrix diagonal, the sum of the interaction for a given bin of up to 2-Mb flanking regions and log2 of the observed bin to the mean of interaction values within the given 2-Mb window. To identify TAD boundaries, we used an approach that is based on insulation score calculation18, and called TAD boundaries for each chromosome of each cell line with the following parameters: ‘-is 1000000 -ids 200000 -im mean -bmoe 1 -nt 0.1 --v’.
To calculate the significance of overlap between different TAD boundary calls, we converted the boundary regions into binary bins per genome to compare the overlap between previously published IMR90 TAD boundaries4 with our IMR90 boundary calls. We performed logical AND operation, in which the region is counted as overlapping boundaries between two datasets if only two bins for the same genomic location of each condition are 1. We used bootstrapping to determine the distribution of the random overlap numbers between two calls, and calculated P values based on the observed number and distribution of the shuffled boundaries. Shuffled boundaries are generated by randomly assigning boundaries while keeping the number of boundaries per chromosome constant. Obtained shuffled boundaries were also converted to binary string and the same logical AND operation was applied. Shuffling was performed 10,000 times for a given boundary set. This procedure is applied in the rest of our study to generate shuffled boundaries. Next, we computed cumulative distribution of expected overlaps, z-scores were calculated based on the observed number and obtained distribution from bootstrapping. A two-tailed unpaired Student’s t-test was used to calculate P values.
Common TAD boundaries were identified for boundaries of all five cell-types (GM12878, HUVEC, IMR90, HMEC and NHEK) that occurred within two Hi-C bins or 50 kb in genomic range. The same bootstrapping method (described above) was applied to calculate the significance of the overlap between common boundaries with TAD boundaries from the cancer cell lines K562 and MCF7.
To cluster individual TADs (defined as genomic regions between two adjacent common boundaries) based on epigenetic modifications, we used a comprehensive epigenome-profiling dataset from various human cell types. To this end, we used an entropy-based approach (epilogos) to calculate the occurrence of each chromatin state enrichment for a given genomic region across all cell types profiled by Roadmap Epigenome Consortia (http://compbio.mit.edu/epilogos/). We calculated the ratio of a TAD genomic space covered by each chromatin state, divided by the length of the TAD, and generated a normalized matrix in which columns are TADs and rows are each chromatin state, which have been extensively studied by the Roadmap Epigenome Consortia21. We applied hierarchical clustering to rows to identify similar chromatin states and k-means clustering to columns to group TADs that contain similar epigenetic modifications. We performed k-means clustering with k = 2–8 clusters and decided on k = 5 clusters as previous chromatin studies17,19 have used 5 distinct epigenetically modified chromosomal domains and k = 5 corresponded to better visually discernible domains. To determine how our TAD clustering correlate with gene expression in cancer-free and cancerous tissues, we downloaded normalized gene expression values for 2,663 different cancer-free samples from the GTEx Portal36 (v.1.6) and used normalized gene-expression values for ICGC cancer samples. We plotted the median expression of the genes in GTEx and ICGC samples, located in each domain type. Expression differences between heterochromatin and repressed domain expression with active domain expression were tested with one-tailed Mann–Whitney U-test. We also calculated the total number of genomic regions covered by each domain type. Finally, identified open and closed chromatin compartments (at a 100-kb resolution) in cancer samples using DNA methylation levels were identified as described previously37. We determined the percentage of our domain calls covered with open and closed chromatin calls from available cancer types.
We used HiCPlotter45 to plot Hi-C data with different features, TAD boundaries or gene-expression fold changes after deletion between repressed and active domains.
ENCODE and Roadmap data
ENCODE replication timing data were downloaded from the UCSC Genome Browser ENCODE portal for the following cell types: BJ, GM06990, GM12801, GM12812, GM12813, GM12878, HeLa-S3, HepG2, HUVEC, IMR-90, K-562, MCF-7, NHEK and SK-N-SH. Replication timing values for smoothed wavelength transformed data were binned into 25-kb windows across the genome to discretize the data. Averages of the values in each bin across all cell types were calculated and used as average replication timing throughout the study.
We downloaded CTCF binding sites and DNase I hypersensitivity for five cell types (GM12878, HUVEC, IMR90, HMEC, NHEK) from the UCSC Genome Browser ENCODE portal. In addition, H3K9me3 and input DNA ChIP–seq alignment files (.bam) for each cell type were also downloaded. We randomly selected the same number of alignment reads for H3K9me3 and input DNA from .bam files and calculated log2-transformed enrichment levels of H3K9me3 over input DNA.
We downloaded all available CTCF peak-calling results and DNase I hypersensitivity regions from the UCSC Genome Browser ENCODE portal from 80 and 115 different cell lines, respectively (Supplementary Table 11). Occurrences of CTCF-binding and DNase I hypersensitivity sites per 25-kb window across the genome were calculated for all downloaded cell types and used to calculate TAD boundary and shuffled boundary enrichments.
Structural alterations
Somatic and germline variant calls, mutational signatures, subclonal reconstructions, transcript abundance, splice calls and other core data generated by the ICGC/TCGA PCAWG Consortium are described by the lead paper22 of the PCAWG Consortium and available for download at https://dcc.icgc.org/releases/PCAWG. Additional information on accessing the data, including raw read files, can be found at https://docs.icgc.org/pcawg/data/. In accordance with the data access policies of the ICGC and TCGA projects, most molecular, clinical and specimen data are in an open tier that does not require access approval. To access potentially identifying information, such as germline alleles and underlying sequencing data, researchers will need to apply to the TCGA Data Access Committee (DAC) via dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) for access to the TCGA portion of the dataset, and to the ICGC Data Access Compliance Office (DACO; http://icgc.org/daco) for the ICGC portion. In addition, to access somatic SNVs derived from TCGA donors, researchers will need to obtain dbGaP authorization.
We obtained the consensus SV calls and annotations of each variation (deletions, inversions, duplications and complex rearrangements), which can be found at Synapse (https://www.synapse.org/) with accession number syn7596712. The SV classification algorithm is comprehensively defined in another study23. The code for the classification algorithm is available on GitHub (https://github.com/cancerit/ClusterSV/). In brief, this algorithm clusters individual SV junctions into SV events that may involve multiple junctions. The single junction events were interpreted, as the ‘basic’ SV types (deletion, tandem duplication, translocation and inversions). However, in many cases events involving multiple SV junctions were detected. The SV events that involved many SV junctions could not be classified into any simple SV types. Therefore, these SV events were classified as complex. We specifically focused on the events that occurred within a chromosome in this study; we therefore did not use the translocation event calls between different chromosomes. To understand the effects of SVs, we first grouped the deletions, inversions or duplications on the basis of the length of the SVs.
Short-range SVs were identified as events with a length of less than 2 Mb and we mainly focused on these events in this study. BA-SVs were identified as SVs that spanned the whole length of a TAD boundary, the rest of the SVs were classified as ‘within TAD’ in Fig. 1b. To determine the distribution of random BA-SV events, we used the same bootstrapping method mentioned above, mainly generated random boundary events 10,000 times and calculated random BA-SV event distributions. The z-scores and P values were calculated on the basis of the observed number and distribution obtained from bootstrapping. In this study, we analyzed each event separately for deletion, duplication and inversion calls, albeit in a given sample these events might occur concurrently.
Long-range SVs were identified as events with a length of more than 2 Mb and we mentioned the results obtained with long-range SVs in the main text, as appropriate.
To understand the germline BA-SV occurrences, we downloaded structural alteration calls from three different studies: deletion events (total of 8,941) from WGS data of the 1000 Genomes project26; deletions (total of 7,511) and duplications (total of 7,501) from WGS data from 236 individuals representing 125 human populations27; and from a comprehensive review of deletions (total of 11,530) and duplications (total of 1,170) events from 23 different studies including 2,647 different individuals25. We noticed that the number of BA-SVs present in germline deletions and duplications was low and these events happened less than expected by chance, which was estimated using a bootstrapping method.
We next profiled short-range SVs and BA-SVs for each of the cancer studies in our ICGC dataset. To calculate the average number of SVs or BA-SVs per sample for each of the cancer studies, we divided the sum of all observed short-range SVs or BA-SVs in a given cancer type by the total number of samples in that cancer study. Observed SVs and BA-SVs across cancer studies were plotted as stacked bar charts representing deletions, inversions and duplications.
To identify the recurrently affected boundaries in each cancer study, we generated a matrix in which each column represented a sample in the cancer study and rows represented the TAD boundaries. A binary score was assigned to each row (a TAD boundary) that indicated whether that boundary was affected by BA-SV(s) in a given sample. Boundaries that were affected in more than 10% of the samples in a cancer study, are reported as recurrently affected boundaries in Supplementary Table 2. The median length of SVs per cancer type was calculated for all observed short-range SVs in each cancer type and plotted with the standard deviation of lengths. Constitutive insulated neighborhoods were obtained from Supplementary Table 8 of a previous study15 and SVs that affected only one anchor (CTCF-binding site) of an insulated neighborhood were considered as loop-disrupting SVs.
We determined flanking domain annotations of BA-SVs, by identifying the type of the nearest domain for the break-ends of each BA-SV. This analysis resulted in a half-matrix that contained the observed frequencies of pair-wise flanking domain types. We plotted the observed values for BA-SV deletions, inversions, duplications or complex rearrangements separately. To understand the genomic distribution of domain neighborhoods, we counted the flanking domains of each TAD boundary.
To profile SVs between nuclear LADs and inter-LADs, we obtained HMM state calls from three different human cell types for constitutive LADs and constitutive inter-LADs34 from GSE22428. For a filter, we used LAD calls from an independent study3. Genomic coordinates were converted to the hg19 assembly with the UCSC liftover tool. To calculate the significance of the observed overlaps between different SV types and constitutive LAD and constitutive inter-LADs, we used the same bootstrapping method, in which break-ends of each SV type were randomly shuffled on the same chromosome 10,000 times and z-scores were calculated between observed and expected values.
We identified the nearest genes to the break-ends of BA-SVs as the nearest RefSeq genes that did not overlap with the break-ends. The RefSeq gene table was downloaded from the UCSC Genome Browser in May 2016. We called genes located upstream of the 5′ end of an SV upstream genes and genes located downstream of the 3′ end of an SV downstream genes for each BA-SV. Fold changes in expression for each of the upstream and downstream genes were calculated by dividing observed normalized RPKM values in the particular sample with BA-SVs, with average normalized RPKM values in the rest of the same cancer study samples. We filtered the genes with low expression values (<0.1 FPKM), as fold changes with those genes would be seemingly high for even small fluctuations. Copy-number variations could be another confounding factor for observed gene-expression fold changes. Therefore, we obtained consensus copy-number calls for the ICGC cohort based on consensus SV results. We removed cases in which copy numbers are more than four for either the upstream or the downstream genes. In addition, we removed genes that were distal to the break-ends by more than 1 Mb. Expression differences between genes that flanked different BA-SV break-ends were tested using one-tailed Mann–Whitney U-tests.
We used pyvcf (https://pyvcf.readthedocs.org) to load .vcf files and pybedtools46 to perform genomic-interval analyses.
Cancer cell lines
The colon cancer cell lines (SW480, SNU-C1) and breast adenocarcinoma cancer cell line (HCC1954) were obtained from the American Type Culture Collection and the esophageal adenocarcinoma (OE33) cell line was obtained from Sigma-Aldrich. Stocks were stored in liquid nitrogen. These cell lines were authenticated by comparing SV results from previous WGS datasets from the same cancer lines.
WGS data analysis of cancer cell lines
We obtained the WGS datasets of the SW480, SNU-C1 and OE33 cell lines from previous publications41,47,48. To identify consensus SVs for SW480 and OE33 cell lines, we ran DELLY49, Lumpy50 and BRASS51 algorithms. SV breaks-ends reported by two different callers were included in this analysis. For the SNU-C1 cell line, SV calls were obtained from Supplementary Table 2 of a previous publication41, genomic coordinates were converted to the hg19 assembly using the UCSC liftover tool. HCC1954 whole-genome data were previously analyzed by the ICGC Structural Variation subgroup and we used the consensus structural alterations for this cell line.
Cancer cell line Hi-C assay and analysis
Hi-C was performed using the in situ Hi-C protocol as previously described17 using 2–5 million cells per experiment that were digested with the MboI restriction enzyme and analyzed in duplicate. Hi-C libraries were sequenced on a NextSeq 500 or a HiSeq 4000. Reads were aligned to the hg19 reference genome using BWA-MEM52 and PCR duplicates were removed using Picard. Hi-C interaction matrices were generated using in house pipelines, and matrices were normalized using the iterative correction method53. ATAC-seq data for the OE33 cell line were obtained from a previous study54 and H3K27ac ChIP–seq datasets for the HCC1954 and SW480 cell lines were obtained from Hon et al.55 and Rahnamoun et al.56, respectively.
To investigate the potential function of SVs in TAD fusions, we classified the interactions on the basis of the nearest TAD boundary. For each SV, the average interaction frequency was calculated within a 2-Mb region of the SV. This average frequency ratio was used to ‘scale’ the interactions to account for ploidy. This was done by taking the average interaction frequency over that region and dividing it by the genome-wide average (controlling for the distance between loci) over a window of identical size. Certain WGS-defined SVs do not appear to have a signal in the Hi-C data, possibly due to false-positive SV calls, and we excluded regions for which the scaling factor was less than 0.1 to remove potential false-positive calls. In addition, we truncated the default 2-Mb window if there was another SV to avoid biases introduced by complex variants.
Reporting Summary
Further information on research design is available in the Life Sciences Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-019-0564-y.
Supplementary information
Acknowledgements
We thank the patients and their families for contributing to this study, S. Dent, Z. Coban Akdemir, E. Z. Keung, T. Gutschner, D. Spring, J. Korbel and J. Stuart for reading the manuscript, F. Scott, S. Amin, S. Seth, F. Barthel, T. Mang, X. Song and J. Zhang for discussions, all ICGC subgroup participants for generating readily accessible mutation calls and uniformly analyzed gene-expression datasets. This work was supported by a Cancer Prevention Research Institute of Texas award (R1205), the Welch Foundation’s Robert A. Welch Distinguished Chair Award (G-0040 to P.A.F.) and the Emerson Collective Cancer Research Fund (to K.C.A.). J.R.D. is supported by an NIH Director’s Early Independence Award (DP5OD023071). We acknowledge the contributions of the many clinical networks across ICGC and TCGA who provided samples and data to the PCAWG Consortium, and the contributions of the Technical Working Group and the Germline Working Group of the PCAWG Consortium for collation, realignment and harmonized variant calling of the cancer genomes used in this study. We thank the patients and their families for their participation in the individual ICGC and TCGA projects.
Extended data
Author contributions
K.C.A. and P.A.F. designed the study. K.C.A. and J.R.D. performed the computational analysis. V.T.L., S.C. and J.R.D. performed the Hi-C experiments on SW480, SNU-C1, HCC1954 and OE33 cell lines. All authors discussed the results and commented on the manuscript. R.B. and P.J.C. were working group or project co-leaders.
Data availability
Aligned sequencing data, as well as somatic and germline variant calls from PCAWG tumors, including SNVs, indels, copy number alterations and SVs, are available for download at https://dcc.icgc.org/releases/PCAWG. Additional information on accessing the data, including raw read files, can be found at https://docs.icgc.org/pcawg/data/. In accordance with the data-access policies of the ICGC and TCGA projects, most molecular, clinical and specimen data are in an open tier that does not require access approval. To access potentially identifying information, such as germline alleles and underlying sequencing data, researchers will need to apply to the TCGA Data Access Committee (DAC) via dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) for access to the TCGA portion of the dataset, and to the ICGC Data Access Compliance Office (DACO; http://icgc.org/daco) for the ICGC portion. In addition, to access somatic SNVs derived from TCGA donors, researchers will also need to obtain dbGaP authorization.
We obtained the consensus SV calls and annotations of each variation (deletions, inversions, duplications and complex rearrangements), which can be found at Synapse (https://www.synapse.org/) with accession number syn7596712.
Hi-C data have been deposited at GEO under accession code GSE116694.
Code availability
The core computational pipelines used by the PCAWG Consortium for alignment, quality control and variant calling are available to the public at https://dockstore.org/search?search=pcawg under a GNU General Public License v.3.0, which allows for reuse and distribution.
Competing interests
R.B. owns equity in Ampressa Therapeutics, is the chair of the scientific advisory board of and consultant for OrigiMed, has received research funding from Bayer and Ono Pharma, and receives patent royalties from LabCorp. All other authors have no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A list of authors and their affiliations appears at the end of the paper.
A list of authors and their affiliations appears online.
Change history
3/21/2023
A Correction to this paper has been published: 10.1038/s41588-023-01318-w
Contributor Information
P. Andrew Futreal, Email: afutreal@mdanderson.org.
PCAWG Structural Variation Working Group:
Kadir C. Akdemir, Eva G. Alvarez, Adrian Baez-Ortega, Paul C. Boutros, David D. L. Bowtell, Benedikt Brors, Kathleen H. Burns, Peter J. Campbell, Kin Chan, Ken Chen, Isidro Cortés-Ciriano, Ana Dueso-Barroso, Andrew J. Dunford, Paul A. Edwards, Xavier Estivill, Dariush Etemadmoghadam, Lars Feuerbach, J. Lynn Fink, Milana Frenkel-Morgenstern, Dale W. Garsed, Mark Gerstein, Dmitry A. Gordenin, David Haan, James E. Haber, Julian M. Hess, Barbara Hutter, Marcin Imielinski, David T. W. Jones, Young Seok Ju, Marat D. Kazanov, Leszek J. Klimczak, Youngil Koh, Jan O. Korbel, Kiran Kumar, Eunjung Alice Lee, Jake June-Koo Lee, Yilong Li, Andy G. Lynch, Geoff Macintyre, Florian Markowetz, Iñigo Martincorena, Alexander Martinez-Fundichely, Matthew Meyerson, Satoru Miyano, Hidewaki Nakagawa, Fabio C. P. Navarro, Stephan Ossowski, Peter J. Park, John V. Pearson, Montserrat Puiggròs, Karsten Rippe, Nicola D. Roberts, Steven A. Roberts, Bernardo Rodriguez-Martin, Steven E. Schumacher, Ralph Scully, Mark Shackleton, Nikos Sidiropoulos, Lina Sieverling, Chip Stewart, David Torrents, Jose M. C. Tubio, Izar Villasante, Nicola Waddell, Jeremiah A. Wala, Joachim Weischenfeldt, Lixing Yang, Xiaotong Yao, Sung-Soo Yoon, Jorge Zamora, and Cheng-Zhong Zhang
PCAWG Consortium:
Lauri A. Aaltonen, Federico Abascal, Adam Abeshouse, Hiroyuki Aburatani, David J. Adams, Nishant Agrawal, Keun Soo Ahn, Sung-Min Ahn, Hiroshi Aikata, Rehan Akbani, Kadir C. Akdemir, Hikmat Al-Ahmadie, Sultan T. Al-Sedairy, Fatima Al-Shahrour, Malik Alawi, Monique Albert, Kenneth Aldape, Ludmil B. Alexandrov, Adrian Ally, Kathryn Alsop, Eva G. Alvarez, Fernanda Amary, Samirkumar B. Amin, Brice Aminou, Ole Ammerpohl, Matthew J. Anderson, Yeng Ang, Davide Antonello, Pavana Anur, Samuel Aparicio, Elizabeth L. Appelbaum, Yasuhito Arai, Axel Aretz, Koji Arihiro, Shun-ichi Ariizumi, Joshua Armenia, Laurent Arnould, Sylvia Asa, Yassen Assenov, Gurnit Atwal, Sietse Aukema, J. Todd Auman, Miriam R. R. Aure, Philip Awadalla, Marta Aymerich, Gary D. Bader, Adrian Baez-Ortega, Matthew H. Bailey, Peter J. Bailey, Miruna Balasundaram, Saianand Balu, Pratiti Bandopadhayay, Rosamonde E. Banks, Stefano Barbi, Andrew P. Barbour, Jonathan Barenboim, Jill Barnholtz-Sloan, Hugh Barr, Elisabet Barrera, John Bartlett, Javier Bartolome, Claudio Bassi, Oliver F. Bathe, Daniel Baumhoer, Prashant Bavi, Stephen B. Baylin, Wojciech Bazant, Duncan Beardsmore, Timothy A. Beck, Sam Behjati, Andreas Behren, Beifang Niu, Cindy Bell, Sergi Beltran, Christopher Benz, Andrew Berchuck, Anke K. Bergmann, Erik N. Bergstrom, Benjamin P. Berman, Daniel M. Berney, Stephan H. Bernhart, Rameen Beroukhim, Mario Berrios, Samantha Bersani, Johanna Bertl, Miguel Betancourt, Vinayak Bhandari, Shriram G. Bhosle, Andrew V. Biankin, Matthias Bieg, Darell Bigner, Hans Binder, Ewan Birney, Michael Birrer, Nidhan K. Biswas, Bodil Bjerkehagen, Tom Bodenheimer, Lori Boice, Giada Bonizzato, Johann S. De Bono, Arnoud Boot, Moiz S. Bootwalla, Ake Borg, Arndt Borkhardt, Keith A. Boroevich, Ivan Borozan, Christoph Borst, Marcus Bosenberg, Mattia Bosio, Jacqueline Boultwood, Guillaume Bourque, Paul C. Boutros, G. Steven Bova, David T. Bowen, Reanne Bowlby, David D. L. Bowtell, Sandrine Boyault, Rich Boyce, Jeffrey Boyd, Alvis Brazma, Paul Brennan, Daniel S. Brewer, Arie B. Brinkman, Robert G. Bristow, Russell R. Broaddus, Jane E. Brock, Malcolm Brock, Annegien Broeks, Angela N. Brooks, Denise Brooks, Benedikt Brors, Søren Brunak, Timothy J. C. Bruxner, Alicia L. Bruzos, Alex Buchanan, Ivo Buchhalter, Christiane Buchholz, Susan Bullman, Hazel Burke, Birgit Burkhardt, Kathleen H. Burns, John Busanovich, Carlos D. Bustamante, Adam P. Butler, Atul J. Butte, Niall J. Byrne, Anne-Lise Børresen-Dale, Samantha J. Caesar-Johnson, Andy Cafferkey, Declan Cahill, Claudia Calabrese, Carlos Caldas, Fabien Calvo, Niedzica Camacho, Peter J. Campbell, Elias Campo, Cinzia Cantù, Shaolong Cao, Thomas E. Carey, Joana Carlevaro-Fita, Rebecca Carlsen, Ivana Cataldo, Mario Cazzola, Jonathan Cebon, Robert Cerfolio, Dianne E. Chadwick, Dimple Chakravarty, Don Chalmers, Calvin Wing Yiu Chan, Kin Chan, Michelle Chan-Seng-Yue, Vishal S. Chandan, David K. Chang, Stephen J. Chanock, Lorraine A. Chantrill, Aurélien Chateigner, Nilanjan Chatterjee, Kazuaki Chayama, Hsiao-Wei Chen, Jieming Chen, Ken Chen, Yiwen Chen, Zhaohong Chen, Andrew D. Cherniack, Jeremy Chien, Yoke-Eng Chiew, Suet-Feung Chin, Juok Cho, Sunghoon Cho, Jung Kyoon Choi, Wan Choi, Christine Chomienne, Zechen Chong, Su Pin Choo, Angela Chou, Angelika N. Christ, Elizabeth L. Christie, Eric Chuah, Carrie Cibulskis, Kristian Cibulskis, Sara Cingarlini, Peter Clapham, Alexander Claviez, Sean Cleary, Nicole Cloonan, Marek Cmero, Colin C. Collins, Ashton A. Connor, Susanna L. Cooke, Colin S. Cooper, Leslie Cope, Vincenzo Corbo, Matthew G. Cordes, Stephen M. Cordner, Isidro Cortés-Ciriano, Kyle Covington, Prue A. Cowin, Brian Craft, David Craft, Chad J. Creighton, Yupeng Cun, Erin Curley, Ioana Cutcutache, Karolina Czajka, Bogdan Czerniak, Rebecca A. Dagg, Ludmila Danilova, Maria Vittoria Davi, Natalie R. Davidson, Helen Davies, Ian J. Davis, Brandi N. Davis-Dusenbery, Kevin J. Dawson, Francisco M. De La Vega, Ricardo De Paoli-Iseppi, Timothy Defreitas, Angelo P. Dei Tos, Olivier Delaneau, John A. Demchok, Jonas Demeulemeester, German M. Demidov, Deniz Demircioğlu, Nening M. Dennis, Robert E. Denroche, Stefan C. Dentro, Nikita Desai, Vikram Deshpande, Amit G. Deshwar, Christine Desmedt, Jordi Deu-Pons, Noreen Dhalla, Neesha C. Dhani, Priyanka Dhingra, Rajiv Dhir, Anthony DiBiase, Klev Diamanti, Li Ding, Shuai Ding, Huy Q. Dinh, Luc Dirix, HarshaVardhan Doddapaneni, Nilgun Donmez, Michelle T. Dow, Ronny Drapkin, Oliver Drechsel, Ruben M. Drews, Serge Serge, Tim Dudderidge, Ana Dueso-Barroso, Andrew J. Dunford, Michael Dunn, Lewis Jonathan Dursi, Fraser R. Duthie, Ken Dutton-Regester, Jenna Eagles, Douglas F. Easton, Stuart Edmonds, Paul A. Edwards, Sandra E. Edwards, Rosalind A. Eeles, Anna Ehinger, Juergen Eils, Roland Eils, Adel El-Naggar, Matthew Eldridge, Kyle Ellrott, Serap Erkek, Georgia Escaramis, Shadrielle M. G. Espiritu, Xavier Estivill, Dariush Etemadmoghadam, Jorunn E. Eyfjord, Bishoy M. Faltas, Daiming Fan, Yu Fan, William C. Faquin, Claudiu Farcas, Matteo Fassan, Aquila Fatima, Francesco Favero, Nodirjon Fayzullaev, Ina Felau, Sian Fereday, Martin L. Ferguson, Vincent Ferretti, Lars Feuerbach, Matthew A. Field, J. Lynn Fink, Gaetano Finocchiaro, Cyril Fisher, Matthew W. Fittall, Anna Fitzgerald, Rebecca C. Fitzgerald, Adrienne M. Flanagan, Neil E. Fleshner, Paul Flicek, John A. Foekens, Kwun M. Fong, Nuno A. Fonseca, Christopher S. Foster, Natalie S. Fox, Michael Fraser, Scott Frazer, Milana Frenkel-Morgenstern, William Friedman, Joan Frigola, Catrina C. Fronick, Akihiro Fujimoto, Masashi Fujita, Masashi Fukayama, Lucinda A. Fulton, Robert S. Fulton, Mayuko Furuta, P. Andrew Futreal, Anja Füllgrabe, Stacey B. Gabriel, Steven Gallinger, Carlo Gambacorti-Passerini, Jianjiong Gao, Shengjie Gao, Levi Garraway, Øystein Garred, Erik Garrison, Dale W. Garsed, Nils Gehlenborg, Josep L. L. Gelpi, Joshy George, Daniela S. Gerhard, Clarissa Gerhauser, Jeffrey E. Gershenwald, Mark Gerstein, Moritz Gerstung, Gad Getz, Mohammed Ghori, Ronald Ghossein, Nasra H. Giama, Richard A. Gibbs, Bob Gibson, Anthony J. Gill, Pelvender Gill, Dilip D. Giri, Dominik Glodzik, Vincent J. Gnanapragasam, Maria Elisabeth Goebler, Mary J. Goldman, Carmen Gomez, Santiago Gonzalez, Abel Gonzalez-Perez, Dmitry A. Gordenin, James Gossage, Kunihito Gotoh, Ramaswamy Govindan, Dorthe Grabau, Janet S. Graham, Robert C. Grant, Anthony R. Green, Eric Green, Liliana Greger, Nicola Grehan, Sonia Grimaldi, Sean M. Grimmond, Robert L. Grossman, Adam Grundhoff, Gunes Gundem, Qianyun Guo, Manaswi Gupta, Shailja Gupta, Ivo G. Gut, Marta Gut, Jonathan Göke, Gavin Ha, Andrea Haake, David Haan, Siegfried Haas, Kerstin Haase, James E. Haber, Nina Habermann, Faraz Hach, Syed Haider, Natsuko Hama, Freddie C. Hamdy, Anne Hamilton, Mark P. Hamilton, Leng Han, George B. Hanna, Martin Hansmann, Nicholas J. Haradhvala, Olivier Harismendy, Ivon Harliwong, Arif O. Harmanci, Eoghan Harrington, Takanori Hasegawa, David Haussler, Steve Hawkins, Shinya Hayami, Shuto Hayashi, D. Neil Hayes, Stephen J. Hayes, Nicholas K. Hayward, Steven Hazell, Yao He, Allison P. Heath, Simon C. Heath, David Hedley, Apurva M. Hegde, David I. Heiman, Michael C. Heinold, Zachary Heins, Lawrence E. Heisler, Eva Hellstrom-Lindberg, Mohamed Helmy, Seong Gu Heo, Austin J. Hepperla, José María Heredia-Genestar, Carl Herrmann, Peter Hersey, Julian M. Hess, Holmfridur Hilmarsdottir, Jonathan Hinton, Satoshi Hirano, Nobuyoshi Hiraoka, Katherine A. Hoadley, Asger Hobolth, Ermin Hodzic, Jessica I. Hoell, Steve Hoffmann, Oliver Hofmann, Andrea Holbrook, Aliaksei Z. Holik, Michael A. Hollingsworth, Oliver Holmes, Robert A. Holt, Chen Hong, Eun Pyo Hong, Jongwhi H. Hong, Gerrit K. Hooijer, Henrik Hornshøj, Fumie Hosoda, Yong Hou, Volker Hovestadt, William Howat, Alan P. Hoyle, Ralph H. Hruban, Jianhong Hu, Taobo Hu, Xing Hua, Kuan-lin Huang, Mei Huang, Mi Ni Huang, Vincent Huang, Yi Huang, Wolfgang Huber, Thomas J. Hudson, Michael Hummel, Jillian A. Hung, David Huntsman, Ted R. Hupp, Jason Huse, Matthew R. Huska, Barbara Hutter, Carolyn M. Hutter, Daniel Hübschmann, Christine A. Iacobuzio-Donahue, Charles David Imbusch, Marcin Imielinski, Seiya Imoto, William B. Isaacs, Keren Isaev, Shumpei Ishikawa, Murat Iskar, S. M. Ashiqul Islam, Michael Ittmann, Sinisa Ivkovic, Jose M. G. Izarzugaza, Jocelyne Jacquemier, Valerie Jakrot, Nigel B. Jamieson, Gun Ho Jang, Se Jin Jang, Joy C. Jayaseelan, Reyka Jayasinghe, Stuart R. Jefferys, Karine Jegalian, Jennifer L. Jennings, Seung-Hyup Jeon, Lara Jerman, Yuan Ji, Wei Jiao, Peter A. Johansson, Amber L. Johns, Jeremy Johns, Rory Johnson, Todd A. Johnson, Clemency Jolly, Yann Joly, Jon G. Jonasson, Corbin D. Jones, David R. Jones, David T. W. Jones, Nic Jones, Steven J. M. Jones, Jos Jonkers, Young Seok Ju, Hartmut Juhl, Jongsun Jung, Malene Juul, Randi Istrup Juul, Sissel Juul, Natalie Jäger, Rolf Kabbe, Andre Kahles, Abdullah Kahraman, Vera B. Kaiser, Hojabr Kakavand, Sangeetha Kalimuthu, Christof von Kalle, Koo Jeong Kang, Katalin Karaszi, Beth Karlan, Rosa Karlić, Dennis Karsch, Katayoon Kasaian, Karin S. Kassahn, Hitoshi Katai, Mamoru Kato, Hiroto Katoh, Yoshiiku Kawakami, Jonathan D. Kay, Stephen H. Kazakoff, Marat D. Kazanov, Maria Keays, Electron Kebebew, Richard F. Kefford, Manolis Kellis, James G. Kench, Catherine J. Kennedy, Jules N. A. Kerssemakers, David Khoo, Vincent Khoo, Narong Khuntikeo, Ekta Khurana, Helena Kilpinen, Hark Kyun Kim, Hyung-Lae Kim, Hyung-Yong Kim, Hyunghwan Kim, Jaegil Kim, Jihoon Kim, Jong K. Kim, Youngwook Kim, Tari A. King, Wolfram Klapper, Kortine Kleinheinz, Leszek J. Klimczak, Stian Knappskog, Michael Kneba, Bartha M. Knoppers, Youngil Koh, Jan Komorowski, Daisuke Komura, Mitsuhiro Komura, Gu Kong, Marcel Kool, Jan O. Korbel, Viktoriya Korchina, Andrey Korshunov, Michael Koscher, Roelof Koster, Zsofia Kote-Jarai, Antonios Koures, Milena Kovacevic, Barbara Kremeyer, Helene Kretzmer, Markus Kreuz, Savitri Krishnamurthy, Dieter Kube, Kiran Kumar, Pardeep Kumar, Sushant Kumar, Yogesh Kumar, Ritika Kundra, Kirsten Kübler, Ralf Küppers, Jesper Lagergren, Phillip H. Lai, Peter W. Laird, Sunil R. Lakhani, Christopher M. Lalansingh, Emilie Lalonde, Fabien C. Lamaze, Adam Lambert, Eric Lander, Pablo Landgraf, Luca Landoni, Anita Langerød, Andrés Lanzós, Denis Larsimont, Erik Larsson, Mark Lathrop, Loretta M. S. Lau, Chris Lawerenz, Rita T. Lawlor, Michael S. Lawrence, Alexander J. Lazar, Ana Mijalkovic Lazic, Xuan Le, Darlene Lee, Donghoon Lee, Eunjung Alice Lee, Hee Jin Lee, Jake June-Koo Lee, Jeong-Yeon Lee, Juhee Lee, Ming Ta Michael Lee, Henry Lee-Six, Kjong-Van Lehmann, Hans Lehrach, Dido Lenze, Conrad R. Leonard, Daniel A. Leongamornlert, Ignaty Leshchiner, Louis Letourneau, Ivica Letunic, Douglas A. Levine, Lora Lewis, Tim Ley, Chang Li, Constance H. Li, Haiyan Irene Li, Jun Li, Lin Li, Shantao Li, Siliang Li, Xiaobo Li, Xiaotong Li, Xinyue Li, Yilong Li, Han Liang, Sheng-Ben Liang, Peter Lichter, Pei Lin, Ziao Lin, W. M. Linehan, Ole Christian Lingjærde, Dongbing Liu, Eric Minwei Liu, Fei-Fei Fei Liu, Fenglin Liu, Jia Liu, Xingmin Liu, Julie Livingstone, Dimitri Livitz, Naomi Livni, Lucas Lochovsky, Markus Loeffler, Georgina V. Long, Armando Lopez-Guillermo, Shaoke Lou, David N. Louis, Laurence B. Lovat, Yiling Lu, Yong-Jie Lu, Youyong Lu, Claudio Luchini, Ilinca Lungu, Xuemei Luo, Hayley J. Luxton, Andy G. Lynch, Lisa Lype, Cristina López, Carlos López-Otín, Eric Z. Ma, Yussanne Ma, Gaetan MacGrogan, Shona MacRae, Geoff Macintyre, Tobias Madsen, Kazuhiro Maejima, Andrea Mafficini, Dennis T. Maglinte, Arindam Maitra, Partha P. Majumder, Luca Malcovati, Salem Malikic, Giuseppe Malleo, Graham J. Mann, Luisa Mantovani-Löffler, Kathleen Marchal, Giovanni Marchegiani, Elaine R. Mardis, Adam A. Margolin, Maximillian G. Marin, Florian Markowetz, Julia Markowski, Jeffrey Marks, Tomas Marques-Bonet, Marco A. Marra, Luke Marsden, John W. M. Martens, Sancha Martin, Jose I. Martin-Subero, Iñigo Martincorena, Alexander Martinez-Fundichely, Yosef E. Maruvka, R. Jay Mashl, Charlie E. Massie, Thomas J. Matthew, Lucy Matthews, Erik Mayer, Simon Mayes, Michael Mayo, Faridah Mbabaali, Karen McCune, Ultan McDermott, Patrick D. McGillivray, Michael D. McLellan, John D. McPherson, John R. McPherson, Treasa A. McPherson, Samuel R. Meier, Alice Meng, Shaowu Meng, Andrew Menzies, Neil D. Merrett, Sue Merson, Matthew Meyerson, William Meyerson, Piotr A. Mieczkowski, George L. Mihaiescu, Sanja Mijalkovic, Tom Mikkelsen, Michele Milella, Linda Mileshkin, Christopher A. Miller, David K. Miller, Jessica K. Miller, Gordon B. Mills, Ana Milovanovic, Sarah Minner, Marco Miotto, Gisela Mir Arnau, Lisa Mirabello, Chris Mitchell, Thomas J. Mitchell, Satoru Miyano, Naoki Miyoshi, Shinichi Mizuno, Fruzsina Molnár-Gábor, Malcolm J. Moore, Richard A. Moore, Sandro Morganella, Quaid D. Morris, Carl Morrison, Lisle E. Mose, Catherine D. Moser, Ferran Muiños, Loris Mularoni, Andrew J. Mungall, Karen Mungall, Elizabeth A. Musgrove, Ville Mustonen, David Mutch, Francesc Muyas, Donna M. Muzny, Alfonso Muñoz, Jerome Myers, Ola Myklebost, Peter Möller, Genta Nagae, Adnan M. Nagrial, Hardeep K. Nahal-Bose, Hitoshi Nakagama, Hidewaki Nakagawa, Hiromi Nakamura, Toru Nakamura, Kaoru Nakano, Tannistha Nandi, Jyoti Nangalia, Mia Nastic, Arcadi Navarro, Fabio C. P. Navarro, David E. Neal, Gerd Nettekoven, Felicity Newell, Steven J. Newhouse, Yulia Newton, Alvin Wei Tian Ng, Anthony Ng, Jonathan Nicholson, David Nicol, Yongzhan Nie, G. Petur Nielsen, Morten Muhlig Nielsen, Serena Nik-Zainal, Michael S. Noble, Katia Nones, Paul A. Northcott, Faiyaz Notta, Brian D. O’Connor, Peter O’Donnell, Maria O’Donovan, Sarah O’Meara, Brian Patrick O’Neill, J. Robert O’Neill, David Ocana, Angelica Ochoa, Layla Oesper, Christopher Ogden, Hideki Ohdan, Kazuhiro Ohi, Lucila Ohno-Machado, Karin A. Oien, Akinyemi I. Ojesina, Hidenori Ojima, Takuji Okusaka, Larsson Omberg, Choon Kiat Ong, Stephan Ossowski, German Ott, B. F. Francis Ouellette, Christine P’ng, Marta Paczkowska, Salvatore Paiella, Chawalit Pairojkul, Marina Pajic, Qiang Pan-Hammarström, Elli Papaemmanuil, Irene Papatheodorou, Nagarajan Paramasivam, Ji Wan Park, Joong-Won Park, Keunchil Park, Kiejung Park, Peter J. Park, Joel S. Parker, Simon L. Parsons, Harvey Pass, Danielle Pasternack, Alessandro Pastore, Ann-Marie Patch, Iris Pauporté, Antonio Pea, John V. Pearson, Chandra Sekhar Pedamallu, Jakob Skou Pedersen, Paolo Pederzoli, Martin Peifer, Nathan A. Pennell, Charles M. Perou, Marc D. Perry, Gloria M. Petersen, Myron Peto, Nicholas Petrelli, Robert Petryszak, Stefan M. Pfister, Mark Phillips, Oriol Pich, Hilda A. Pickett, Todd D. Pihl, Nischalan Pillay, Sarah Pinder, Mark Pinese, Andreia V. Pinho, Esa Pitkänen, Xavier Pivot, Elena Piñeiro-Yáñez, Laura Planko, Christoph Plass, Paz Polak, Tirso Pons, Irinel Popescu, Olga Potapova, Aparna Prasad, Shaun R. Preston, Manuel Prinz, Antonia L. Pritchard, Stephenie D. Prokopec, Elena Provenzano, Xose S. Puente, Sonia Puig, Montserrat Puiggròs, Sergio Pulido-Tamayo, Gulietta M. Pupo, Colin A. Purdie, Michael C. Quinn, Raquel Rabionet, Janet S. Rader, Bernhard Radlwimmer, Petar Radovic, Benjamin Raeder, Keiran M. Raine, Manasa Ramakrishna, Kamna Ramakrishnan, Suresh Ramalingam, Benjamin J. Raphael, W. Kimryn Rathmell, Tobias Rausch, Guido Reifenberger, Jüri Reimand, Jorge Reis-Filho, Victor Reuter, Iker Reyes-Salazar, Matthew A. Reyna, Sheila M. Reynolds, Esther Rheinbay, Yasser Riazalhosseini, Andrea L. Richardson, Julia Richter, Matthew Ringel, Markus Ringnér, Yasushi Rino, Karsten Rippe, Jeffrey Roach, Lewis R. Roberts, Nicola D. Roberts, Steven A. Roberts, A. Gordon Robertson, Alan J. Robertson, Javier Bartolomé Rodriguez, Bernardo Rodriguez-Martin, F. Germán Rodríguez-González, Michael H. A. Roehrl, Marius Rohde, Hirofumi Rokutan, Gilles Romieu, Ilse Rooman, Tom Roques, Daniel Rosebrock, Mara Rosenberg, Philip C. Rosenstiel, Andreas Rosenwald, Edward W. Rowe, Romina Royo, Steven G. Rozen, Yulia Rubanova, Mark A. Rubin, Carlota Rubio-Perez, Vasilisa A. Rudneva, Borislav C. Rusev, Andrea Ruzzenente, Gunnar Rätsch, Radhakrishnan Sabarinathan, Veronica Y. Sabelnykova, Sara Sadeghi, S. Cenk Sahinalp, Natalie Saini, Mihoko Saito-Adachi, Gordon Saksena, Adriana Salcedo, Roberto Salgado, Leonidas Salichos, Richard Sallari, Charles Saller, Roberto Salvia, Michelle Sam, Jaswinder S. Samra, Francisco Sanchez-Vega, Chris Sander, Grant Sanders, Rajiv Sarin, Iman Sarrafi, Aya Sasaki-Oku, Torill Sauer, Guido Sauter, Robyn P. M. Saw, Maria Scardoni, Christopher J. Scarlett, Aldo Scarpa, Ghislaine Scelo, Dirk Schadendorf, Jacqueline E. Schein, Markus B. Schilhabel, Matthias Schlesner, Thorsten Schlomm, Heather K. Schmidt, Sarah-Jane Schramm, Stefan Schreiber, Nikolaus Schultz, Steven E. Schumacher, Roland F. Schwarz, Richard A. Scolyer, David Scott, Ralph Scully, Raja Seethala, Ayellet V. Segre, Iris Selander, Colin A. Semple, Yasin Senbabaoglu, Subhajit Sengupta, Elisabetta Sereni, Stefano Serra, Dennis C. Sgroi, Mark Shackleton, Nimish C. Shah, Sagedeh Shahabi, Catherine A. Shang, Ping Shang, Ofer Shapira, Troy Shelton, Ciyue Shen, Hui Shen, Rebecca Shepherd, Ruian Shi, Yan Shi, Yu-Jia Shiah, Tatsuhiro Shibata, Juliann Shih, Eigo Shimizu, Kiyo Shimizu, Seung Jun Shin, Yuichi Shiraishi, Tal Shmaya, Ilya Shmulevich, Solomon I. Shorser, Charles Short, Raunak Shrestha, Suyash S. Shringarpure, Craig Shriver, Shimin Shuai, Nikos Sidiropoulos, Reiner Siebert, Anieta M. Sieuwerts, Lina Sieverling, Sabina Signoretti, Katarzyna O. Sikora, Michele Simbolo, Ronald Simon, Janae V. Simons, Jared T. Simpson, Peter T. Simpson, Samuel Singer, Nasa Sinnott-Armstrong, Payal Sipahimalani, Tara J. Skelly, Marcel Smid, Jaclyn Smith, Karen Smith-McCune, Nicholas D. Socci, Heidi J. Sofia, Matthew G. Soloway, Lei Song, Anil K. Sood, Sharmila Sothi, Christos Sotiriou, Cameron M. Soulette, Paul N. Span, Paul T. Spellman, Nicola Sperandio, Andrew J. Spillane, Oliver Spiro, Jonathan Spring, Johan Staaf, Peter F. Stadler, Peter Staib, Stefan G. Stark, Lucy Stebbings, Ólafur Andri Stefánsson, Oliver Stegle, Lincoln D. Stein, Alasdair Stenhouse, Chip Stewart, Stephan Stilgenbauer, Miranda D. Stobbe, Michael R. Stratton, Jonathan R. Stretch, Adam J. Struck, Joshua M. Stuart, Henk G. Stunnenberg, Hong Su, Xiaoping Su, Ren X. Sun, Stephanie Sungalee, Hana Susak, Akihiro Suzuki, Fred Sweep, Monika Szczepanowski, Holger Sültmann, Takashi Yugawa, Angela Tam, David Tamborero, Benita Kiat Tee Tan, Donghui Tan, Patrick Tan, Hiroko Tanaka, Hirokazu Taniguchi, Tomas J. Tanskanen, Maxime Tarabichi, Roy Tarnuzzer, Patrick Tarpey, Morgan L. Taschuk, Kenji Tatsuno, Simon Tavaré, Darrin F. Taylor, Amaro Taylor-Weiner, Jon W. Teague, Bin Tean Teh, Varsha Tembe, Javier Temes, Kevin Thai, Sarah P. Thayer, Nina Thiessen, Gilles Thomas, Sarah Thomas, Alan Thompson, Alastair M. Thompson, John F. F. Thompson, R. Houston Thompson, Heather Thorne, Leigh B. Thorne, Adrian Thorogood, Grace Tiao, Nebojsa Tijanic, Lee E. Timms, Roberto Tirabosco, Marta Tojo, Stefania Tommasi, Christopher W. Toon, Umut H. Toprak, David Torrents, Giampaolo Tortora, Jörg Tost, Yasushi Totoki, David Townend, Nadia Traficante, Isabelle Treilleux, Jean-Rémi Trotta, Lorenz H. P. Trümper, Ming Tsao, Tatsuhiko Tsunoda, Jose M. C. Tubio, Olga Tucker, Richard Turkington, Daniel J. Turner, Andrew Tutt, Masaki Ueno, Naoto T. Ueno, Christopher Umbricht, Husen M. Umer, Timothy J. Underwood, Lara Urban, Tomoko Urushidate, Tetsuo Ushiku, Liis Uusküla-Reimand, Alfonso Valencia, David J. Van Den Berg, Steven Van Laere, Peter Van Loo, Erwin G. Van Meir, Gert G. Van den Eynden, Theodorus Van der Kwast, Naveen Vasudev, Miguel Vazquez, Ravikiran Vedururu, Umadevi Veluvolu, Shankar Vembu, Lieven P. C. Verbeke, Peter Vermeulen, Clare Verrill, Alain Viari, David Vicente, Caterina Vicentini, K. VijayRaghavan, Juris Viksna, Ricardo E. Vilain, Izar Villasante, Anne Vincent-Salomon, Tapio Visakorpi, Douglas Voet, Paresh Vyas, Ignacio Vázquez-García, Nick M. Waddell, Nicola Waddell, Claes Wadelius, Lina Wadi, Rabea Wagener, Jeremiah A. Wala, Jian Wang, Jiayin Wang, Linghua Wang, Qi Wang, Wenyi Wang, Yumeng Wang, Zhining Wang, Paul M. Waring, Hans-Jörg Warnatz, Jonathan Warrell, Anne Y. Warren, Sebastian M. Waszak, David C. Wedge, Dieter Weichenhan, Paul Weinberger, John N. Weinstein, Joachim Weischenfeldt, Daniel J. Weisenberger, Ian Welch, Michael C. Wendl, Johannes Werner, Justin P. Whalley, David A. Wheeler, Hayley C. Whitaker, Dennis Wigle, Matthew D. Wilkerson, Ashley Williams, James S. Wilmott, Gavin W. Wilson, Julie M. Wilson, Richard K. Wilson, Boris Winterhoff, Jeffrey A. Wintersinger, Maciej Wiznerowicz, Stephan Wolf, Bernice H. Wong, Tina Wong, Winghing Wong, Youngchoon Woo, Scott Wood, Bradly G. Wouters, Adam J. Wright, Derek W. Wright, Mark H. Wright, Chin-Lee Wu, Dai-Ying Wu, Guanming Wu, Jianmin Wu, Kui Wu, Yang Wu, Zhenggang Wu, Liu Xi, Tian Xia, Qian Xiang, Xiao Xiao, Rui Xing, Heng Xiong, Qinying Xu, Yanxun Xu, Hong Xue, Shinichi Yachida, Sergei Yakneen, Rui Yamaguchi, Takafumi N. Yamaguchi, Masakazu Yamamoto, Shogo Yamamoto, Hiroki Yamaue, Fan Yang, Huanming Yang, Jean Y. Yang, Liming Yang, Lixing Yang, Shanlin Yang, Tsun-Po Yang, Yang Yang, Xiaotong Yao, Marie-Laure Yaspo, Lucy Yates, Christina Yau, Chen Ye, Kai Ye, Venkata D. Yellapantula, Christopher J. Yoon, Sung-Soo Yoon, Fouad Yousif, Jun Yu, Kaixian Yu, Willie Yu, Yingyan Yu, Ke Yuan, Yuan Yuan, Denis Yuen, Christina K. Yung, Olga Zaikova, Jorge Zamora, Marc Zapatka, Jean C. Zenklusen, Thorsten Zenz, Nikolajs Zeps, Cheng-Zhong Zhang, Fan Zhang, Hailei Zhang, Hongwei Zhang, Hongxin Zhang, Jiashan Zhang, Jing Zhang, Junjun Zhang, Xiuqing Zhang, Xuanping Zhang, Yan Zhang, Zemin Zhang, Zhongming Zhao, Liangtao Zheng, Xiuqing Zheng, Wanding Zhou, Yong Zhou, Bin Zhu, Hongtu Zhu, Jingchun Zhu, Shida Zhu, Lihua Zou, Xueqing Zou, Anna deFazio, Nicholas van As, Carolien H. M. van Deurzen, Marc J. van de Vijver, L. van’t Veer, and Christian von Mering
Extended data
is available for this paper at 10.1038/s41588-019-0564-y.
Supplementary information
is available for this paper at 10.1038/s41588-019-0564-y.
References
- 1.Dekker J, Heard E. Structural and functional diversity of topologically associating domains. FEBS Lett. 2015;589:2877–2884. doi: 10.1016/j.febslet.2015.08.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bonev B, Cavalli G. Organization and function of the 3D genome. Nat. Rev. Genet. 2016;17:661–678. doi: 10.1038/nrg.2016.112. [DOI] [PubMed] [Google Scholar]
- 3.Guelen L, et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 2008;453:948–951. doi: 10.1038/nature06947. [DOI] [PubMed] [Google Scholar]
- 4.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nora EP, et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012;485:381–385. doi: 10.1038/nature11049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.de Laat W, Duboule D. Topology of mammalian developmental enhancers and their regulatory landscapes. Nature. 2013;502:499–506. doi: 10.1038/nature12753. [DOI] [PubMed] [Google Scholar]
- 7.Vietri Rudan M, et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 2015;10:1297–1309. doi: 10.1016/j.celrep.2015.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ibn-Salem J, et al. Deletions of chromosomal regulatory boundaries are associated with congenital disease. Genome Biol. 2014;15:423. doi: 10.1186/s13059-014-0423-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lupiáñez DG, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–1025. doi: 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Franke M, et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature. 2016;538:265–269. doi: 10.1038/nature19800. [DOI] [PubMed] [Google Scholar]
- 11.Weischenfeldt J, et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 2017;49:65–74. doi: 10.1038/ng.3722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Beroukhim R, Zhang X, Meyerson M. Copy number alterations unmasked as enhancer hijackers. Nat. Genet. 2017;49:5–6. doi: 10.1038/ng.3754. [DOI] [PubMed] [Google Scholar]
- 13.Northcott PA, et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature. 2014;511:428–434. doi: 10.1038/nature13379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gröschel S, et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell. 2014;157:369–381. doi: 10.1016/j.cell.2014.02.019. [DOI] [PubMed] [Google Scholar]
- 15.Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science351, 1454–1458 (2016). [DOI] [PMC free article] [PubMed]
- 16.Flavahan WA, et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature. 2016;529:110–114. doi: 10.1038/nature16490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rao SSP, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Crane E, et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature. 2015;523:240–244. doi: 10.1038/nature14450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ho JWK, et al. Comparative analysis of metazoan chromatin organization. Nature. 2014;512:449–452. doi: 10.1038/nature13415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Barutcu AR, et al. Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells. Genome Biol. 2015;16:214. doi: 10.1186/s13059-015-0768-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature10.1038/s41586-020-1969-6 (2020).
- 23.Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature10.1038/s41586-019-1913-9 (2020). [DOI] [PMC free article] [PubMed]
- 24.Korbel JO, Campbell PJ. Criteria for inference of chromothripsis in cancer genomes. Cell. 2013;152:1226–1236. doi: 10.1016/j.cell.2013.02.023. [DOI] [PubMed] [Google Scholar]
- 25.Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat. Rev. Genet. 2015;16:172–183. doi: 10.1038/nrg3871. [DOI] [PubMed] [Google Scholar]
- 26.Abyzov A, et al. Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat. Commun. 2015;6:7256. doi: 10.1038/ncomms8256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sudmant PH, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015;349:aab3761. doi: 10.1126/science.aab3761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sudmant PH, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81. doi: 10.1038/nature15394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Futreal PA, et al. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Jones DTW, et al. Tandem duplication producing a novel oncogenic BRAF fusion gene defines the majority of pilocytic astrocytomas. Cancer Res. 2008;68:8673–8677. doi: 10.1158/0008-5472.CAN-08-2097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Garsed DW, et al. The architecture and evolution of cancer neochromosomes. Cancer Cell. 2014;26:653–667. doi: 10.1016/j.ccell.2014.09.010. [DOI] [PubMed] [Google Scholar]
- 32.Hnisz D, Day DS, Young RA. Insulated neighborhoods: structural and functional units of mammalian gene control. Cell. 2016;167:1188–1200. doi: 10.1016/j.cell.2016.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Libbrecht MW, et al. Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression. Genome Res. 2015;25:544–557. doi: 10.1101/gr.184341.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Meuleman W, Peric-Hupkes D, Kind J. Constitutive nuclear lamina–genome interactions are highly conserved and associated with A/T-rich sequence. Genome Res. 2013;23:270–280. doi: 10.1101/gr.141028.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Sexton T, Yaffe E. Chromosome folding: driver or passenger of epigenetic state? Cold Spring Harb. Perspect. Biol. 2015;7:a018721. doi: 10.1101/cshperspect.a018721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.GTEx Consortium The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Fortin J-P, Hansen KD. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 2015;16:180. doi: 10.1186/s13059-015-0741-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Dowen JM, et al. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell. 2014;159:374–387. doi: 10.1016/j.cell.2014.09.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Narendra V, et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science. 2015;347:1017–1021. doi: 10.1126/science.1262088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dixon JR, et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 2018;50:1388–1398. doi: 10.1038/s41588-018-0195-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40. doi: 10.1016/j.cell.2010.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ghavi-Helm Y, et al. Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. Nat. Genet. 2019;51:1272–1282. doi: 10.1038/s41588-019-0462-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Despang A, et al. Functional dissection of the Sox9–Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 2019;51:1263–1271. doi: 10.1038/s41588-019-0466-z. [DOI] [PubMed] [Google Scholar]
- 44.Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152:1237–1251. doi: 10.1016/j.cell.2013.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Akdemir KC, Chin L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 2015;16:198. doi: 10.1186/s13059-015-0767-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Dale RK, Pedersen BS, Quinlan AR. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics. 2011;27:3423–3424. doi: 10.1093/bioinformatics/btr539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Contino G, et al. Whole-genome sequencing of nine esophageal adenocarcinoma cell lines. F1000Res. 2016;5:1336. doi: 10.12688/f1000research.7033.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zong C, Lu S, Chapman AR, Xie XS. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science. 2012;338:1622–1626. doi: 10.1126/science.1229164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Rausch T, et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15:R84. doi: 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Papaemmanuil E, et al. RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia. Nat. Genet. 2014;46:116–125. doi: 10.1038/ng.2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
- 53.Imakaev M, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods. 2012;9:999–1003. doi: 10.1038/nmeth.2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Britton E, et al. Open chromatin profiling identifies AP1 as a transcriptional regulator in oesophageal adenocarcinoma. PLoS Genet. 2017;13:e1006879. doi: 10.1371/journal.pgen.1006879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hon GC, et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 2012;22:246–258. doi: 10.1101/gr.125872.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Rahnamoun H, et al. Mutant p53 shapes the enhancer landscape of cancer cells in response to chronic immune signaling. Nat. Commun. 2017;8:754. doi: 10.1038/s41467-017-01117-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Aligned sequencing data, as well as somatic and germline variant calls from PCAWG tumors, including SNVs, indels, copy number alterations and SVs, are available for download at https://dcc.icgc.org/releases/PCAWG. Additional information on accessing the data, including raw read files, can be found at https://docs.icgc.org/pcawg/data/. In accordance with the data-access policies of the ICGC and TCGA projects, most molecular, clinical and specimen data are in an open tier that does not require access approval. To access potentially identifying information, such as germline alleles and underlying sequencing data, researchers will need to apply to the TCGA Data Access Committee (DAC) via dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) for access to the TCGA portion of the dataset, and to the ICGC Data Access Compliance Office (DACO; http://icgc.org/daco) for the ICGC portion. In addition, to access somatic SNVs derived from TCGA donors, researchers will also need to obtain dbGaP authorization.
We obtained the consensus SV calls and annotations of each variation (deletions, inversions, duplications and complex rearrangements), which can be found at Synapse (https://www.synapse.org/) with accession number syn7596712.
Hi-C data have been deposited at GEO under accession code GSE116694.
The core computational pipelines used by the PCAWG Consortium for alignment, quality control and variant calling are available to the public at https://dockstore.org/search?search=pcawg under a GNU General Public License v.3.0, which allows for reuse and distribution.