Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Oct 3.
Published in final edited form as: Nat Genet. 2012 May 6;44(6):651–658. doi: 10.1038/ng.2270

Detectable clonal mosaicism and its relationship to aging and cancer

Kevin B Jacobs 1,2, Meredith Yeager 1,2, Weiyin Zhou 1,2, Sholom Wacholder 1, Zhaoming Wang 1,2, Benjamin Rodriguez-Santiago 3,5, Amy Hutchinson 1,2, Xiang Deng 1,2, Chenwei Liu 1,2, Marie-Josephe Horner 1, Michael Cullen 1,2, Caroline G Epstein 1, Laurie Burdett 1,2, Michael C Dean 6, Nilanjan Chatterjee 1, Joshua Sampson 1, Charles C Chung 1, Joseph Kovaks 1, Susan M Gapstur 7, Victoria L Stevens 7, Lauren T Teras 7, Mia M Gaudet 7, Demetrius Albanes 1, Stephanie J Weinstein 1, Jarmo Virtamo 8, Philip R Taylor 1, Neal D Freedman 1, Christian C Abnet 1, Alisa M Goldstein 1, Nan Hu 1, Kai Yu 1, Jian-Min Yuan 9,10, Linda Liao 1, Ti Ding 11, You-Lin Qiao 12, Yu-Tang Gao 13, Woon-Puay Koh 14, Yong-Bing Xiang 13, Ze-Zhong Tang 11, Jin-Hu Fan 12, Melinda C Aldrich 15,16, Christopher Amos 17, William J Blot 16,18, Cathryn H Bock 19,20, Elizabeth M Gillanders 21, Curtis C Harris 22, Christopher A Haiman 23, Brian E Henderson 23, Laurence N Kolonel 24, Loic Le Marchand 24, Lorna H McNeill 25,26, Benjamin A Rybicki 27, Ann G Schwartz 19,20, Lisa B Signorello 16,18,28, Margaret R Spitz 29, John K Wiencke 30, Margaret Wrensch 30, Xifeng Wu 17, Krista A Zanetti 21,22, Regina G Ziegler 1, Jonine D Figueroa 1, Montserrat Garcia-Closas 1,31, Nuria Malats 32, Gaelle Marenne 32, Ludmila Prokunina-Olsson 1, Dalsu Baris 1, Molly Schwenn 33, Alison Johnson 34, Maria Teresa Landi 1, Lynn Goldin 1, Dario Consonni 35,36, Pier Alberto Bertazzi 35,36, Melissa Rotunno 1, Preetha Rajaraman 1, Ulrika Andersson 37, Laura E Beane Freeman 1, Christine D Berg 38, Julie E Buring 39,40, Mary A Butler 41, Tania Carreon 41, Maria Feychting 42, Anders Ahlbom 42, J Michael Gaziano 40,43,44, Graham G Giles 45,46, Goran Hallmans 47, Susan E Hankinson 48, Patricia Hartge 1, Roger Henriksson 37,49, Peter D Inskip 1, Christoffer Johansen 50, Annelie Landgren 1, Roberta McKean-Cowdin 23, Dominique S Michaud 51,52, Beatrice S Melin 37, Ulrike Peters 53,54, Avima M Ruder 41, Howard D Sesso 40, Gianluca Severi 45,46, Xiao-Ou Shu 28, Kala Visvanathan 55, Emily White 53,54, Alicja Wolk 56, Anne Zeleniuch-Jacquotte 57, Wei Zheng 28, Debra T Silverman 1, Manolis Kogevinas 58,61, Juan R Gonzalez 59,61, Olaya Villa 3,5, Donghui Li 62, Eric J Duell 63, Harvey A Risch 64, Sara H Olson 65, Charles Kooperberg 53, Brian M Wolpin 48,66, Li Jiao 67, Manal Hassan 62, William Wheeler 68, Alan A Arslan 69,71, H Bas Bueno-de-Mesquita 72,73, Charles S Fuchs 48,66, Steven Gallinger 74, Myron D Gross 75, Elizabeth A Holly 76, Alison P Klein 77, Andrea LaCroix 53, Margaret T Mandelson 53,78, Gloria Petersen 79, Marie-Christine Boutron-Ruault 80, Paige M Bracci 76, Federico Canzian 81, Kenneth Chang 82, Michelle Cotterchio 83, Edward L Giovannucci 48,84,85, Michael Goggins 76,86,87, Judith A Hoffman Bolton 55, Mazda Jenab 88, Kay-Tee Khaw 89, Vittorio Krogh 90, Robert C Kurtz 91, Robert R McWilliams 92, Julie B Mendelsohn 1, Kari G Rabe 79, Elio Riboli 51, Anne Tjønneland 50, Geoffrey S Tobias 1, Dimitrios Trichopoulos 84,93, Joanne W Elena 38, Herbert Yu 24, Laufey Amundadottir 1, Rachael Z Stolzenberg-Solomon 1, Peter Kraft 48,84, Fredrick Schumacher 23, Daniel Stram 23, Sharon A Savage 1, Lisa Mirabello 1, Irene L Andrulis 74,94, Jay S Wunder 74,94, Ana Patiño García 95, Luis Sierrasesúmaga 95, Donald A Barkauskas 23, Richard G Gorlick 96,97, Mark Purdue 1, Wong-Ho Chow 1, Lee E Moore 1, Kendra L Schwartz 98, Faith G Davis 99, Ann W Hsing 1, Sonja I Berndt 1, Amanda Black 1, Nicolas Wentzensen 1, Louise A Brinton 1, Jolanta Lissowska 100, Beata Peplonska 101, Katherine A McGlynn 1, Michael B Cook 1, Barry I Graubard 1, Christian P Kratz 1,102, Mark H Greene 1, Ralph L Erickson 103, David J Hunter 48,84, Gilles Thomas 103, Robert N Hoover 1, Francisco X Real 3,105, Joseph F Fraumeni Jr 1, Neil E Caporaso 1, Margaret Tucker 1, Nathaniel Rothman 1, Luis A Pérez-Jurado 3,4,, Stephen J Chanock 1,‡,*
PMCID: PMC3372921  NIHMSID: NIHMS369783  PMID: 22561519

Abstract

In an analysis of 31,717 cancer cases and 26,136 cancer-free controls drawn from 13 genome-wide association studies (GWAS), we observed large chromosomal abnormalities in a subset of clones from DNA obtained from blood or buccal samples. Mosaic chromosomal abnormalities, either aneuploidy or copy-neutral loss of heterozygosity, of size >2 Mb were observed in autosomes of 517 individuals (0.89%) with abnormal cell proportions between 7% and 95%. In cancer-free individuals, the frequency increased with age; 0.23% under 50 and 1.91% between 75 and 79 (p=4.8×10−8). Mosaic abnormalities were more frequent in individuals with solid-tumors (0.97% versus 0.74% in cancer-free individuals, OR=1.25, p=0.016), with a stronger association for cases who had DNA collected prior to diagnosis or treatment (OR=1.45, p=0.0005). Detectable clonal mosaicism was common in individuals for whom DNA was collected at least one year prior to diagnosis of leukemia compared to cancer-free individuals (OR=35.4, p=3.8×10−11). These findings underscore the importance of the role and time-dependent nature of somatic events in the etiology of cancer and other late-onset diseases.


Classically, genetic mosaicism is defined as the co-existence of cells with two or more distinct karyotypes within an individual that results from a post-zygotic event during development and can occur in both somatic and germline cells1,2. Errors in chromosomal duplication and subsequent transmission to daughter cells may lead to aneuploidy, the gain or loss of chromosomes or segments of chromosomes, and reciprocal gain and loss events manifesting in copy-neutral loss of heterozygosity (cnloh) or acquired uniparental disomy. Somatic mosaicism has been established as a cause of miscarriage, birth defects, developmental delay, and cancer3-9. Because mosaicism can be benign or may manifest with diverse clinical phenotypes, there are no accurate estimates of its frequency in the general population3,6. On rare occasions the propensity to develop chromosomal abnormalities is inherited and leads to multiple phenotypic abnormalities including cancer predisposition as reported in families with mutations in BUB1B and CEP57 10,11. Recently, two groups have identified somatic mosaic mutations in IDH1 and IDH2 in tumors of individuals with Ollier disease and Maffucci syndrome12,13 while another group has characterized somatic mosaicism of a HRAS mutation in an individual with urothelial cancer and epidermal nevus14. Recent work in a population of twins has suggested that the detection of somatic structural variants in blood increases with aging and may be related to reduction in blood cell clonality15. In this report, we define mosaic chromosomal abnormalities broadly: the presence of both normal karyotypes and those with large structural genomic events resulting in alteration of copy number or loss of heterozygosity in distinct and detectable subpopulations of cells regardless of the clonal or developmental origin of the subpopulations.

Recently, we reported on 1,991 individuals from the Spanish Bladder Cancer/EPICURO population-based case-control study in which we had performed a GWAS of adult-onset bladder cancer using DNA obtained from blood or buccal samples16. The SNP array data generated for the GWAS was subsequently used to detect clonal mosaic abnormalities in the autosomes of 1.7% of study subjects, suggesting a higher frequency in adults than previously suspected. Even though somatic mosaicism has been implicated in several cancers, this study did not reveal a significant difference in frequency between cases and controls. A computational algorithm was used to detect 42 large mosaic events involving two or more distinct clones in DNA extracted from blood or buccal samples and we experimentally validated the findings using multiplex ligation-dependent probe amplification (MLPA) and microsatellite analysis (as well as fluorescent in situ hybridization in a subset), establishing the robustness of the software detection method. A similar proportion of cells carrying each event was found in 5 of 6 events (in four individuals with bladder cancer in whom three had one event and one individual with three separate events) in which it was possible to examine more than one tissue (whole blood and bladder mucosa), suggesting an early embryonic origin of the somatic mutation leading to the observed mosaic chromosomal abnormalities16.

Results

In this report, we extend our analysis of clonal mosaic abnormalities in the autosomes to 57,853 individuals (including those previously published16). We tested 31,717 cancer cases and 26,136 cancer free controls for evidence of mosaic abnormalities using genome-wide SNP array data generated as part of 13 distinct cancer GWAS drawn from 48 epidemiological case-control and case-cohort studies (Supplementary Table 1). DNA samples were extracted from blood or buccal samples using a variety of collection and extraction techniques and genotyped using one or more Infinium Human SNP arrays from Illumina Inc. (including versions of Hap300, Hap240, Hap550, Hap610, Hap660, Hap1, Omni Express, and Omni1). Genotype clusters were empirically estimated in 45 batches to optimize accuracy while minimizing potential batch effects (Online Methods).

Detection of clonal mosaic events was based on assessment of allelic imbalance and copy number changes. We used the B-allele frequency (BAF) measurement, derived from the ratio of probe values relative to the locations of the estimated genotype-specific clusters, for initial segmentation using the Mosaic Alteration Detection (MAD) algorithm implemented in GADA-R with modifications17,18. The BAF and log2 relative probe intensity ratio (LRR), which provides data on copy number, were used to classify each event as copy-altering (gain or loss) or neutral (reciprocal gain and loss resulting in loss of heterozygosity, LOH) and to assign the proportions of abnormal (p) and normal (1-p) cells. Mosaic proportions were required to deviate from levels expected from constitutional (non-mosaic) changes in order to exclude homozygous chromosomal segments inherited identical by descent and non-mosaic instances of trisomy, monosomy and uniparental disomy. A minimum event size threshold was set to detect only clonal mosaic events greater than 2 Mbps to minimize the false discovery of constitutional copy number variants. Copy-neutral LOH and copy-loss events could be detected for mosaic proportions between 7% and 95% (Figure 1) with sensitivity that was affected by the signal-to-noise ratio characteristic of each microarray assay and sample quality. There was reduced sensitivity to distinguish between copy-neutral LOH and copy-loss events for mosaic proportions less than 15% across the autosomes. The magnitude of BAF differences for single-copy gain events was 1/3 of the magnitude of copy-neutral LOH or copy-loss events, reducing the sensitivity for calling copy-gain events. As a result, single copy gain events could only be reliably detected for mosaic proportions between 22% and 88%, with ambiguity in distinguishing copy-gain from copy-neutral LOH for mosaic proportions of less than 20%. Since DNA was obtained for the purpose of performing a GWAS, it was not possible to further explore the developmental and clonal characteristics of mosaic events detected in these individuals (e.g. by studying DNA from fractionated blood and other tissue types, determining cell composition of buccal samples, or effect of DNA collection and extraction methods on detection and accuracy of the estimation of mosaic proportions). We report only autosomal chromosomal abnormalities, as the analysis of the sex chromosomes presents distinct technical and interpretative challenges.

Figure 1. Characteristics of detectable clonal mosaic events.

Figure 1

Detectable clonal mosaic events plotted by proportion of abnormal cells (p) and Log R Ratio (LRR) for 681 events in 517 individuals.

We observed 681 mosaic segments of size greater than 2 Mb on 641 autosomal chromosomes in 517 individuals for an overall frequency of individuals with mosaicism of 0.87% (Tables 1 and 2). The most frequent type of event observed was copy-neutral LOH (48.2%), while copy-gains and copy-losses were observed for 15.1% and 34.8% of mosaic events, respectively (Table 1). A small proportion (1.9%) of mosaic chromosomes were complex, harboring more than one type of event. 18.7% of mosaic chromosomal events spanned the entire chromosome, including 62 complete trisomies, predominantly in chromosomes 8, 12 and 15. 47.9% of mosaic chromosomal events began at a telomere and extended across some portion of the chromosomal arm (Table 1 and Figure 2). The majority of telomeric events were mosaic copy-neutral LOH (85.7%), most frequently on 9p (Table 3). The remaining mosaic chromosomal events were interstitial (31.5%) spanning neither telomere nor centromere, while an additional small proportion (1.8%) spanned the centromere or had more complex structure (e.g. distinct events involving both telomeres, but not the whole chromosome). The majority of interstitial events were mosaic copy-loss (91.6%), which was most frequently observed within specific regions of chromosomes 13q and 20q (Figure 2). We observed 69 individuals (46 cancer cases and 23 cancer-free individuals) with clonal mosaic events on multiple chromosomes. The distribution of the number of clonal mosaic chromosomal events per individual is shown in Supplementary Table 3. Among cancer-free individuals, the greatest number observed was 5 mosaic chromosomal events, whereas six individuals with cancer had greater than 5 events, including two individuals with gastric cancer who each had 20. A list of mosaic events with phenotype data is available as Supplementary Data.

Table 1.

Count and frequency of mosaic chromosomal events by event type and location

Mosaic Chromosome Count
Mosaic Chromosome Frequency (%)
Event Location gain loss cnloh mixed Total gain loss cnloh mixed Total
chromosome 62 11 42 5 120 9.7 1.7 6.6 0.8 18.7
telomeric P 11 13 114 1 139 1.7 2.0 17.8 0.2 21.7
telomeric Q 9 10 149 0 168 1.4 1.6 23.2 0.0 26.2
interstitial 14 185 2 1 202 2.2 28.9 0.3 0.2 31.5
span centromere 1 1 2 0 4 0.2 0.2 0.3 0.0 0.6
complex 0 3 0 5 8 0.0 0.5 0.0 0.8 1.2

Total 97 223 309 12 641 15.1 34.8 48.2 1.9

Table 2. Count and frequency of individuals with detectable clonal mosaic events for cancer-free individuals and by first diagnosed cancer site.

Non-hematological cancers are listed by first cancer site and exclude anyone diagnosed with a hematological cancer, shown separately.

Mosaic Counts Non-Mosaic Counts Mosaic Frequency (%)
Site of first cancer Likely
Untreated
Possibly
Treated
Total Likely
Untreated
Possibly
Treated
Total Likely
Untreated
Possibly
Treated
Overall
overall * 498 57,201 0.86
cancer-free 194 25,942 0.74
First non-hematologic cancer 185 119 304 13,865 17,394 31,259 1.32 0.68 0.96
  bladder 37 6 43 2,240 973 3,213 1.62 0.61 1.32
  breast 4 8 12 1,060 1,753 2,813 0.38 0.45 0.42
  endometrium 3 6 9 247 624 871 1.20 0.95 1.02
  esophagus 1 6 7 53 1,855 1,908 1.85 0.32 0.37
  glioma 7 2 9 1,279 441 1,720 0.54 0.45 0.52
  kidney 21 3 24 1,241 325 1,566 1.66 0.91 1.51
  lung 73 26 99 4,647 2,605 7,252 1.55 0.99 1.35
  osteosarcoma 0 3 3 0 760 760 0.39 0.39
  ovary 1 3 4 260 283 543 0.38 1.05 0.73
  pancreas 2 29 31 379 3,513 3,892 0.52 0.82 0.79
  prostate 32 11 43 2,116 1,410 3,526 1.49 0.77 1.20
  stomach 2 13 15 99 2,194 2,293 1.98 0.59 0.65
  testis 2 0 2 144 503 647 1.37 0.00 0.31
  other sites 0 3 3 100 155 255 0.00 1.90 1.16

Any hematologic cancer 8 11 19 34 62 96 19.05 15.07 16.52
leukemia 8 9 17 34 11 45 19.05 45.00 27.42
  lymphoid 4 5 9 14 5 19 22.22 50.00 32.14
  myeloid 4 3 7 16 5 21 20.00 37.50 25.00
  other/nos 0 1 1 4 1 5 0.00 50.00 16.67
 lymphoma 0 2 2 0 42 42 4.55 4.55
 multiple myeloma 0 0 0 0 9 9 0.00 0.00
*

overall total of cancer-free individuals and those with non-hematologic cancers

Figure 2. Circular genomic plot of detectable clonal mosaic events.

Figure 2

Genomic location of detectable clonal mosaic events. Outer rings are the autosomes 1 to 22. Yellow region denotes events of copy neutral loss of heterozygosity. Blue region denotes copy gain events. Red region denotes copy loss events. Panel A includes events in cancer free controls. Panel B includes events in cancer cases.

Table 3.

Distribution and frequency of recurrent detectable clonal mosaic events

Mosaic Counts
Mosaic Frequency (%)
13q
(del)
20q
(del)
9p
(cnloh)
14q
(cnloh)
other Total 13q
(del)
20q
(del)
9p
(cnloh)
14q
(cnloh)
other
Overall 33 77 56 35 480 681 5 11 8 5 70

Cancer free 10 30 28 7 150 225 4 13 12 3 67

Cancer DX 23 47 28 28 330 456 5 10 6 6 72

First non-hematologic cancer
 bladder 2 4 2 7 35 50 4 8 4 14 70
 breast 1 1 0 1 13 16 6 6 6 81
 endometrium 0 1 3 1 6 11 9 27 9 55
 esophagus 0 1 1 1 6 9 11 11 11 67
 glioma 0 2 2 0 6 10 20 20 60
 kidney 1 3 0 4 20 28 4 11 14 71
 lung 9 14 10 7 90 130 7 11 8 5 69
 osteosarcoma 0 0 1 0 6 7 14 86
 ovary 0 2 0 0 2 4 50 50
 pancreas 1 4 1 1 29 36 3 11 3 3 81
 prostate 4 10 4 1 33 52 8 19 8 2 63
 stomach 0 1 4 3 55 63 2 6 5 87
 testis 0 0 0 0 4 4 100
 other sites 0 0 0 1 3 4 25 75

Any hematologic cancer
 leukemia 5 3 0 1 20 29 17 10 3 69
 lymphoma 0 1 0 0 2 3 33 67

The strongest predictor of mosaic autosomal abnormalities was age at DNA collection. We examined the effect of aging on the frequency of mosaicism across all studies, which were predominantly individuals over the age of 50. The frequency of cancer-free individuals with detectable clonal mosaic events increased with age, from 0.23% for those under 50 to 1.91% (p=4.8×10−8) for those between the ages of 75 and 79, and with slightly higher frequencies for individuals with cancer (Figure 3). In the early onset cancers (under age 40), which constituted less than 5% of analyzed cases (e.g., testicular cancer and osteogenic sarcoma), we did not observe an increase in mosaic abnormalities. Further studies are needed to investigate the relationship between mosaic abnormalities and cancer in children and young adults, particularly because of the strong association between mosaicism and many developmental disorders. There was no apparent relationship between age at DNA collection and the number, size of mosaic events, or the proportion of abnormal cells (Supplementary Figures 1 and 2).

Figure 3. Frequency of detectable clonal mosaic events by age and cancer status.

Figure 3

Analysis excluded 1,000 individuals with unknown age at DNA collection. 95% confidence intervals are shown.

We regressed the presence of detectable clonal mosaicism in 26,136 cancer-free individuals on age at DNA collection (in 5 year intervals), sex (male versus female), DNA source (buccal cells versus blood), smoking (ever versus never) and admixture coefficients for African and East Asian ancestry in a logistic model to determine the additional factors that influenced frequency of detectable clonal mosaicism. The source of DNA was known for 87% of individuals, of whom 19% were derived from buccal cells and the remainder from blood. DNA source was not significantly associated with mosaicism (OR=0.83, 0.55-1.26 95% confidence interval (CI), p=0.39). By admixture analysis, 75% of subjects were determined to be of European ancestry, 9% of African ancestry and 16% of East Asian ancestry. Although power was limited, we observed that cancer-free individuals with African admixture were at a lower risk of being mosaic (OR=0.43, 0.20-0.92 95% CI, p=0.03), but not in those with East Asian admixture (OR=0.60, 0.32-1.15 95% CI, p=0.12). We did not observe an association between smoking (ever/never) and frequency of mosaic abnormalities (OR=1.04, 0.75-1.44 95% CI, p=0.81).

In 26,136 cancer-free controls and 23,093 cancer cases drawn from non-sex specific and non-hematological cancer sites (i.e. excluding 8,470 individuals with leukemia, lymphoma, multiple myeloma and cancers of the breast, endometrium, ovary, testis, and prostate), we observed a higher frequency of males with mosaic abnormalities than females. In cancer-free individuals, we observed mosaic events in 0.56% of females and 0.87% males (OR=1.35, 0.98-1.88 95% CI, p=0.07); for individuals with cancer we observed mosaic events in 0.79% of females and 1.21% of males (OR=1.48, 1.08-2.03 95% CI, p=0.015); and overall, 0.65% of females and 1.04% of males (OR=1.42, 1.14-1.80 95% CI, p=0.002) in logistic models adjusted for cancer diagnosis (if applicable), age at DNA collection, ancestry, DNA source and smoking. These differences could be due to a true sex-specific effect akin to sex-differential mutation and recombination rates19; however the complex and heterogeneous nature of the inclusion of individual studies and the differences in their entry and selection criteria could result in spurious associations. Although this observation was consistent across cancer types, it should be confirmed in additional studies better designed to address this question.

To determine the relationship between detectable mosaic autosomal abnormalities and non-hematological cancers, we regressed the presence of detectable clonal mosaicism on cancer diagnosis, age, sex, DNA source, smoking and ancestry in a logistic model. We observed a modest increase in cancer risk for mosaic individuals (OR=1.27, 1.05-1.52 95% CI, p=0.012) (Tables 2 & Supplementary Table 2). Notable associations were observed in stratified analyses of lung (OR=1.56, 1.18-2.08 95% CI, p=0.002) and kidney (OR=1.98, 1.27-3.06 95% CI, p=0.002) cancers, both tobacco-associated malignancies. However no cancer site-specific associations were observed for bladder, esophagus, stomach and pancreas cancers, which are also typically associated with tobacco use. There was no significant association in non-hematological cancer cases overall between smoking (ever/never) and frequency of mosaicism (OR=1.19, 0.92-1.54 95% CI, p=0.19) or when stratified by cancer site (results not shown).

In an analysis of the subset of 14,050 individuals with cancer for whom it was possible to determine that DNA was likely obtained before or at the time of diagnosis and prior to treatment with radiation or chemotherapy for a primary tumor (designated as “likely untreated”), we observed a stronger association between mosaic abnormalities and non-hematological cancer diagnosis (OR=1.45, 1.18-1.80 95% CI, p=0.0005). The associations for lung and kidney also increased in significance (Table 3). It is notable that the evidence for association with non-hematological cancer diminished in individuals who were potentially treated (OR=1.03, 0.81-1.30 95% CI, p=0.80). We had approached this analysis with the hypothesis that there could be an increased frequency in detectable clonal mosaicism in non-hematological cancers induced by chemotherapy or radiotherapy but were surprised to observe the frequency was reduced to virtually the same as in the cancer-free population. Although this attenuated effect could have many explanations (e.g., related to the diagnosis and treatment of a solid tumor leading to a decrease in populations of cells with mosaic alteration), we had a limited capacity to model and control for treatment-effects since many of the studies did not provide any treatment information or only provided incomplete, retrospective ascertainment of the specifics. Although many of the participating studies were prospectively ascertained cohorts, DNA collection often occurred after cancer diagnosis. Additional studies are needed in prospectively ascertained cohorts and longitudinal studies in which multiple DNA samples were collected prior to and after diagnosis in order to explore treatment and disease effects.

For the 43 individuals with hematological cancers for whom DNA was obtained at least a year prior to diagnosis, the frequency of detectable clonal mosaicism was 20% for myeloid leukemia and 22% for lymphocytic leukemia (predominantly chronic lymphocytic leukemia, Table 2) compared to 0.74% in 26,136 cancer free controls (overall OR=35.4, 14.7-76.6 95% CI, Fisher exact p=3.8×10−11). Of the 8 mosaic individuals with leukemia for whom DNA samples were collected at least a year prior to diagnosis, 4 were diagnosed with chronic lymphocytic leukemia (CLL) of which 2 had a mosaic deletion in a region of chromosome 13q14 previously described to be deleted in CLL20. DNA was obtained more than 5 years prior to diagnosis for 6 mosaic individuals, with the longest interval being 14 years, suggesting that detectable clonal mosaicism could be a marker of hematological cancer or its precursors, i.e., monoclonal B cell lymphocytosis (MBL) for CLL and myelodysplastic syndrome for acute myelogenous leukemia. Recent work shows that the majority of MBL have mono- or biallelic 13q14 abnormalities21. However, further studies will be needed, preferably with serial pre- and post- diagnosis sampling to investigate the predictive nature of detectable clonal mosaicism, especially involving regions of chromosome 13 and 20 with respect to leukemia risk20.

We further explored the 4 most recurrent altered regions (>20), which also harbor well known cancer genes (as noted in the COSMIC22 and Mitelman databases: http://cgap.nci.nih.gov/Chromosomes/Mitelman); these were on chromosomes 9p (cnloh), 13q (del), 14 (cnloh) and 20q (del) (Table 4). Notably, the most recurrent mosaic events were observed in cancer-free individuals as well as across multiple solid tumors. We observed a comparable frequency in non-hematologic cancer cases and cancer-free controls for three of the regions, whereas the chromosome 14 cnloh abnormalities were more frequent in non-hematological cancer cases (OR=3.32, 1.42-9.00 95% CI, Fisher’s exact p=0.003), particularly in individuals with bladder or kidney cancer. Copy-neutral loh in this region of chromosome 14 has been associated with increased susceptibility to sporadic cancers and harbors imprinted genes, such as the tumor suppressing non-coding RNA, Maternally expressed gene 3 (MEG3)8,23. The recurrent segmental deletion of 13q14 was observed in 5 leukemia cases, but also in 18 individuals with solid tumors (9 with lung cancer and 4 with prostate cancer), and in 10 cancer-free individuals. This region includes the tumor suppressor gene DLEU7 (Deleted In Leukemia 7) and related genes, DLEU1 and DLEU2, the latter harboring two microRNAs within one of its introns (miR-15a and miR-16-1)24-26. The retinoblastoma gene, RB1 was also included within a subset harboring a mosaic deletion of 13q14. It cannot be ruled out that these individuals have either undiagnosed CLL or MBL. The 20q- was seen in two individuals with myeloid leukemia as has been described previously27 but also in cancer-free and individuals with solid tumors.

The accuracy of our software methods to detect clonal mosaic abnormalities was previously addressed and we were able to validate 100% of 42 events in 34 individuals from the Spanish Bladder Cancer Study using confirmatory cytogenetic assays16 (Supplementary Figure 3). We have also performed a comparison of mosaic events in samples from the EAGLE and PLCO lung cancer studies which were independently analyzed as part of the Gene, Environment Association studies consortium (GENEVA) report on mosaic events28. A total of 83 mosaic events in individuals from the EAGLE and PLCO lung cancer studies were detected in common, 20 additional events of size less than 2 Mb and 8 events greater than 2 Mb were detected by GENEVA and not by our study, while we detected 20 additional events (size > 2 Mb) that were not detected by GENEVA. Although additional cytogenetic or molecular validation was not performed, neither method detected notable false-positive events based on manual review of the data. The concordance rate is 75% if considering events > 2 Mb (the cut-off for this analysis) or 63% if considering all events, both of which are considerably better than the 25-50% concordance rates observed across CNV detection methods29-31. Our method is more conservative in the size of events detected, while the GENEVA method is more conservative with respect to sample quality, but provides calls for smaller events when assay quality is sufficient. Better approaches are needed to characterize smaller size events accurately as either mosaic or constitutional and to estimate their frequency. Further improvements to data normalization, segmentation and event classification methods will also likely reduce false-negative rates.

Discussion

Our study has important implications for the design and analysis of molecular epidemiology studies in cancer as well as the somatic characterization of cancer genomes, like The Cancer Genome Atlas32 and International Cancer Genome Consortium33. Investigators will need to carefully analyze samples used as exemplars of germline DNA for somatic alterations, such as detectable clonal mosaicism. Otherwise, comparisons between “grmline” and tumor DNA may result in implausible somatic changes (e.g. large gains of heterozygosity) and it may be impossible to determine whether somatic events pre-date changes secondary to driver mutations. Since how to detect mosaic events with next generation sequencing technologies is neither routine nor well understood, for the near future it may be prudent to continue to utilize SNP microarrays for such analyses. Due to the increased frequency of detectable clonal mosaicism with age, this will be particularly important for the analysis of epithelial cancers, which characteristically occur in the older population. For future large-scale GWAS in prospective studies, it may be wise to consider analyzing the earliest, pre-diagnosis DNA samples and to consider time from collection to diagnosis in the analysis of longitudinally collected biospecimens.

We have extended our initial observation that detectable clonal mosaicism of the autosomes is present in the population with surprising frequency and particularly in the aging genome. A recent study of detectable clonal mosaicism in twins reported an increase in frequency with age and suggested that this reduction could lead to a less diverse blood cell population and immune system15. These emerging data raise a number of critical issues in mechanisms underlying the possible shift in the repertoire of clones with large structural abnormalities. Thus cells with abnormal karyotypes could have an early developmental origin in which a somatic event in a single stem cell progenitor during embryogenesis could become apparent when cellular diversity decreases with age and cell populations become increasingly oligoclonal. Higher rates of detectable clonal mosaicism in older cancer-free individuals could also be due to increased rates of somatic mutation or diminished capacity for genomic maintenance, such as with telomere attrition34 leading to proliferation of somatically altered cell populations. A survival bottleneck of cellular progenitors could also lead to observable mosaic alterations that were previously below the threshold of detection but subsequently expanded due to positive selection. Further work is required to begin to unravel the underlying mechanisms that result in mosaic abnormalities, particularly as it relates to how and when altered clones are created, tissue-specificity, and the timing and expansion of distinct populations of cells with age. Finally, these findings underscore the importance of considering the role and time-dependent nature of somatic events in the etiology of cancer as well as other late-onset diseases.

Supplementary Material

1
2
03

Acknowledgments

This research was supported by the Intramural Research Program and by contract number HHSN261200800001E of the USA National Institutes of Health, National Cancer Institute. Support for each contributing study is listed in the Supplementary Acknowledgement Section. We thank Cathy Laurie and Bruce Weir for constructive discussion and a comparison of methodology and results for the GENEVA study. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Cancer Institute, the National Institute for Occupational Safety and Health, or the Maryland Cancer Registry.

Footnotes

Author Contributions

K.B.J., M.Y., WY.Z., Z.W., X.D., C.L., S.W., N.E.C., M.T., N.R., and S.J.C. designed the study.

K.B.J., M.Y., L.P.-J., WY.Z., Z.W., S.W., N.E.C., N.R., M.G.C., M.C.D., D.A., B.I.G., R.N.H., F.X.R. , and S.J.C. interpreted the primary results.

K.B.J., M.Y., L.P.-J., B.R.-S., and J.R.G. developed the study methods.

K.B.J., M.Y., L.P.-J., WY.Z., Z.W., X.D., C.L., M.G.C., C.G.E., M.C.D., N.C., J.S., and C.C.C. analyzed the data.

K.B.J., M.Y., WY.Z., Z.W., X.D., C.L., A.H., L.B., and J.K. were responsible for production and analysis of the genotype data.

K.B.J., M.Y., and S.W. performed statistical analysis.

K.B.J., M.Y., S.W., M.-J.H. and S.J.C. drafted the manuscript.

M.T. , R.N.H., S.J.C. and J.F.F. provided vital programmatic and institutional support.

J.R.G., N.E.C., M.T., N.R., S.J.C., S.M.G.,V.L.S., L.T.T, M.M.G., D.A., S.J.W. J.V., P.R.T., N.D.F., C.C.A., A.M.G., N.H.,K.Y., J-M.Y., L.L., T.D., Y-L.Q., Y-T.G.,W-P.K., Y-B.X., Z-Z.T., J-H.F., M.C.A., C.A., W.J.B., C.H.B., E.M.G., C.C.H., C.A.H., B.E.H., L.N.K., L.L.M., L.H.M., B.A.R., A.G.S., L.B.S., M.R.S., J.K.W., M.W., X.W., K.A.Z., R.G.Z., J.D.F., M.G-C., N.M., G.M., L.P-O., D.B., M.S., A.J., M.T.L., L.G., D.C., P.A.B., M.R., P.R., U.A., L.E.B.F.,C.D.B., J.E.B., M.A.B., T.C., M.F., A.A., J.M.G., G.G.G., G.H., S.E.H., P.H., R.H., P.D.I., C.J., A.L., R.M-C., D.S.M., B.S.M., U.P., A.M.R., H.D.S., G.S., X-O.S., K.V., E.W., A.W., A.Z-J., W.Z., D.T.S., M.K., O.V., D.L., E.J.D., H.A.R., S.H.O., C.K., B.M.W., L.J., M.H., W.W., A.A.A., H.B.B-d-M., C.S.F., S.G., M.D.G., E.A.H., A.P.K., A.LC., M.T.M., G.P., M-C.B-R., P.M.B., F.C., K.C., M.C., E.L.G., M.G., J.A.H.B., M.J., K-T.K.,V.K.,R.C.K., R.R.M., J.B.M., K.G.R., E.R., A.T., G.S.T., D.T., J.W.E., H.Y., L.A., R.Z.S-S., P.K., F.S., D.S., S.A.S., L.M., I.L.A., J.S.W., A.P.G., L.S., D.A.B., R.G.G., M.P., WH.C., L.E.M., K.L.S., F.G.D., A.W.H., S.I.B., A.B., N.W., L.A.B., J.L., B.P., K.A.M., M.B.C., B.I.G., C.P.K., M.H.G., R.L.E., D.J.H., G.T., R.N.H., F.X.R., and J.F.F. contributed data or samples.

All authors contributed critical feedback, review, and approval of the manuscript.

Disclosures: BRS and OV are currently employees of the qGenomics company while LAPJ is a member of its scientific advisory board.

References

  • 1.Youssoufian H, Pyeritz RE. Mechanisms and consequences of somatic mosaicism in humans. Nat Rev Genet. 2002;3:748–58. doi: 10.1038/nrg906. [DOI] [PubMed] [Google Scholar]
  • 2.Notini AJ, Craig JM, White SJ. Copy number variation and mosaicism. Cytogenet Genome Res. 2008;123:270–7. doi: 10.1159/000184717. [DOI] [PubMed] [Google Scholar]
  • 3.Hsu LY, et al. Proposed guidelines for diagnosis of chromosome mosaicism in amniocytes based on data derived from chromosome mosaicism and pseudomosaicism studies. Prenat Diagn. 1992;12:555–73. doi: 10.1002/pd.1970120702. [DOI] [PubMed] [Google Scholar]
  • 4.Menten B, et al. Emerging patterns of cryptic chromosomal imbalance in patients with idiopathic mental retardation and multiple congenital anomalies: a new series of 140 patients and review of published reports. J Med Genet. 2006;43:625–33. doi: 10.1136/jmg.2005.039453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lu XY, et al. Genomic imbalances in neonates with birth defects: high detection rates by using chromosomal microarray analysis. Pediatrics. 2008;122:1310–8. doi: 10.1542/peds.2008-0297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Conlin LK, et al. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Hum Mol Genet. 2010;19:1263–75. doi: 10.1093/hmg/ddq003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Heim S, Mitelman F. Nonrandom chromosome abnormalities in cancer - an overview. In: Mitelman F, Heim S, editors. Cancer Cytogenetics. John Wiley & Sons, Inc.; Hoboken, NJ: 2009. [Google Scholar]
  • 8.Tuna M, Knuutila S, Mills GB. Uniparental disomy in cancer. Trends Mol Med. 2009;15:120–8. doi: 10.1016/j.molmed.2009.01.005. [DOI] [PubMed] [Google Scholar]
  • 9.Solomon DA, et al. Mutational inactivation of STAG2 causes aneuploidy in human cancer. Science. 2011;333:1039–43. doi: 10.1126/science.1203619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rio Frio T, et al. Homozygous BUB1B mutation and susceptibility to gastrointestinal neoplasia. N Engl J Med. 2010;363:2628–37. doi: 10.1056/NEJMoa1006565. [DOI] [PubMed] [Google Scholar]
  • 11.Snape K, et al. Mutations in CEP57 cause mosaic variegated aneuploidy syndrome. Nat Genet. 2011;43:527–9. doi: 10.1038/ng.822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Amary MF, et al. Ollier disease and Maffucci syndrome are caused by somatic mosaic mutations of IDH1 and IDH2. Nat Genet. 2011 doi: 10.1038/ng.994. [DOI] [PubMed] [Google Scholar]
  • 13.Pansuriya TC, et al. Somatic mosaic IDH1 and IDH2 mutations are associated with enchondroma and spindle cell hemangioma in Ollier disease and Maffucci syndrome. Nat Genet. 2011 doi: 10.1038/ng.1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hafner C, T.A., Real FX. HRAS mutation mosaicism causing urothelial cancer and epidermal nevus. N Engl J Med. 2011;365:1940–2. doi: 10.1056/NEJMc1109381. [DOI] [PubMed] [Google Scholar]
  • 15.Forsberg LA, et al. Age-related somatic structural changes in the nuclear genome of human blood cells. Am J Hum Genet. 2012;90:217–28. doi: 10.1016/j.ajhg.2011.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rodriguez-Santiago B, et al. Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome. Am J Hum Genet. 2010;87:129–38. doi: 10.1016/j.ajhg.2010.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gonzalez JR, et al. A fast and accurate method to detect allelic genomic imbalances underlying mosaic rearrangements using SNP array data. BMC Bioinformatics. 2011;12:166. doi: 10.1186/1471-2105-12-166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pique-Regi R, Caceres A, Gonzalez JR. R-Gada: a fast and flexible pipeline for copy number analysis in association studies. BMC Bioinformatics. 2010;11:380. doi: 10.1186/1471-2105-11-380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hedrick PW. Sex: differences in mutation, recombination, selection, gene flow, and genetic drift. Evolution. 2007;61:2750–71. doi: 10.1111/j.1558-5646.2007.00250.x. [DOI] [PubMed] [Google Scholar]
  • 20.Dohner H, et al. Genomic aberrations and survival in chronic lymphocytic leukemia. N Engl J Med. 2000;343:1910–6. doi: 10.1056/NEJM200012283432602. [DOI] [PubMed] [Google Scholar]
  • 21.Lanasa MC, et al. Immunophenotypic and gene expression analysis of monoclonal B-cell lymphocytosis shows biologic characteristics associated with good prognosis CLL. Leukemia. 2011;25:1459–66. doi: 10.1038/leu.2011.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Forbes SA, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–50. doi: 10.1093/nar/gkq929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Benetatos L, Vartholomatos G, Hatzimichael E. MEG3 imprinted gene contribution in tumorigenesis. Int J Cancer. 2011;129:773–9. doi: 10.1002/ijc.26052. [DOI] [PubMed] [Google Scholar]
  • 24.Lee S, et al. Forerunner genes contiguous to RB1 contribute to the development of in situ neoplasia. Proc Natl Acad Sci U S A. 2007;104:13732–7. doi: 10.1073/pnas.0701771104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Migliazza A, et al. Nucleotide sequence, transcription map, and mutation analysis of the 13q14 chromosomal region deleted in B-cell chronic lymphocytic leukemia. Blood. 2001;97:2098–104. doi: 10.1182/blood.v97.7.2098. [DOI] [PubMed] [Google Scholar]
  • 26.Pekarsky Y, Zanesi N, Croce CM. Molecular basis of CLL. Semin Cancer Biol. 2010;20:370–6. doi: 10.1016/j.semcancer.2010.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gurvich N, et al. L3MBTL1 polycomb protein, a candidate tumor suppressor in del(20q12) myeloid disorders, is essential for genome stability. Proc Natl Acad Sci U S A. 2010;107:22552–7. doi: 10.1073/pnas.1017092108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Laurie CC, Laurie CA, Kenneth R, Doheny KF. Somatic mosaicism for large chromosomal anomalies from birth to old age and its relationship to cancer. submitted to Nature Genetics. 2011 [Google Scholar]
  • 29.Pinto D, et al. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol. 2011;29:512–20. doi: 10.1038/nbt.1852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dellinger AE, et al. Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 2010;38:e105. doi: 10.1093/nar/gkq040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Marenne G, et al. Assessment of copy number variation using the Illumina Infinium 1M SNP-array: a comparison of methodological approaches in the Spanish Bladder Cancer/EPICURO study. Hum Mutat. 2011;32:240–8. doi: 10.1002/humu.21398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–8. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hudson TJ, et al. International network of cancer genome projects. Nature. 2010;464:993–8. doi: 10.1038/nature08987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sahin E, Depinho RA. Linking functional decline of telomeres, mitochondria and stem cells during ageing. Nature. 2010;464:520–8. doi: 10.1038/nature08982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Petersen GM, et al. A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat Genet. 2010;42:224–228. doi: 10.1038/ng.522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Staaf J, et al. Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios. BMC Bioinformatics. 2008;9:409. doi: 10.1186/1471-2105-9-409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Diskin SJ, et al. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic acids research. 2008;36:e126. doi: 10.1093/nar/gkn556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Peiffer DA, et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 2006;16:1136–1148. doi: 10.1101/gr.5402306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Itsara A, et al. Population analysis of large copy number variants and hotspots, of human genetic disease. Am J Hum Genet. 2009;84:148–161. doi: 10.1016/j.ajhg.2008.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19:716–723. [Google Scholar]
  • 41.Krzywinski MI, et al. Circos: An information aesthetic for comparative genomics. Genome Research. 2009 doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Agresti A, Coull BA. Approximate is better than "exact" for interval estimation of binomial proportions. Am Stat. 1998;52:119–126. [Google Scholar]
  • 43.Frazer KA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. doi: 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
03

RESOURCES