Copy number variants (CNVs) are widely distributed throughout the human genome, where they contribute to genetic variation and phenotypic diversity. De novo CNVs are also a major cause of numerous genetic and developmental disorders. However, unlike many other types of mutations, little is known about the genetic and environmental risk factors for new and deleterious CNVs. DNA replication errors have been implicated in the generation of a major class of CNVs, the nonrecurrent CNVs. We have found that agents that perturb normal replication and create conditions of replication stress, including hydroxyurea and aphidicolin, are potent inducers of nonrecurrent CNVs in cultured human cells. These findings have broad implications for identifying CNV risk factors and for hydroxyurea-related therapies in humans.
In recent years, copy number variants (CNVs), defined as deletions or duplications of 50 bp to over a megabase, have been found to be widely distributed throughout the human genome [1-7]. The discovery of CNVs is tied to the advent of new genomic technologies that have enabled high-resolution analysis, including oligonucleotide microarrays and next generation sequencing approaches. With over 25,000 polymorphic CNVs, including nearly 1000 large CNVs greater than 50 kb now described in normal individuals [8], it is clear that human genetic variation is profoundly influenced by large-scale structural changes. It is also clear that many CNVs have deleterious consequences. Spontaneous or de novo CNVs are an important and frequent cause of genetic and developmental disorders, including severe intellectual disability, autism, schizophrenia, heart defects and many others [9-13], and they arise frequently in cancer cells. The frequency at which they arise suggests a high de novo mutation rate.
Despite their importance, there is limited understanding of how many CNVs arise, and little knowledge of risk factors involved. Like all mutation classes, it is certain that risk for new and deleterious CNVs will be increased by exposures to precipitating environmental mutagens as well as by inherited genetic predisposition. A key to predicting and identifying these factors is a clear understanding of the underlying mechanisms by which CNVs are formed. At least two distinct pathways are involved in the formation of most disease-associated CNVs: unequal meiotic recombination and replication errors. We have found that agents that perturb replication induce a high frequency of CNVs in normal human cells that resemble non-recurrent CNVs in humans in all aspects [14-16]. These agents include the polymerase inhibitor aphidicolin and the ribonucleotide reductase inhibitor, hydroxyurea, which is commonly used in the treatment of sickle cell disease and other disorders. These data provide experimental support for replication error models for the origins of CNVs and further suggest that many agents or conditions that lead to replication stress have the potential to induce deleterious CNVs.
Classes of CNVs
As with all mutation types, the risk for new and deleterious CNVs will undoubtedly be increased by inherited genetic predisposition and by exposure to precipitating environmental mutagens. Understanding the mechanisms involved in their formation is key to defining genetic and environmental risk factors for new and deleterious CNV mutations. However, we know little about the molecular mechanisms involved in the formation of this important class of CNVs. Most human CNV research to date has focused on cataloguing their occurrence and association with various disease states [17-20], with few experimental studies aimed at defining molecular mechanisms of formation. Mechanisms giving rise to CNVs have therefore largely been inferred from the observed CNV breakpoint junction sequences of normal and disease-associated CNVs and from the genetic architecture in the vicinity of breakpoints. In addition to the large class of smaller CNVs created by retrotransposition events or VNTR rearrangements, this approach has revealed two major categories of both polymorphic and de novo, pathogenic CNVs with distinctly different structures and cellular origins, frequently termed “recurrent” and “non-recurrent” CNVs, respectively.
Recurrent CNVs
Approximately 20-40% of normal polymorphic CNVs and many de novo, disease-related CNVs show recurrent breakpoints in low-copy repeats or segmental duplications [3,5,6,8,17,19]. As one would expect, hotspots for these CNVs exist in regions containing large segmental duplications. These variants include a growing number of recurrent CNVs associated with distinct clinical phenotypes, such as those identified on chromosomes 16p11.2 and 17q11.2 in individuals with neurological disorders, including severe intellectual disability, autism and schizophrenia [21-24].
Non-recurrent CNVs
These CNVs have unique breakpoints that are not dependent on segmental duplications. In most cases, they are characterized by microhomologies with a smaller number having blunt ends or short insertions at the breakpoint junctions. The majority of normal CNVs and a large percentage of pathogenic CNVs fall into this class [19]. Many of these CNVs are unique, though overlapping CNVs with widely variable breakpoints can be clustered in regions. These CNV-prone regions are often ascertained by disease association, such as the MBD5 gene region, which harbors deletions in humans with autism and other neurological abnormalities [25]. Most non-recurrent CNVs are simple deletions or tandem duplications, but some are more complex and are interrupted by normal sequences or inversions and can contain both deleted and duplicated segments within the same interval. Some non-recurrent CNVs are highly complex, with dozens of events clustered in a single genomic region [26], similar to a phenomenon termed chromothripsis (for “chromosome shattering”), recently described in cancer cell genomes [27]. It is likely that the observed incidence of these complex events is currently underestimated because of the difficulty in obtaining accurate sequence data at the breakpoints of such events.
Mechanisms of CNV formation
Recurrent CNVs with common breakpoints are deletions and reciprocal duplications that are thought to arise by meiotic unequal or non-allelic homologous recombination (NAHR), typically mediated by misalignment of large flanking segmental duplications or repeated sequences. These CNVs therefore arise in the same manner as was first elegantly described two decades ago for human microdeletion/micorduplication syndromes such as Charcot-Marie Tooth and Prader-Willi syndromes [28], and recurrent CNV disorders can therefore be considered an extension of this class of syndromes.
Less is known about the origins and molecular mechanisms leading to non-recurrent CNVs. Interestingly, data from breakpoint sequences and experimental systems all point to a mitotic, rather than meiotic, cell origin. Errors in DNA replication, rather than meiotic homologous recombination are predicted to give rise to the observed breakpoint junctions (Figure 1). In addition, inhibitors of replication that create conditions of “replication stress” can induce similar CNVs experimentally [14-16], as discussed in detail below. The simplest model that is consistent with both breakpoint sequence data and induction by replication stress is one whereby there is aberrant restoration of stalled replication forks. The molecular mechanisms involved thus undoubtedly include pathways such as cell cycle checkpoints to restore normal replication and prevent CNV formation, as well as DNA replication and repair factors that create the CNV lesion. A number of pathways have been suggested for the latter. These possible mechanisms include simple rejoining of two or more double strand breaks by nonhomologous end-joining (NHEJ) or the related microhomology-mediated end-joining (MMEJ) pathway, and mechanisms involving template switching events [14,15,18,29-31]. Most notable of the latter are the models of “Fork Stalling and Template Switching” (FoSTeS) of Lee et al. [30] and a modification termed microhomology-mediated break-induced replication (MMBIR) proposed by Hastings et al. [29]. These models are based on template switching mechanisms proposed for stress-induced amplifications in E. coli [32] and at sites of stalled replication in yeast [33-35] and BIR events principally described in yeast [36]. In the FoSTeS model, replication forks encountering low-copy repeats or areas that are difficult to replicate are prone to stalling, leading to a switch to another active fork to bypass the DNA lesion or to resume replication. The MMBIR model invokes template switching repair of single-sided DSBs formed at collapsed replication forks into regions of microhomology rather than the longer homology typically observed at BIR-mediated events [36]. These models provide an appealing explanation of the molecular mechanisms involved in non-recurrent CNV formation. However, most of the data to support these models come from observations of breakpoint junction sequences of CNVs found in normal genomes and arising in patients. There is a clear need for rigorous experimental testing of the possible models explaining CNV formation.
Figure 1.
Possible mechanisms involved in the prevention and formation of nonrecurrent CNVs. Exposure of cells to conditions that cause replication stress will result in stalled replication, which in turn might lead to fork collapse and result in a single-ended double strand break (Top). Either of these structures will activate a number of DNA damage checkpoint and repair pathways, which should faithfully restore replication at the site of the stalled or collapsed fork (Left). These checkpoint and repair pathways serve to protect the integrity of the genome, preventing the formation of CNVs and other structural variants. However, stalled replication or collapsed replication forks could also lead to restart or repair via alternate pathways. A stalled fork may be inaccurately restarted at a distant site, using a template switching or MMBIR pathway, giving rise to a CNV. Long range end-joining of two distant DNA breaks could also lead to deletions of large amounts of intervening sequence, resulting in a CNV (Right). It is expected that mutations that inhibit a cell’s ability to properly respond to a stalled or collapsed fork will result in an increased CNV frequency
Replication stress induces CNVs
Our laboratory has published direct experimental evidence that aberrant replication can induce a high frequency of CNVs in cultured mammalian somatic cells. We found that inhibiting replication with the DNA polymerase inhibitor aphidicolin (APH) induces a high frequency of de novo CNVs that mimic non-recurrent human CNVs in size, distribution and breakpoint structures [14,16]. This finding arose from studies of aberrations induced by replication stress at chromosome fragile sites. Treatment of cells with low doses of APH is highly effective in inducing expression of common fragile sites. We found that treatment of a human chromosome 3 somatic cell hybrid cell line with doses of APH used to induce fragile sites gave rise to a high frequency of CNV-like deletions of tens to hundreds of kb in the FRA3B fragile site that mimic those frequently found in tumor cells [16]. The breakpoints of these APH-induced CNVs all showed microhomologies, blunt ends or short insertions, which were concurrently being found in human non-recurrent CNVs. When this approach was applied to normal human fibroblasts, we found that APH induced CNVs across the human genome that resembled human non-recurrent CNVs in size, structure and breakpoint sequences. These CNVs were distributed throughout the genome with most (81%) found in regions containing genes [14].
These results strongly suggested that replication fork stalling is mechanistically responsible for these CNVs and, furthermore, that any agent that leads to replication stress could be a risk factor for their induction. To begin to test this, we performed a series of experiments using the mechanistically-distinct and clinically-relevant replication inhibitor, hydroxyurea (HU). HU leads to replication stress via inhibition of ribonucleotide reductase and perturbation of nucleotide pools [37], resulting in stalled replication and DNA double strand breaks. In addition to its well-studied properties as a replication inhibitor, HU is an important drug, especially for treatment of sickle-cell disease. Chronic HU treatment leads to increased expression of fetal hemoglobin, possibly through direct stimulation of cellular nitric oxide and cGMP signaling in erythroid progenitors [37-41], which results in reduced erythrocyte sickling and amelioration of disease severity [42]. As a result, HU-treated patients have fewer vaso-occlusive events and require fewer transfusions and hospitalizations [42]. HU treatment is therefore an effective drug for long-term management of sickle cell disease and it is effective for a number of other disorders, including certain cancers, myeloproliferative disorders, thalassemias and HIV infection. Thus, in addition to allowing a test of our replication stress hypothesis, HU is an important drug for many thousands of individuals.
These experiments showed that HU, at doses equivalent to the peak serum levels achieved in sickle cell patients, is also a potent inducer of CNVs in cultured normal human fibroblasts [15]. The sizes, structures and breakpoint junction sequences of HU-induced CNVs were consistent with APH-induced CNVs [14,16,43] and the non-recurrent class of normal and pathogenic CNVs [5,8,17-19,30,44-46]. It is notable that the sizes of CNVs induced by APH and HU are the same as those that arise spontaneously during culture, indicating that exogenous replication stress is not inducing a new type of event, but rather increasing the incidence of events that occur at a measurable frequency during normal cellular growth. While replication stress induced CNVs throughout the genome in these experiments, there were also hotspots where distinct overlapping CNVs were found. These include hotspots at 3q13.31 near the LSAMP gene, a deletion hotspot in cancers and cancer cell lines [47-49], at 16q23.3 in the WWOX locus, and at 7q11.2 spanning AUTS2, a gene deleted in some cases of autism and other neurological disorders [50-52]. These hotspots coincide with the location of chromosomal fragile sites [15,53,54], strongly suggesting a mechanistic link between the events leading to fragile site expression and these CNVs. Notably, the most significant CNV hotspot we observed in human fibroblasts, at 3q13.31, corresponds to a fragile site that was recently shown by Le Tallec et al. [55] to be highly expressed in fibroblasts, but not lymphoblasts, in a manner that correlated with reduced levels of replication origin firing in this region and cell type.
HU is now the second agent experimentally shown to induce CNVs. Although HU and APH impair DNA replication via different mechanisms, both agents induce CNVs with similar frequencies and size distributions that are identical to many normal and pathogenic CNVs. These results strongly support a common mechanism mediated by replication stress for the formation of the non-recurrent class of CNVs found in vivo and those induced experimentally. They also suggest that any agent or condition that leads to replication stress has the potential to induce deleterious, de novo CNVs. They also have direct implications for HU therapy. HU is FDA-approved for the treatment of sickle cell disease and has clear benefits for the treatment of sickle cell patients, as well as those with some types of cancer, myeloproliferative disorders, thalassemias and HIV infection [56]. While HU is well-tolerated and has low toxicity in patients, reproductive studies are limited and the long-term effects of HU on the genomes of subsequent generations have not been evaluated. The observation that HU induces CNVs in cultured cells, at concentrations equivalent to the peak serum levels achieved in sickle cell patients [57,58] and does so in one or two cell divisions, strongly suggests that further studies are necessary. In particular, the intergenerational, germline effects of HU and other replication inhibitors should be determined to directly test the replication stress hypothesis for CNV formation in vivo and to further assess the potential risk for submicroscopic genomic structural changes in the genomes of HU-treated patients and their future generations.
Risk Factors for CNV formation
Unlike other types of mutations studied for decades, our current knowledge of the mechanisms involved in CNV formation only allow us to begin to identify potential genetic and environmental risk factors (Table 1). For recurrent CNVs formed by NAHR during meiosis I, the greatest risk factor thus far identified is variation in genomic architecture, including the orientation and size of segmental duplications. Such structural polymorphisms impact the likelihood that NAHR will create a de novo CNV in these regions [59,60]. There does not appear to be a parental origin bias for recurrent CNVs [61] and the importance of variation in genes involved in meiotic recombination is unknown. In addition, nothing is known about environmental factors that could influence meiotic NAHR that could lead to CNVs.
For non-recurrent CNVs, the mitotic cell origin hypothesis has important implications for the genetic and environmental factors involved in their formation. For example, males complete ongoing mitotic divisions leading to mature germ cells throughout adulthood while females do so during fetal development. We thus predicted a male sex bias in risk for de novo, non-recurrent CNVs, coupled with a possible age effect [14,15]. The recent studies of Hehir-Kwa et al. [62] and Sibbons et al. [61] support this prediction. These two groups determined the parent of origin of rare CNVs associated with intellectual disability and found that the majority of non-recurrent CNVs, or CNVs not mediated by segmental duplications, originated on the paternal allele. In addition, agents that perturb replication may be a factor in producing CNVs in the maternal grandchildren of females exposed during pregnancy. The mitotic origin hypothesis also predicts that CNVs will arise frequently in post-zygotic somatic cells, leading to somatic mosaicism within or between tissues. Indeed, substantial evidence exists for somatic mosaicism of pathogenic CNVs, such as in the NF1 and DMD genes [44,63,64], and for apparently benign CNVs in identical twins [65], and different tissues within individuals [66,67].
Because the precise molecular mechanisms involved in producing non-recurrent CNVs are not well understood, it is more difficult to precisely predict genetic risk factors, other than parental origin. If NHEJ is mechanistically involved, the genetic factors are well known and variation in those genes could be tested. However, the genetic factors involved in MMEJ and template switching in mammalian cells have not been identified. Thus, tests for the influence of variation in these genes are not currently possible. We can speculate that variation in genes in DNA damage checkpoint pathways that respond to replication stress to prevent CNV formation can potentially influence non-recurrent CNV risk. Based on our current knowledge, we can also predict from our findings in a cell culture model that agents or conditions leading to replication stress are good candidates for risk factors for inducing non-recurrent CNVs in both the germline and somatic cells in vivo. Nevertheless, aside from a handful of well-characterized laboratory reagents, we currently have a very poor understanding of what these agents are and the scope of the risk for de novo CNVs resulting from environmental agents since, unlike direct DNA damaging agents, comprehensive genomic studies defining agents that cause replication stress and their effects are lacking. These results demonstrate the importance of identifying and studying the effects of such agents on our genomes, both in experimental models and directly in human populations in order to better understand the risks of replication stress for de novo CNVs.
Conclusions and Perspectives
The development and implementation of high resolution, genome-wide analyses over the past decade has revealed that the human genome contains much more structural variation than previously realized. Major progress has been made showing that CNVs are important factors in human genomic variation and in genetic disease. From these studies, we have learned that there are distinct classes of CNVs, as defined by the molecular mechanisms responsible for their formation, and which therefore dictate the risk factors involved for new mutation. Recurrent CNVs have been shown to be the result of NAHR, a well-understood mechanism resulting in rearrangements during meiosis. Surprisingly, the large class of non-recurrent CNVs appears to have a mitotic cell origin, associated with replication stress. While several models have been proposed to explain this class of CNV, there is little experimental evidence to explain how they are formed. Current challenges include experimentally defining these mechanisms and beginning to identify the risk factors for all classes of CNVs.
