Ultradeep sequencing of genomes permits the detection of very low-level genomic instability in non-neoplastic tissues of patients with the most common form of inherited colorectal cancer.
Abstract
Early detection of cancer-associated genomic instability is crucial, particularly in tumour types in which this instability represents the essential underlying mechanism of tumourigenesis. Currently used methods require the presence of already established neoplastic cells because they only detect clonal mutations. In principle, parallel sequencing of single DNA filaments could reveal the early phases of tumour initiation by detecting low-frequency mutations, provided an adequate depth of coverage and an effective control of the experimental error. We applied ultradeep sequencing to estimate the genomic instability of individuals with hereditary non-polyposis colorectal cancer (HNPCC). To overcome the experimental error, we used an ultraconserved region (UCR) of the human genome as an internal control. By comparing the mutability outside and inside the UCR, we observed a tendency of the ultraconserved element to accumulate significantly fewer mutations than the flanking segments in both neoplastic and nonneoplastic HNPCC samples. No difference between the two regions was detectable in cells from healthy donors, indicating that all three HNPCC samples have mutation rates higher than the healthy genome. This is the first, to our knowledge, direct evidence of an intrinsic genomic instability of individuals with heterozygous mutations in mismatch repair genes, and constitutes the proof of principle for the development of a more sensitive molecular assay of genomic instability.
Author Summary
In hereditary non-polyposis colorectal cancer (HNPCC), a germline mutation in one allele of a gene responsible for repairing DNA damage predisposes the host to cancer, because subsequent somatic inactivation of the one wild-type allele leads to genomic instability that favours tumourigenesis. Nonneoplastic tissues of HNPCC individuals are believed to repair DNA normally, as they are heterozygous and thus are thought to be genomically stable. However, methods used to date are known to be incapable of detecting very low levels of genome instability. Here, we present a more sensitive procedure based on the resequencing of a HNPCC genomic region using next-generation sequencing technology. With this approach, we show that genomic instability is in fact detectable in nonneoplastic tissues of HNPCC patients compared with healthy donors. This constitutional instability may predispose them to acquiring the second somatic mutation event needed for cancer development.
Introduction
Genomic instability is a common trait of cancer cells and plays a pivotal role in promoting carcinogenesis in several hereditary tumours. One of the best-known examples is the Lynch syndrome, an autosomal dominant condition associated with heterozygous mutations in mismatch repair (MMR) genes [1]. During their lifespan, individuals affected by the Lynch syndrome undergo somatic inactivation of the second allele that causes the impairment of the MMR machinery and the onset of the “mutator phenotype” [2]. The tumourigenic process starts when mutations hit oncogenes and/or tumour suppressors, often in actively renovating tissues such as endometrium, ovary, and colon. In the latter case, the genetic condition is known as hereditary non-polyposis colorectal cancer (HNPCC), which represents the most common form of inherited colorectal cancer [3]. A hallmark of MMR deficiency is microsatellite instability (MSI), which measures the accumulation of insertions and deletions (indels) at repeated regions of the genome. Since more than 90% of HNPCC show MSI [4],[5], this has become a common diagnostic marker of MMR deficiency. Recently, large-scale mutational screenings returned the first estimations of the mutation frequency, which is the number of mutations per genome unit, associated with coding and noncoding sequences of cancer genomes [6]–[9]. These studies measured a higher proportion of base substitutions in MMR-deficient compared to MMR-proficient cancers [6]. Both MSI and large-scale mutational screenings only reveal mutations occurring in most cancer cells, namely in an expanded clonal population, while neglecting low-frequency substitutions. The returned picture is a “static snapshot” of the cancer genome in which only the tip of the iceberg (i.e., clonal mutations) is captured. The detection of low-frequency mutations in addition to clonal mutations is instrumental to clarify controversial aspects of cancer genetics. For example, the high sensitivity needed to find nonclonal mutations helps to trace the appearance of the mutator phenotype, thus clarifying the role of genomic instability during the early stages of carcinogenesis. So far, technical limitations prevented the detection of low-frequency mutations, since traditional sequencing procedures cannot reach the required level of sensitivity. In past years, several approaches have been explored to overcome this problem, often based on complex experimental settings [10],[11]. In principle, next-generation sequencing technologies could offer a valid solution, as they rely on amplification and sequencing of distinct DNA filaments. Because sensitivity of these methods increases with coverage, rare mutations should become detectable by performing an ultradeep resequencing of a given DNA region. The obvious drawback is connected with specificity: at deep coverage, low-frequency substitutions are an indistinguishable mixture of technical errors and true mutations, which makes it hard to distinguish true signal from noise. One possible solution to overcome technical errors is to use internal controls, i.e., genomic elements that do not accumulate true mutations so that all substitutions observed in these regions are bona fide errors. Ultraconserved regions (UCRs) of the human genome constitute a possible repository of such immutable segments. UCRs are genomic elements longer than 200 base pairs (bp), 100% identical between human, mouse, and rat, and significantly depleted in SNPs [12] and copy number variants [13] within the human population. Although mice lacking UCRs are in general viable and fertile [14], these regions undergo purifying selection [15] even stronger than nonsynonymous sites [16]. UCRs seem to have ideal features to be exploited as a control for the experimental errors of DNA amplification and sequencing. The working hypothesis is that by comparing the mutability of UCRs with that of genomically unstable regions, the higher mutation rate of the latter should become eventually detectable. This model works only under two assumptions. The first one is that UCRs are conserved, not only in germline, but also in somatic cells. Recently, an altered expression of some UCRs has been reported in leukaemia and carcinomas [17], and two out of six SNPs that are present in UCRs show significant association with familial breast cancer risk [18]. Both these studies suggest that UCRs may play a role also in adult cells, and therefore, they might be under somatic selection. The second assumption is that the cancer mutation rate is higher or at least comparable to the experimental error rate, because only in this case can the difference in mutability be appreciated. This seems a plausible assumption, given the current estimations for the cancer-associated mutator phenotype [10],[11].
As a proof of principle of this analytical approach, we resequenced more than 45,000 distinct DNA filaments of an ∼1,500-bp genomic segment centred on a carefully selected UCR. The region derived from three different tissues of patients affected by HNPCC: neoplastic colon mucosa, nonneoplastic colon mucosa, and peripheral blood. As a negative control, we used the peripheral blood of nine healthy donors. To amplify and sequence each sample, we used emulsion PCR followed by pyrosequencing [19]. This method offers, to date, the best compromise between sufficiently long reads and low error rate in miscalled bases [20]. The depth of coverage that we reached allowed us to detect genomic instability in neoplastic as well as in nonneoplastic HPNCC samples, offering the first, to our knowledge, evidence of constitutional genomic instability of these individuals.
Results
UCR Selection, Amplification, and Sequencing
Starting from 481 UCRs [12], we restricted the analysis to the 307 regions detectable in seven fully sequenced vertebrates (human, mouse, rat, cow, chicken, frog, and fugu). We enlarged all UCRs in both directions to allow the inclusion of nonconserved sequences. The resulting extended UCRs (eUCRs) were composed of the ultraconserved core and nonconserved flanking regions. All 307 eUCRs were screened for genomic and functional properties that would favour the detection of a difference in mutability between the ultraconserved core and the flanking segments (Table S1). The best candidate was eUCR41, a 1,493-bp-long region centred on a 217-bp-long ultraconserved core (Figure 1A). This extended region bears two SNPs frequent in the European population, has no coding activity, and is located in a gene desert. Although the role of UCR41 is unknown, it has been reported to drive gene expression in the mouse embryo [21] and might be transcribed in adult cells [17]. We verified that homopolymers in eUCR41 are shorter than 10 bp and contribute for only a small portion of the entire region (∼8.2%). In addition, the base composition is similar inside and outside the ultraconserved core (Figure 1B).
We extracted the DNA from the neoplastic colon mucosa, nonneoplastic colon mucosa, and peripheral blood of nine HNPCC patients with known germline mutations in either MLH1 or MSH2 genes. All tumour samples, six adenocarcinomas, and three adenomas, were verified to display high degree of MSI (Table S2). As a negative control, we used the peripheral blood of nine healthy donors. To amplify eUCR41, we divided the region into 11 overlapping segments (Figure 1A) and reduced the PCR errors by using the highest fidelity DNA polymerase available to date [22]. To uniformly cover the region and minimize the contribution of single individuals, we pooled equimolar ratios of all amplicons from the different tissues types of each individual into four distinct samples: cancer colon (CC), nonneoplastic colon (NC), peripheral blood leukocytes (PBL), and healthy peripheral blood leukocytes (H-PBL). Each sample was sequenced on both sides using a fully dedicated run of ultradeep pyrosequencing [19]. This allowed sequencing of more than 83 million single bases per sample, corresponding to an average coverage of more than 45,000 reads/base pair (Figure S1, Table 1). After aligning all obtained reads to the reference sequence, we measured the substitution frequency at each position, defined as the percentage of reads bearing a nucleotide different from the reference. We distinguished between high (>0.1%) and low (<0.1%) frequency substitutions (Table 1), according to the estimated detection power of the method [23],[24].
Table 1. Results of the ultradeep sequencing screening.
Sample | Total Reads | Total Bases | Average Read Length (bp) | Aligned Reads | Positions with High Substitution Frequency (>0.1%) | Positions with Low Substitution Frequency (<0.1%) | ||
SNPs | Clonal Mutations | Errors | ||||||
CC | 460,584 | 89,958,949 | 195.3 | 99.8% | 2 | 2 | 20 | 1,221 |
NC | 429,940 | 83,376,393 | 193.9 | 98.9% | 2 | 0 | 18 | 1,215 |
PBL | 496,358 | 96,210,962 | 193.8 | 99.8% | 2 | 0 | 35 | 1,151 |
H-PBL | 459,691 | 88,625,322 | 192.8 | 99.4% | 2 | 0 | 38 | 1,157 |
For each sample, the total number of sequence reads and sequenced bases are shown, together with the average length of the reads and the percentage of reads aligned to the reference sequence. The latter correspond to the fraction of reads that passed the quality filter of 454 sequencing. Reported also are the positions of eUCR41 with substitutions at high (>0.1%) and low (<0.1%) frequency. The threshold of 0.1% represents the detection power of 454 sequencing.
Analysis of High-Frequency Substitutions
After manual inspection, we discarded all but four high-frequency substitutions (Table 1). Errors were mostly generated by incorrect indels in proximity of polynucleotide stretches, often at the end of the reads where the sequencing performance decreases (Table S3). Indels caused misalignments between the reads and the reference sequence, which resulted in false substitutions (Figure S2).
Of the four high-frequency mutations that passed the manual inspection, two are the known SNPs detectable in all four samples and two are G∶C to A∶T clonal somatic transitions only present in sample CC (Figure 2A). We genotyped eUCR41 in all analyzed individuals (Table S4) and confirmed that the minor allele frequency (MAF) of the two SNPs obtained with 454 sequencing was comparable with that inferred from Sanger sequencing (Table 2). This confirms that amplicons from the nine individuals were pooled in equimolar ratios in all four samples and that all of them contributed uniformly to the results. Sanger sequencing also showed that the two somatic mutations are detectable in heterozygosis in two different patients (patients 5 and 6, Table S4). From the substitution frequency obtained from pyrosequencing (Table 2), we could infer that mutations 871 and 1,095 occur in 37.0% and 23.4% of the corresponding PCR products, respectively. Considering that both are heterozygous, these mutations are present in about 74% and 47% of the diploid cancer genomes of patients 6 and 5, respectively. They therefore reflect the expansion of the dominant neoplastic clones. Further experimental validations are needed to assess whether these two clonal mutations are driver or passenger. The fact that both correspond to the wild-type nucleotide in mouse (A∶T) suggests that they might be tolerated, and hence hitchhiked, during clonal expansion.
Table 2. MAF of the high-frequency mutations in eUCR41.
Mutation | MAF in HNPCC Patients | MAF in Healthy Donors | ||||
Sample CC (454) | Sample NC (454) | Sample PBL (454) | Sanger | Sample H-PBL (454) | Sanger | |
SNP 286 (A/G) | 13.7% | 10.6% | 12.0% | 11.1% | 4.5% | 5.5% |
SNP 1204 (A/G) | 40.0% | 42.0% | 38.0% | 38.9% | 32.6% | 33.3% |
MUT 871 (G/A) | 4.1% | — | — | — | — | — |
MUT 1095 (G/A) | 2.6% | — | — | — | — | — |
For both SNPs and somatic mutations (MUT), the MAF in all samples is reported, as derived from 454 and Sanger sequencing. In the case of 454, MAF was calculated as the percentage of reads bearing the minor allele in each sample. In the Sanger screening, it corresponds to the fraction of minor alleles detected in the nine patients and in the nine healthy donors. Sanger genotyping confirmed that the two clonal mutations in sample CC are heterozygous mutations present in two different patients. Combining this information with the frequency in the 454 screening, it is possible to infer that these mutations are present in about 74% and 47% of the cells of the two patients, respectively.
Because indels at homopolymers are a major source of sequencing errors in the 454 platform ([20] and Figure S2), we ignored this type of modification in our analysis. Despite the high rate of indels in all four samples, the only two 9-bp-long polyAs of eUCR41 are significantly more instable in the HNPCC samples than in the healthy control (Table S5).
Instability of HNPCC Neoplastic and Nonneoplastic Genome
Low-frequency substitutions (<0.1%) likely consist of an indistinguishable mixture of nonclonal true mutations and errors that have been introduced during DNA amplification and pyrosequencing. Similarly to what we did for high-frequency substitutions, we excluded indels from the analysis to reduce the impact of 454 sequencing errors. The pattern of these substitutions is different, and their frequency is lower (Table S6) than the recently estimated contribution of PCR errors [25]. This is likely due to the fact that we used the polymerase with the lowest error rate compared to all other thermostable polymerases with 3′-5′ proofreading activity [22],[26],[27]. We used all low-frequency substitutions to measure the mutability of eUCR41, defined as the substitution frequency over the entire region (see Materials and Methods). To verify whether UCR41 is conserved also in cancer cells, we dynamically scanned the mutability within eUCR41 using sliding windows as long as UCR41. Whereas nonconserved segments of eUCR41 always show mutability higher than average, mutability decreases for increasing values of sequence conservation and reaches the minimum in correspondence of the ultraconserved core (Figure 2B). To assess the significance of the inverse correlation between mutability and sequence conservation, we compared the distribution of substitution frequency within the ultraconserved core with that of the flanking regions. We found that the two distributions differ significantly in neoplastic and nonneoplastic HNPCC samples, but not in healthy donors (Table 3). To exclude a possible bias due to the differences in length and, although minimal (Figure 1B), in base composition between UCR41 and its flanking segments, we measured the mutability ratio (μ) between flanking regions and UCR41 in all four samples. Each observed value was then compared to the expected distribution of mutability ratios after 1,000,000 random permutations. This comparison showed that base substitutions occur significantly more frequently in the flanking regions than in the ultraconserved core in all HNPCC samples but not in healthy donors (Figure 3).
Table 3. Substitution frequency and mutability outside and inside UCR41.
Sample | All Data | Data after Removing Potential Errors | ||||||||
Positions with Low Substitution Frequency | Median Substitution Frequency (×10−3) | p-Value | Positions with Low Substitution Frequency | Median Substitution Frequency (×10−3) | p-Value | |||||
Outside | Inside | Outside | Inside | Outside | Inside | Outside | Inside | |||
CC | 1,044 | 177 | 7.98 | 5.85 | 3×10−7 a | 729 | 115 | 9.11 | 7.23 | 7×10−4 a |
2×10−7 b | 4×10−4 b | |||||||||
NC | 1,038 | 177 | 8.38 | 7.14 | 6×10−5 a | 704 | 131 | 9.21 | 7.24 | 6×10−5 a |
3×10−5 b | 3×10−5 b | |||||||||
PBL | 979 | 172 | 7.82 | 5.28 | 8×10−7 a | 655 | 114 | 8.73 | 6.95 | 5×10−5 a |
4×10−7 b | 3×10−5 b | |||||||||
H-PBL | 985 | 172 | 10.35 | 8.14 | 0.09a | 672 | 111 | 10.85 | 10.34 | 0.99a |
0.05b | 0.50b |
For each sample, the number of positions with low substitution frequency and the median substitution frequency outside and inside UCR41 are reported, considering all data and after removing potential errors. At such a low substitution frequency, it is not possible to directly compare substitution frequencies between different samples because of the high contribution of run-specific errors. When the distributions of substitution frequency outside and inside UCR41 are compared in each sample, it becomes clear that they differ significantly in all three HNPCC samples, but not in H-PBL.
Two-tailed Wilcoxon test.
One-tailed Wilcoxon test (alpha value = 0.05).
Control for Possible Amplification and Sequencing Errors
Because we rely on low-frequency substitutions for estimating genomic instability, it is instrumental to control for possible sources of noise that could invalidate our results. We therefore reanalysed the data after filtering for typical errors of the 454 platform. First, we removed all stretches of homopolymers (n>3) and two flanking bases on both sides, which are known to accumulate pyrosequencing artefacts [25]. Second, we removed all reads hosting at least one uncalled base, since they are prone to errors [28]. Finally, we discarded all substitutions occurring only in one read, which bear most random errors [24]. After removing all potential errors, the difference in substitution frequency (Table 3), as well as in mutability (Figure 3) between outside and inside UCR41 remains significant in all HNPCC samples and not significant in H-PBL. The same holds true when we applied the three filters separately (Table S7).
Although we used the highest fidelity polymerase, we further controlled whether PCR errors could have any impact on our results. We estimated that ∼12%–15% of low-frequency substitutions could be errors introduced by the DNA polymerase. After randomly removing a comparable fraction of substitutions in all four samples, we again observed higher mutability outside than inside UCR41 in HNPCC and no difference in H-PBL (Table S8). This test clearly excludes that PCR errors impacted in a significant manner on the observed difference in mutability between the UCR core and its flanking regions.
Direct Comparison of HNPCC and Healthy Samples
Due to the occurrence of run-specific errors in the 454 platform [29], substitution frequencies of different samples cannot be compared directly. Instead of substitution frequencies, we compared the mutability ratios, which exploit the ultraconserved element to normalize the sample-specific errors. In particular, we compared the observed difference in mutability ratio between each of the HNPCC samples and H-PBL with the corresponding expected distribution. Also in this case, we performed 1,000,000 random permutations to compute expected differences in mutability ratios. In all three comparisons, the difference in the mutability ratio was significantly higher than expected using both raw and filtered data (Figure 4). This result provides further evidence that both neoplastic and nonneoplastic tissues from HNPCC patients accumulate more mutations than tissue from healthy individuals.
Altogether, our data verify our initial assumption that UCR41 is maintained ultraconserved also in somatic cells, and it can be therefore used to normalize the experimental errors. At deep coverage, the mutation rate of the HNPCC genome allows detection of an increased occurrence of mutations in the flanking segments when compared to the ultraconserved core. No increase is detectable in the sample H-PBL, although UCR41 is very likely also to be conserved there. In this case, the mutation rate of the healthy human genome is so low that sequencing errors overcome true mutations in the entire region. The different behaviour between HNPCC and healthy samples becomes more evident when the contribution of random errors decreases. When we removed positions with substitutions at increasing values of frequency, the mutability ratio increases in all HNPCC samples, but not in H-PBL, where it is always around 1 (Figure 5). This result also excludes that the mutability ratio of the normal sample is due to a casual and nonhomogenous distribution of low-frequency substitutions between the ultraconserved core and the flanking segments.
Sensitivity and Specificity in Detecting Rare Substitutions
In order to experimentally assess the error rate associated with pyrosequencing, we performed a controlled dilution experiment in which an amplicon carrying a single mutation (G, corresponding to the SNP at position 1,204, Figure 1A, Table S4) was diluted with the corresponding wild-type amplicon (A). At each step of the four controlled dilutions (1∶1,000; 1∶2,000; 1∶5,000; and 1∶10,000), wild-type and mutant amplicons were first quantified separately to control for experimental inaccuracy and then pooled. The four samples were sequenced using four distinct lanes. Although the expected coverage was 70,000 reads/lane, we obtained around double the amount of reads for each sample, which indicates an optimal experimental setting (Table 4). By plotting the observed frequency of the mutated allele against the corresponding dilution, we observed a strict linear correlation (R 2>0.99) also for the most extreme dilution (Figure S3 and Table 4). This result assesses the high sensitivity of our procedure in detecting very rare mutations. The dilution experiment also allows an estimation of specificity, defined as the fraction of correct positions over the total sequenced positions. In the sequenced region, specificity starts to decrease for substitution frequencies lower than 0.05% (Table 4). Since specificity depends on the sequence composition and complexity, it is reasonable to think that the lower bound of specificity is different for longer and more complex regions. This supports the mandatory usage of an internal normalization of the experimental error, when substitutions at very low frequency are considered. Interestingly, the few positions with substitution frequency between 0.1% and 0.05% (less than 18 in all four samples) show an overall frequency higher in sample CC than in sample H-PBL, also without using UCR41 as an internal control (p-value = 8×10−3, Wilcoxon text). This again confirms that the signal improves by removing random errors (Figure 5).
Table 4. Sensitivity and specificity in detection of rare mutations.
Dilution | Total Reads | Mutated Reads (G) | Observed Frequency (%G) | Expected Frequency (%G) | Positions with Errors | Specificity |
1∶1,000 | 151,118 | 110 | 0.073 | 0.1 | 0 | 1.00 |
1∶2,000 | 148,990 | 56 | 0.038 | 0.05 | 0 | 1.00 |
1∶5,000 | 144,307 | 30 | 0.021 | 0.02 | 6 | 0.96 |
1∶10,000 | 161,921 | 19 | 0.012 | 0.01 | 24 | 0.85 |
For each dilution value, the total number of sequenced reads, the number of reads bearing the mutated allele (G), and the observed and expected substitution frequency are reported. We considered errors all positions showing a substitution frequency equal to or higher than the corresponding frequency of the mutated allele. This allowed measuring of the specificity, defined as the number of true negatives (156−errors) over all variable positions (156).
Discussion
We exploited the frozen status of UCR41 to increase sensitivity and specificity of ultradeep sequencing and hence quantify cancer-associated genomic instability. The obtained results offered several insights into cancer genetics. We provided the first indication that an ultraconserved element does not accumulate mutations in somatic cells also in conditions of genomic instability. This result suggests that genomic instability is not constant in all regions of the cancer genome and that certain genomic portions are utterly preserved from modifications even in advanced tumoural stages such as carcinoma. It remains to be verified whether all UCRs are under the same somatic conservation and which are the reasons for it. In the case of UCR41, the extreme conservation could be a sign of strong purifying selection. UCR41 seems to be involved in a variety of different functions. It drives the expression of reporter gene in mouse embryos, [21], and gets transcribed into noncoding RNAs in adult tissues [17]. In addition, UCR41 is located upstream to PROX1, a gene that acts as a tumour suppressor in breast and pancreatic cancers [30],[31], hepatocellular carcinomas [32] and lymphomas [33]. Recently, PROX1 has been shown to promote tumour growth and malignant progression in colorectal cancers [34]. Finally, the region between UCR41 and PROX1 can undergo genomic rearrangements that have been associated with heart defects [35]. Altogether, these observations may indeed indicate that UCR41 is under functional constraints in both germline and somatic cells, although the alternative hypothesis of UCR41 as a cold spot for mutations, as proposed for other UCRs [14], cannot be completely ruled out. Whatever the biological reason for the somatic conservation of UCR41 may be, we proved that it can be used as an internal control for the sequencing errors, thus increasing the sensitivity in the detection of genomic instability.
This increased sensitivity led to the observation that the genome of nonneoplastic HNPCC cells has a constitutional mutation rate higher than MMR proficient genomes and, therefore, it is deficient in repairing DNA (Figure 4). Despite sporadic reports of low-frequency MSI [36],[37], HNPCC nonneoplastic cells are commonly assumed to repair DNA normally [38],[39]. This was based on measures of genomic instability that required the presence of clonal mutations. These assays were able to detect instability in tumoural samples, but not in pretumoural stages in which cells do not have a clonal origin. Indicative of the difference between the two approaches is the observation that several thousands of different clones are needed to reproduce the data reported here, with the concrete possibility of cloning PCR errors. The constitutional instability of MMR+/− genomes implies that they start accumulating low-frequency substitutions before cancer transformation. This constitutional instability could predispose MMR+/− individuals to the inactivation of the second allele, which is a mandatory step to initiate carcinogenesis [38],[39]. Known mechanisms of somatic inactivation of the MMR wild-type allele include loss of heterozygosity (LOH), promoter hypermethylation and somatic mutations in the gene sequence. The relative contribution of these three main mechanisms is controversial. In general, LOH seems the most common, with a frequency that ranges from 33% to 86% of the cases [40]–[46]. Although more rarely, somatic inactivating mutations have also been reported [4],[42],[43],[46]–[50]. In addition, there are a number of cases in which none of the known inactivating mechanisms can explain MMR deficiency [46],[47]. A constitutional mutation rate higher than healthy genome could contribute to an explanation of those cases, because deleterious mutations could directly hit the gene sequence, as well as other regions important, for example, for the regulation of gene expression. Our findings highlight the importance of an early diagnosis of genomic instability for selecting the best clinical approach to monitor, prevent, and possibly slow down the progression to cancer. A molecular test to reveal cancer predisposition could also restrict invasive surveillance examinations, such as colonoscopy and/or extracolonic screening of endometrium and ovary, only to positive carriers. To date, predisposition testing in family members with the Lynch syndrome consists of genetic screening of the MMR genes to identify germline mutations [51],[52]. Our strategy constitutes the proof of principle to implement an alternative test for diagnosing cancer predisposition without any a priori knowledge of the mutated genes. Although promising, several aspects of our procedure need further investigation. It remains to be confirmed whether MMR+/− genomes of healthy carriers, (i.e., gene carriers who had not developed cancer yet) are unstable as well. So far, we have only analyzed nonneoplastic cells of HNPCC patients, which constitutes reliable, but indirect, evidence that this could indeed be the case. In addition, although the MAF inferred with Sanger was comparable with that obtained with 454 sequencing (Table 2), we cannot exclude that the mutation rate is variable even between individuals and not only between HNPCC carriers and healthy donors. We therefore need to measure genomic instability of single individuals to check for possible interindividual variability, DNA quality, and other technical factors, as well as to confirm the suitability of our approach as a genetic marker.
Materials and Methods
Ethics Statement
All individuals involved in this study agreed to and signed the informal consent form for the use of their biological samples for research purposes, approved by the local ethical committee in accordance with current Italian regulations.
UCR Selection
The genomic coordinates of 481 UCRs were derived from the hg18 release of the human genome (March 2006). The conservation between each human UCR and the corresponding orthologous element in mouse (February 2006), rat (November 2004), dog (May 2005), cow (March 2005), chicken (February 2004), and fugu (August 2002) was derived from the multiZ alignments [53]. Only 307 UCRs detectable in all seven species were retained for further analysis. These UCRs were extended on both sides up to 50% of sequence conservation, measured as the percentage of nucleotides over a 25-bp sliding window conserved in at least four of the seven species. To include also nonconserved segments, regions were further extended 500 bp on both sides. The selection of extended UCR41 (eUCR41) as the best candidate for ultradeep sequencing was done as reported in Table S1. The entire sequence of eUCR41 was divided into 11 overlapping segments (amplicons), each around 200-bp long. For each amplicon, a pair of forward and reverse primers was designed with 40%–60% of GC content and a melting temperature of 58–60°C. The UCSC in silico PCR tool was used to check that selected primers did not have spurious additional matches on the human genome. All primers were fused with ad-hoc 5′ overhangs to allow emulsion PCR and sequencing.
Sample Preparation and Sequencing
Nine HNPCC carriers were selected from the Registry of Hereditary Colorectal Cancer at the Istituto Nazionale Tumori (Milan, Italy). Heterozygous MLH1 and MSH2 mutations were detected on genomic DNA purified from peripheral blood leukocytes [54]. Nine healthy controls more than 50 years old (four males and five females) were selected among blood donors with Italian ancestry and no personal history of cancer. Tumours (six adenocarcinomas and three adenomas) and normal colonic mucosa were surgical removed and cryoconserved. Hematoxylin-eosin staining revealed that tumour areas were not heavily contaminated with normal cells, did not present necrosis, and that normal colonic mucosa was free of tumour infiltration. Tumour and matched normal DNAs were amplified by PCR using fluorescent primers followed by gel electrophoresis on a 3130 DNA Sequencer (Applied Biosystems) and fragments were analyzed using GeneScan and Genotyper software [55]. All tumour samples used for the analysis showed altered electrophoretic pattern in tumour compared with normal DNA for at least two microsatellites of the National Cancer Institute–recommended panel [56]. Genomic DNA was extracted from frozen tumours and normal mucosa using the QIAmp DNA Mini Kit and from PBL using the QIAmp DNA Blood Mini Kit (Qiagen) according to the manufacturer's instructions. Genomic DNA was amplified by PCR using the high-fidelity Pwo SuperYield DNA Polymerase (Roche). The PCR products were individually checked on agarose gel and purified using the AGENCOURT AMPure kit (Beckman Coulter) according to the manufacturer's protocol. All 99 amplicons from each tissue type (CC, NC, PBL, and H-PBL) were quantified using NanoDrop ND-1000 UV-Vis Spectrophotometer and pooled in equimolar ratio to obtain four samples (CC, NC, PBL, and H-PBL). Four independent runs of pyrosequencing were performed at 454 Life Sciences, each of them on a 70×75-mm PicoTiterPlate using the GS FLX Sequencer. Emulsion PCR and sequencing were performed as previously described [19]. Each sequence read was base called [19], filtered by quality metrics, and aligned to the human reference sequence as previously described [23]. Sanger sequencing was performed to characterize the genotype of each individual in each tissue and to identify the carriers of the two mutations in cancer. Amplicons were generated using the Pwo SuperYield DNA polymerase (Roche) and sequenced in both directions on a 3130×l sequencer, Data Collection 3.0 (Applied Biosystems), using the dRhodamine chemistry under standard conditions.
Measures of Substitution Frequency, Mutability, and Mutability Ratio
For each position of eUCR41, the number of reads bearing a nucleotide different from the reference sequence was counted. The substitution frequency at position j was defined as:
where n is the number of reads differing from the reference, and t is the total number of reads for position j. Positions with high substitution frequency (>0.1%) in all four samples were manually checked to reject possible false positives. In the analysis of positions with low substitution frequency (<0.1%), only base substitutions and no indels were considered to reduce the probability of pyrosequencing artefacts associated to insertions and deletions. Substitution frequency outside and inside UCR41 was compared using the Wilcoxon test.
The mutability of eUCR41 as well as of specific regions (i.e., ultraconserved core; flanking segments; 217-bp-long sliding windows) was defined as:
where j is the starting position and L is the length of the region. Mutability ratio (μ) was calculated as the ratio between mutability outside and inside UCR41:
To account for the putative effects of length and base composition on the mutability of UCR41 and flanking segments, a permutation test was performed in which all positions with low-frequency substitutions were randomly reassigned in each sample, keeping the same length base composition of the two regions. Permutations were repeated 1,000,000 times, and the ratio between the expected mutability outside and inside UCR41 was calculated at each round. The probability (p) of observing the experimental ratio by chance was calculated as the fraction of the expected ratios equal or higher than the observed value.
The three null distributions to test the difference of mutability ratio between cases (samples CC, NC, and PBL) and control (sample H-PBL) were also computed using a permutation test. For each comparison, all sequence positions were randomly reassigned for 1,000,000 times, again maintaining length and base composition of UCR41 and flanking regions. At each permutation, the difference in the mutability ratio was derived, and each expected distribution was compared to the corresponding observed difference.
Estimation of PCR Errors
The number of possible errors introduced by the DNA polymerase during the polymerase chain reaction (PCR errors), was first estimated and then removed from experimental data. PCR errors were quantified using two different approaches. The first one was based on the binomial probability distribution, in which the number of PCR errors X was considered a random variable that follows a binomial distribution:
where L is the length of the region, and p is the probability to accumulate errors at a given position after d duplications with a given number of errors r introduced per base pairs at each duplication:
From this model, the total number of PCR errors expected in a region L is:
The total number N of PCR errors present in n single-stranded DNA sequences will be:
In our analysis, parameters r, d, L, and n were all derived from the experimental data. The applied error rate was r = 6.5×10−7 errors/base pair/duplication [22],[57]. The number of duplications was set equal to the number of PCR cycles d = 40. The length L of the region was calculated as the number of positions unchanged or bearing low-frequency substitutions in each sample (1,431; 1,435; 1,418; and 1,415 in CC, NC, PBL, and H-PBL, respectively). The number n of single-stranded DNA sequences was taken from the number of reads of each sample (49,194; 45,383; 53,212; and 49,005 in CC, NC, PBL, and H-PBL, respectively). In the second approach, the cycles of PCR amplifications were simulated in silico using a model similar to that used for the mutation rate. Starting from one DNA double strand of length L, errors were randomly introduced at a rate r in each position of the strand at each of the d PCR cycles. Once introduced, errors were retained in all the daughter strands. At the end of the amplification, the number of PCR errors present in the n single strands of DNA sequences was derived. The procedure was reiterated 1,000 times to generate a distribution of N values. The number of estimated PCR errors returned by the two approaches is identical and is reported in Table S8.
To verify the putative effect of PCR errors on the difference in mutability originally detected between the UCR core and the flanking regions, a number of low-frequency substitutions equal to the estimated number of PCR errors in each sample was randomly removed. The procedure was repeated 1,000 times, and the distribution of observed mutability ratios between the flanking regions and the UCR core was derived. Applying the same permutation used for the real samples, the distribution of expected ratios was also derived. The results of both simulations are reported in Table S8, together with the p-values of the comparison between observed and expected distributions.
All statistical analyses were performed using the R statistical environment and ad hoc Perl scripts.
Serial Dilution
Dilution experiments were performed using the 157-bp-long segment of eUCR41 corresponding to amplicon 9, which bears a SNP in position 1,204 (SNP A/G, Figure 1A). This segment was amplified from the blood of two healthy donors showing homozygous AA and GG genotypes, respectively (Samples 13 and 14, Table S4). After amplification, the regions were purified as described above and pooled in different relative amounts. Four final dilutions were obtained with decreasing G∶A ratios (1∶1,000; 1∶2,000; 1∶5,000; and 1∶10,000; respectively). To correct for possible experimental inaccuracies during DNA quantification and pipetting, at each step of the serial dilutions, DNA quantifications of the two alleles were performed using the Victor PicoGreen fluorometer (PerkinElmer Life Sciences). The obtained values were used to calibrate the successive dilution. The DNA samples corresponding to the four dilutions were sequenced using four distinct lanes using a four-lane gasket for 70×75 PicoTiterPlate device on the GS FLX Sequencer at BMR Genomics. Specificity was measured as TN/(TN+FP). The number of true negatives (TN) was calculated as the number of correctly sequenced positions, i.e., positions with no errors at a frequency equal or higher than the frequency of the diluted allele.
Supporting Information
Acknowledgments
We thank Pier Giuseppe Pelicci, Fabrizio Capuani, David Cairns, and the members of the Ciccarelli lab for useful discussions, Fernando Ravagnani, Andrea Lampis and Stefano Signoroni for providing biological samples, and Gioacchino Natoli and Andrea Musacchio for useful comments on the manuscript.
Abbreviations
- CC
cancer colon mucosa
- eUCR
extended ultraconserved region
- H-PBL
healthy peripheral blood leukocytes
- HNPCC
hereditary non-polyposis colorectal cancer
- MAF
minor allele frequency
- MMR
mismatch repair
- MSI
microsatellite instability
- NC
nonneoplastic colon mucosa
- PBL
peripheral blood leukocytes
- UCR
ultraconserved region
Footnotes
The authors have declared that no competing interests exist.
This work was supported by the Start-Up grant of the Italian Association for Cancer Research (AIRC), by the Cariplo Foundation, and by Italian Ministry of Health to FDC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Peltomaki P, de la Chapelle A. Mutations predisposing to hereditary nonpolyposis colorectal cancer. Adv Cancer Res. 1997;71:93–119. doi: 10.1016/s0065-230x(08)60097-4. [DOI] [PubMed] [Google Scholar]
- 2.Loeb L. A. Mutator phenotype may be required for multistage carcinogenesis. Cancer Res. 1991;51:3075–3079. [PubMed] [Google Scholar]
- 3.Lynch H. T, de la Chapelle A. Hereditary colorectal cancer. N Engl J Med. 2003;348:919–932. doi: 10.1056/NEJMra012242. [DOI] [PubMed] [Google Scholar]
- 4.Aaltonen L. A, Peltomaki P, Mecklin J. P, Jarvinen H, Jass J. R, et al. Replication errors in benign and malignant tumors from hereditary nonpolyposis colorectal cancer patients. Cancer Res. 1994;54:1645–1648. [PubMed] [Google Scholar]
- 5.Soreide K. Molecular testing for microsatellite instability and DNA mismatch repair defects in hereditary and sporadic colorectal cancers–ready for prime time? Tumour Biol. 2007;28:290–300. doi: 10.1159/000110427. [DOI] [PubMed] [Google Scholar]
- 6.Greenman C, Stephens P, Smith R, Dalgliesh G. L, Hunter C, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158. doi: 10.1038/nature05610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wood L. D, Parsons D. W, Jones S, Lin J, Sjoblom T, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007;318:1108–1113. doi: 10.1126/science.1145720. [DOI] [PubMed] [Google Scholar]
- 8.Jones S, Zhang X, Parsons D. W, Lin J. C-H, Leary R. J, et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008;321:1801–1806. doi: 10.1126/science.1164368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Parsons D. W, Jones S, Zhang X, Lin J. C-H, Leary R. J, et al. An integrated genomic analysis of human glioblastoma multiforme. Science. 2008;321:1807–1812. doi: 10.1126/science.1164382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bielas J. H, Loeb L. A. Quantification of random genomic mutations. Nat Methods. 2005;2:285–290. doi: 10.1038/nmeth751. [DOI] [PubMed] [Google Scholar]
- 11.Li M, Diehl F, Dressman D, Vogelstein B, Kinzler K. W. BEAMing up for detection and quantification of rare sequence variants. Nat Methods. 2006;3:95–97. doi: 10.1038/nmeth850. [DOI] [PubMed] [Google Scholar]
- 12.Bejerano G, Pheasant M, Makunin I, Stephen S, Kent W. J, et al. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. doi: 10.1126/science.1098119. [DOI] [PubMed] [Google Scholar]
- 13.Derti A, Roth F. P, Church G. M, Wu C. T. Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants. Nat Genet. 2006;38:1216–1220. doi: 10.1038/ng1888. [DOI] [PubMed] [Google Scholar]
- 14.Ahituv N, Zhu Y, Visel A, Holt A, Afzal V, et al. Deletion of ultraconserved elements yields viable mice. PLoS Biol. 2007;5:e234. doi: 10.1371/journal.pbio.0050234. doi: 10.1371/journal.pbio.0050234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Drake J. A, Bird C, Nemesh J, Thomas D. J, Newton-Cheh C, et al. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat Genet. 2006;38:223–227. doi: 10.1038/ng1710. [DOI] [PubMed] [Google Scholar]
- 16.Katzman S, Kern A. D, Bejerano G, Fewell G, Fulton L, et al. Human genome ultraconserved elements are ultraselected. Science. 2007;317:915. doi: 10.1126/science.1142430. [DOI] [PubMed] [Google Scholar]
- 17.Calin G. A, Liu C-G, Ferracin M, Hyslop T, Spizzo R, et al. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell. 2007;12:215–229. doi: 10.1016/j.ccr.2007.07.027. [DOI] [PubMed] [Google Scholar]
- 18.Yang R, Frank B, Hemminki K, Bartram C. R, Wappenschmidt B, et al. SNPs in ultraconserved elements and familial breast cancer risk. Carcinogenesis. 2008;29:351–355. doi: 10.1093/carcin/bgm290. [DOI] [PubMed] [Google Scholar]
- 19.Margulies M, Egholm M, Altman W. E, Attiya S, Bader J. S, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. doi: 10.1038/nature03959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–1145. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]
- 21.Pennacchio L. A, Ahituv N, Moses A. M, Prabhakar S, Nobrega M. A, et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006;444:499–502. doi: 10.1038/nature05295. [DOI] [PubMed] [Google Scholar]
- 22.Andre P, Kim A, Khrapko K, Thilly W. G. Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence. Genome Res. 1997;7:843–852. doi: 10.1101/gr.7.8.843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Thomas R. K, Nickerson E, Simons J. F, Janne P. A, Tengs T, et al. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med. 2006;12:852–855. doi: 10.1038/nm1437. [DOI] [PubMed] [Google Scholar]
- 24.Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer R. W. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 2007;17:1195–1201. doi: 10.1101/gr.6468307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Campbell P. J, Pleasance E. D, Stephens P. J, Dicks E, Rance R, et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci U S A. 2008;105:13081–13086. doi: 10.1073/pnas.0801523105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Flaman J. M, Frebourg T, Moreau V, Charbonnier F, Martin C, et al. A rapid PCR fidelity assay. Nucleic Acids Res. 1994;22:3259–3260. doi: 10.1093/nar/22.15.3259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Cline J, Braman J. C, Hogrefe H. H. PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res. 1996;24:3546–3551. doi: 10.1093/nar/24.18.3546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Huse S. M, Huber J. A, Morrison H. G, Sogin M. L, Welch D. M. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 2007;8:R143. doi: 10.1186/gb-2007-8-7-r143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Druley T. E, Vallania F. L, Wegner D. J, Varley K. E, Knowles O. L, et al. Quantification of rare allelic variants from pooled genomic DNA. Nat Methods. 2009;6:263–265. doi: 10.1038/nmeth.1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schneider M, Buchler P, Giese N, Giese T, Wilting J, et al. Role of lymphangiogenesis and lymphangiogenic factors during pancreatic cancer progression and lymphatic spread. Int J Oncol. 2006;28:883–890. [PubMed] [Google Scholar]
- 31.Versmold B, Felsberg J, Mikeska T, Ehrentraut D, Kohler J, et al. Epigenetic silencing of the candidate tumor suppressor gene PROX1 in sporadic breast cancer. Int J Cancer. 2007;121:547–554. doi: 10.1002/ijc.22705. [DOI] [PubMed] [Google Scholar]
- 32.Shimoda M, Takahashi M, Yoshimoto T, Kono T, Ikai I, et al. A homeobox protein, prox1, is involved in the differentiation, proliferation, and prognosis in hepatocellular carcinoma. Clin Cancer Res. 2006;12:6005–6011. doi: 10.1158/1078-0432.CCR-06-0712. [DOI] [PubMed] [Google Scholar]
- 33.Nagai H, Li Y, Hatano S, Toshihito O, Yuge M, et al. Mutations and aberrant DNA methylation of the PROX1 gene in hematologic malignancies. Genes Chromosomes Cancer. 2003;38:13–21. doi: 10.1002/gcc.10248. [DOI] [PubMed] [Google Scholar]
- 34.Petrova T. V, Nykanen A, Norrmen C, Ivanov K. I, Andersson L. C, et al. Transcription factor PROX1 induces colon cancer progression by promoting the transition from benign to highly dysplastic phenotype. Cancer Cell. 2008;13:407–419. doi: 10.1016/j.ccr.2008.02.020. [DOI] [PubMed] [Google Scholar]
- 35.Gill H. K, Parsons S. R, Spalluto C, Davies A. F, Knorz V. J, et al. Separation of the PROX1 gene from upstream conserved elements in a complex inversion/translocation patient with hypoplastic left heart. Eur J Hum Genet. 2009;17:1423–1431. doi: 10.1038/ejhg.2009.91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Parsons R, Li G. M, Longley M, Modrich P, Liu B, et al. Mismatch repair deficiency in phenotypically normal human cells. Science. 1995;268:738–740. doi: 10.1126/science.7632227. [DOI] [PubMed] [Google Scholar]
- 37.Alazzouzi H, Domingo E, Gonzalez S, Blanco I, Armengol M, et al. Low levels of microsatellite instability characterize MLH1 and MSH2 HNPCC carriers before tumor diagnosis. Hum Mol Genet. 2005;14:235–239. doi: 10.1093/hmg/ddi021. [DOI] [PubMed] [Google Scholar]
- 38.Parsons R, Li G. M, Longley M. J, Fang W. H, Papadopoulos N, et al. Hypermutability and mismatch repair deficiency in RER+ tumor cells. Cell. 1993;75:1227–1236. doi: 10.1016/0092-8674(93)90331-j. [DOI] [PubMed] [Google Scholar]
- 39.de la Chapelle A. Genetic predisposition to colorectal cancer. Nat Rev Cancer. 2004;4:769–780. doi: 10.1038/nrc1453. [DOI] [PubMed] [Google Scholar]
- 40.Hemminki A, Peltomaki P, Mecklin J. P, Jarvinen H, Salovaara R, et al. Loss of the wild type MLH1 gene is a feature of hereditary nonpolyposis colorectal cancer. Nat Genet. 1994;8:405–410. doi: 10.1038/ng1294-405. [DOI] [PubMed] [Google Scholar]
- 41.Kuismanen S. A, Holmberg M. T, Salovaara R, de la Chapelle A, Peltomaki P. Genetic and epigenetic modification of MLH1 accounts for a major share of microsatellite-unstable colorectal cancers. Am J Pathol. 2000;156:1773–1779. doi: 10.1016/S0002-9440(10)65048-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lu S. L, Akiyama Y, Nagasaki H, Nomizu T, Ikeda E, et al. Loss or somatic mutations of hMSH2 occur in hereditary nonpolyposis colorectal cancers with hMSH2 germline mutations. Jpn J Cancer Res. 1996;87:279–287. doi: 10.1111/j.1349-7006.1996.tb00218.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Potocnik U, Glavac D, Golouh R, Ravnik-Glavac M. Causes of microsatellite instability in colorectal tumors: implications for hereditary non-polyposis colorectal cancer screening. Cancer Genet Cytogenet. 2001;126:85–96. doi: 10.1016/s0165-4608(00)00399-x. [DOI] [PubMed] [Google Scholar]
- 44.Tannergard P, Liu T, Weger A, Nordenskjold M, Lindblom A. Tumorigenesis in colorectal tumors from patients with hereditary non-polyposis colorectal cancer. Hum Genet. 1997;101:51–55. doi: 10.1007/s004390050585. [DOI] [PubMed] [Google Scholar]
- 45.Tuupanen S, Karhu A, Jarvinen H, Mecklin J. P, Launonen V, et al. No evidence for dual role of loss of heterozygosity in hereditary non-polyposis colorectal cancer. Oncogene. 2007;26:2513–2517. doi: 10.1038/sj.onc.1210038. [DOI] [PubMed] [Google Scholar]
- 46.Yuen S. T, Chan T. L, Ho J. W, Chan A. S, Chung L. P, et al. Germline, somatic and epigenetic events underlying mismatch repair deficiency in colorectal and HNPCC-related cancers. Oncogene. 2002;21:7585–7592. doi: 10.1038/sj.onc.1205968. [DOI] [PubMed] [Google Scholar]
- 47.Cunningham J. M, Kim C. Y, Christensen E. R, Tester D. J, Parc Y, et al. The frequency of hereditary defective mismatch repair in a prospective series of unselected colorectal carcinomas. Am J Hum Genet. 2001;69:780–790. doi: 10.1086/323658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Konishi M, Kikuchi-Yanoshita R, Tanaka K, Muraoka M, Onda A, et al. Molecular nature of colon tumors in hereditary nonpolyposis colon cancer, familial polyposis, and sporadic colon cancer. Gastroenterology. 1996;111:307–317. doi: 10.1053/gast.1996.v111.pm8690195. [DOI] [PubMed] [Google Scholar]
- 49.Leach F. S, Nicolaides N. C, Papadopoulos N, Liu B, Jen J, et al. Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell. 1993;75:1215–1225. doi: 10.1016/0092-8674(93)90330-s. [DOI] [PubMed] [Google Scholar]
- 50.Papadopoulos N, Nicolaides N. C, Wei Y. F, Ruben S. M, Carter K. C, et al. Mutation of a mutL homolog in hereditary colon cancer. Science. 1994;263:1625–1629. doi: 10.1126/science.8128251. [DOI] [PubMed] [Google Scholar]
- 51.Vasen H. F, Moslein G, Alonso A, Bernstein I, Bertario L, et al. Guidelines for the clinical management of Lynch syndrome (hereditary non-polyposis cancer). J Med Genet. 2007;44:353–362. doi: 10.1136/jmg.2007.048991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Lynch P. M. New issues in genetic counseling of hereditary colon cancer. Clin Cancer Res. 2007;13:6857s–6861s. doi: 10.1158/1078-0432.CCR-07-1124. [DOI] [PubMed] [Google Scholar]
- 53.Blanchette M, Kent W. J, Riemer C, Elnitski L, Smit A. F, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Blasi M. F, Ventura I, Aquilina G, Degan P, Bertario L, et al. A human cell-based assay to evaluate the effects of alterations in the MLH1 mismatch repair gene. Cancer Res. 2006;66:9036–9044. doi: 10.1158/0008-5472.CAN-06-1896. [DOI] [PubMed] [Google Scholar]
- 55.Canzian F, Salovaara R, Hemminki A, Kristo P, Chadwick R. B, et al. Semiautomated assessment of loss of heterozygosity and replication error in tumors. Cancer Res. 1996;56:3331–3337. [PubMed] [Google Scholar]
- 56.Boland C. R, Thibodeau S. N, Hamilton S. R, Sidransky D, Eshleman J. R, et al. A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res. 1998;58:5248–5257. [PubMed] [Google Scholar]
- 57.Dabrowski S, Kur J. Cloning and expression in Escherichia coli of the recombinant his-tagged DNA polymerases from Pyrococcus furiosus and Pyrococcus woesei. Protein Expr Purif. 1998;14:131–138. doi: 10.1006/prep.1998.0945. [DOI] [PubMed] [Google Scholar]
- 58.Visel A, Prabhakar S, Akiyama J. A, Shoukry M, Lewis K. D, et al. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat Genet. 2008;40:158–160. doi: 10.1038/ng.2007.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.