Abstract
Mutations in two large multi-exon genes, PKD1 and PKD2, cause autosomal dominant polycystic kidney disease (ADPKD). The duplication of PKD1 exons 1–32 as six pseudogenes on chromosome 16, the high level of allelic heterogeneity, and the cost of Sanger sequencing complicate mutation analysis, which can aid diagnostics of ADPKD. We developed and validated a strategy to analyze both the PKD1 and PKD2 genes using next-generation sequencing by pooling long-range PCR amplicons and multiplexing bar-coded libraries. We used this approach to characterize a cohort of 230 patients with ADPKD. This process detected definitely and likely pathogenic variants in 115 (63%) of 183 patients with typical ADPKD. In addition, we identified atypical mutations, a gene conversion, and one missed mutation resulting from allele dropout, and we characterized the pattern of deep intronic variation for both genes. In summary, this strategy involving next-generation sequencing is a model for future genetic characterization of large ADPKD populations.
Autosomal dominant polycystic kidney disease (ADPKD) is one of the most common inherited cystic kidney diseases, with an incidence of 1 in 400 to 1 in 1000.1,2 ADPKD is characterized by the development and progressive enlargement of cysts in the kidneys and other organs, eventually leading to ESRD. ADPKD is caused by mutations at two genes, PKD1 (16p13.3) and PKD2 (4q21). PKD1 mutations account for approximately 85% and PKD2 mutations for approximately 15% of the cases in clinically well characterized cohorts.3 PKD1 patients reach ESRD approximately 20 years earlier than PKD2 patients (approximately 54 versus 74 years).4 PKD1 and PKD2 encode polycystin 1 and 2 (PC-1 and PC-2), respectively. PC-1 is a large, transmembrane protein that interacts with PC-2, a transient receptor potential channel that regulates intracellular calcium.5 Both proteins localize to the kidney primary cilium,5 and may act as a flow-dependent mechanosensor regulating the differentiation and proliferation of tubular epithelial cells.5
Within ADPKD populations, a high level of allelic heterogeneity is observed, with a total of 436 pathogenic PKD1 and 115 pathogenic PKD2 mutations reported to date, the majority of which are private to a single pedigree (ADPKD Database [PKDB], http://pkdb.mayo.edu).
Gene conversions (GCs) are unusual mutational events that cause the transfer of sequence variants from segmental duplications into the master gene, and have been proven to be disease associated.6 GCs have been previously described in ADPKD7,8 but their exact genomic origin and extent have not been characterized.
ADPKD is typically diagnosed by imaging such as ultrasonography, computed tomography, or magnetic nuclear resonance,9,10 with age-related criteria established for ultrasonography.9,11 However, a diagnosis determined by imaging may be uncertain, particularly in young individuals (aged <30 years).11 In such cases, molecular diagnostics is useful to determine a definite diagnosis.3 Molecular testing also plays a role in the evaluation of potential living related kidney donors with doubtful imaging data, in individuals with a negative family history, and in cases of early onset ADPKD.12 Furthermore, mutation characterization of clinical trials cohorts3 provides genetic stratification for the evaluation of such trials.13
The 5′ two-thirds of the PKD1 gene (exons 1–32) is duplicated six times on chromosome 16 within six pseudogenes (PKD1P1-P6).14,15 The PKD1P1-P6 pseudogenes share a 97.7% sequence identity with the genuine PKD1, although they carry some large deletions compared with the genuine PKD1.15,16 Rare sequence divergences have been used to develop PKD1 locus-specific amplicons to analyze the duplicated portion of the gene for mutations.17 The PKD1 genomic complexity and the high allelic heterogeneity of both PKD1 and PKD2 make molecular diagnostics challenging.3
High-throughput next-generation DNA sequencing (NGS) technologies have recently been developed, the common feature of which is the utilization of massive parallel sequencing of DNA strands after random fragmentation to produce millions of reads. These are subsequently re-aligned for sequence variant calling.18–20 The feasibility of utilizing NGS for limited genomic regions has arisen through multiplexing by the introduction of bar codes, unique 6-bp tags, which allow the individual identification of samples analyzed within the same lane.21 Bar coding and multiplexing of PCR and long-range PCR (LR-PCR) amplicons from groups of patients have been effectively used to characterize genomic regions up to approximately 150 kb.22–30 Exon enrichment or capture protocols have also been developed for the analysis of specific genomic intervals or the whole exome.31,32 However, these are not effective in duplicated genomic regions (e.g., the PKD1 gene) because they would lead to concurrent capture of the six pseudogenes.
In this study, we utilized pooling and multiplexing of samples to validate NGS for the mutation analysis of the ADPKD genes in a cohort of 230 ADPKD patients. These results show the feasibility of high-throughput NGS for the genetic characterization of large ADPKD cohorts. Furthermore, the utilization of fewer PCR primers and the possibility of characterizing the entire genomic structure of the PKD1 and PKD2 genes will help in detecting and characterizing atypical mutations (deep intronic variants, GCs, and ones missed due to allele dropout).
Results
Development of LR-PCR Amplicons for the PKD1 and PKD2 Genes and Proof of Principle Experiment for Pooling and Multiplexing Samples for NGS
Because of the duplication of the PKD1 gene (which already requires LR-PCR amplicons for locus-specific amplification17) and the limited genomic size of both genes combined (118 kb), we extended the number of LR-PCR amplicons to cover all of the coding regions of both genes tested (76.2 kb; PKD1-eight amplicons; PKD2-six amplicons) (Figures 1A and 2 and Supplemental Table 1).
A proof of principle experiment (Figure 3A) was performed employing one Illumina flow cell using 16 previously Sanger-characterized ADPKD patients (carrying 281 known sequence variants) and four novel cases. We pooled two to eight samples in a single bar-coded library (lanes 1–4) to test the maximum number of samples that can be pooled in a single bar-coded library while still detecting all of the positive controls. A second part of this experiment analyzed multiplexing of two to four bar-coded libraries of four pooled samples each (up to 16 samples, lanes 5–7), to evaluate the relationship between read depth and detection of positive controls after multiplexing (Figure 3A). One single bar-coded library of four unknown samples was run to mimic the planned mutation discovery workflow (lane 8).
The 281 Sanger-verified control variants allowed a detailed analysis of read depth (number of reads per known variant), coverage (percentage of the regions of interest adequately covered), sensitivity (proportion of true positives), and precision rate in the exonic regions (proportion of correctly identified mutations) for each of the bar-coded libraries (Table 1). Very high read depth was obtained for all control variants, and although variation of up to 18-fold in read depth was found within the same library (pool of four samples, 397× to 72,497×), all regions of interest were adequately covered (Figure 2). Single nucleotide variants were efficiently detected, but two PKD1 deletions (38 and 15 bp in length, respectively) were missed due to the short, 51-bp reads used, lowering the overall sensitivity. For the pooling test, a loss of sensitivity was observed when pooling eight samples (four false negatives and 12 false positives), whereas the pool of four and six samples performed similarly well (Table 1). Multiplexing at this level did not affect sensitivity or precision rate significantly.
Table 1.
Strategy | Control Variantsa | Average Read Depthb | SD (±) | Highest Read Depthb | Lowest Read Depthb | Sensitivity (%)c | Precision Rate in the Exonic Regions (%)d |
---|---|---|---|---|---|---|---|
Pool of 2 (lane 1) | 33 | 14,168 | 9672 | 41,065 | 639 | 97e | 100 |
Pool of 4 (lane 2) | 48 | 19,888 | 11,998 | 72,497 | 397 | 100 | 87 |
Pool of 6 (lane 3) | 40 | 17,702 | 10,501 | 56,149 | 1229 | 95e | 91 |
Pool of 8 (lane 4) | 52 | 27,932 | 18,537 | 92,367 | 2945 | 92 | 76 |
Multiplex of 2×4 (red, lane 5) | 42 | 8209 | 5340 | 24,863 | 315 | 98e | 100 |
Multiplex of 3×4 (red, lane 6) | 42 | 6862 | 4319 | 21,452 | 226 | 98e | 93 |
Multiplex of 4×4 (red, lane 7) | 42 | 2162 | 1325 | 7869 | 129 | 98e | 90 |
Multiplex of 2×4 (green, lane 5) | 40 | 18,943 | 12,178 | 60,278 | 484 | 97.5e | 96 |
Multiplex of 3×4 (green, lane 6) | 40 | 13,986 | 8956 | 43,897 | 535 | 97.5e | 92 |
Multiplex of 4×4 (green, lane 7) | 40 | 9743 | 6104 | 31,335 | 240 | 97.5e | 83 |
Multiplex of 3×4 (yellow, lane 6) | 36 | 6224 | 3175 | 15,934 | 651 | 97e | 86 |
Multiplex of 4×4 (yellow, lane 7) | 36 | 4214 | 2228 | 11,256 | 372 | 97e | 96 |
Multiplex of 4×4 (purple, lane 7) | 46 | 4414 | 2900 | 15,575 | 520 | 100 | 100 |
TP, true positive; TN, true negative; FP, false positive; FN, false negative.
Indicates the number of Sanger-verified control variants (unique and common polymorphisms) available for the corresponding pool.
Read depth is reported as the number of paired-end re-aligned reads per nucleotide site for the control variants (as the average, the highest and the lowest).
Sensitivity was calculated as the number of true positive mutations/number of true positive plus number of false negative mutations (TP/TP+FN).
Precision rate was limited to the exonic regions (in which complete Sanger data were available) and was calculated as number of true positive mutations/number of true positives plus number of false positive mutations (TP/TP+FP). The same mining protocol was used to calculate both the sensitivity and the precision rate.
The two missed variants were two deletions of 38 and 15 bp, which were not detected (one or both, depending on the pool) with the 51-bp long reads used in this proof of principle experiment.
Taken together, this proof of principle experiment suggested the following: a conservative approach of pooling four samples per bar-coded library was feasible, at least 12 such libraries could be run per lane with an expected read depth of approximately 100×, and longer reads were required for the detection of indels longer than 15 bp.
Mutation Analyses of a Large ADPKD Cohort by Bar-Coded and Multiplexed NGS
On the basis of the proof of principle experiment, we utilized the previously developed amplicons to characterize a cohort of 264 ADPKD samples (230 novel and 34 internal controls) as 66 libraries (Figures 3B and 4) using 101-bp reads. The 230 novel samples spanned the entire phenotypic spectrum of ADPKD and included severe and mild PKD cases, to mimic a “real-world” diagnostic setting. Patient samples were amplified separately and amplicons pooled equimolarly for libraries 1–66, whereas the DNA was pooled before amplification for libraries 67–74 (Figure 3B).
Data mining identified 2445 variants in the 230 novel cases. After quality filtering, 779 high-confidence variants were retained and 1666 low-confidence variants were removed (Concise Methods and Figure 4). The remaining 779 high-confidence variants were further filtered based on the likelihood of disease association (Figure 4). We individually confirmed by re-amplifying each of the four samples originally pooled in the same library and Sanger sequencing the 176 possible pathogenic variants and 58 likely neutral variants (Figure 4, Table 2, and Supplemental Tables 2 and 3). Of the 234 Sanger-verified variants, 213 (90%) were true positives and 21 (10%) false positives. Manual inspection of the 21 false positive variants revealed that they were due to misalignment of reads during data mining, low quality coverage or, for PKD1, residual contamination from the pseudogenes (Table 3).
Table 2.
Pedigree Clinical Features | Ped IDa | Pt ID | Gene | Exon | Codon | cDNA Change | Protein Change | PKDBb | Classificationc |
---|---|---|---|---|---|---|---|---|---|
Pedigrees with clinical diagnosis compatible with standard clinical and imaging criteria in which DP, HLP, or LP genotypes were found (n=115) | DP (PKD1=49; PKD2=17) | ||||||||
M190 | R1361 | PKD1 | 29 | 3298 | c.9894G>A | p.Trp3298X | Novel | DP | |
M201 | R1340 | PKD1 | 46 | 4202 | c.12604_12631delGGCCGGCTGGGGACAAGGTGTGAGCCTG | p.Gly4202fs146X | Novel | DP | |
M254 | R1425 | PKD1 | 15 | 1672 | c.5014_5015delAG | p.Arg1672fs97X | 23× | DP | |
M327 | R1632 | PKD1 | 44 | 4011 | c.12031C>T | p.Gln4011X | 4× | DP | |
M374 | R1557 | PKD1 | 5 | 266 | c.796C>T | p.Gln266X | 1× | DP | |
M375 | R1582 | PKD1 | 40 | 3793 | c.11379delG | p.Gly3793fs31X | 6× | DP | |
M388 | R1749 | PKD1 | 38 | 3697 | c.11090_11091insA | p.His3695fs25X | Novel | DP | |
M447 | R1918 | PKD1 | 15 | 2142 | c.6424C>T | p.Gln2142X | Novel | DP | |
M455 | R1941 | PKD1 | 15 | 1672 | c.5014_5015delAG | p.Arg1672fs97X | 23× | DP | |
M538 | R1804 | PKD1 | 36 | 3603 | c.10808G>A | p.Trp3603X | 1× | DP | |
PKD1 | 23 | 2765 | c.8293C>T | p.Arg2765Cys | 9× | Hyp | |||
M541 | R1450 | PKD1 | 3 | 121 | c.360+2T>C | p.Ile120fs | Novel | DP | |
M573 | R1554 | PKD1 | 14 | 1072 | c.3215_3216insA | p.Asn1072fs28X | Novel | DP | |
M575 | R1572 | PKD1 | IVS17 | 2403 | c.7209+4_+7delAGTG | p.Val2356_Gly2403del | 1× | DP | |
M576 | R1588 | PKD1 | 42 | 3852 | c.11554delC | p.Leu3852fs82X | Novel | DP | |
M578 | R1604 | PKD1 | 22 | 2683 | c.8049_8050insGCCGCTCGTGC | p.Cys2683fs4X | Novel | DP | |
M580 | R1672 | PKD1 | 15 | 1672 | c.5014_5015delAG | p.Arg1672fs97X | 23× | DP | |
M582 | R1717 | PKD1 | 14 | 1091 | c.3272_3289delTCATGCACACCTACGCTG | p.Val1091_Ala1096delVal-Met-His-Thr-Tyr-Ala | 1× | DP | |
M586 | R1802 | PKD1 | 35 | 3530 | c.10588C>T | p.Gln3530X | 2× | DP | |
M588 | R1835 | PKD1 | 15 | 1482 | c.4444C>T | p.Gln1482X | Novel | DP | |
PKD1 | 27 | 3183 | c.9548G>A | p.Arg3183Gln | Novel | I | |||
M591 | R1866 | PKD1 | 21 | 2643 | c.7927delC | p.Arg2643fs10X | Novel | DP | |
M600 | R1926 | PKD1 | 20 | 2602 | c.7804C>T | p.Gln2602X | Novel | DP | |
PKD1 | 23 | 2765 | c.8293C>T | p.Arg2765Cys | 9× | Hyp | |||
M601 | R1939 | PKD1 | 11 | 832 | c.2494_2495insC | p.Arg832fs39X | 1× | DP | |
P19 | OX162 | PKD1 | 11 | 874 | c.2619_2620insC | p.Cys874fs30X | Novel | DP | |
PKD1 | 11 | 873 | c.2618T>C | p.Val873Ala | Novel | LN | |||
P21 | OX170 | PKD1 | 13 | 1020 | c.3058C>T | p.Gln1020X | 1× | DP | |
P45 | OX284 | PKD1 | IVS28 | 3238 | c.9712+1G>T | p.Ser3238fs | Novel | DP | |
P94 | OX873 | PKD1 | 23 | 2810 | c.8428G>T | p.Glu2810X | 3× | DP | |
P96 | OX940 | PKD1 | 21 | 2637 | c.7909C>T | p.Gln2637X | Novel | DP | |
P104 | OX1009 | PKD1 | 36 | 3582 | c.10745_10746insC | p.Pro3582fs44X | 1× | DP | |
P209 | OX20 | PKD1 | 15 | 1672 | c.5014_5015delAG | p.Arg1672fs97X | 23× | DP | |
P218 | OX1403 | PKD1 | 40 | 3779 | c.11337_11338ins AGGAGGCTTCAGCACCAGC | p.Ala3779fs41X | Novel | DP | |
P219 | OX1394 | PKD1 | 11 | 886 | c.2657_2658insACCTTCGTGCCCGGCTGCCC | p.Trp886fs18X | Novel | DP | |
P222 | OX1002 | PKD1 | 15 | 1457 | c.4369_4370delTC | p.Ser1457fs64X | 1× | DP | |
P223 | OX1051 | PKD1 | 15 | 1960 | c.5878C>T | p.Gln1960X | 1× | DP | |
P226 | OX1418 | PKD1 | 21 | 2650 | c.7948_7949delCT | p.Leu2650fs9X | 1× | DP | |
P231 | OX969 | PKD1 | 26 | 3082 | c.9240_9241delAT | p.Ala3082fs95X | 2× | DP | |
P238 | OX27 | PKD1 | 18 | 2430 | c.7288C>T | p.Gln2430X | 8× | DP | |
PKD1 | 18 | 2434 | c.7300C>T | p.Arg2434Trp | Novel | LP | |||
PKD1 | 10 | 696 | c.2086G>A | p.Ala696Thr | Novel | LN | |||
P245 | OX1948 | PKD1 | 10 | 696 | c.2085_2086insC | p.Ala696fs17X | 8× | DP | |
P246 | OX991 | PKD1 | 15 | 1633 | c.4897_4898insT | p.Glu1633fs23X | 1× | DP | |
P272 | OX19 | PKD1 | 3 | 117 | c.348_352delTTTAA | p.Asn116fs1X | 1× | DP | |
PKD1 | 23 | 2765 | c.8293C>T | p.Arg2765Cys | 9× | Hyp | |||
P286 | OX1123 | PKD1 | 32 | 3406 | c.10216_IVS32+20delAAGAGGTGGGTTCCCTAGAGAAACC | p.Lys3406fs | Novel | DP | |
P287 | OX1555 | PKD1 | 43 | 3962 | c.11885delA | p.Ala3962fs21X | Novel | DP | |
P299 | OX1650 | PKD1 | 15 | 1117 | c.3349C>T | p.Gln1117X | 2× | DP | |
P369 | OX1768 | PKD1 | 24 | 2948 | c.8843C>A | p.Ser2948X | Novel | DP | |
P437 | OX2130 | PKD1 | 36 | 3586 | c.10756_10757delGT | p.Val3586fs39X | Novel | DP | |
P998 | OX1239 | PKD1 | 15 | 1621 | c.4861C>T | p.Gln1621X | Novel | DP | |
P1001 | OX1505 | PKD1 | 4 | 162 | c.485delC | p.Ala162fs127X | Novel | DP | |
P1008 | OX1951 | PKD1 | 15 | 1672 | c.5014_5015delAG | p.Arg1672fs97X | 23× | DP | |
P1009 | OX1955 | PKD1 | 15 | 2154 | c.6461_6462insCCTGCCGGGAGCCGGAGGTGGACGTGGT | p.Val2154fs29X | Novel | DP | |
P1012 | OX1246 | PKD1 | 18 | 2444 | c.7330_7331insGAT | p.Tyr2444X | Novel | DP | |
M365 | R1551 | PKD2 | 14 | 845 | c.2533C>T | p.Arg845X | 2× | DP | |
M509 | R2020 | PKD2 | IVS5 | 439 | c.1319+1G>A | p.Arg439fs | 9× | DP | |
PKD1 | 18 | 2442 | c.7324G>C | p.Glu2442Gln | Novel | I | |||
M512 | R1838 | PKD2 | 5 | 417 | c.1249C>T | p.Arg417X | 9× | DP | |
M567 | R1401 | PKD2 | 1 | 68 | c.203delC | p.Pro68fs48X | Novel | DP | |
M571 | R1487 | PKD2 | IVS4 | 365 | c.1094+3_+6delAAGT | p.Ala365fs | 7× | DP | |
M577 | R1601 | PKD2 | 14 | 872 | c.2614C>T | p.Arg872X | 22× | DP | |
M583 | R1734 | PKD2 | 4 | 292 | c.876C>G | p.Tyr292X | Novel | DP | |
PKD2 | 4 | 314 | c.940C>G | p.Leu314Val | Novel | LN | |||
M596 | R1915 | PKD2 | 11 | 728 | c.2182_2183delAG | p.Ser728fs10X | Novel | DP | |
P203 | OX1358 | PKD2 | 1 | 180 | c.538_539insC | p.Leu180fs32X | 3× | DP | |
P237 | OX986 | PKD2 | 14 | 872 | c.2614C>T | p.Arg872X | 22× | DP | |
P288 | OX1558 | PKD2 | 11 | 742 | c.2224C>T | p.Arg742X | 7× | DP | |
P300 | OX1614 | PKD2 | 3 | 270 | c.810delT | p.Phe270fs46X | Novel | DP | |
P378 | OX1838 | PKD2 | 4 | 306 | c.916C>T | p.Arg306X | 19× | DP | |
PKD1 | 6 | 404 | c.1211C>G | p.Pro404Arg | 1× | I | |||
PKD1 | 15 | 2001 | c.6001C>T | p.Arg2001Trp | Novel | I | |||
P527 | OX1432 | PKD2 | IVS4 | 365 | c.1094+3_+6delAAGT | p.Ala365fs | 7× | DP | |
P999 | OX1244 | PKD2 | IVS5 | 439 | c.1319+1G>T | p.Arg439fs | 9× | DP | |
P1004 | OX1774 | PKD2 | 4 | 320 | c.958C>T | p.Arg320X | 9× | DP | |
PKD1 | 10 | 641 | c.1922T>C | p.Met641Thr | Novel | LN | |||
P1006 | OX1844 | PKD2 | 13 | 836 | c.2508C>G | p.Tyr836X | Novel | DP | |
HLP (PKD1= 32; PKD2= 3) | |||||||||
M80 | R167 | PKD1 | 27 | 3168 | c.9504C>G | p.Phe3168Leu | 3× | HLP | |
M243 | R1364 | PKD1 | 15 | 1503 | c.4507G>A | p.Gly1503Arg | 2× | HLP | |
M351 | R1662, R1663 | PKD1 | 33 | 3415 | c.10243G>A | p.Glu3415Lys | Novel | HLP | |
M378 | R1712 | PKD1 | 5 | 381 | c.1141G>A | p.Gly381Ser | 5× | HLP | |
M386 | R1738 | PKD1 | 46 | 4150 | c.12448C>T | p.Arg4150Cys | 4× | HLP | |
M412 | R1380 | PKD1 | 26 | 3081 | c.9241T>C | p.Cys3081Arg | Novel | HLP | |
M448 | R1909, R1910 | PKD1 | 17 | 2373 | c.7118G>A | p.Cys2373Tyr | 1× | HLP | |
M461 | R1962 | PKD1 | 4 | 125 | c.373A>G | p.Asn125Asp | Novel | HLP | |
M543 | R2004 | PKD1 | 28 | 3233 | c.9698A>T | p.Glu3233Val | Novel | HLP | |
PKD1 | 28 | 3232 | c.9694A>G | p.Lys3232Glu | Novel | LN | |||
M581 | R1716 | PKD1 | 29 | 3263 | c.9787T>C | p.Trp3263Arg | Novel | HLP | |
M597 | R1917 | PKD1 | 5 | 381 | c.1141G>A | p.Gly381Ser | 5× | HLP | |
M598 | R1920 | PKD1 | 6 | 420 | c.1259A>C | p.Tyr420Se | Novel | HLP | |
PKD1 | 15 | 2082 | c.2644G>A | p.Ala2082Thr | Novel | LN | |||
M602 | R1944 | PKD1 | 26 | 3130 | c.9388_9393delCGGGGC | p.Arg3130_Gly3131del | Novel | HLP | |
P20 | OX165 | PKD1 | 11 | 845 | c.2534T>C | p.Leu845Ser | 6× | HLP | |
P51 | OX341 | PKD1 | 15 | 1699 | c.5096C>A | p.Ala1699Asp | Novel | HLP | |
P87 | OX860 | PKD1 | 11 | 727 | c.2180T>C | p.Leu727Pro | 7× | HLP | |
P93 | OX937 | PKD1 | 40 | 3782 | c.11345_11346insTTACGA | p.Asp3782_Val3783insTyrAsp | Novel | HLP | |
M589 | R1841 | PKD1 | 5 | 210 | c.629G>A | p.Cys210Tyr | Novel | HLP | |
P213 | OX1396 | PKD1 | 11 | 727 | c.2180T>C | p.Leu727Pro | 7× | HLP | |
P214 | OX1038 | PKD1 | 39 | 3753 | c.11258G>T | p.Arg3753Leu | Novel | HLP | |
P230 | OX824 | PKD1 | 39 | 3751 | c.11252A>G | p.Gln3751Arg | 1× | HLP | |
P281 | OX841 | PKD1 | 7 | 508 | c.1522T>C | p.Cys508Arg | 4× | HLP | |
P305 | OX1631 | PKD1 | 39 | 3751 | c.11252A>G | p.Gln3751Arg | 1× | HLP | |
P309 | OX1947 | PKD1 | 15 | 1999 | c.5995G>A | p.Gly1999Ser | 2× | HLP | |
P312 | OX871 | PKD1 | 46 | 4150 | c.12448C>T | p.Arg4150Cys | 4× | HLP | |
P416 | OX2062 | PKD1 | 11 | 796 | c.2387G>A | p.Tyr796Cys | Novel | HLP | |
P435 | OX2098 | PKD1 | 3 | 101 | c.303_305delCAA | p.N101del | 4× | HLP | |
PKD1 | 15 | 1332 | c.3994G>A | p.Asp1332Asn | Novel | LN | |||
P526 | OX1704 | PKD1 | 15 | 1328 | c.3982T>C | p.Trp1328Arg | Novel | HLP | |
P576 | OX2027 | PKD1 | 15 | 1375 | c.4124G>A | p.Cys1375Tyr | Novel | HLP | |
PKD1 | 23 | 2872 | c.8615T>G | p.Ile2872Ser | Novel | HLP | |||
PKD1 | 15 | 1292 | c.3876C>A | p.Phe1292Leu | Novel | I | |||
P601 | OX2404 | PKD1 | 19 | 2530 | c.7589G>A | p.Hly2530Asp | Novel | HLP | |
PKD1 | IV15 | 1254 | c.3761C>T | p.Ser1254Leu | Novel | LN | |||
P1002 | OX1509 | PKD1 | 23 | 2771 | c.8311G>A | p.Gly2771Lys | 18× | HLP | |
P1010 | OX2057 | PKD1 | 11 | 727 | c.2180T>C | p.Leu727Pro | 7× | HLP | |
M471 | R2034 | PKD2 | 6 | 448 | c.1343C>A | p.Thr448Lys | Novel | HLP | |
PKD1 | 7 | 471 | c.1412C>T | p.Ser471Leu | Novel | LP | |||
M604 | R1986 | PKD2 | 14 | 886 | c.2657A>G | p.Asp886Gly | 1× | HLP | |
PKD1 | 23 | 2779 | c.8335G>A | p.Glu2779Lys | 2× | I | |||
PKD1 | 23 | 2822 | c.8464G>A | p.Val2822Met | 1× | I | |||
M605 | R2061 | PKD2 | 4 | 322 | c.965G>A | p.Arg322Gln | 3× | HLP | |
PKD1 | 43 | 3906 | c.11717G>T | p.Cys3906Phe | Novel | LN | |||
LP (PKD1=14; PKD2=0) | |||||||||
M127 | R1498 | PKD1 | 5 | 271 | c.812C>A | p.Ala271Asp | 2× | LP | |
M152 | R1432 | PKD1 | 27 | 3187 | c.9561C>A | p.Asp3187Glu | Novel | LP | |
M469 | R2041 | PKD1 | 27 | 3178 | c.9533G>T | p.Ser3178Ile | Novel | LP | |
M540 | R1558 | PKD1 | 18 | 2423 | c.7268C>T | p.Ser2423Phe | 2× | LP | |
M566 | R1040 | PKD1 | 23 | 2816 | c.8447T>A | p.Leu2816Gln | Novel | LP | |
M572 | R1519 | PKD1 | 18 | 2467 | c.7400C>T | p.Pro2467Leu | Novel | LP | |
M574 | R1563 | PKD1 | 6 | 460 | c.1379T>C | p.Val460Ala | Novel | LP | |
M587 | R1832 | PKD1 | 18 | 2423 | c.7268C>T | p.Ser2423Phe | 2× | LP | |
M592 | R1881 | PKD1 | 15 | 1914 | c.5741G>C | p.Gly1914Ala | Novel | LP | |
M595 | R1899 | PKD1 | 18 | 2434 | c.7301G>A | p.Arg2434Gln | Novel | LP | |
PKD1 | 23 | 2865 | c.8593C>T | p.Arg2865Trp | Novel | I | |||
M599 | R1924 | PKD1 | 20 | 2612 | c.7835C>G | p.Ser2612Trp | Novel | LP | |
P308 | OX1656 | PKD1 | 15 | 1278 | c.3834C>A | p.Ser1278Arg | Novel | LP | |
PKD1 | 18 | 2417 | c.7250T>A | p.Leu2417Gln | Novel | LP | |||
P529 | OX832 | PKD1 | 15 | 1278 | c.3834C>A | p.Ser1278Arg | Novel | LP | |
P1003 | OX1701 | PKD1 | 15 | 1610 | c.4828_4830delATC | p.Ile1610del | 9× | LP | |
Pedigrees with clinical diagnosis compatible with standard clinical and imaging criteria in which I, Hyp, or novel LN genotypes were found (n=7) | M237 | R1413 | PKD1 | 5 | 276 | c.827C>T | p.Thr276Ile | Novel | I |
PKD1 | 23 | 2765 | c.8293C>T | p.Arg2765Cys | 9× | Hyp | |||
M267 | R181 | PKD1 | 23 | 2742 | c.8293C>T | p.Glu2742Lys | 2× | Hyp | |
PKD1 | 23 | 2765 | c.8293C>T | p.Arg2765Cys | 9× | Hyp | |||
M281 | R1344 | PKD1 | 37 | 3654 | c.10960C>G | p.Leu3654Val | 2× | I | |
M283 | R1443 | PKD1 | 46 | 4288 | c.12862A>G | p.Ser4288Gly | 3× | I | |
M590 | R1852 | PKD1 | IVS1 | 2497 | c.7210-10C>A | N/A | 1× | I | |
PKD1 | 77 | 532 | c.1594C>G | p.Leu532Val | Novel | LN | |||
P161 | OX1211 | PKD1 | 10 | 658 | c.1972G>A | p.Ala658Thr | Novel | LN | |
M568 | R1440 | PKD2 | 6 | 482 | c.1445T>G | p.Phe482Cys | 3× | Hyp | |
Pedigrees with clinical diagnosis noncompatible with standard clinical and imaging criteria in which I or novel LN genotypes were found (n=4) | M414 | R1842 | PKD1 | 11 | 927 | c.2780C>T | p.Thr927Met | Novel | I |
M593 | R1886 | PKD1 | 16 | 2327 | c.6969C>T | p.Arg2327Trp | 3× | I | |
M594 | R1890 | PKD1 | 29 | 3240 | c.9718G>A | p.Ala3240Thr | Novel | LN | |
P57 | OX405 | PKD1 | 15 | 1878 | c.5633C>T | p.Thr1878Met | Novel | LN | |
Pedigrees from a group of 28 previous Sanger sequencing mutation negative that were included for re-analysis (and considered as novel) in this experiment, in which unique deep intronic variants predicted to affect splicing were detected | 100006 | 112396 | PKD1 | IVS1 | 73 | c.216-1198T>G | N/A | Novel | LN |
244111 | 244111 | PKD2 | IVS4 | 365 | c.1094+507G>A | N/A | Novel | LN |
DP, definitely pathogenic; HLP, highly likely pathogenic; LP, likely pathogenic; I, indeterminate; Hyp, likely hypomorphic; LN, likely neutral.
Pedigrees are ordered in descending order, for PKD1 and PKD2, respectively; pedigrees with multiple variants are classified based on the most pathogenic, which is shown on top.
Indicates previous description in the PKD Mutation Database (http://pkdb.mayo.edu). The “×” indicates number of pedigrees.
Variants were classified as DP, HLP, LP, I, Hyp, and LN (see Figure 4).
Table 3.
Gene | Exon/IVS | Codon | Sequence Variant (Amino Acid Change) | Times Detected | Comment |
---|---|---|---|---|---|
PKD1 | 1 | 65 | c.194T>C (p.Ile65Thr) | 1 | Just before homopolymer (TCCCC), likely misalignment |
PKD1 | 5 | 283 | c.856_862delTCTGGCC | 1 | (TCTGGCC) repeated twice, likely misalignment |
PKD1 | 13 | 1023 | c.3068A>G (p.Gln1023Arg) | 4 | PKD1P1-P2 contamination |
PKD1 | 15 | 2236 | c.6706T>C (p.Phe2236Leu) | 1 | Low quality coverage |
PKD1 | 15 | 2238 | c.6713A>C (p.Asp2238Ala) | 1 | Homopolymer of 4 G nucleotides followed by (AC) repeated twice, likely misalignment |
PKD1 | 23 | 2872 | c.8614delA | 2 | Palindrome separated by dinucleotide (GCCATCACCG), likely misalignment |
PKD1 | 25 | 3023 | c.9067A>G (p.Met3023Val) | 1 | Just before a GT motif, likely misalignment |
PKD1 | 33 | 3454 | c.10360delT | 1 | (TC) dinucleotide repeated twice, likely misalignment |
PKD1 | 33 | 3456 | c.10368_10369insCTC | 1 | Within a palindrome (GCCAGCC) after a TC motif, likely misalignment |
PKD1 | 40 | 3780 | c.11335_11336insGCGATT | 1 | Same sequence in wild-type, likely misalignment |
PKD1 | 41 | 3845 | c.11537+2T>GT | 1 | Low quality coverage |
PKD1 | 41 | 3864 | c.11591A>C (p.His3864Pro) | 1 | Within palindrome (GCACG), likely misalignment |
PKD2 | 11 | 720 | c.2159delA | 4 | Homopolymer of 8 A nucleotides, likely misalignment |
PKD2 | 14 | 843 | c.2527delG | 1 | Homopolymer of 2 G nucleotides, likely misalignment |
Bold indicates the site where the corresponding change occurs for single nucleotide or single deletion changes.
For the group of nonpathogenic variants that were not checked by Sanger sequencing (Supplemental Tables 2 and 3), data from the PKDB and/or “the NCBI Database of Single Nucleotide Polymorphisms (dbSNP)” high NextGENe score/read depth or detection in at least two different libraries (high-confidence variants) provided evidence that they were real variants.
The 155 possible pathogenic variants were classified either as definitely pathogenic or as variants of unknown clinical significance (VUCS). VUCS were further classified as highly likely pathogenic, likely pathogenic, indeterminate, likely hypomorphic, or likely neutral (see Concise Methods, Figure 4, and Table 2). Interestingly, these 155 true positive variants included deletions and insertions of up to approximately one-third of the total read length used in this experiment (Figure 5, A and B and Table 2).
Because of the pronounced phenotypic heterogeneity of the study cohort, we focused on the subset of 183 ADPKD probands with a diagnosis compatible with standard clinical and imaging criteria to fairly evaluate the overall detection rate of this experiment.9,11 In this subset of samples, a detection rate of 115 of 183 (63%) was achieved (66 probands with definitively pathogenic variants, 35 with highly likely pathogenic variants, and 14 with likely pathogenic variants) (Figure 4 and Table 2). In the remaining 68 pedigrees, only indeterminate, likely hypomorphic, novel likely neutral variants (in 7 pedigrees) (Table 2), or synonymous and known polymorphisms (in 61 pedigrees, not shown) were found.
To further evaluate the sensitivity, specificity, and accuracy of this experiment, we utilized 34 samples that were concurrently characterized by Sanger sequencing during the timeframe of this experiment. In addition we resequenced 14 of the 68 unresolved pedigrees by Sanger sequencing, for a total of 48 samples for which both Sanger sequencing and NGS data were available (Tables 4 and 5). This comparison with the available Sanger data suggested 78% sensitivity, 100% specificity, and 60% accuracy in this experiment (Table 5). Manual inspection of the NGS alignment for the missing variants in the nine false negative pedigrees revealed that they were filtered out because of lack of coverage or too stringent threshold criteria used during data mining (Figure 5, C and D, respectively, and Table 4).
Table 4.
Variant Description | Ped ID | Pt ID | Gene | Exon/IVS | Codon | cDNA change | Protein | Comment |
---|---|---|---|---|---|---|---|---|
Variants detected by Sanger sequencing but not by NGS (n=10) | M368 | R1700 | PKD1 | 1 | 37 | c.108_109insC | p.Cys37fs76X | Lack of coverage |
M615 | R1953 | PKD1 | 11 | 845 | c.2534T>C | p.Leu845Ser | Mutant percentage below threshold | |
M499 | R2001 | PKD1 | 12 | 960 | c.2879G>A | p.Gly960Asp | Mutant percentage below threshold | |
M152 | R1432 | PKD1 | 15 | 1362 | c.4084C>T | p.Ser1362Pro | Mutant percentage below threshold | |
M118 | R95 | PKD1 | 15 | 2212 | c.6635G>A | p.Ser2212Asn | Mutant percentage below threshold | |
2215 | c.6644G>A | p.Arg2215Gln | Mutant percentage below threshold | |||||
M307 | R1573 | PKD1 | IVS20 | 2621 | c.7864-2A>G | p.2621fs | Mutant percentage below threshold | |
M307 | R1581 | PKD1 | IVS20 | 2621 | c.7864-2A>G | p.2621fs | Mutant percentage below threshold | |
P387 | OX2242 | PKD1 | 25 | 3016 | c.9047A>G | p.Gln3016Arg | Mutant percentage below threshold | |
P229 | OX1056 | PKD1 | IVS31 | 3390 | c.10170+25_+45delCTGGGGGTCCTGGGCTGGG | p.Gln3390fs | NextGENe score below threshold | |
Variants detected by NGS but missed during the original Sanger sequencing analysis (n=2) | M453 | R1930 | PKD1 | 15 | 2250 | c.6749C>T | p.Thr2250Met | Operator-caused error |
M412 | R1380 | PKD1 | 26 | 3081 | c.9241T>C | p.Cys3081Arg | Allele dropout |
Table 5.
Sanger Sequencing (as Gold Standard) | ||||
---|---|---|---|---|
+ | − | |||
NGS | + | True positive (TP) | Sanger false negative (FN) | |
28 | 1 | |||
− | NGS false negative (FN) | True negative (TN) | ||
8 | 11 |
Sanger sensitivity, 97% (36/37); specificity, 100% (11/11); accuracy 98% (47/48). NGS sensitivity, 78% (29/37); specificity, 100% (11/11); accuracy, 60% (29/48). For both Sanger sequencing and NGS, sensitivity was calculated as TP/TP+FN, specificity as TN/TN+FP, and accuracy as TP+TN/TP+FP+TN+FN.
In two cases, NGS detected novel changes previously missed: a novel, highly likely pathogenic change (PKD1/p.Cys3081Arg) (Figure 6) and a likely neutral change (PKD1/p.Thr2250Met) in two Sanger mutation-negative samples (R1380 and R1930, respectively) (Table 4). Interestingly, manual inspection of the original Sanger chromatograms revealed that they were missed due to unequal amplification (allele dropout) during the original Sanger analysis, with almost complete loss of the mutant allele (p.Cys3081Arg) (Figure 6), as well as an operator-caused mistake with a variant lost at follow-up (p.Thr2250Met). Manual inspection of the binding sites of the original Sanger sequencing primers for p.Cys3081Arg revealed no polymorphisms as a possible cause of primer instability, which was also supported by repeated Sanger sequencing analysis (Figure 6).
Comparison of Pooling of Amplicons versus Pooling of DNA Strategy
Libraries 67–74 were matched to libraries 59–66 in patient content, but genomic DNA was pooled before PCR amplification (Figure 3B). The two strategies were compared for sensitivity and precision rate by using all of the 48 novel Sanger-verified variants found after analysis of both datasets. Pooling PCR fragments showed that 46 of 48 variants were detected with one false positive, whereas pooling DNA samples showed that 39 of 48 variants were detected with 25 false positives. Hence, pooling of DNA before amplification compared with pooling of PCR amplified fragments led to a substantial loss of sensitivity (96% versus 81%) and a higher number of false positives (1 versus 25).
Identification of Atypical Variants by NGS: Deep Intronic Variants and a PKD1 Gene Conversion
By amplifying most introns for both genes, NGS allowed a detailed analysis of the pattern of intronic variation in a sizable cohort for the first time, particularly for the duplicated portion of PKD1, in which 463 high-confidence intronic variants outside the canonic splice sites were identified (Supplemental Table 3). Although 460 variants were common intronic polymorphisms, in silico splicing analysis of three that were unique predicted them as possibly affecting splicing, including patients R1852-PKD1/c.7210-10C>A (Table 2), 244111 PKD2/c.1094+507G>A, and 100006 PKD1/c.216-1198T>G (Table 2). PKD1/c.7210-10C>A was the only possibly pathogenic variant detected in patient R1852, and it is described in the PKDB as a variant of indeterminate clinical significance predicted to weaken the polypyrimidine tract (http://pkdb.mayo.edu). Both PKD2/c.1094+507G>A and PKD1/c.216-1198T>G were identified in patients that were mutation negative in a previous Sanger sequencing analysis (from a group of 28 that were included here for re-analysis by NGS as mutation-negative samples). They are novel variants and predicted to cause pseudo-exon activation by creating a new acceptor or new donor site, respectively.
Whereas RNA was not available for R1852, RNA from lymphoblastoid cell lines was analyzed for 244111 and 100006; however, no apparent splicing abnormalities were revealed in these cells.
The deep intronic sequencing obtained by NGS allowed for the first time the fine characterization of a PKD1 GC event, involving exons 28–32,33 and 47 variants were identified (12 exonic and 35 intronic) matching one of the PKD1P1-P6 pseudogenes. Careful comparison with available genomic sequence data showed that this GC likely derives from a conversion event with the PKD1P6 pseudogene (Figure 7).
Discussion
NGS has critically accelerated the discovery process in human genetics through targeted resequencing,34 whole-exome analysis,35–40 and whole-genome analysis.41,42 To maximize the number of samples that can be run per sequencing lane, targeted resequencing is preferable and more cost-effective when small to moderately sized genomic regions need to be resequenced in large populations. To aid this analysis, methods for genomic partitioning and bar-coding strategies have been developed.32 Here, we developed an original strategy that uses LR-PCR in association with sample pooling and library bar coding and we applied it to a duplicated genomic region (PKD1 exons 1–32), for which conventional genomic partitioning approaches are not suitable due to the presence of 97.7% identical duplicons.15
Pooling and bar coding were utilized as a strategy to achieve cost-efficiency so that NGS could be applied to analyze large ADPKD populations. Although still significant, the cost of NGS is constantly decreasing due to novel strategies for library preparation, decreased hardware cost, and greater sequence output per run.43 As a more cost-effective workflow is optimized by use of an automated workstation, decreased library preparation costs and introduction of more bar codes to allow up to 96 samples to be analyzed per lane,43,44 the overall NGS cost will further decrease and make NGS highly competitive compared with conventional Sanger sequencing. The much higher number of reads generated by the recently introduced Illumina HiSeq (approximately 320 million per lane, 8 times higher than the Illumina GA2× used in these experiments) and the availability of 96 individual bar codes44 will soon make it possible to sequence up to 1536 individually bar-coded samples for both the PKD1 and PKD2 genes concurrently (16 lanes on two flow cells per run), making NGS the method of choice for large population studies. However, the introduction of more scalable, faster turnaround times and simpler workflow instruments like the Illumina MiSeq will likely make NGS an attractive method also for clinical and diagnostic application, particularly when only specific genes or specific sets of genes need to be analyzed, as for ADPKD.
Pooling of DNA before amplification would further simplify the upstream work by greatly reducing the number of amplicons to generate, check, normalize, and pool.45 However, in our experiments, this did not seem ideal and led to loss of sensitivity and a higher number of false positives. This may be due to variability of DNA quality and errors in DNA concentration measurements due to innate DNA viscosity, as previously suggested.23 Whole-genome amplification before DNA pooling may be an alternative strategy to overcome this issue. Furthermore, because of the PKD1 gene duplication, a residual contamination derived from the pseudogenes was detected when mining the data at low stringency (<3% mutant level). Because of the need to filter out the contaminant reads, data mining had to be performed with a 3% mutant level as the minimum threshold, thus posing an additional challenge to pooling samples in the same library.
Detection of small indels has been an issue thus far in NGS due to the short Illumina reads. However, utilizing longer 101-bp reads and the elongation NextGENe mining protocol allowed detection of small to medium size indels up to 30% of the read length. This improvement closes the gap with Sanger sequencing, and it is particularly important in ADPKD, in which this mutation type accounts for approximately one-third of all mutations (http://pkdb.mayo.edu). Detection of larger genomic indels by NGS is possible by paired-end mapping46 and based on variations of the depth of coverage in deleted genomic regions.47,48 However, this approach was not feasible in these experiments due to the pooling of different samples within the same library and, for PKD1, the genomic complexity requiring locus-specific amplification.
In the subset of 183 ADPKD samples with a typical diagnosis according to standard clinical and imaging criteria,9,11 a maximal detection rate of 63% was achieved, including definitely pathogenic, highly likely pathogenic, and likely pathogenic variants (115 pedigrees) (Figure 4 and Table 2). The remaining unresolved 68 pedigrees (37%) may be due to the phenotypic heterogeneity of the study cohort (purposively mimicking a diagnostic cohort), missed mutations, as well as possible additional genetic heterogeneity. Considering the 19% difference in sensitivity in this experiment compared with Sanger sequencing (Table 5) (corresponding to approximately 13 missed pathogenic mutations in the 68 unresolved pedigrees), a putative detection rate of approximately 70% would be achievable, which is comparable with data recently obtained in a similar diagnostic setting by Sanger sequencing.49 Sensitivity, specificity, and accuracy will improve by individually bar coding each sample and by further development of the data mining strategies.50–52
The application of NGS in these experiments has allowed the discovery and characterization of missed and atypical variants (allele dropout, gene conversion, and deep intronic variants). Allele dropout is a well known cause of missed mutations53 due to unequal amplification of heterozygote alleles. By utilizing larger amplicons, a previously undetected mutation (p.Cys3180Arg) was identified in an otherwise mutation-negative sample. A posteriori verification of the original Sanger screening chromatograms revealed an extreme underrepresentation of the mutant allele. The apparent lack of polymorphisms in the binding sites of the Sanger primers54 suggests stochastic variability in allele amplification efficiency as the likely cause. This may be due to unexpected secondary structures of the DNA template, and may be an underestimated cause of missed mutations in ADPKD due to the GC richness of the PKD1 gene.
By deep sequencing the PKD1 introns in the case of a GC involving exons 28–32,33 an extended haplotype of exonic and intronic variants was generated, which suggested that the GC event likely occurred between PKD1 and the PKD1P6 duplicon. The application of this strategy to suspected GC events proved that they are genuine GC events, determined their genomic origin, and precisely defined the extent of the GC event, thereby emphasizing their disease association.
The inclusion of most intronic regions for both genes allowed us to explore the pattern of intronic variation and generate a catalog of 460 high-confidence intronic variants. Notably, this dataset provides an important filter for common intronic variants, which will be useful to pinpoint rare, potential splice-changing deep intronic variants.55–57 Because of the genomic duplication, these data are not available for the entire duplicated portion of PKD1 from the 1000 Genomes Project and other sequencing projects.
Lymphoblast-derived cDNA analysis of two private deep intronic variants (PKD1/c.216-1198T>G and PKD2/c.1094+507G>A) predicted to cause intron exonization in otherwise mutation-negative samples revealed no apparent splicing abnormality, although we cannot definitely exclude such event in kidney-derived epithelial cells due to tissue-specific splicing regulation.58,59
Allele dropout and unrecognized intronic variants affecting splicing may account for part of the approximately 10% mutation-negative samples in clinically well characterized cohorts,3,60 and are well suited to be detected and characterized by NGS rather than conventional Sanger sequencing.
In conclusion, we designed a strategy to analyze the PKD1 gene by NGS, a locus at which conventional genomic partitioning approaches are not possible, and may serve as an example for NGS analysis of other disease-associated segmental duplications in the human genome.6,61–63 The very high throughput of NGS makes it ideal for large-scale projects and will make genotyping of large ADPKD cohorts feasible (e.g., in association with large ADPKD clinical trials or ADPKD population studies). The NGS deep sequencing of the entire genomic structure of both the PKD1 and PKD2 genes holds the potential of discovering atypical mutations like the ones we describe here, and helps to clarify the genetic basis of the 10% of ADPKD pedigrees in which no mutation is detectable through conventional Sanger sequencing.
Concise Methods
Amplification and Pooling of LR-PCR Amplicons
LR-PCR amplicons were amplified as previously described3,64 using the primers described in Supplemental Table 1. Amplicons were checked and normalized to a control of known concentration by gel densitometry, using a Kodak Gel Logic 200 Instrument. The amplicon standards were previously amplified ones with no smearing or extra-bands and the concentration measured using a fluorescence-based method (Quant-It, Invitrogen Inc). For the purpose of amplicon equimolar assembly, variations in concentration up to three-folds were tolerated. Thirty picomoles of each amplicon were assembled for each sample, and equimolar amounts of assembled samples (2–8) were pooled to generate 2–3 µg of total material for library preparation (proof of principle and libraries 1–66 of the second experiment).
For libraries 67–74, genomic DNA of each individual sample were diluted to 10 ng/µl, gently agitated overnight, pooled in equal amounts, mixed again overnight, and then PCR amplified.
Preparation of Illumina Bar-Coded, Paired-End Libraries, Clusters Formation, and Reads Generation on the Illumina GA2X Sequencer
Input pooled DNA was fragmented by sonication using the Covaris E210 ultrasonicator (Covaris Inc), and Illumina bar-coded, paired-end libraries prepared using the NEBNext kit (New England Biolabs). Bar-coded libraries were subjected to cluster formation using the Illumina cluster station (Illumina Inc), and clusters loaded at an average density of 250,000 clusters per lane.
Low-level bar coding (<8 indexes per lane) was performed following the bar-code combinations suggested by Illumina. Clusters were sequenced with 51- or 101- bp long paired-end reads.
Data Mining, Sequence Variant Call, and Confirmation of Mutations
Reads were exported as FASTAQ files and deconvoluted by bar code, and each bar code was imported separately into the NextGENe software version 1.99 (SoftGenetics Inc) for data mining. Mining was performed using the NextGENe elongation algorithm. This polishes the reads (reducing the machine error rate well below 1%); elongates them by approximately 30% of the original length by read-to-read comparison, facilitating variants and indels calls; and preserves the paired-end information, reducing the false positives rate. NextGENe software assigns a confidence score to each variant call on a scale from 0 to 30, which was used for further variant filtering.50
Each bar code was mined at 4% mutant allele percentage–100× read depth (high stringency), 4% mutant allele percentage–100× read depth on trimmed, 76-bp long reads (medium stringency), and 3% mutant allele percentage–100× read depth (low stringency). These three mining protocols were typically linked within NextGENe for each bar code, and several bar codes linked in series, for automatic execution. Reports were exported as spreadsheets and merged to generate a consensus report. Variants were called if present in two of three of the reports (high-confidence variants). Low-confidence variants were defined as variants present in only one report, which was always the low stringency one as defined above.
All novel and likely pathogenic exonic nonsynonymous/typical splicing variants and a select group of other variants were Sanger verified by re-amplifying the four original samples contained in each library.
Exonic synonymous and intronic variants in Supplemental Table 2 that were not tested by Sanger sequencing were validated by filtering at 1000× minimum read depth and progressively higher stringency (using NextGENe scores of 27, 28, 29, and 30). To ensure high confidence in these datasets, variants were included only if present in at least two different libraries with a minimum NextGENe score of 27, unless it was previously described (PKDB or dBSNP) or Sanger verified (high-confidence variants). Similarly, high read depth and high NextGENe score (approximately 8000× and 27 minimum score) were used to validate all the intronic variants involved in the PKD1-P6 gene conversion.
Scoring of VUCS
VUCS were classified based on the scores of the ADPKD Mutation Database (http://www.pkdb.mayo.edu) for previously described variants, and as previously described for novel variants.17 Hypomorphic alleles were classified as previously described.65
Analyses of Splicing Variants
Private intronic variants were analyzed for splicing potential using the following tools: BDGP (http://www.fruitfly.org/seq_tools/splice.html),66 Spliceport (http://spliceport.cs.umd.edu/),67 GeneSplicer (http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml),68 NetGene2 (http://www.cbs.dtu.dk/services/NetGene2/),69 and HSF (http://www.umd.be/HSF/).70 cDNA analysis was performed as previously described.17
Disclosures
None.
Acknowledgments
We thank all of the patients and families who agreed to participate in this study; Mr. Bernard and Mrs. Edith Waterman for their generous support; Dr. Franklyn G. Prendergast (Mayo Center for Individualized Medicine), Dr. Eric D. Wieben (Mayo Advanced Genomics Technology Center), and Dr. Fernando Fervenza (Mayo Clinic) for their support; and Mrs. Linda Pelleymounter (Mayo Clinic) for her collaboration. We thank Dr. Jared Grantham and Dr. Arlene Chapman of the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease for kindly supplying samples 100006 and 244111, in which the two deep intronic variants described in this manuscript were found.
This work was supported by Grant R21DK083669, the Robert W. Fulk Career Development Award, a generous gift of the Bernard and Edith Waterman Charitable Foundation through the Mayo Center for Individualized Medicine, Grant R01DK58816, and by the Mayo Translational PKD Center (P30DK090728).
This work was partially presented as an abstract at the Annual Meeting of the American Society of Nephrology, November 16–21, 2010, Denver, Colorado, as well as at the Polycystic Kidney Disease: From Bench to Bedside meeting of the Federation of American Societies for Experimental Biology, June 26 to July 1, 2011, Saxtons River, Vermont.
Footnotes
Published online ahead of print. Publication date available at www.jasn.org.
This article contains supplemental material online at http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2011101032/-/DCSupplemental.
References
- 1.Dalgaard OZ: Bilateral polycystic disease of the kidneys; a follow-up of two hundred and eighty-four patients and their families. Acta Med Scand Suppl 328: 1–255, 1957 [PubMed] [Google Scholar]
- 2.Iglesias CG, Torres VE, Offord KP, Holley KE, Beard CM, Kurland LT: Epidemiology of adult polycystic kidney disease, Olmsted County, Minnesota: 1935-1980. Am J Kidney Dis 2: 630–639, 1983 [DOI] [PubMed] [Google Scholar]
- 3.Rossetti S, Consugar MB, Chapman AB, Torres VE, Guay-Woodford LM, Grantham JJ, Bennett WM, Meyers CM, Walker DL, Bae K, Zhang QJ, Thompson PA, Miller JP, Harris PC, CRISP Consortium : Comprehensive molecular diagnostics in autosomal dominant polycystic kidney disease. J Am Soc Nephrol 18: 2143–2160, 2007 [DOI] [PubMed] [Google Scholar]
- 4.Ravine D, Walker RG, Gibson RN, Forrest SM, Richards RI, Friend K, Sheffield LJ, Kincaid-Smith P, Danks DM: Phenotype and genotype heterogeneity in autosomal dominant polycystic kidney disease. Lancet 340: 1330–1333, 1992 [DOI] [PubMed] [Google Scholar]
- 5.Harris PC, Torres VE: Polycystic kidney disease. Annu Rev Med 60: 321–337, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bischof JM, Chiang AP, Scheetz TE, Stone EM, Casavant TL, Sheffield VC, Braun TA: Genome-wide identification of pseudogenes capable of disease-causing gene conversion. Hum Mutat 27: 545–552, 2006 [DOI] [PubMed] [Google Scholar]
- 7.Watnick TJ, Piontek KB, Cordal TM, Weber H, Gandolph MA, Qian F, Lens XM, Neumann HP, Germino GG: An unusual pattern of mutation in the duplicated portion of PKD1 is revealed by use of a novel strategy for mutation detection. Hum Mol Genet 6: 1473–1481, 1997 [DOI] [PubMed] [Google Scholar]
- 8.Watnick TJ, Gandolph MA, Weber H, Neumann HPH, Germino GG: Gene conversion is a likely cause of mutation in PKD1. Hum Mol Genet 7: 1239–1243, 1998 [DOI] [PubMed] [Google Scholar]
- 9.Ravine D, Gibson RN, Walker RG, Sheffield LJ, Kincaid-Smith P, Danks DM: Evaluation of ultrasonographic diagnostic criteria for autosomal dominant polycystic kidney disease 1. Lancet 343: 824–827, 1994 [DOI] [PubMed] [Google Scholar]
- 10.Nascimento AB, Mitchell DG, Zhang XM, Kamishima T, Parker L, Holland GA: Rapid MR imaging detection of renal cysts: Age-based standards. Radiology 221: 628–632, 2001 [DOI] [PubMed] [Google Scholar]
- 11.Pei Y, Obaji J, Dupuis A, Paterson AD, Magistroni R, Dicks E, Parfrey P, Cramer B, Coto E, Torra R, San Millan JL, Gibson R, Breuning M, Peters D, Ravine D: Unified criteria for ultrasonographic diagnosis of ADPKD. J Am Soc Nephrol 20: 205–212, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Harris PC, Rossetti S: Molecular diagnostics for autosomal dominant polycystic kidney disease. Nat Rev Nephrol 6: 197–206, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hogan MC, Masyuk TV, Page LJ, Kubly VJ, Bergstralh EJ, Li X, Kim B, King BF, Glockner J, Holmes DR, 3rd, Rossetti S, Harris PC, LaRusso NF, Torres VE: Randomized clinical trial of long-acting somatostatin for autosomal dominant polycystic kidney and liver disease. J Am Soc Nephrol 21: 1052–1061, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Loftus BJ, Kim UJ, Sneddon VP, Kalush F, Brandon R, Fuhrmann J, Mason T, Crosby ML, Barnstead M, Cronin L, Deslattes Mays A, Cao Y, Xu RX, Kang HL, Mitchell S, Eichler EE, Harris PC, Venter JC, Adams MD: Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q. Genomics 60: 295–308, 1999 [DOI] [PubMed] [Google Scholar]
- 15.Symmons O, Váradi A, Arányi T: How segmental duplications shape our genome: Recent evolution of ABCC6 and PKD1 Mendelian disease genes. Mol Biol Evol 25: 2601–2613, 2008 [DOI] [PubMed] [Google Scholar]
- 16.Bogdanova N, Markoff A, Gerke V, McCluskey M, Horst J, Dworniczak B: Homologues to the first gene for autosomal dominant polycystic kidney disease are pseudogenes. Genomics 74: 333–341, 2001 [DOI] [PubMed] [Google Scholar]
- 17.Rossetti S, Strmecki L, Gamble V, Burton S, Sneddon V, Peral B, Roy S, Bakkaloglu A, Komel R, Winearls CG, Harris PC: Mutation analysis of the entire PKD1 gene: Genetic and diagnostic implications. Am J Hum Genet 68: 46–63, 2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tucker T, Marra M, Friedman JM: Massively parallel sequencing: The next big thing in genetic medicine. Am J Hum Genet 85: 142–154, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9: 387–402, 2008 [DOI] [PubMed] [Google Scholar]
- 20.Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet 11: 31–46, 2010 [DOI] [PubMed] [Google Scholar]
- 21.Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ, Pawlowski TL, Laub T, Nunn G, Stephan DA, Homer N, Huentelman MJ: Identification of genetic variants using bar-coded multiplexed sequencing. Nat Methods 5: 887–893, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Margraf RL, Durtschi JD, Dames S, Pattison DC, Stephens JE, Mao R, Voelkerding KV: Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development. J Biomol Tech 21: 126–140, 2010 [PMC free article] [PubMed] [Google Scholar]
- 23.Out AA, van Minderhout IJ, Goeman JJ, Ariyurek Y, Ossowski S, Schneeberger K, Weigel D, van Galen M, Taschner PE, Tops CM, Breuning MH, van Ommen GJ, den Dunnen JT, Devilee P, Hes FJ: Deep sequencing to reveal new variants in pooled DNA samples. Hum Mutat 30: 1703–1712, 2009 [DOI] [PubMed] [Google Scholar]
- 24.Melum E, May S, Schilhabel MB, Thomsen I, Karlsen TH, Rosenstiel P, Schreiber S, Franke A: SNP discovery performance of two second-generation sequencing platforms in the NOD2 gene region. Hum Mutat 31: 875–885, 2010 [DOI] [PubMed] [Google Scholar]
- 25.Smith AM, Heisler LE, St Onge RP, Farias-Hesson E, Wallace IM, Bodeau J, Harris AN, Perry KM, Giaever G, Pourmand N, Nislow C: Highly-multiplexed barcode sequencing: An efficient method for parallel analysis of pooled samples. Nucleic Acids Res 38: e142, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nijman IJ, Mokry M, van Boxtel R, Toonen P, de Bruijn E, Cuppen E: Mutation discovery by targeted genomic enrichment of multiplexed barcoded samples. Nat Methods 7: 913–915, 2010 [DOI] [PubMed] [Google Scholar]
- 27.Kenny EM, Cormican P, Gilks WP, Gates AS, O’Dushlaine CT, Pinto C, Corvin AP, Gill M, Morris DW: Multiplex target enrichment using DNA indexing for ultra-high throughput SNP detection. DNA Res 18: 31–38, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Calvo SE, Tucker EJ, Compton AG, Kirby DM, Crawford G, Burtt NP, Rivas M, Guiducci C, Bruno DL, Goldberger OA, Redman MC, Wiltshire E, Wilson CJ, Altshuler D, Gabriel SB, Daly MJ, Thorburn DR, Mootha VK: High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nat Genet 42: 851–858, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yeager M, Xiao N, Hayes RB, Bouffard P, Desany B, Burdett L, Orr N, Matthews C, Qi L, Crenshaw A, Markovic Z, Fredrikson KM, Jacobs KB, Amundadottir L, Jarvie TP, Hunter DJ, Hoover R, Thomas G, Harkins TT, Chanock SJ: Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers. Hum Genet 124: 161–170, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Goossens D, Moens LN, Nelis E, Lenaerts AS, Glassee W, Kalbe A, Frey B, Kopal G, De Jonghe P, De Rijk P, Del-Favero J: Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing. Hum Mutat 30: 472–476, 2009 [DOI] [PubMed] [Google Scholar]
- 31.Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J, Turner DJ: Target-enrichment strategies for next-generation sequencing. Nat Methods 7: 111–118, 2010 [DOI] [PubMed] [Google Scholar]
- 32.Turner EH, Ng SB, Nickerson DA, Shendure J: Methods for genomic partitioning. Annu Rev Genomics Hum Genet 10: 263–284, 2009 [DOI] [PubMed] [Google Scholar]
- 33.Rossetti S, Chauveau D, Kubly V, Slezak JM, Saggar-Malik AK, Pei Y, Ong AC, Stewart F, Watson ML, Bergstralh EJ, Winearls CG, Torres VE, Harris PC: Association of mutation position in polycystic kidney disease 1 (PKD1) gene and development of a vascular phenotype. Lancet 361: 2196–2201, 2003 [DOI] [PubMed] [Google Scholar]
- 34.Otto EA, Hurd TW, Airik R, Chaki M, Zhou W, Stoetzel C, Patil SB, Levy S, Ghosh AK, Murga-Zamalloa CA, van Reeuwijk J, Letteboer SJ, Sang L, Giles RH, Liu Q, Coene KL, Estrada-Cuzcano A, Collin RW, McLaughlin HM, Held S, Kasanuki JM, Ramaswami G, Conte J, Lopez I, Washburn J, Macdonald J, Hu J, Yamashita Y, Maher ER, Guay-Woodford LM, Neumann HP, Obermüller N, Koenekoop RK, Bergmann C, Bei X, Lewis RA, Katsanis N, Lopes V, Williams DS, Lyons RH, Dang CV, Brito DA, Dias MB, Zhang X, Cavalcoli JD, Nürnberg G, Nürnberg P, Pierce EA, Jackson PK, Antignac C, Saunier S, Roepman R, Dollfus H, Khanna H, Hildebrandt F: Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal-renal ciliopathy. Nat Genet 42: 840–850, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Biesecker LG: Exome sequencing makes medical genomics a reality. Nat Genet 42: 13–14, 2010 [DOI] [PubMed] [Google Scholar]
- 36.Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ: Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42: 30–35, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloğlu A, Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton RP: Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci USA 106: 19096–19101, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Haack TB, Danhauser K, Haberberger B, Hoser J, Strecker V, Boehm D, Uziel G, Lamantea E, Invernizzi F, Poulton J, Rolinski B, Iuso A, Biskup S, Schmidt T, Mewes HW, Wittig I, Meitinger T, Zeviani M, Prokisch H: Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency. Nat Genet 42: 1131–1134, 2010 [DOI] [PubMed] [Google Scholar]
- 39.Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve HI, Beck AE, Tabor HK, Cooper GM, Mefford HC, Lee C, Turner EH, Smith JD, Rieder MJ, Yoshiura K, Matsumoto N, Ohta T, Niikawa N, Nickerson DA, Bamshad MJ, Shendure J: Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 42: 790–793, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Teer JK, Mullikin JC: Exome sequencing: The sweet spot before whole genomes. Hum Mol Genet 19[R2]: R145–R151, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio Deiros D, Chen DC, Nazareth L, Bainbridge M, Dinh H, Jing C, Wheeler DA, McGuire AL, Zhang F, Stankiewicz P, Halperin JJ, Yang C, Gehman C, Guo D, Irikat RK, Tom W, Fantin NJ, Muzny DM, Gibbs RA: Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med 362: 1181–1191, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, Shendure J, Drmanac R, Jorde LB, Hood L, Galas DJ: Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328: 636–639, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Garber K: Fixing the front end. Nat Biotechnol 26: 1101–1104, 2008 [DOI] [PubMed] [Google Scholar]
- 44.Kozarewa I, Turner DJ: 96-plex molecular barcoding for the Illumina Genome Analyzer. Methods Mol Biol 733: 279–298, 2011 [DOI] [PubMed] [Google Scholar]
- 45.Otto EA, Ramaswami G, Janssen S, Chaki M, Allen SJ, Zhou W, Airik R, Hurd TW, Ghosh AK, Wolf MT, Hoppe B, Neuhaus TJ, Bockenhauer D, Milford DV, Soliman NA, Antignac C, Saunier S, Johnson CA, Hildebrandt F, GPN Study Group : Mutation analysis of 18 nephronophthisis associated ciliopathy disease genes using a DNA pooling and next generation sequencing strategy. J Med Genet 48: 105–116, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders AC, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M, Snyder M: Paired-end mapping reveals extensive structural variation in the human genome. Science 318: 420–426, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Medvedev P, Stanciu M, Brudno M: Computational methods for discovering structural variation with next-generation sequencing. Nat Methods 6[Suppl]: S13–S20, 2009 [DOI] [PubMed] [Google Scholar]
- 48.Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 19: 1586–1592, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Hoefele J, Mayer K, Scholz M, Klein HG: Novel PKD1 and PKD2 mutations in autosomal dominant polycystic kidney disease (ADPKD). Nephrol Dial Transplant 26: 2181–2188, 2011 [DOI] [PubMed] [Google Scholar]
- 50.Pelleymounter LL, Moon I, Johnson JA, Laederach A, Halvorsen M, Eckloff B, Abo R, Rossetti S: A novel application of pattern recognition for accurate SNP and indel discovery from high-throughput data: Targeted resequencing of the glucocorticoid receptor co-chaperone FKBP5 in a Caucasian population. Mol Genet Metab 104: 457–469, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Bansal V: A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26: i318–i324, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, Frazer KA: Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res 20: 537–545, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Laios E, Glynou K: Allelic drop-out in the LDLR gene affects mutation detection in familial hypercholesterolemia. Clin Biochem 41: 38–40, 2008 [DOI] [PubMed] [Google Scholar]
- 54.Ward KJ, Ellard S, Yajnik CS, Frayling TM, Hattersley AT, Venigalla PN, Chandak GR: Allelic drop-out may occur with a primer binding site polymorphism for the commonly used RFLP assay for the -1131T>C polymorphism of the Apolipoprotein AV gene. Lipids Health Dis 5: 11, 2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.King K, Flinter FA, Nihalani V, Green PM: Unusual deep intronic mutations in the COL4A5 gene cause X linked Alport syndrome. Hum Genet 111: 548–554, 2002 [DOI] [PubMed] [Google Scholar]
- 56.Clendenning M, Buchanan DD, Walsh MD, Nagler B, Rosty C, Thompson B, Spurdle AB, Hopper JL, Jenkins MA, Young JP: Mutation deep within an intron of MSH2 causes Lynch syndrome. Fam Cancer 10: 297–301, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lo YF, Nozu K, Iijima K, Morishita T, Huang CC, Yang SS, Sytwu HK, Fang YW, Tseng MH, Lin SH: Recurrent deep intronic mutations in the SLC12A3 gene responsible for Gitelman’s syndrome. Clin J Am Soc Nephrol 6: 630–639, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470–476, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Venables JP: Downstream intronic splicing enhancers. FEBS Lett 581: 4127–4131, 2007 [DOI] [PubMed] [Google Scholar]
- 60.Rossetti S, Harris PC: Genotype-phenotype correlations in autosomal dominant and autosomal recessive polycystic kidney disease. J Am Soc Nephrol 18: 1374–1380, 2007 [DOI] [PubMed] [Google Scholar]
- 61.Pulkkinen L, Nakano A, Ringpfeil F, Uitto J: Identification of ABCC6 pseudogenes on human chromosome 16p: Implications for mutation detection in pseudoxanthoma elasticum. Hum Genet 109: 356–365, 2001 [DOI] [PubMed] [Google Scholar]
- 62.Lee HH, Niu DM, Lin RW, Chan P, Lin CY: Structural analysis of the chimeric CYP21P/CYP21 gene in steroid 21-hydroxylase deficiency. J Hum Genet 47: 517–522, 2002 [DOI] [PubMed] [Google Scholar]
- 63.Khurana E, Lam HY, Cheng C, Carriero N, Cayting P, Gerstein MB: Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res 38: 6997–7007, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rossetti S, Chauveau D, Walker D, Saggar-Malik A, Winearls CG, Torres VE, Harris PC: A complete mutation screen of the ADPKD genes by DHPLC. Kidney Int 61: 1588–1599, 2002 [DOI] [PubMed] [Google Scholar]
- 65.Rossetti S, Kubly VJ, Consugar MB, Hopp K, Roy S, Horsley SW, Chauveau D, Rees L, Barratt TM, van’t Hoff WG, Niaudet P, Torres VE, Harris PC: Incompletely penetrant PKD1 alleles suggest a role for gene dosage in cyst initiation in polycystic kidney disease. Kidney Int 75: 848–855, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Reese MGEF, Eeckman FH, Kulp D, Haussler D: Improved splice site detection in Genie. J Comput Biol 4: 311–323, 1997 [DOI] [PubMed] [Google Scholar]
- 67.Dogan RI, Getoor L, Wilbur WJ, Mount SM: SplicePort–an interactive splice-site analysis tool. Nucleic Acids Res 35: W285–W 291, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Pertea M, Lin X, Salzberg SL: GeneSplicer: A new computational method for splice site prediction. Nucleic Acids Res 29: 1185–1190, 2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Brunak S, Engelbrecht J, Knudsen S: Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220: 49–65, 1991 [DOI] [PubMed] [Google Scholar]
- 70.Desmet FO, Hamroun D, Lalande M, Collod-Béroud G, Claustres M, Béroud C: Human Splicing Finder: An online bioinformatics tool to predict splicing signals. Nucleic Acids Res 37: e67, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]