Abstract
Prion diseases are fatal neurodegenerative conditions that affect humans and animals. Rapid and accurate sequencing of the prion gene PRNP is paramount to human prion disease diagnosis and for animal surveillance programmes. Current methods for PRNP genotyping involve sequencing of small fragments within the protein-coding region. The contribution of variants in the non-coding regions of PRNP including large structural changes is poorly understood. Here, we used long-range PCR and Nanopore sequencing to sequence the full length of PRNP, including its regulatory region, in 25 samples from blood and brain of individuals with inherited or sporadic prion diseases. Nanopore sequencing detected the same variants as identified by Sanger sequencing, including repeat expansions/deletions. Nanopore identified additional single-nucleotide variants in the non-coding regions of PRNP, but no novel structural variants were discovered. Finally, we explored somatic mosaicism of PRNP’s octapeptide repeat region, which is a hypothetical cause of sporadic prion disease. While we found changes consistent with somatic mutations, we demonstrate that they may have been generated by the PCR. Our study illustrates the accuracy of Nanopore sequencing for rapid and field prion disease diagnosis and highlights the need for single-molecule sequencing methods for the detection of somatic mutations.
Subject terms: Medical genetics, Sequencing, Diseases of the nervous system, Genetics of the nervous system
Introduction
Prion diseases are a group of fatal mammalian neurodegenerative disorders, the most common of which in human is sporadic Creutzfeldt–Jakob disease (CJD). Prion diseases can be acquired or transmitted between humans and animals, necessitating active surveillance programmes1. The infectious agent of prion disease is composed of assemblies of misfolded forms of cellular prion protein (PrP). Mutations in the gene that encodes PrP (PRNP) are associated with inherited prion diseases, which comprise a heterogeneous range of clinical phenotypes that mimic common dementias2.
Human PRNP is composed of two exons flanking a single large intron for a total of 15 kb located on chromosome 20 (Fig. 1a)3,4. The 762-bp open reading frame is contained entirely within the second exon. Sanger sequencing of the protein-coding region of PRNP is routinely performed as part of the clinical investigation of patients suspected to have any form of prion disease2. Accordingly, the coding region has been extensively studied in prion disease cases with over 70 variants reported in patients, most of which are non-synonymous single-nucleotide variants (SNV)2,5. An important SNV in PRNP is at codon 129 (rs1799990), which encodes either methionine (M) or valine (V). Homozygosity at codon 129 is associated with an elevated risk of developing sporadic CJD and shorter survival relative to heterozygotes6, and the variant is a strong determinant of clinical phenotype7. In sporadic CJD cases homozygous at codon 129, concentration and stability of the distinct prion protein isoforms represent other determinants of disease duration8. Other non-synonymous variants protect against acquired or sporadic prion diseases such as G127V and E219K9–11.
The N-terminal region of PrP contains a repetitive sequence of amino acids termed the octapeptide repeat region (OPR). The normal OPR consists of a nonapeptide followed by four octapeptide repeats (Fig. 1b)12. Repeat length is variable in the human population ranging from deletions (OPRD, for octapeptide repeat deletion) of two repeats to insertions (OPRI, for octapeptide repeat insertion) of an extra 12 repeats. Insertion of four or more supplementary repeats is recognised as definite cause of inherited prion disease2. Instability of the OPR, which might manifest as somatic insertion and/or deletion of repeats in brain tissue, is a hypothetical mechanism of sporadic CJD13–15.
The expression level of Prnp is a strong determinant of incubation time in mouse models of prion disease16,17. In cattle, a 23-bp deletion in the promoter and a 12-bp deletion in the intron of PRNP increase an animal’s risk of developing bovine spongiform encephalopathy upon consumption of infected feedstuffs, potentially by modifying expression levels18–20. In humans, whether variants in regulatory regions of PRNP can increase risk of sporadic CJD by altering expression remains unclear. First, there has been no assessment of the non-coding regions of PRNP in CJD cases for structural variants (SVs) that might alter regulatory sites, hence the need for the present study. Second, in contrast to previous reports21,22 our genome-wide association study (GWAS) of sporadic CJD did not find evidence of an independent signal in the intron or regulatory regions of PRNP23. Regulatory variants may have been missed in our study. For example, an eQTL upstream of PRNP (rs17327121) is associated with lower expression of PRNP in the cerebellum24 but was not genotyped or imputed in the GWAS. Third, differences in risk of prion disease by sex, age, and codon 129 genotype do not appear to be explained by different levels of PRNP expression25. Fourth, duplication of the PRNP locus would be predicted to increase risk of CJD by causing overexpression, but this has only been reported once in a control individual26. Overall, more routine sequencing of the non-coding regions of PRNP in CJD patients may provide insights into regulatory variants and risk of disease.
Here we set up a long-read sequencing protocol for full-length human PRNP, including its regulatory region. We employed this protocol to sequence PRNP in 21 individuals, 19 of which had inherited or sporadic CJD. We used Oxford Nanopore MinION, a portable sequencer which offers a more complete view of genomic variation than short-read technologies as it generates long reads spanning repetitive regions and potential SVs27,28. Our aim was to establish a protocol that could potentially support (1) the rapid field sequencing of PRNP, (2) longer read lengths that are more likely to span an SV and can be more confidently aligned or assembled, and (3) high sequencing coverage to detect hypothetical somatic mutations in the OPR.
Results
Genotyping PRNP in prion disease cases using Nanopore sequencing
PRNP genotyping in prion diseases cases is routinely performed using both Sanger sequencing and gel electrophoresis of the protein-coding region29. First, we tested if Nanopore sequencing performs as well as Sanger for the amplicon that is routinely used for clinical genotyping. Genomic DNA was extracted from CJD patient blood and the PRNP protein-coding region was amplified and sequenced using both Nanopore and Sanger. We computed a consensus sequence of the Nanopore reads and aligned it to the sequence obtained by Sanger sequencing. Both sequences were 100% identical. The sample was genotyped as codon 129 heterozygous (M129V) by both Sanger and Nanopore sequencing (Supplementary Fig. S1). This pilot experiment encouraged us to pursue Nanopore sequencing as an alternative to Sanger sequencing for PRNP genotyping.
Next, we developed a protocol for amplification and sequencing of the entire PRNP genomic region. We mapped the regulatory region based on genome annotations of epigenetic marks H3K4me1, H3K4me3, H3K27ac, and transcription factor binding sites. These pointed towards an important regulatory region starting around 1.1 kb upstream of exon 1 and extending into the intron, consistent with a prior study30. Although Nanopore allows sequencing of fragments longer than 1 Mb31, we faced technical difficulties amplifying the 16.7 kb region as a single PCR product. This was likely due to a GC-rich (> 70%) region around exon 1. We therefore opted to sequence two overlapping amplicons (Fig. 1a). The smaller amplicon (2988 bp) started upstream of the regulatory region and ended 1 kb into the intron. The larger amplicon (14,025 bp) included the remaining of the intron and exon 2. The two amplicons overlapped by 316 bp.
Using this protocol, we sequenced the PRNP genomic region in 25 samples from 21 individuals. First, we sequenced one healthy control and seven patients with inherited prion disease—of which six carried an insertion and/or a deletion in the OPR (2 OPRD, 1 OPRD, 1 OPRI, 2 OPRI, 5 OPRI/1 OPRD, and 6 OPRI). Second, we sequenced both blood and brain samples from four inherited prion disease patients with OPR insertions. Third, we sequenced one healthy control and eight patients with sporadic CJD. Of the 21 individuals, nine carried an SNV in the protein-coding region detected by Sanger sequencing: one E200K, one P102L, and seven M129V (Table 1). Using Nanopore, we obtained a range of coverage of 3165–8160 × for the gene body (14,025-bp amplicon) and 133–8966 × for the regulatory region (2988-bp amplicon).
Table 1.
Ind. | Sample | Tissue | Prion disease | Sanger sequencing | Nanopore MinION | |||
---|---|---|---|---|---|---|---|---|
SVs | SNVs | SVs | SNVs (protein-coding) | Non-coding SNVs | ||||
1 | 52331 | Blood | Control | – | Not tested | – | M129V | 14 |
2 | 1906 | Blood | Inherited | 2 OPRD | – | 2 OPRD | – | 3 |
3 | 55826 | Blood | Inherited | 1 OPRD | – | 1 OPRD | – | 9 |
4 | 58398 | Blood | Inherited | – | E200K | – | E200K | 1 |
5 | 58778 | Blood | Inherited | 1 OPRI | P102L | 1 OPRI | P102L | 4 |
6 | 57749 | Blood | Inherited | 2 OPRI | M129V | 2 OPRI | M129V | 17 |
7 | 46345 | Blood | Inherited | 5 OPRI/1 OPRD | – | 5 OPRI/1 OPRD | – | 6 |
8 | 58648 | Blood | Inherited | 6 OPRI | – | 6 OPRI | – | 4 |
9 | 43706 | Brain | Inherited | 4 OPRI | – | 4 OPRI | – | 2 |
9 | 42656 | Blood | Inherited | 4 OPRI | – | 4 OPRI | – | 2 |
10 | 57265 | Brain | Inherited | 4 OPRI | M129V | 4 OPRI | M129V | 15 |
10 | 55492 | Blood | Inherited | 4 OPRI | M129V | 4 OPRI | M129V | 15 |
11 | 59060 | Brain | Inherited | 6 OPRI | – | 6 OPRI | – | – |
11 | 53747 | Blood | Inherited | 6 OPRI | – | 6 OPRI | – | – |
12 | 53689 | Brain | Inherited | 8 OPRI | M129V | 8 OPRI | M129V | 15 |
12 | 47875 | Blood | Inherited | 8 OPRI | M129V | 8 OPRI | M129V | 16 |
13 | 56635 | Blood | Control | – | M129V | – | M129V | 16 |
14 | 54493 | Blood | Sporadic | – | M129V | – | M129V | 16 |
15 | 54890 | Blood | Sporadic | – | – | – | – | – |
16 | 54917 | Blood | Sporadic | – | – | – | – | 5 |
17 | 54960 | Blood | Sporadic | – | M129V | – | M129V | 15 |
18 | 54968 | Blood | Sporadic | – | – | – | – | 1 |
19 | 55048 | Blood | Sporadic | – | – | – | – | 1 |
20 | 55050 | Blood | Sporadic | – | – | – | – | – |
21 | 55052 | Blood | Sporadic | – | M129V | – | M129V | 14 |
First, we called SNVs for all samples. After filtering calls based on allele frequency and strand bias, we detected a total of 46 different SNVs, most of which were present in more than one individual, for a total of 203 calls (Fig. 2a, Supplementary Fig. S2). All 46 SNVs were found in human genetics databases, and their respective frequency in the general population was in accordance with the number of individuals carrying each SNV in our panel (Fig. 2b). The SNV calling algorithm successfully called the E200K, P102L, and M129V variants previously identified by Sanger sequencing (Table 1). No additional SNVs were detected in the protein-coding sequence, indicating a low frequency of false positive calls. We observed that samples carrying the 129V haplotype tended to carry more SNVs in total, in line with known linkage disequilibrium between M129V and adjacent SNVs. We did not find an excess of SNVs in sporadic CJD cases (Supplementary Fig. S2).
Second, we called SVs for all samples. The variant calling algorithm correctly identified all 15 OPR mutations present in our panel including the precise size of the inserted/deleted sequence ± 2 bp (Fig. 2c, d, Table 1). We did not detect novel SVs in PRNP’s regulatory region or in the sporadic CJD samples. These results support the use of Nanopore sequencing for PRNP genotyping in patient samples.
Exploring somatic mutation of the octapeptide repeat region using Nanopore sequencing
Somatic mosaicism has been implicated in many diseases, from cancer to neurodegeneration32–35. In prion diseases, somatic mutation of PRNP’s OPR has been hypothesised as a possible cause of sporadic CJD14,15. We took advantage of the high sequencing coverage obtained with Nanopore sequencing to search for rare somatic insertions or deletions in PRNP’s OPR.
We first trimmed all reads we collected to keep only PRNP’s OPR, generating 208,554 OPR reads. Then, we labelled each read with the most likely OPR length based on the total insertion and/or deletion compared to the reference. We designed a consensus sequence for one OPR repeat and built a set of template sequences for a range of possible OPRs. Each read was then aligned to the OPR template sequence matching its OPR label, returning the number of mismatches. We defined a somatic mutation call as any read whose OPR label was not reference or the genotype of its sample and was at least 94% identical to its OPR template sequence (Supplementary Fig. S3). For example, any OPR read with a 24-bp insertion was labelled 1 OPRI, aligned to the 1 OPRI template, and was selected as a somatic mutation call if it aligned to its template sequence and did not come from a sample with a heterozygous 1 OPRI mutation.
Chimeric reads, which are single reads composed of sequences from one or more amplicons, can occur with Nanopore sequencing36. As we sequenced samples with different OPR genotypes on the same flow cell, we wanted to exclude any somatic mutation call which could be a chimeric read from another sample on the flow cell. To this end, we discarded the following reads: those not barcoded at both ends; those containing an adapter or barcode in their middle; those longer than their amplicon37. We identified a total of 129 somatic mutation calls, at a frequency per sample of 0–0.28% of reads (Fig. 3a, Supplementary Fig. S5). Reads from the forward and reverse strand were uniformly represented in the somatic mutation calls, excluding the possibility of strand bias. The number of somatic mutation calls from each sample did not correlate with its sequencing coverage, which likely excludes sequencing errors occurring at the pore (Supplementary Fig. S4)38. Based on the above checks, we concluded that the 129 somatic mutation calls were not sequencing artefacts.
Of the 129 somatic mutation calls, 103 (80%) were from inherited CJD samples (Supplementary Fig. S3). This represented a significant excess of calls from this group compared to sporadic CJD or control samples, even after accounting for the higher number of inherited CJD samples in our panel. Most inherited CJD patients in our panel carried a monoallelic OPR mutation. In these samples, the somatic mutation calls could represent the mutation of the reference allele or the mutation of the allele already carrying an OPR mutation. For example, a 5 OPRI somatic mutation call in a 4 OPRI heterozygous carrier could represent a 5-repeat insertion in the reference allele, or a 1-repeat insertion in the 4 OPRI allele. We used haplotype phasing to discriminate between these two possibilities (Supplementary Fig. S4). Of the 64 somatic mutation calls from samples with an OPR mutation, 47 (73%) were assigned to the OPR-mutated allele (Supplementary Fig. S3). Somatic mutation calls ranged from the deletion of 7 OPR repeats to the insertion of 5 additional repeats, but the majority (55/94, 59% of haplotype-assigned calls) represented the deletion of one OPR repeat (Fig. 3b).
Were specific samples more likely to produce somatic mutation calls? Samples from individuals with longer OPRs had more somatic mutation calls, with each additional repeat in the OPR increasing somatic mutation calls by 0.017% (Fig. 3c). In the four individuals who had both blood and brain DNA sequenced, the frequency of somatic mutation calls was consistently, but not significantly, lower in brain than in blood (0.07 ± 0.07% fewer calls in brain vs blood of the same individual, p = 0.068; Fig. 3d).
PCR amplification introduces errors in the octapeptide repeat region that resemble somatic mutations
Li et al.14 have elegantly demonstrated that PRNP’s OPR is prone to contraction or expansion when replicated by PCR or in E. coli cells. Therefore, we tested if the somatic mutation calls could be errors generated by the Taq polymerase. We amplified the protein-coding region (1015-bp amplicon, Fig. 1a) from a known amount of control sample: 1 ng of genomic DNA (#56635), approximately 307 haploid genomes60. In this sample, we previously found that 0.0325% of reads were somatic mutation calls. Assuming 0.0325% of genomes in the original genomic DNA carried a somatic mutation of the OPR, it was almost certain that we drew either no (0.9 probability) or only one mutated genome (0.09 probability, cumulatively 0.99 probability). Accordingly, if amplification did not introduce errors in the OPR, only reference OPR reads or a few mutated OPR reads with a unique mutation (e.g. all 2 OPRI) should be detectable in the sequencing reads. However, 0.21% of reads were somatic mutation calls, ranging from 4 OPRD to 4 OPRI (Fig. 3e). The distribution of OPR lengths resembled the distribution of the original 129 somatic mutation calls (Fig. 3e vs b) with 1 OPRD being the most frequent call (319/761 calls, 42%). Therefore, we cannot confirm that the 129 calls were genuine somatic mutations. Instead, the calls may have been errors introduced by the PCR. Single-molecule sequencing technologies are likely required to confirm the presence of somatic mutations in PRNP’s OPR in human samples.
Discussion
Sequencing of PRNP is of crucial importance in informing patients, veterinaries, and clinicians about prion diseases diagnosis and management. Here we established a novel long-read sequencing strategy to sequence the full length of PRNP including its regulatory region (16.7 kb). First, we showed that Nanopore sequencing accurately detects known single-nucleotide and structural variants in PRNP, including the insertion and deletion of repeats in the OPR, which are challenging to sequence accurately with Sanger or short-read technologies. Second, we did not discover novel SVs in non-coding regions of PRNP or in patients with sporadic CJD. Third, we detected rare changes in the OPR consistent with somatic mutations, which have been speculated to be the cause of sporadic CJD. However, we demonstrated that these somatic mutation calls may in fact represent errors introduced during the PCR.
The present strategy has some limitations: it requires amplification of PRNP in two fragments, which renders the protocol relatively laborious and lab-based. Removing the amplification steps could help streamline the process. To do so, targeted sequencing using CRISPR-Cas939 or the ‘Read Until’ mode of Oxford Nanopore devices could be employed40. This however will require modifications of the present protocol. By removing the amplification steps, these strategies could enable PRNP sequencing in the field. This could benefit rapid decision making during therapeutic studies and support programmes which survey the zoonotic potential of animal prion diseases, such as during the recent episodes of Chronic Wasting Disease (CWD) in Norway and North America1,41,42. This may first require sequencing of PRNP in the affected species as not all have annotated reference genomes. Direct sequencing of genomic DNA would also allow the analysis of epigenetic marks, which may serve as a predictor of disease progression in sporadic CJD39.
Somatic mutations in PRNP have long been suspected to be a cause of sporadic CJD. The hypothesis posits that the first prion in a patient with sporadic CJD originated in a cell or group of cells which acquired a somatic mutation in PRNP. A strong candidate for this original mutation is the somatic insertion of additional repeats in the OPR, as OPRIs can cause inherited CJD and somatic instability is commonly seen for repetitive sequences43. A possible mechanism for the insertion or deletion of repeats in the OPR is replication slippage14,44. During replication, the DNA polymerase may dissociate from the DNA leading to either the template or daughter strand to ‘slip’, i.e. re-anneal incorrectly to an earlier repeat (Supplementary Fig. S6). In the case of template strand slippage, the polymerase then skips one or more repeats when DNA replication resumes, producing an OPRD. In the case of daughter strand slippage, the DNA polymerase replicates the same one or more repeats again, producing an OPRI. In our panel, most somatic mutation calls in inherited CJD samples were assigned to the OPR-mutated allele and OPR length positively correlated with frequency of somatic mutation calls. Both observations support this model: as more repeats are present in the DNA molecule undergoing replication, there are more ways the two DNA strands can mispair if the polymerase dissociates, leading to OPR mutations.
However, our results suggest that the OPR mutations identified here might have been generated by the Taq polymerase. The strong bias towards 1 OPRD in our calls also supports this possibility. Indeed, deletion of repeats are more frequent in bacteria, while eukaryotic repetitive sequences show no bias or a bias towards insertions45. Replication slippage is dependent on cell division, which leads to two predictions relevant for future research on sporadic CJD. First, the frequency of somatic mutations of the OPR should be higher in cells undergoing divisions than in post-mitotic cells. In four individuals, we found that the frequency of somatic mutation calls was consistently higher in blood DNA than in brain DNA, which suggests that some calls could have been genuine somatic mutations. Second, the original OPR mutation which causes sporadic CJD may not arise in post-mitotic neurons, but rather during development or in glia. Of note, mechanisms independent of cell division may also be possible, as has been suggested for trinucleotide repeat disorders such as Huntington disease46,47. Future research should initially aim at discovering genuine somatic mutations of PRNP’s OPR in genomic DNA from donors, for example using single-molecule sequencing technologies48.
Methods
Patient recruitment and sample obtention
Healthy control donors were recruited from spouses or relatives of patients by the National Prion Clinic (London, UK), and the UCL Dementia Research Centre (London, UK). All experimental protocols were approved by the London Queen Square Research Ethics Committee (reference 05/Q0505/113). Samples were obtained with written informed consent from all controls, patients, or a patient's consultee in accordance with applicable UK legislation and Codes of Practice. All methods were carried out in accordance with relevant guidelines and regulations.
DNA extraction
Nucleon BACC3 kit was used for DNA extraction from frozen blood samples according to the manufacturer’s protocol. Final DNA samples were stored at 4 °C until needed. DNA extraction from CJD brain samples was carried out in a Biosafety Level 3 facility as previously published49.
Sanger sequencing
Sanger sequencing of the protein-coding sequence of PRNP was performed following published protocols for genetic diagnosis29,50. In this work, the PCR amplicons were not cloned. Instead, the exact sequences of inserts were inferred from Sanger sequencing electropherograms by subtracting the reference sequence from heterozygous calls. The amplicon subjected to sequencing was the 1015-bp amplicon covering the protein-coding region (Fig. 1a). The shorter amplicon (348 bp in the reference allele) was used to assess OPR deletions/insertions in affected individuals by gel electrophoresis.
The Sanger sequencing trace included in Supplementary Fig. S1 was exported from Benchling (benchling.com).
Nanopore sequencing of PRNP’s protein-coding region
PCR was performed on genomic DNA from blood of a healthy individual to produce the 1015-bp amplicon covering the protein-coding sequence (see above). Forward primer was 5′-CTATGCACTCATTCATTATG-3′, reverse primer was 5′-GTTTTCCAGTGCCCATCAGTG-3′. The PCR well contained 12.5 μL 2 × MegaMix Royal (Microzone), 10.5 μL H2O, 1 μL primers (12.5 μM each), 1 μL (600 ng) genomic DNA.
Initial denaturation was 95 °C—5′, followed by 35 cycles of: 95 °C—30″ (denaturation); 58 °C—40″ (annealing); 72 °C—1′ (extension). The PCR product was purified and concentrated using Zymo DNA Clean & Concentrator-5 kit and eluted in 50 µL TE buffer. The PCR product’s concentration was quantified with Qubit (dsDNA Broad Range assay) and its length was verified using TapeStation 2200 (D1000 tape). Library preparation was performed according to Nanopore Technologies’ 1D amplicon by ligation protocol (version ADE_9003_v108_revU_18Oct2016). In brief, the PCR products were end-prepped/dA-tailed for adapter attachment and sequencing adapters were attached by ligation. The final products were cleaned using size-selection beads. All reagents were provided by the SQK-LSK108 kit, except: NEBNext Ultra II End Repair/dA-Tailing buffer and enzyme mix (#E7546), NEB Blunt TA/Ligase Master Mix (#M0367), and Agencourt AMPure XP beads (#A63880). The PRNP coding region amplicons were diluted after end-repair/dA-tailing to bring only 0.2 pmol of DNA fragments to the ligation reaction. 23 ng of final library were loaded onto the MinION flow cell. Sequencing was performed for 53 min and followed live on the MinKNOW software.
FAST5 files were basecalled using guppy v4.5.3 (Oxford Nanopore Technologies). We then computed a consensus sequence of the Nanopore reads using canu v2.1.151, and “polished” it using nanopolish v0.13.252. The resulting consensus sequence was aligned to the sequence obtained by Sanger sequencing using BLASTn (megablast) (NCBI). The full alignment is included in Supplementary Fig. S1.
To call variants, the reads obtained after basecalling were aligned to the human reference genome (hg38) using minimap2 v2.18-r101553. The resulting sam file was converted to bam, then sorted and indexed using samtools v1.954. Nanopolish v.0.13.2 was used for variant calling. Alignments were visualised with the Integrative Genomics Viewer (IGV) v2.8.6.
Nanopore sequencing of full-length PRNP and its regulatory region
Amplification prior to Nanopore sequencing was performed in two subsequent PCRs. The first one amplified the genomic region of interest from genomic DNA, the second attached unique barcodes to allow multiplexing of several samples onto the sequencing flow cell. In order to use the barcoding primers provided by Oxford Nanopore Technologies in the PBK-004 barcoding kit, we designed primers that included a complementary sequence to the barcoding primers (see “Primer sequences” section). We used the NEB LongAmp Taq DNA polymerase, which performed better than other long-range polymerases we tried.
The first PCR amplified PRNP in two amplicons (Fig. 1a, Nanopore amplicons). For the regulatory region amplicon (chr20:4,685,060–4,688,047), primers were primer 3 (forward) and primer 4 (reverse); annealing was 59 °C—25″; elongation time was 2′40″, for 30 cycles. For the gene body amplicon (chr20:4,687,732–4,701,756), primers were primer 2 (forward) and primer 1 (reverse); annealing was 57 °C—30″; elongation time was 13′00″, for 30 cycles. Amount of genomic DNA used as template was 119–443 ng. The reactions and other program settings were as recommended by the manufacturer’s protocol.
The barcoding PCR and library preparation were carried out according to ONT’s barcoding kit (SQK-PBK004, version PBK_9073_v1_revB_23May2018) and ligation sequencing kit (SQK-LSK109) protocols. Up to 12 samples were multiplexed on each flow cell and sequencing was performed for 48 hr.
Basecalling and alignment
The raw, multi-line FAST5 files were processed using ONT Guppy v4.2.2 suite. We used guppy_basecaller for basecalling and guppy_barcoder for demultiplexing, i.e. assigning reads to each sample. The reads in fastq format were then aligned to the human reference genome (hg38) using guppy_aligner. Samtools v1.7.0 was used to sort and index the resulting alignment files (BAM format). IGV v.2.8.6 was used to visualise the alignments.
Single-nucleotide variant calling
We used nanopolish v0.13.3 to call single-nucleotide variants (SNVs) in the PRNP genomic window. Default nanopolish filtering criteria were used: only SNVs in regions sequenced at a minimum coverage of 20× and with an allele frequency of minimum 0.2 were called. We filtered the calls further using strand bias as a criterion39. MinION sequences at random the forward or reverse strand. Hence, for a true positive variant call, the reads supporting the variant are predicted to be ~ 50% forward and ~ 50% reverse. There is evidence of strand bias if this proportion is strongly imbalanced, i.e. when the forward and the reverse reads do not uniformly call for the same nucleotide. We used the StrandOddsRatio (SOR) metric computed by nanopolish55, which increases with evidence of strand bias. To filter SNV calls in non-coding regions of PRNP, we reasoned that the SNVs in the protein-coding region identified by both Sanger and Nanopore sequencing could be used as a set of true positives. For those, the maximum SOR was 0.77 (N = 12). Therefore, we filtered out any SNV call above SOR = 1.0 as showing evidence of strand bias and hence a possible false positive. This decreased the number of SNV calls for all samples (N = 25) from 218 to 203. SNV calls before and after strand-bias filtering are included (Supplementary Information 1).
Haplotype phasing
To assign the gene body reads to haplotypes (haplotagging), we generated a new VCF file for each file containing its filtered SNV calls (see “Single-nucleotide variant calling” section). Five samples (#58648, #54890, #55050, #59060, #53747) had no heterozygous SNV call and therefore their reads could not be haplotagged. We first used whatshap phase to phase the SNV calls in relation to each other56. A single heterozygous SNV call is sufficient for haplotagging but could not be phased by whatshap phase. Therefore, the VCF files of samples with a single SNV call (#54917, #54968, #55048, #43706, #42656) were manually edited to match the formatting of a phased variant call. We then used whatshap haplotag to haplotag the reads.
Structural variant calling
Prior to SV calling, alignments were filtered further to keep only high-quality reads. For the regulatory region amplicon, any read shorter than 1 kb, longer than 3.5 kb, or with more than 15% of its length soft-clipped was discarded. For the gene body amplicon, any read shorter than 10 kb, longer than 15 kb or with more than 5% of its length soft-clipped was discarded. The goal of the maximum length criteria is to exclude potentially chimeric reads. Secondary alignments were also discarded.
We used sniffles v1.0.12 to call structural variants (SVs) larger than 20 bp and supported by at least 100 reads57. Calls were filtered to keep SV calls within the PRNP genomic window and with an allele frequency above 0.1.
Lengths of OPR reads
This refers to Fig. 2c, d. The filtered aligned reads (see above) from the gene body were trimmed to keep only the OPR (chr20:4,699,371–4,699,493) using samtools ampliconclip54. From the reads we obtained, any read shorter than 21 bp or longer than 702 bp were discarded. 21 bp was chosen as minimum length as it would correspond to a 4 OPRD, minus 6 bp to account for small artefactual deletions. 702 bp was chosen as maximum length as it would correspond to a 24 OPRI, plus 3 bp to account for small artefactual insertions. We then counted the number of reads of each length to produce the histograms in Fig. 2c, d.
Search for somatic mutations of the OPR
The haplotagged OPR reads were imported in R from SAM files, for a total of 208,554 OPR reads. We parsed the CIGAR of each read to calculate its total insertion and deletion. For example, 11500H43M1I24M1D29M1I7M3D7M2D7M2306H returned insertion 2 bp (1I + 1I) and deletion 6 bp (1D + 3D + 2D). The hard clips (11500H, 2306H) were created by samtools ampliconclip. Each read was then assigned to a most likely OPR genotype based on its total insertion/deletion. For each OPR genotype, the interval of possible lengths was defined as the target insertion/deletion length minus 6 bp to allow for small artefactual deletions up to the target insertion/deletion length plus 3 bp to allow for small artefactual insertions. For example, the target deletion length for 1 OPRD is − 24 bp. Therefore, any read with a total deletion of − 30 bp up to − 21 bp was labelled as a potential 1 OPRD read. As the target insertion/deletion for the reference OPR is 0 bp, the interval was set from deletions of − 6 bp up to insertions of + 3 bp. Any read with a total insertion/deletion not included in one of the intervals (e.g. a 10-bp insertion) was labelled as unassigned.
At this stage, we generated Supplementary Fig. S4 by plotting for each sample the proportion of reads labelled as reference or as the expected OPR mutation assigned to each haplotype.
In parallel, we built OPR template sequences using the OPR consensus sequence. Each read, except if labelled as unassigned, was then aligned one-to-one to the OPR template matching its OPR label (Supplementary Fig. S5), returning the number of mismatches. Alignments were performed with ClustalW58, implemented in the msa R package59. The mismatch threshold to filter somatic mutation calls was calculated as the mean + standard deviation of the number of mismatches of all ‘expected’ reads, i.e. reference reads and reads matching the OPR genotype of their sample. Somatic mutations calls were then defined as any read not labelled as reference, not labelled as the OPR genotype of its sample, and having fewer mismatches with its template than 5.8% of its length (Supplementary Fig. S3). For the compound heterozygous sample #46345, reads labelled as reference and below the mismatch threshold were also defined as somatic mutation calls.
Control PCR to test for PCR-introduced errors
1 ng of genomic DNA from sample #56635, approximately 307 haploid genomes60, was PCR amplified to produce the 1015-bp amplicon covering the protein-coding sequence (see Sanger sequencing). The product was cleaned, eluted in 25 µL of dH2O, and its concentration was quantified at 10.36 ng/µL by Qubit (dsDNA Broad Range assay). 1 µL of this product was then barcoded and sequenced as previously (see “Nanopore sequencing of full-length PRNP and its regulatory region” section).
The fast5 files were basecalled to produce reads which were aligned to the human reference genome as previously (see Basecalling and alignment). We then filtered the reads by discarding any read shorter than 900 bp, longer than 1300 bp, or with more than 20% of its length soft-clipped. We trimmed the reads to keep only the OPR (see “Lengths of OPR reads” section), generating 366,243 OPR reads. Calling possible somatic mutations was then performed as previously (see “Search for somatic mutations of the OPR” section).
The original frequency of somatic mutation calls in sample #56635 was 0.0325% (9 of 26,764 OPR reads were somatic mutation calls). Let us assume that these somatic mutation calls were genuine, and that any genome taken at random from sample #56635 had a probability of carrying an OPR mutation . Conversely, the probability that any genome taken at random from sample #56635 was OPR reference was .
The probability to have drawn exactly 0 genome of the 307 can be written as the probability that all 307 genomes were reference, which is , i.e. there was a 90.4% probability to draw 0 mutated genome in the 307.
The probability that exactly genomes out of genomes drawn (here, 307) were mutated can be calculated as
1 |
where
2 |
From Eqs. (1) and (2), the probability of having drawn exactly 1 mutated genome in the 307 was , i.e. there was a 9.02% probability of drawing exactly 1 mutated genome in the 307.
The probability of drawing 0 or 1 mutated genome in the 307 was therefore , i.e. the probability to draw more than one mutated genome was 0.58%.
Primer sequences
The sequence complementary to the ONT barcoding primers (kit PBK-004) is shown in bold.
Primer 1:
Sequence: ACTTGCCTGTCGCTCTATCTTCGTGGGTGACTGGGAAGTGAG.
Direction: Reverse.
Binding site: chr20:4,701,737–4,701,756.
Primer 2:
Sequence: TTTCTGTTGGTGCTGATATTGCTGGCCGCGTTATTTTCTTGT.
Direction: Forward.
Binding site: chr20:4,687,732–4,687,751.
Primer 3:
Sequence: TTTCTGTTGGTGCTGATATTGCACATAAACATGGCCCAGGCA.
Direction: Forward.
Binding site: chr20:4,685,060–4,685,079.
Primer 4:
Sequence: ACTTGCCTGTCGCTCTATCTTCCTCTGCCTGCTTTTCCCCTT.
Direction: Reverse.
Binding site: chr20:4,688,028–4,688,047.
All genomic positions are given for the human reference genome GRCh38 (hg38).
Supplementary Information
Acknowledgements
We are grateful to patients and volunteers who have made this study possible by donating samples to research; to Paulo Amaral, Alejandro Gener (Baylor College of Medicine), Prof. Christos Proukakis (UCL) and Mike Clark (University of Melbourne) for helpful discussions; and to Joanna Field and Sarah Mazdon for help with metadata acquisition. The research was funded by the Medical Research Council (UK). Patient samples were provided with the assistance of funding from the UK’s National Institute of Health Research (NIHR) Biomedical Research Centre at University College London Hospitals NHS Foundation Trust. SM and JC are NIHR Senior Investigators.
Author contributions
S.M. had full access to all data in the study and takes responsibility for integrity of the data and accuracy of the data analysis. Concept and design: E.A.V. and S.M.; acquisition, analysis or interpretation of data: F.K., A.D., T.C., L.D., S.M. and E.A.V.; drafting of manuscript: F.K., A.D., E.A.V.; obtaining funding: J.C. and S.M.; study supervision: S.M. and E.A.V. All authors read and approved the final manuscript.
Funding
This study was funded by the UK Medical Research Council and the National Institute of Health Research (NIHR) Biomedical Research Centre at UCL Hospitals NHS Foundation Trust. S.M. and J.C. are NIHR Senior Investigators. F. K. was supported and funded by the Leonard Wolfson PhD Programme in Neurodegeneration. A.D. was supported by the Onassis Foundation—Scholarship ID: F ZQ 022-1/2020-2021.
Data availability
Code is made available at GitHub repositories https://github.com/athanadd/prnp-nanopore-seq and https://github.com/francoiskroll/prnp_nanopore. Sequencing data are available as a Zenodo archive at https://doi.org/10.5281/zenodo.6427185.
Competing interests
Prof. Collinge is a director and shareholder of D-Gen Limited (London), an academic spinout company working in the field of prion disease diagnosis, decontamination and therapeutics. François Kroll owns stocks in Oxford Nanopore Technologies and Pacific Biosciences. All other authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: François Kroll and Athanasios Dimitriadis.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-022-12130-7.
References
- 1.Watson N, Brandel JP, Green A, Hermann P, Ladogana A, Lindsay T, et al. The importance of ongoing international surveillance for Creutzfeldt–Jakob disease. Nat. Rev. Neurol. 2021;17(6):362–379. doi: 10.1038/s41582-021-00488-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mead S, Lloyd S, Collinge J. Genetic factors in mammalian prion diseases. Annu. Rev. Genet. 2019;53:117–147. doi: 10.1146/annurev-genet-120213-092352. [DOI] [PubMed] [Google Scholar]
- 3.Oesch B, Westaway D, Walchli M, McKinley MP, Kent SB, Aebersold R, et al. A cellular gene encodes scrapie PrP 27–30 protein. Cell. 1985;40(4):735–746. doi: 10.1016/0092-8674(85)90333-2. [DOI] [PubMed] [Google Scholar]
- 4.Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, et al. Ensembl 2020. Nucleic Acids Res. 2020;48(D1):D682–D688. doi: 10.1093/nar/gkz966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Minikel EV, Vallabh SM, Orseth MC, Brandel JP, Haik S, Laplanche JL, et al. Age at onset in genetic prion disease and the design of preventive clinical trials. Neurology. 2019;93(2):e125–e134. doi: 10.1212/WNL.0000000000007745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Palmer MS, Dryden AJ, Hughes JT, Collinge J. Homozygous prion protein genotype predisposes to sporadic Creutzfeldt–Jakob disease. Nature. 1991;352(6333):340–342. doi: 10.1038/352340a0. [DOI] [PubMed] [Google Scholar]
- 7.Pocchiari M, Puopolo M, Croes EA, Budka H, Gelpi E, Collins S, et al. Predictors of survival in sporadic Creutzfeldt–Jakob disease and other human transmissible spongiform encephalopathies. Brain. 2004;127(Pt 10):2348–2359. doi: 10.1093/brain/awh249. [DOI] [PubMed] [Google Scholar]
- 8.Kim C, Haldiman T, Cohen Y, Chen W, Blevins J, Sy MS, et al. Protease-sensitive conformers in broad spectrum of distinct PrPSc structures in sporadic Creutzfeldt–Jakob disease are indicator of progression rate. PLoS Pathog. 2011;7(9):e1002242. doi: 10.1371/journal.ppat.1002242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zheng Z, Zhang M, Wang Y, Ma R, Guo C, Feng L, et al. Structural basis for the complete resistance of the human prion protein mutant G127V to prion disease. Sci. Rep. 2018;8(1):13211. doi: 10.1038/s41598-018-31394-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Asante EA, Smidak M, Grimshaw A, Houghton R, Tomlinson A, Jeelani A, et al. A naturally occurring variant of the human prion protein completely prevents prion disease. Nature. 2015;522(7557):478–481. doi: 10.1038/nature14510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hizume M, Kobayashi A, Teruya K, Ohashi H, Ironside JW, Mohri S, et al. Human prion protein (PrP) 219K is converted to PrPSc but shows heterozygous inhibition in variant Creutzfeldt–Jakob disease infection. J. Biol. Chem. 2009;284(6):3603–3609. doi: 10.1074/jbc.M809254200. [DOI] [PubMed] [Google Scholar]
- 12.Goldfarb LG, Brown P, McCombie WR, Goldgaber D, Swergold GD, Wills PR, et al. Transmissible familial Creutzfeldt–Jakob disease associated with five, seven, and eight extra octapeptide coding repeats in the PRNP gene. Proc. Natl. Acad. Sci. USA. 1991;88(23):10926–10930. doi: 10.1073/pnas.88.23.10926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wadsworth JD, Joiner S, Linehan JM, Cooper S, Powell C, Mallinson G, et al. Phenotypic heterogeneity in inherited prion disease (P102L) is associated with differential propagation of protease-resistant wild-type and mutant prion protein. Brain. 2006;129(Pt 6):1557–1569. doi: 10.1093/brain/awl076. [DOI] [PubMed] [Google Scholar]
- 14.Li B, Qing L, Yan J, Kong Q. Instability of the octarepeat region of the human prion protein gene. PLoS ONE. 2011;6(10):e26635. doi: 10.1371/journal.pone.0026635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mead S, Webb TE, Campbell TA, Beck J, Linehan JM, Rutherfoord S, et al. Inherited prion disease with 5-OPRI: Phenotype modification by repeat length and codon 129. Neurology. 2007;69(8):730–738. doi: 10.1212/01.wnl.0000267642.41594.9d. [DOI] [PubMed] [Google Scholar]
- 16.Bueler H, Aguzzi A, Sailer A, Greiner RA, Autenried P, Aguet M, et al. Mice devoid of PrP are resistant to scrapie. Cell. 1993;73(7):1339–1347. doi: 10.1016/0092-8674(93)90360-3. [DOI] [PubMed] [Google Scholar]
- 17.Westaway D, Mirenda CA, Foster D, Zebarjadian Y, Scott M, Torchia M, et al. Paradoxical shortening of scrapie incubation times by expression of prion protein transgenes derived from long incubation period mice. Neuron. 1991;7(1):59–68. doi: 10.1016/0896-6273(91)90074-A. [DOI] [PubMed] [Google Scholar]
- 18.Juling K, Schwarzenbacher H, Williams JL, Fries R. A major genetic component of BSE susceptibility. BMC Biol. 2006;4:33. doi: 10.1186/1741-7007-4-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hills D, Comincini S, Schlaepfer J, Dolf G, Ferretti L, Williams JL. Complete genomic sequence of the bovine prion gene (PRNP) and polymorphism in its promoter region. Anim. Genet. 2001;32(4):231–232. doi: 10.1046/j.1365-2052.2001.0769a.x. [DOI] [PubMed] [Google Scholar]
- 20.Sander P, Hamann H, Drogemuller C, Kashkevich K, Schiebel K, Leeb T. Bovine prion protein gene (PRNP) promoter polymorphisms modulate PRNP expression and may be responsible for differences in bovine spongiform encephalopathy susceptibility. J. Biol. Chem. 2005;280(45):37408–37414. doi: 10.1074/jbc.M506361200. [DOI] [PubMed] [Google Scholar]
- 21.Vollmert C, Windl O, Xiang W, Rosenberger A, Zerr I, Wichmann H-E, Bickeboller H, Illig T, Kretzschmar H A. Significant association of a M129V independent polymorphism in the 5' UTR of the PRNP gene with sporadic Creutzfeldt-Jakob disease in a large German case-control study. J Med Genet. 2006;43(10):e53–e53. doi: 10.1136/jmg.2006.040931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sanchez-Juan, P., Bishop, M. T., Croes, E. A., Knight, R. S. G., Will, R. G., van Duijn, C. M., Manson, J. C. A polymorphism in the regulatory region of PRNPis associated with increased risk of sporadic Creutzfeldt-Jakob disease. BMC Medical Genetics12(1). 10.1186/1471-2350-12-73 (2011). [DOI] [PMC free article] [PubMed]
- 23.Jones E, Hummerich H, Vire E, Uphill J, Dimitriadis A, Speedy H, et al. Identification of novel risk loci and causal insights for sporadic Creutzfeldt-Jakob disease: A genome-wide association study. Lancet Neurol. 2020;19(10):840–848. doi: 10.1016/S1474-4422(20)30273-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.The GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mortberg MA, Zhao HT, Reidenbach AG, Gentile JE, Kuhn E, O'Moore J, et al. Regional variability and genotypic and pharmacodynamic effects on PrP concentration in the CNS. JCI Insight. 2022;7(6):e156532. doi: 10.1172/jci.insight.156532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lukic A, Uphill J, Brown CA, Beck J, Poulter M, Campbell T, et al. Rare structural genetic variation in human prion diseases. Neurobiol. Aging. 2015;36(5):2004e1-8. doi: 10.1016/j.neurobiolaging.2015.01.011. [DOI] [PubMed] [Google Scholar]
- 27.De Coster W, De Rijk P, De Roeck A, De Pooter T, D'Hert S, Strazisar M, et al. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Res. 2019;29(7):1178–1187. doi: 10.1101/gr.244939.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet. 2021;53(6):779–786. doi: 10.1038/s41588-021-00865-4. [DOI] [PubMed] [Google Scholar]
- 29.Beck JA, Poulter M, Campbell TA, Adamson G, Uphill JB, Guerreiro R, et al. PRNP allelic series from 19 years of prion protein gene sequencing at the MRC Prion Unit. Hum. Mutat. 2010;31(7):E1551–E1563. doi: 10.1002/humu.21281. [DOI] [PubMed] [Google Scholar]
- 30.Mahal SP, Asante EA, Antoniou M, Collinge J. Isolation and functional characterisation of the promoter region of the human prion protein gene. Gene. 2001;268(1–2):105–114. doi: 10.1016/S0378-1119(01)00424-3. [DOI] [PubMed] [Google Scholar]
- 31.Payne A, Holmes N, Rakyan V, Loose M. BulkVis: A graphical viewer for Oxford nanopore bulk FAST5 files. Bioinformatics. 2019;35(13):2193–2198. doi: 10.1093/bioinformatics/bty841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Beck JA, Poulter M, Campbell TA, Uphill JB, Adamson G, Geddes JF, et al. Somatic and germline mosaicism in sporadic early-onset Alzheimer's disease. Hum. Mol. Genet. 2004;13(12):1219–1224. doi: 10.1093/hmg/ddh134. [DOI] [PubMed] [Google Scholar]
- 33.Kim YC, Won SY, Jeong BH. Identification of prion disease-related somatic mutations in the prion protein gene (PRNP) in cancer patients. Cells. 2020;9(6):1480. doi: 10.3390/cells9061480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Leija-Salazar M, Piette C, Proukakis C. Review: somatic mutations in neurodegeneration. Neuropathol. Appl. Neurobiol. 2018;44(3):267–285. doi: 10.1111/nan.12465. [DOI] [PubMed] [Google Scholar]
- 35.Park JS, Lee J, Jung ES, Kim MH, Kim IB, Son H, et al. Brain somatic mutations observed in Alzheimer's disease associated with aging and dysregulation of tau phosphorylation. Nat. Commun. 2019;10(1):3090. doi: 10.1038/s41467-019-11000-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.White R, Pellefigues C, Ronchese F, et al. Investigation of chimeric reads using the MinION. F1000Research. 2017;6:631. doi: 10.12688/f1000research.11547.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Xu Y, Lewandowski K, Lumley S, Pullan S, Vipond R, Carroll M, et al. Detection of viral pathogens with multiplex nanopore MinION sequencing: Be careful with cross-talk. Front. Microbiol. 2018;9:2225. doi: 10.3389/fmicb.2018.02225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Delahaye C, Nicolas J. Sequencing DNA with nanopores: Troubles and biases. PLoS ONE. 2021;16(10):e0257521. doi: 10.1371/journal.pone.0257521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gilpatrick T, Lee I, Graham JE, Raimondeau E, Bowen R, Heron A, et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 2020;38(4):433–438. doi: 10.1038/s41587-020-0407-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Stevanovski, I., et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. medRxiv (2021). [DOI] [PMC free article] [PubMed]
- 41.Guere ME, Vage J, Tharaldsen H, Benestad SL, Vikoren T, Madslien K, et al. Chronic wasting disease associated with prion protein gene (PRNP) variation in Norwegian wild reindeer (Rangifer tarandus) Prion. 2020;14(1):1–10. doi: 10.1080/19336896.2019.1702446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nonno R, Di Bari MA, Pirisinu L, D'Agostino C, Vanni I, Chiappini B, et al. Studies in bank voles reveal strain differences between chronic wasting disease prions from Norway and North America. Proc. Natl. Acad. Sci. USA. 2020;117(49):31417–31426. doi: 10.1073/pnas.2013237117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lopez Castel A, Cleary JD, Pearson CE. Repeat instability as the basis for human diseases and as a potential target for therapy. Nat. Rev. Mol. Cell Biol. 2010;11(3):165–170. doi: 10.1038/nrm2854. [DOI] [PubMed] [Google Scholar]
- 44.Mad-Cow.org. Origin of extra prion repeat units. Available online (1999).
- 45.Metzgar D, Liu L, Hansen C, Dybvig K, Wills C. Domain-level differences in microsatellite distribution and content result from different relative rates of insertion and deletion mutations. Genome Res. 2002;12(3):408–413. doi: 10.1101/gr.198602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Gomes-Pereira M, Fortune MT, Ingram L, McAbney JP, Monckton DG. Pms2 is a genetic enhancer of trinucleotide CAG.CTG repeat somatic mosaicism: Implications for the mechanism of triplet repeat expansion. Hum. Mol. Genet. 2004;13(16):1815–1825. doi: 10.1093/hmg/ddh186. [DOI] [PubMed] [Google Scholar]
- 47.Monckton DG. The contribution of somatic expansion of the CAG repeat to symptomatic development in Huntington's disease: A historical perspective. J. Huntington’s Dis. 2021;10(1):7–33. doi: 10.3233/JHD-200429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
- 49.Dabin LC, Guntoro F, Campbell T, Belicard T, Smith AR, Smith RG, et al. Altered DNA methylation profiles in blood from patients with sporadic Creutzfeldt–Jakob disease. Acta Neuropathol. 2020;140(6):863–879. doi: 10.1007/s00401-020-02224-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Beck JA, Mead S, Campbell TA, Dickinson A, Wientjens DP, Croes EA, et al. Two-octapeptide repeat deletion of prion protein associated with rapidly progressive dementia. Neurology. 2001;57(2):354–356. doi: 10.1212/WNL.57.2.354. [DOI] [PubMed] [Google Scholar]
- 51.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Simpson, J. Nanopolish. https://github.com/jts/nanopolish (2020).
- 53.Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.GATKTeam. StrandOddsRatio (2019).
- 56.Patterson M, Marschall T, Pisanti N, van Iersel L, Stougie L, Klau GW, et al. WhatsHap: Weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 2015;22(6):498–509. doi: 10.1089/cmb.2014.0157. [DOI] [PubMed] [Google Scholar]
- 57.Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods. 2018;15(6):461–468. doi: 10.1038/s41592-018-0001-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bodenhofer, U., Bonatesta, E., Horejš-Kainrath, C. & Hochreiter, S. msa: an R package for multiple sequence alignment. Bioinformatics 31(24), 3997–3999 (2015). [DOI] [PubMed]
- 60.Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L. On the length, weight and GC content of the human genome. BMC Res. Notes. 2019;12(1):106. doi: 10.1186/s13104-019-4137-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Code is made available at GitHub repositories https://github.com/athanadd/prnp-nanopore-seq and https://github.com/francoiskroll/prnp_nanopore. Sequencing data are available as a Zenodo archive at https://doi.org/10.5281/zenodo.6427185.