Abstract
Germline pathogenic TP53 variants predispose individuals to a high lifetime risk of developing multiple cancers and are the hallmark feature of Li-Fraumeni syndrome (LFS). Our group has previously shown that LFS patients harbor shorter plasma cell-free DNA fragmentation; independent of cancer status. To understand the functional underpinning of cfDNA fragmentation in LFS, we conducted a fragmentomic analysis of 199 cfDNA samples from 82 TP53 mutation carriers and 30 healthy TP53-wildtype controls. We find that LFS individuals exhibit an increased prevalence of A/T nucleotides at fragment ends, dysregulated nucleosome positioning at p53 binding sites, and loci-specific changes in chromatin accessibility at development-associated transcription factor binding sites and at cancer-associated open chromatin regions. Machine learning classification resulted in robust differentiation between TP53 mutant versus wildtype cfDNA samples (AUC-ROC = 0.710–1.000) and intra-patient longitudinal analysis of ctDNA fragmentation signal enabled early cancer detection. These results suggest that cfDNA fragmentation may be a useful diagnostic tool in LFS patients and provides an important baseline for cancer early detection.
Subject terms: Cancer genomics, Oncogenes, Cancer genomics, Tumour biomarkers, Cancer screening
Here, Wong et al investigate the cell-free DNA landscape of individuals with Li-Fraumeni syndrome (LFS), a cancer predisposition, and find altered composition compared to non-LFS individuals which can be used to detect and track cancer development.
Introduction
Somatic mutations in the tumor suppressor gene TP53 are the most common genetic alterations across all human cancers1 and germline TP53 variants are the hallmark feature of Li-Fraumeni Syndrome (LFS; OMIM #151623), a highly penetrant hereditary cancer syndrome (HCS)2. Individuals with TP53 mutations (TP53m-carrier) have an estimated life-time risk of ~75% in males and ~100% in females for developing, often multiple, cancers3,4, most commonly soft tissue sarcoma, osteosarcoma, brain cancers, breast cancer, and adrenocortical carcinoma5,6.
Cell-free DNA (cfDNA) are fragments of DNA released, mainly by apoptotic cells, into bodily fluids such as lymphocytes into the blood plasma7. Sequencing of blood cfDNA has become an attractive method to explore for early cancer detection and monitoring in individuals with HCS, such as LFS8,9. One emerging “omic”, fragmentomics, is the study of cfDNA fragmentation and is based on findings that fragment preservation in circulation is related to nucleosome protection and highly correlated to the chromatin landscape of the cell-of-origin10. This is evidenced by a characteristic peak at 167 bp, corresponding to the length of DNA occupied by one nucleosome10. Studies have shown that TP53m-carriers, even prior to cancer onset, exhibit unique transcriptomic11 and peripheral blood DNA methylation12 signatures compared to TP53-wildtype individuals. Our own work has shown shorter cfDNA fragments in cancer-free TP53m-carriers compared to TP53-wildtype13. The effects of these fundamental biological differences, observed in TP53m-carriers, on cfDNA fragmentation, are currently unknown and elucidating these differences is essential for the adoption of liquid biopsy in LFS.
We comprehensively characterize the cfDNA fragmentomic landscape of a cohort of 169 blood plasma samples collected from 82 TP53m-carriers and 30 TP53-wildtype individuals (Fig. 1A, B). Using shallow whole genome sequencing (sWGS; ~1X coverage) we find that several cfDNA fragmentation features, independent of cancer status and cancer history, are significantly altered in TP53m-carriers, often recapitulating those previously observed in patients with sporadic cancer. Integration of fragmentomic features was able to robustly discriminate cfDNA from TP53m-carriers and TP53-wildtype, and by using patient specific baselines, we were able to detect ctDNA kinetics prior to a clinical cancer diagnosis.
Fig. 1. Study and analysis design.
Introduction of the patient and sample cohort profiled (A) and cfDNA fragmentation and analysis metrics utilized (B) in this study. Figure 1/panel A, created with BioRender.com, released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.
Results
Patient cohort
TP53m-carrier plasma samples were grouped based on their clinical cancer history at the time of blood collection: individuals who have never had cancer (LFS-Healthy [LFS-H], samples = 58), cancer-free individuals with a cancer history (LFS-Past Cancer [LFS-PC], samples = 73), and individuals with an active cancer diagnosis (LFS-Active Cancer [LFS-AC], samples = 38; Fig. 1A, Supplementary Fig. 1A and Supplementary Data 1). Plasma samples were collected during routine clinical follow-up/screening, which included conventional surveillance assays used to determine the individual’s clinical cancer status (imaging/bloodwork)13. Plasma samples were also collected from 30 healthy TP53-wildtype controls (TP53-wildtype). Samples collected from LFS-H and LFS-PC were classified as cancer negative (n = 131) and samples collected from LFS-AC were classified as cancer positive (n = 38). Serial samples (n = 122) were available for 36 TP53m-carriers including individuals that developed multiple cancers (n = 2), transitioned from cancer negative to cancer positive (forward phenoconverter, n = 5), or vice versa (reverse phenoconverter, n = 9), or were cancer negative at all serial timepoints (n = 16; Supplementary Fig. 1B).
A global reduction of cfDNA fragment size
We have previously described an increased proportion of short ( < 150 bp) mono-nucleosome cfDNA fragments in TP53m-carriers, independent of cancer status (ANOVA p-value < 0.001; Fig. 2A/B), cancer history (Student’s t-test p-value = 0.085; Supplementary Fig. 2A-D) and patient age at the time of sample collection (Pearson’s R2 = 0.01; Supplementary Fig. 2E)13. Technical variables were also accounted for, with no differences detected between institutions (Student’s t-test = 0.637; SickKids = SK, Princess Margaret = PM), collection tubes (SK = EDTA, PM = Streck; Supplementary Fig. 2F), sequencing run (ANOVA p-value = 0.0601; Supplementary Fig. 2G), or flow cell cluster density (Pearson’s R2 = 0.004 – 0.049; Supplementary Fig. 2H). This observation persisted when compared to external cohorts of healthy controls (Supplementary Fig. 2I)14–16. Using an alternate sequencing method (Ultima Genomics), in a subset of samples (n = 58), we observed a similar distribution of fragment lengths in TP53-wildtype (n = 18) and TP53m-carriers (cancer negative n = 26, cancer positive n = 14) with the canonical peak at 167 bp and 10 bp periodicities (Supplementary Fig. 2J, K). Comparing the frequency of fragments in 5 bp bins, we saw high correlation between Illumina and Ultima sequencing suggesting good concordance between the two sequencing technologies (R2 = 0.973 – 0.988; Supplementary Fig. 2L)17. Within our Illumina data, we also observed an increase in longer fragments (250-500 bp), representing di-nucleosome fragments, in TP53m-carriers compared to TP53-wildtype, independent of cancer status and age (Supplementary Fig. 2M–Q).
Fig. 2. Increased proportion of short cfDNA fragmentation.
A Median fragment size distributions of healthy TP53-wildtype, LFS-H, LFS-PC, and LFS-AC. TP53m-carriers exhibit increased frequency at fragments <150 bp. B Tukey boxplots showing the proportion of fragments within 3 size compartments: 10–150 bp = short, 151–180 bp = mono-nucleosomes, 250–500 bp = di-nucleosomes. TP53m-carriers, independent from cancer status, exhibit increased proportions within the 10–150 bp compartment. C Tukey boxplots showing the proportion of short cfDNA fragments normalized against the global proportion of short cfDNA fragments and proportion of total fragments mapped to each repeat element (normalized contribution). TP53m-carriers exhibit decreased contribution of short cfDNA fragments at these select 10 repeat elements. p-values were calculated using two-sided Student’s t-test compared to TP53-wildtype. D Tukey boxplots showing the proportion of short cfDNA fragments across each germline mutation group by functional TP53 mutation class. Numbers represent the number of samples with the number of patients in brackets. p-values were calculated using Mann–Whitney–Wilcoxon test compared to the median proportion of short cfDNA fragments from all cancer negative TP53m-carriers (red line). * = p-value < 0.05, ** = p-value < 0.01, *** = p-value < 0.001. Exact p-values are provided in Supplementary Data 2. Source data are provided as a Source Data file.
To investigate whether this increase in shorter mono-nucleosome fragments was related to the increased contribution of short cfDNA fragments at specific genomic repeat regions observed in healthy individuals18,19, we compared cfDNA fragment lengths at 46 different families of genomic repeat elements (Supplementary Fig. 3). We observed an increased proportion of short cfDNA fragments across all repeat families in TP53m-carriers, comparable to the increase observed genome-wide (Student’s t-test p-values: <0.001–0.062; Supplementary Fig. 4A). To determine the absolute contribution of short cfDNA fragments, we normalized against the global proportion of short cfDNA fragments and the total proportion of fragments mapped to each repeat family (Supplementary Fig. 4B). Compared to TP53-wildtype, we observed decreased contribution of short cfDNA fragments from long terminal repeats (LTR), (GAATG)n and beta satellite repeats (BSRs), LINE-L2 retrotransposons, and short interspersed nuclear elements (SINEs; Student’s t-test p-values: <0.001- 0.046) in TP53m-carriers (Fig. 2C). This decrease was due to a reduction in the total number of fragments mapped to these repeat families (Supplementary Fig. 4C). Given that p53 functions as a repressor of transposable elements (repeat regions)20, the depletion in total fragments at these regions may be due to derepression leading to increased open chromatin and cfDNA degradation at these regions in TP53m-carriers10,21.
Genomic instability associated with telomere attrition has been linked to earlier tumor onset in LFS22,23 and short cfDNA fragmentation is associated with ctDNA24. Therefore, we wanted to explore further by calculating telomere content using TelSeq25 and TelomereHunter26 (Pearson’s R2 = 0.97; Supplementary Fig. 5A). No significant difference in the contribution of telomere repeat sequences or contexts to the cfDNA was found between TP53-wildtype and TP53m-carriers, independent of cancer status (Supplementary Fig. 5B–F). We did observe increased telomere content in pediatric compared to adult TP53m-carriers, as expected (Supplementary Fig. 5G). The proportion of short cfDNA fragments was also not correlated with the telomere content (Pearson’s R2 = 0.002; Supplementary Fig. 5H).
Lastly, to assess whether there are genotype-fragment length correlations within TP53m-carriers, based on differential functional impact27, we grouped TP53 mutations into three categories: 1) loss of function (LOF; nonsense, frameshift, deletions/duplications; n = 49), 2) splice site (n = 41), and 3) missense (n = 78). Missense mutations were further grouped based on their functional outcomes: Missense 1 (n = 3) less penetrant gene-wide; Missense 2 (n = 23) non-dominant negative DNA binding domain; Missense 3 (n = 46) dominant negative DNA binding domain including hotspots; Missense 4 (n = 0) C-terminal; and Missense 5 (n = 6) tetramerization and transactivation domain27,28. There were no differences in the proportion of cfDNA fragments between TP53 mutation classes (ANOVA p-value = 0.064; Supplementary Fig. 6A) or between different mutations within TP53 mutation classes (Kruskal-Wallis p-values: Missense 2 = 0.099, Missense 3 = 0.090, Splice Site = 0.188, LOF = 0.290; Fig. 2D). However, we did observe differences across mutations, independent of TP53 mutation class (Kruskal-Wallis p-value = 0.029; Fig. 2D) and across different LFS families (Kruskal-Wallis p-value = 0.021; Supplementary Fig. 6B). Interestingly, no differences were observed across LFS families that harbored the same germline TP53 mutation (Kruskal-Wallis p-values: p.R213Q = 0.464, p.T125 = = 0.168; Supplementary Fig. 6B). This suggests that the functional impact of the germline TP53 mutation may contribute to short cfDNA fragmentation but that these observations do not extend to broad TP53 mutation classes. Other factors such as germline genetics, epigenetic, or lifestyle may also contribute to the variability observed between patients, similar to in TP53-wildtype individuals29.
Increased intra-nucleosome fragmentation
To refine our understanding of where short cfDNA fragments originate from in TP53m-carriers, we performed fine-mapping of fragment starts and ends (fragment-ends) relative to the nucleosome positions of TP53-wildtype (nucleosome footprint). In TP53-wildtype, using a previously defined genome-wide set of ~13 million peripheral blood cell-derived nucleosomes10, we observed an M-shaped distribution of fragment-ends, with peaks +/−83 bp from the center of the nucleosome; corresponding to a nucleosome spanning cfDNA fragment ( ~ 167 bp)30. In contrast, TP53m-carriers, independent of cancer status, exhibited decreased fragment-end frequency at the peaks and increased frequency within the nucleosome spanning trough (Kolmogorov–Smirnov p-values: <0.001–0.010; Fig. 3A); a phenomenon that has previously been associated with cfDNA from cancer-positive patients30. We observed a similar but lesser in magnitude pattern when comparing TP53-wildtype with higher and lower proportions of short cfDNA fragments (cohort split by median; Supplementary Fig. 7A). When considering only nucleosome spanning fragments (167 bp), the difference between cancer negative TP53m-carriers and TP53-wildtype was diminished (Kolmogorov-Smirnov p-values: 0.272 and 0.479); however, the increased frequency within the trough in LFS-AC samples remained (Kolmogorov-Smirnov p-value = 0.002); likely due to contribution of nucleosome spanning fragments from cancer cells (Supplementary Fig. 7B/C). The differences in nucleosome footprint observed within TP53m-carriers were similar between LFS-H and LFS-PC (Kolmogorov-Smirnov p-values: 0.18–0.77) while dissimilar between LFS-AC and LFS-H/PC (Kolmogorov-Smirnov p-value < 0.001) suggesting that these differences are innate to TP53m-carriers, unaffected by previous cancer or treatment, but affected by the presence of an active cancer (Supplementary Fig. 7D/E).
Fig. 3. Altered nucleosome positioning and nucleosome placement.
A Frequency distribution (top), z-scores (middle), and absolute difference (bottom) of the distance of fragment-ends to the closest nucleosome peak in TP53m-carriers. Healthy TP53-wildtype median is displayed as the black curve in each of the top row panels. Increased fragment-ends are observed within the nucleosome trough in all TP53m-carriers. p-values were calculated using a two-sided Kolmogorov-Smirnov test. B Left: Line plot of the median log2(Observed/Expected) frequencies of each AA/AT/TA/TT and CC/CG/GC/GG dinucleotide contexts from nucleosome spanning (167 bp) fragments in TP53m-carriers. The dinucleotide frequencies 50 bp up and downstream of the fragments are also displayed. The median healthy control is displayed as black lines in each of the left panels. Right: Z-scores of dinucleotide frequencies compared to healthy TP53-wildtype controls. TP53-carriers are separated by cancer status. Increased A/T and decreased C/G dinucleotides are observed in TP53m-carriers. Source data are provided as a Source Data file.
Splitting fragments into those within 1 kb (proximal) and those greater than 1 kb (distal) from a healthy control nucleosome peak, we observed an increased proportion of short cfDNA fragments in TP53m-carriers in both compartments (ANOVA p-values < 0.001; Supplementary Fig. 7F/G). Together, these data suggest that the shorter cfDNA fragment length observed in TP53m-carriers is global and may be derived from differential nucleosome placement compared to TP53-wildtype.
Increased frequency of A/T nucleotides in nucleosome associated fragments
To investigate whether the placement of nucleosomes in TP53m-carriers is altered versus TP53-wildtype, we calculated the frequency of A/T and G/C dinucleotides across nucleosome spanning (167 bp) cfDNA fragments (Fig. 3B); 50 bps flanking the fragment were also included for normalization. Compared to TP53-wildtype, we observed a similar periodicity in the frequency of A/T and G/C dinucleotides, such as an increased preference of C/G dinucleotides at the dyad (center of the nucleosome)31, in TP53m-carriers (Pearson’s R2: 0.98 – 0.99; Supplementary Fig. 8A). However, we also observed a global increase in A/T and decrease in G/C dinucleotides within the nucleosome spanning region compared to the nucleosome flanking region in TP53m-carriers (Kolmogorov-Smirnov p-values < 0.001; Supplementary Fig. 8B/C). The stability of DNA wrapped around a nucleosome is heavily dependent on the nucleotide context and in vitro, will nascently form around preferred sequences, but in vivo can be forced into position by chromatin remodelers31. Thus an increase in A/T dinucleotides in TP53m-carriers suggests less robust nucleosome stability and may be a result of an altered epigenetic state12,31.
Increased fragmentation at A/T regions
cfDNA is generated, in part, by DNases that have preferred nucleotide motifs19. We postulated whether the increased frequency of A/T nucleotides within nucleosome-spanning fragments would also lead to differences in nucleotide composition at fragment ends. We first investigated the 5’ tetranucleotide motif at fragment ends (end-motif). Hierarchical clustering of all 256 possible 4 bp end-motifs resulted in distinct clusters of TP53m-carriers and TP53-wildtype (Fig. 4A). TP53m-carriers (cancer negative) were characterized by an increase in A/T rich motifs and a decrease in G/C rich motifs relative to TP53-wildtype (Fig. 4B). Using normalized Shannon entropy scores, we observed decreased end-motif diversity in TP53m-carriers (ANOVA p-value = 0.011; Fig. 4C). Next, we split end-motifs into those that occur at frequencies above and below the expected random frequency (1/256 = 0.0039) in TP53-wildtype. Within TP53m-carriers, we observed an increased frequency in 63.0% (63/100) and a decreased frequency in 67.3% (105/156) of motifs that occur above and below the expected frequency (Chi-squared p-value < 0.001) compared to TP53-wildtype (Supplementary Fig. 9A). This was further accompanied by an increase in 82.5% (33/40) of A/T rich end-motifs (3 + A/T) that occur above (Chi-squared p-value = 0.009) and a decrease in 84.4% (49/58) of G/C rich end-motifs (3 + G/C) that occur below (Chi-squared p-value < 0.001) the expected frequency (Supplementary Fig. 9B). We observed strong correlation of end-motif frequencies across cancer statuses (Pearson’s R2 = 0.82 – 0.95) suggesting that the differences in motif frequency, compared to TP53-wildtype, are innate to TP53m-carriers (Supplementary Fig. 9C). Lastly, comparing LFS-H to LFS-AC (fold change over TP53-wildtype controls), we observed a decrease in A/T motifs and an increase in G/C motifs in LFS-AC samples, which may be due to the differential expression and activity of nucleases, previously described, in cancer cells (Supplementary Fig. 9D)32.
Fig. 4. A/T nucleotide bias at cfDNA fragmentation breakpoints.
A Heatmap displaying unsupervised clustering of the frequency of each 256 fragment end-motif. Patient type, cancer history, cancer status, and germline mutation type are displayed above and the GC content of the fragment end-motif is displayed on the right. Distinct clustering between TP53-wildtype and TP53m-carriers is observed. B Dot plot showing the fold change in frequency observed between TP53-wildtype and TP53m-carriers for each of the 256 tetranucleotide end motifs. Gray bars represent the standard deviation, and the GC content of the motif is displayed below. TP53m-carriers display an increased frequency of A/T rich end motifs. C Tukey boxplots of Shannon entropy scores calculated using fragment end-motif frequencies. TP53m-carriers exhibit decreased diversity scores compared to TP53-wildtype samples. D Dot plot showing the median difference between the frequency of each nucleotide up and downstream from fragment cut sites. Error bars represent standard deviation and dot size is representative of p-value. TP53m-carriers display increased frequency of A/T and decreased frequency of C/G nucleotides surrounding fragment cut sites. p-values calculated using two-sided Student’s t-test. * = p-value < 0.05, ** = p-value < 0.01, *** = p-value < 0.001. Exact p-values are provided in Supplementary Data 2. Source data are provided as a Source Data file.
Based on the finding that cfDNA fragmentation occurs at specific nucleotide contexts up and downstream of the breakpoint, we expanded our analysis to +/− 10 nucleotides from the 5’ cfDNA cut site (breakpoint). While we observed similar nucleotide preferences at each position, relative to the breakpoint (Supplementary Fig. 9E), TP53m-carriers exhibited a 1.98–2.36x higher standard deviation and an increased frequency of A/T and decreased frequency of C/G nucleotides surrounding the cleavage site (Chi-squared p-value < 0.001; Fig. 4D), consistent with our nucleosome footprint, dinucleotide frequency, and end-motif findings.
A previous study has described an increase in A/T end-motifs in patients with germline pathogenic mutations in DNASE1L333. However, we did not observe significant differences in DNASE1L3-associated end-motifs between TP53m-carriers and TP53-wildtype (Supplementary Fig. 10A) and low correlation between TP53m-carriers and individuals with heterozygous (DNASE1L3+/-; Pearson’s R2 = 0.019) and homozygous (DNASE1L3-/-; Pearson’s R2 = 0.006) germline variants (Supplementary Fig. 10B–D)33. Similarly, breakpoint analysis of DNASE1L3 deficient individuals also showed low correlation to TP53m-carriers (Pearson’s R2: 0.001 – 0.410) suggesting that the differences we observe in TP53m-carriers are not due to dysregulation of DNASE1L3 (Supplementary Fig. 10E).
Differences in fragmentation driven by functional germline mutation class
Chromatin is 3-dimensionally arranged in relation to the genome as a mechanism of regulating gene expression and cell identity. This 3-dimensional chromatin architecture influences the representation of cfDNA fragments (size and location) in circulation due to the differential accessibility of nucleases throughout the genome10,14. To investigate whether the increased release of short cfDNA fragments in TP53m-carriers is dependent on genomic context, we generated genome-wide fragmentation profiles by calculating the ratio of short (90–150 bp) to long (151–220 bp) cfDNA fragments across 5 Mb bins (n = 512)14. We observed ~1.45x increased standard deviation (1.37–1.66) in the fragmentation profiles of TP53m-carriers, independent of cancer status and functional TP53 mutation class, compared to TP53-wildtype (Fig. 5A and Supplementary Fig. 11A). Hierarchical clustering of samples did not result in any distinct clusters (Fig. 5B).
Fig. 5. Differential genome-wide fragmentation.
A Genome-wide fragment ratio profiles (90-150 bp/151-220 bp) of healthy TP53-wildtype controls and TP53m-carriers separated by cancer status. TP53m-carriers show ~1.4x increased variability compared to TP53-wildtype controls. B Heatmap showing unsupervised clustering of genome-wide fragment ratios. Samples were separated by cancer status. Patient type, cancer history, cancer status, and germline mutation type are shown on the right. Pearson’s correlation scores of TP53m-carriers compared to the healthy TP53-wildtype median shown on the right. The healthy TP53-wildtype median fragmentation profile, differentially fragmented regions ( > 3 standard deviations away from healthy TP53-wildtype median), percent of the LFS cohort greater than 3 standard deviations away from the healthy TP53-wildtype median, and the LFS cohort median are displayed on top. C Heatmap showing the Pearson’s correlation scores comparing the median fragment ratio profile between each functional TP53 mutation class, LFS cohort, and healthy TP53-wildtype controls. Each TP53 mutation class shows a low correlation to the healthy control, moderate correlation to other TP53 mutation classes, and highest correlation to the LFS cohort median. D Barcharts showing the proportion of bins with significantly increased or decreased fragment ratios ( > 3 standard deviations away from healthy TP53-wildtype controls) in each functional TP53 mutation class compared to the healthy TP53-wildtype control median. Source data are provided as a Source Data file.
To determine if fragmentation profiles may be different between functional classes, we compared the median fragmentation profiles across TP53 mutation classes (described above; only cancer negative). All TP53 mutation classes showed low correlation to the TP53-wildtype median (Pearson’s r: 0.22 – 0.36), moderate correlation between classes (Pearson’s r: 0.40 – 0.58), and strongest correlation to the LFS cohort median (Pearson’s r: 0.65 – 0.83; Fig. 5C). The lower correlation observed between TP53 mutation classes compared to the LFS cohort median suggests that within TP53m-carriers, fragmentation profiles may be partially driven by the functional germline mutation class. To assess whether there are specific genomic regions with enrichment or depletion of short cfDNA fragments, we performed differential analysis between cancer negative TP53m-carriers and TP53-wildtype at each 5 Mb bin (n = 512). We observed increased fragmentation in 39/512 (median fold change = 1.20) and decreased fragmentation in 39/512 (median fold change = 0.87) bins (78/512 total; padj <0.05, two-sided Student’s t-test; Fig. 5D). Within chromosomes, we observed an imbalance of increased fragmentation (more short cfDNA fragments) at chromosomes 16 (5/14 bins; Chi-squared p-value < 0.001) and 20 (4/11 bins; Chi-squared p-value = 0.001); and decreased fragmentation (fewer short cfDNA fragments) in chromosome 19 (7/10 bins; Chi-squared p-value < 0.001). Decreased chromosome 19 fragmentation was consistent across TP53 mutation classes, suggesting that there may be general dysregulation of the chromatin architecture of chromosome 19 in TP53m-carriers. Interestingly, individuals with Missense 2 mutations showed less aberrant genome-wide fragmentation (Chi-squared p-value < 0.001) which may contribute to the lack of pediatric cancers diagnosed in Missense 2 patients, suggesting lower penetrance and a later age of onset34. These observations were driven by the ratio of fragment lengths and not by differences in sequencing coverage across each bin (Supplementary Fig. 11B–D).
Altered fragmentation profiles have previously been shown to be driven, in part, by cancer-associated copy number alterations in patients with active cancer14. To determine if the fragmentation profiles we observed in our TP53m-carriers could be associated with cancer-associated copy number signatures, we compared TCGA GISTIC copy number profiles from LFS-associated cancers to fragmentation profiles of TP53m-carriers35. Cancer positive samples with high ctDNA (ichorCNA TF > 0.10) showed moderate correlation (Pearson’s r: 0.20 – 0.48) to cancer type-matched copy number signatures (Supplementary Fig. 11E). In contrast, fragmentation profiles of cancer negative TP53m-carriers showed low correlation (Pearson’s r: −0.08 – 0.10; Supplementary Fig. 11F) suggesting that the fragmentation profiles observed in TP53m-carriers are not derived from or associated with chromosomal instability or somatic copy number alterations. As cfDNA fragmentation is known to be driven by the chromatin architecture of the cell-of-origin, our observations suggest that TP53m-carriers exhibit differences in cfDNA fragmentation that may be driven by epigenetic and transcriptomic differences compared to TP53-wildtype individuals11,12.
Unstable nucleosome positioning at p53 associated sites
Due to the altered fragmentation profiles observations in TP53m-carriers, we postulated whether dysregulation occurred at specific functional areas of the genome. Previous studies have identified differential methylation at p53 binding sites in TP53m-carriers suggesting localized chromatin reorganization12. Using Griffin, we inferred the nucleosome positioning surrounding p53 binding sites using p53 CHIP-seq data from the Gene Transcription Regulation Database (GTRD)36,37. The coverage at the p53 binding site (midpoint coverage) is indicative of binding activity (decreased central coverage = increased accessibility/binding), the amplitude is indicative of the robustness of the nucleosome placement surrounding the binding site within a sample, and the variability (standard deviation) is indicative of the robustness of nucleosome positioning throughout the cohort38. In TP53m-carriers, independent of cancer status, we observed increased accessibility and more robustly placed nucleosomes surrounding p53 binding sites compared to TP53-wildtype (ANOVA p-values: midpoint <0.001, amplitude = 0.003; variability = 1.84x greater SD; Fig. 6A, Supplementary Fig. 12A). Similar observations were also found at the transcription start site (TSS) of p53 target genes (n = 384; ANOVA p-values: midpoint = 0.284, amplitude <0.001; variability = 1.79x greater SD; Fig. 6B, Supplementary Fig. 12B)37. Nucleosome profiles were not significantly different across different TP53 mutation classes (Kruskal-Wallis p-values: p53 binding sites midpoint = 0.576, amplitude = 0.578; TP53 target genes midpoint = 0.435, amplitude = 0.547; Supplementary Fig. 12C–F).
Fig. 6. Altered chromatin accessibility at p53 associated sites.
A Nucleosome positioning tracks (+/−1000bp) for CTCF (top) and p53 (bottom) binding sites split by TP53-wildtype and TP53m-carrier cancer status. Increased variability and decreased central coverage are observed in TP53m-carriers at p53 binding sites. B Nucleosome positioning tracks (+/−1000bp) for the transcription start sites of housekeeping (top) and p53 target (bottom) genes split by TP53-wildtype and TP53m-carrier cancer status. Increased variability and decreased central coverage are observed in TP53m-carriers at the transcription start sites of p53 target genes. Figures show median +/− 1 standard deviation. Source data are provided as a Source Data file.
While nucleosome placement and positioning were consistent at evolutionarily conserved CCCTC-binding factor (CTCF) sites39 and the TSS of 3,804 previously reported housekeeping genes40, we did observe increased accessibility at these sites in TP53m-carriers (Supplementary Fig. 12A, B). To determine if this increase in accessibility may be due to increased degradation of inherently shorter cfDNA fragments, we calculated the lengths of fragments overlapping hematopoietic-associated open (n = 251,586)41 and closed (n = 90,000)42 chromatin sites (Supplementary Fig. 12G). At open-associated chromatin sites, we observed a steep drop-off of fragments less than 90 bp in TP53-wildtype which was not observed in TP53m-carriers (Supplementary Fig. 12H). TP53m-carriers also showed a decreased fraction of total reads spanning both open and closed chromatin sites (Supplementary Fig. 12I). Read coverage of cfDNA within a genomic region is dependent on the activity and accessibility of nucleases at that specific site15. Thus, the combination of decreased coverage and a higher prevalence of shorter cfDNA fragments suggests that the increased accessibility observed in TP53m-carriers may be due to increased fragmentation and degradation.
Increased chromatin accessibility at developmental and cancer-associated loci
Based on the genome-wide dysregulation of cfDNA fragmentation observed in TP53m-carriers, we hypothesized whether the dysregulated nucleosome positioning observed at p53-associated sites may be extended to other functional areas. To start, we inferred the nucleosome positioning surrounding the top 10,000 binding sites (meta peaks) for 335 transcription factors (TFs) available in the Gene Transcription Regulation Database (GTRD)36,43. Differential midpoint and amplitude analysis identified 18 (all decreased) and 8 (all increased) TFs, respectively (p-adjusted <0.01, change > 1 SD [standard deviation]; Fig. 7A and Supplementary Fig. 13A). p53 was amongst the TFs found to have decreased midpoint as explored above (increased accessibility). TFs with increased chromatin accessibility, in TP53m-carriers, were associated with embryonic organ development (GATA3, GATA4, GATA6, PRDM1, ASCL2, TCF7, and NKX3.1) which is consistent with previous observations in LFS-derived osteoblasts compared to TP53-wildtype11. To further validate these observations, we compared the transcriptional profiles of TP53-inactive and TP53-intact cancers across 26 cohorts in The Cancer Genome Atlas (TCGA)44 and in LFS-derived and sporadic osteosarcomas and glioblastoma45. In both studies, we observed an enrichment in developmental and transcription factor gene sets in TP53 inactive and LFS-derived tumors compared to TP53 wildtype and non-LFS derived tumors (Supplementary Fig. 13B/C). Comparing LFS-H and LFS-AC samples, we identified 8 and 4 TFs with decreased and increased midpoint, respectively (ΔSD > 0.5; Fig. 7B). TFs with increased chromatin accessibility (decreased midpoint) included oncogenic factors (ETV2, HES1) and regulators of development (POU5F1, GRHL2). GRHL2, an important factor for epithelial differentiation and carcinogenesis was also found to have increased accessibility in patients with epithelial cancer types (prostate, breast) supporting an increased contribution of cfDNA derived from epithelial-like (carcinoma) cells (Supplementary Fig. 13D)46.
Fig. 7. Altered chromatin accessibility at development and cancer-associated sites.
A Volcano of transcription factors with differential nucleosome positioning midpoints (left) and amplitudes (right) comparing cancer negative TP53m-carriers to healthy TP53-wildtype controls. p-values were calculated using Student’s two-sided t-test and corrected for multiple comparisons. B Dotplot showing changes in transcription factors with differential midpoint coverage between cancer negative and cancer positive TP53m-carriers. C Nucleosome positioning tracks showing chromatin accessibility at prostate cancer (top) and bladder cancer (bottom) associated open chromatin sites. Samples from patients with active matched cancers are also displayed in each respective plot (blue and yellow). The respective cohort medians are displayed in black +/− 1 standard deviation. D Midpoint coverage and amplitude from nucleosome positioning tracks at cancer-associated open chromatin sites from an array of LFS-associated cancer cohorts obtained from the TCGA. TP53m-carriers exhibit decreased midpoint coverage across cancer types. p-values calculated using two-sided Student’s t-test. * = p-value < 0.05, ** = p-value < 0.01, *** = p-value < 0.001. Exact p-values are provided in Supplementary Data 2. Source data are provided as a Source Data file.
Considering cancer is a disease that often hijacks cellular developmental programs, leading to distinct epigenetic signatures, we explored nucleosome positioning at cancer-associated sites using ATAC-seq peaks from 23 TCGA cancer types47. To confirm that these cancer type-specific sites could be detected in the cfDNA, we first investigated the nucleosome profiles of TP53m-carriers with an active cancer-type matched diagnosis. TP53m-carriers with high grade cancers were found to have significantly increased accessibility (Z-score p-values < 0.001) compared to cancer-negative TP53m-carriers and TP53-wildtype (Fig. 7C and Supplementary Fig. 14A). Interestingly, we observed increased accessibility across all TCGA cancer types in cancer-free TP53m-carriers compared to TP53-wildtype with further increases in LFS-AC (Fig. 7D and Supplementary Fig. 14B). This was not observed at DNase hypersensitivity sites associated with normal tissues (Supplementary Fig. 14C) and suggests that TP53m-carriers may exhibit altered chromatin architecture at sites involved with development and oncogenesis.
Classification of TP53 status using fragmentomic features
Based on the differences in cfDNA fragmentation explored above, we investigated whether an integrated analysis could differentiate between 1) TP53m-carriers and TP53-wildtype and 2) cancer negative and cancer positive TP53m-carriers. First, we tested an array of classifiers on each fragmentation metric (fragment length, end-motif, breakpoint, nucleosome footprint, dinucleotide frequency, fragmentation ratios, nucleosome positioning [TFBS], nucleosome positioning [TCGA/DHS]) to determine the optimal performing algorithm (Supplementary Fig. 15A, B).
In our TP53-wildtype versus TP53m-carrier comparison, we only used cancer negative samples to exclude cancer-associated bias in the model. Next, using the optimal algorithm for each metric (Supplementary Table 1), we trained a classifier by performing 100 iterations of down-sampling followed by nested 10-fold cross validation (90% training, 10% validation) for each iteration (Supplementary Fig. 15C). An additional 20% of the cohort, split proportionally between groups, was held back as a test cohort. In our TP53-wildtype versus TP53m-carrier analysis, we achieved AUC-ROC values ranging between 0.701–1.000 (95% CI = 0.685–1.000) in the validation and 0.726–1.000 (95% CI = 0.715 – 1.000) in the test cohorts (Fig. 8A and Supplementary Fig. 15D). To further investigate the AUC-ROC of 1.000 achieved using end-motifs, and to rule out sequencing bias, we trained a model using only samples from sequencing runs that contained a mixture of TP53-wildtype (n = 22) and cancer negative TP53m-carriers (n = 22). This model also achieved 100% sensitivity and specificity (AUC = 1.000; Supplementary Fig. 15E). An integrated model (excluding fragment end-motif) resulted in an AUC = 0.935 (95% CI = 0.929–0.941) in the validation and 0.983 (95% CI = 0.981–0.985) in the test cohorts. Performance between validation and test cohorts, across all fragmentomic features, was comparable (AUC difference = −0.031 – 0.223; Supplementary Fig. 15F).
Fig. 8. Fragmentomic classification of TP53 and cancer status.
A ROC curves for each fragmentomic feature. Models were trained using TP53-wildtype controls and cancer-free TP53m-carriers to classify LFS from non-LFS. AUC values with 95% confidence intervals are displayed and color-matched to their respective curve. B Longitudinal integrated fragmentation scores of patients that transitioned clinically from cancer negative to cancer positive or vice versa (phenoconverter). Patient type, cancer status, cancer history, cancer stage, and cancer type are displayed. C Confusion matrix comparing the rates of cancer detection using cell-free cancer fragmentation. Clinical diagnosis was used as the ground truth. Source data are provided as a Source Data file.
Further analysis of the prediction scores generated by the integrated model (LFS fragmentation score [LFS-FS]) showed no differences between TP53m-carriers independent of cancer history or germline TP53 class (Supplementary Fig. 16A, B). Although, we did observe higher LFS-FS in pediatric compared to adult TP53m-carriers (p < 0.001; Supplementary Fig. 16C), LFS-FS were not correlated to age in both TP53-wildtype (R2 = 0.06) and TP53m-carriers (R2 = 0.12; Supplementary Fig. 16D). To account for technical variation, we compared LFS-FS across different sequencing runs (Supplementary Fig. 16E) and did not observe correlation with cluster density (R2 = 0.006; Supplementary Fig. 16F). The high performance of our TP53m-carrier vs TP53-wildtype classifier suggests that fragmentomic analysis of cfDNA may be a useful method for diagnosing phenotypic LFS.
Integrated longitudinal analysis for early cancer detection
Next, assessing cancer negative and cancer positive TP53m-carriers, our classifiers achieved AUC-ROC values of 0.549–0.722 (95% CI = 0.540–0.738) in the validation and 0.529–0.641 (95% CI = 0.523–0.651) in the test cohorts (Supplementary Fig. 17A–D). The lack of performance in our cancer status classifier may be due to the inherent variability of fragmentomic features and cancer-like phenotypes observed in TP53m-carriers. Also, as TP53m-carriers are at a high-risk of cancer, some individuals may harbor occult malignancies resulting in fragmentomic features that resemble those of cancer while clinically cancer-free, thus confounding our models.
To determine if, instead of absolute classification, longitudinal analysis was able to reveal kinetics associated with cancer onset, we calculated a cancer fragmentation delta (CFD) using the probability scores from our integrated classifier normalized to each individual’s baseline. To consider the direction of change, we also calculated a cancer fragmentation slope (CFS) between timepoints. First, assessing CFD and CFS in a longitudinal cohort of TP53-wildtype (n = 9), we found the mean CFD and CFS (excluding baselines) to be 0.049 (sd = 0.225) and 0.034 (sd = 0.274), respectively (Supplementary Fig. 18A). Cancer detection limits were set using the 99th percentile of CFD and CFS in TP53-wildtype.
Next, we assessed CFD and CFS in phenoconverter TP53m-carriers (Fig. 8B). Within forward phenoconverters (cancer negative to cancer positive, n = 5), we observed a 6.43x increase in CFD (mean = 0.364, sd = 0.544, Student’s t-test p-value = 0.032) and a 6.24x increase in CFS (mean = 0.246, sd = 0.512, Student’s t-test p-value = 0.101), compared to TP53-wildtype. Using CFD and CFS, we were able to detect cancer prior to clinical diagnosis in 2/5 (LFS15, LFS59) and contemporary to clinical diagnosis in 1/5 (LFS34) patients. In both of the remaining patients, we observed stable CFD at the two timepoints preceding their clinical diagnosis followed by: a decrease following treatment (LFS51); and an increase at the time of clinical diagnosis (metastatic disease - LFS78). In both cases, CFD and CFS were not sensitive enough likely due to existing cancer signal at baseline. This is further supported by the decrease in CFD, to below baseline, following treatment in LFS51, and the high tumor fraction, detected by ichorCNA, preceding clinical diagnosis in LFS78. In reverse phenoconverters (cancer positive to cancer negative), we observed a 4.22x decrease in CFD (mean = −0.158, sd = 0.302, Student’s t-test p-value = 0.024) and a 4.38x decrease in CFS (mean = −0.115, sd = 0.315, Student’s t-test p-value = 0.145) compared to TP53-wildtype.
For TP53m-carriers who developed multiple cancers (n = 2), we constructed temporal timelines (Supplementary Fig. 18B). In LFS5, using CFD and CFS, we were able to detect a cancer signal 1 year preceding an osteosarcoma diagnosis followed by a decrease to baseline after treatment with cisplatin. Additionally, though not above the detection threshold, we observed an increase in CFD 6 months preceding a malignant melanoma diagnosis followed by a decrease after resection.
Next, using CFD and CFS in cancer negative TP53m-carriers (n = 16), we identified five individuals above the cancer detection limits (LFS16, LFS52, LFS45, LFS55, LFS71; Supplementary Fig. 18C). Following review of clinical follow-up data (mean follow-up = 32.9 months, range: 6.2 – 46.9, median = 39.7), we identified a cancer diagnosis either between cancer negative timepoints or following the last timepoint in 3/5 individuals (LFS45, LFS55, LFS71), suggesting two false positive individuals. In addition, we identified two false negative individuals (TP53m-carriers diagnosed with cancer but not detected using CFD/CFS; LFS28, LFS68). Within cancer negative TP53m-carriers that remained cancer free (n = 11), CFD (mean = −0.051, sd = 0.317) and CFS (mean = −0.053, sd = 0.405) were comparable to TP53-wildtype (Student’s t-test p-values: CFD = 0.257, CFS = 0.490).
Together, within our cohort of TP53m-carriers with longitudinal sampling, we found our negative and positive predictive values (NPV/PPV) to be 73.91% and 80.00%, respectively (Supplementary Fig. 18D). This increased to 83.33% (NPV) and 80.95% (PPV) when considering the two forward phenoconverters with evidence of cancer fragmentation at baseline (Fig. 8C). These findings suggest that an integrated fragmentation approach using patient specific baselines may be helpful in the early detection of cancer in TP53m-carriers and may also be extended to other HCS. Future studies with larger cohorts and increased longitudinal sampling may improve the performance by establishing more robust patient-specific baselines.
Discussion
Using a comprehensive integrated fragmentomic analysis, we identify innate differences in the cell-free fragmentation of TP53m-carriers compared to TP53-wildtype (Fig. 9). Interestingly, using fragment end-motifs we were able to achieve 100% specificity and sensitivity between healthy TP53-wildtype and cancer-free TP53m-carriers suggesting that end-motif profiling may be useful for the diagnosis or triaging of patients with LFS phenotypes and family history with no known germline TP53 mutation, or those with a variant of uncertain clinical significance. This would be an important tool to guide and complement clinical surveillance of these patients48.
Fig. 9. Summary of study findings.
Classification of samples from TP53m-carriers and TP53-wildtype using cfDNA fragmentation features. Using these features and longitudinal sampling, cfDNA fragmentation features can also be used to create personalized cancer monitoring. Portions of Fig. 9 created with BioRender.com, released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.
Using fragmentation profiles, we found that functional germline TP53 mutation classes partially drove differences within TP53m-carriers. These differences may be due to differences in chromatin organization dependent on the functional germline mutation and may reflect the differential penetrance and cancer-types associated with different germline mutations49–51. Given a large enough cohort, mutation-specific analyses could be performed to increase granularity. Compared to TP53-wildtype, we observed a consistent, cohort-wide, decrease in short cfDNA fragments across chromosome 19 which has the highest gene density, greatest number of genes involved in single-gene disorders, and is evolutionarily important to mammalian and primate development52,53. Thus, further investigation may provide biological insight into germline mutant TP53-driven tumor etiology in LFS.
Studies have shown that mutant p53 can alter chromatin accessibility in cancer-specific contexts and co-opt chromatin remodeling pathways such as through the SWI/SNF complex54–56. However, little is known about the effects of germline TP53 mutations on normal development and cellular homeostasis. By inferring nucleosome occupancy using cfDNA, we find that nucleosomes at p53 binding sites and the transcription start sites of p53 target genes are not as robustly positioned in TP53-carriers. This may be due to dysregulation of the chromatin architecture at functional p53 sites and potentially p53 function. Similar observations have been found in LFS patient-derived iPSCs upon differentiation11. Globally, we observed an increased prevalence of A/T nucleotides across nucleosome spanning reads and breakpoints suggesting less robust placement of nucleosomes, and may be related to global chromatin restructuring and relaxation57. These observations were less prominent at highly conserved loci such as CTCF binding sites and the TSS of housekeeping genes suggesting that the nucleosome instability may be driven by altered p53-chromatin interactions58,59.
Growing evidence suggests that TP53m-carriers are at a greater risk of cancer not only due to the defective tumor suppressor activity of mutant p53, but are primed for cancer initiation60. Studies have shown that loss of p53 function can lead to cancer-promoting niches through chronic inflammation61, altered metabolism62,63, angiogenesis64, and immune modulation65. We found increased nucleosome accessibility at sites associated with embryonic and developmental transcription factors. However, more interesting, we also observed increased accessibility at cancer-associated open chromatin sites in cancer-free TP53m-carriers which may be indicative of a chromatin state that is primed for the development of cancer. The presence of a cancer-associated signature in the plasma of cancer-free TP53m-carriers may also affect the detection of cancer, such as in this study. Advanced uni-modal fragmentomic features such as fragment end-motif66, fragment ratio67, nucleosome footprint30, and nucleosome positioning68 have been remarkably sensitive in detecting sporadic cancers, even low-grade cancers. However, at baseline, cancer negative TP53m-carriers exhibit fragmentation features associated with sporadic cancer which confound sensitive cancer detection. Using an integrated, multi-modal approach coupled with patient-specific baselines, we demonstrate the ability to detect cancer-associated fragmentation prior to clinical diagnosis through conventional screening modalities (Fig. 9).
As liquid biopsy technologies rapidly develop and are adopted into the clinic, HCS such as LFS, are patient populations that can gain the most benefit. However, while HCS accounts for at least 10% of all cancer diagnoses, each individual syndrome is rare. Therefore, most liquid biopsy studies have opted to focus on common sporadic cancers (breast, prostate, lung) resulting in a dearth of information about the efficacy and landscape of liquid biopsy in HCS. Few studies have also investigated whether innate differences in cell-free biology exist between HCS patients and non-carrier controls. Our study suggests that there are indeed differences, at least in LFS, and in order for liquid biopsy to be successfully adopted into the clinic, these differences need to be fully characterized. HCS or patient-specific baselines should be established, especially for rapidly developing analyses like cfDNA fragmentomics and methylation. Compared to the general population, patient-specific baselines are more feasible in HCS as these patients already undergo routine surveillance, at minimum, annually. The need for HCS-specific baselines will further compound the complexities of clinically implementing advanced liquid biopsy analyses and thus would greatly benefit from fragmentomic data sharing to increase cohort sizes and improve methodology, particularly for rarer cancers encountered in HCS.
Methods
Study design and patiENT COhort
This study was approved by the UHN institutional review board (REB# 19-6239). All patients underwent routine clinical care by board certified clinicians as per the standard-of-care. All samples were collected with informed consent for research directly by the patient for patients >19 years of age or by a legal guardian for patients <19 years of age.
Blood processing and extraction
Venous blood samples were collected in EDTA or Streck tubes (Streck, La Vista, Nebraska). EDTA collection was processed within 2 h. Whole blood samples were centrifuged at 4°C (1900g, 10 min). Peripheral blood mononuclear cells (PBMCs) were separated from plasma and stored at −80°C. Isolated plasma was centrifuged a second time at 4°C (16,000 g, 10 min) to remove residual cells and debris. Purified plasma was stored at −80°C until cfDNA extraction. DNA from plasma was extracted using the QIAGEN QIAamp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany). Full protocol can be found at https://charmconsortium.ca/protocols-database/.
DNA library construction and sequencing
Libraries were prepared using KAPA Hyper Prep Kit (Kapa Biosystems, Wilmington, MA) and xGen Duplex Seq Adapter-Tech Access (Integrated DNA Technologies [IDT], Coralville, IA) adapters and indexes. Libraries were indexed, pooled, and sequenced on the NextSeq 500 using 150-bp paired-end sequencing reads (2×150 bp; Illumina, San Diego, CA) to a target depth of ~1X coverage. UMI, adapter, and index extraction and sequencing alignment to human genome reference GRCh38 was performed using Burrows-Wheelers Aligner version 0.7.1269 and deduplicated using Samtools version 1.9 according to the following workflow: https://github.com/oicr-gsi/bwa. All sequencing and alignment were performed in a CAP/ACD-accredited, CLIA-certified, ISO 15189-compliant laboratory for clinical reporting and research (https://genomics.oicr.on.ca).
For a subset of plasma samples from TP53-wildtype and TP53m-carriers, adapter sequences were removed from libraries previously prepared for Illumina sequencing prior to being prepared using xGen Dual Index UMI Adapters and IDT Dual Index UDI indexes. Libraries were sequenced using single end sequencing reads (Ultima, Newark, CA) to a target depth of 40X (30X minimum). Alignment was performed as described above and adapter and UMI trimming was performed using Samtools (v1.17) to consider only complete inserts.
Telomere analysis
Telomere content was calculated using TelSeq (v0.0.1) (25) and TelomereHunter (v1.1.0) (26) using default settings.
Fragment size analysis
Global fragment size distributions were calculated using Picard CollectInsertSizeMetrics (v4.0.1.2; https://github.com/broadinstitute/picard). Only fragment sizes between 10-600 were kept for downstream analysis. RepeatMasker annotations were downloaded from the UCSC Genome Browser for hg38 (https://genome.ucsc.edu) and separated into the 46 unique repeat elements used for downstream analysis. For each repeat element, reads overlapping with the genomic regions were pulled and fragment size distributions were calculated as above.
Fragment end-motif and breakpoint analysis
BAM files were deduplicated using Samtools (v1.9) and converted into BEDPE format using Bedtools (v2.27.1). Fragment start and end coordinates were extracted from BEDPE format and converted into tetranucleotide (end-motif) or expanded to 15 bp up and downstream (breakpoint) reference FASTA sequences using Bedtools (v2.27.1) and the reference genome (GRCh38). This was done to reduce the potential for bias due to missed adapter sequences and to only analyze fragments that were fully mapped. Reverse compliment sequences were generated for 3’ fragment ends and frequencies were calculated for each of the 256 tetranucleotide motifs70. Plasma whole genome sequencing from DNASE1L3+/- and DNASE1L3-/- patients were obtained from Chan et al.33. For breakpoint analysis, frequencies of each nucleotide at each position were normalized using the genome-wide expected frequency of each nucleotide.
Nucleosome footprint analysis
Healthy control nucleosome peaks were obtained from Snyder et al.10. Deduplicated bam files were converted into BED files with fragment starts and ends using Bedtools (v2.27.1). Fragment starts and ends were then mapped to the closest nucleosome peak and the distance calculated. Only fragments that mapped within 1000 bp from a peak were kept for downstream analysis30.
Dinucleotide analysis
Dinucleotide frequencies were calculated using 167 bp fragments as described in Snyder et al.10. Briefly, BAM files were deduplicated and pruned (167 bp only) using Samtools (v1.9) and converted to BEDPE format using Bedtools (v2.27.1). Fragment start and end coordinates were extracted from BEDPE and reference FASTA sequences were extracted for 167 bp fragments +/− 50 bp flanking using Bedtools (v2.27.1). The frequency of each dinucleotide were calculated along the region (position + 5’ preceding nucleotide) and frequencies were normalized against the expected frequency of each dinucleotide based upon their prevalence across the genome.10.
Fragment ratio profile analysis
Genome-wide fragmentation profiles were generated using version of DELFI adapted for GRCh38.14. Briefly, filtered and blacklist regions were compiled using GRCh38 coordinates. The ratio of short (90-150 bp) over long (151–220 bp) DNA fragments were calculated in 100 kb bins across the genome. GC content correction and read depth were corrected using loess with span ¾ separately for short and long fragments. GC-corrected read counts were compiled into 5 Mb bins, and the averaged ratio of short/long reads were scaled to mean 0 with unit standard deviation. Coverage profiles were generated similarly using the total number of fragments spanning each bin. GISTIC copy number data was downloaded from http://firebrowse.org (Accessed July 6th, 2022).
Nucleosome accessibility analysis
Nucleosome positioning was calculated using the Griffin tool (v0.1.0) (https://github.com/adoebley/Griffin) as described in Doebley et al. using 35 – 220 bp fragments to account for the shorter DNA fragment length of TP53m-carriers. Tissue-specific Dnase hypersensitivity sites were obtained from https://zenodo.org/record/3838751/files/DHS_Index_and_Vocabulary_hg38_WM20190703.txt.gz42. Transcription factor binding sites were obtained from the GTRD37. p53 target genes were obtained from Fischer et al.71. Housekeeping genes were obtained from https://www.tau.ac.il/~elieis/HKG/40. Hematopoietic open chromatin sites were obtained from Satpathy et al.41 and were a consensus of open chromatin areas shared across all hematopoietic cell types. Hematopoietic closed chromatin sites were collected by filtering out hematopoietic open chromatin sites from tissue-specific open chromatin sites (neural, musculoskeletal, endothelial/vascular, cardiac, renal, stromal) and using only the top 10,000 sites from each tissue-type with the strongest enrichment.
Gene set enrichment analysis
For gene expression analysis, we utilized RNA-sequencing expression datasets and the top 500 differentially expressed genes for 28 tumor types comparing TP53 wild-type and TP53 mutant tumors from the Cancer Genome Atlas Research (TCGA)44. Additionally, we also utilized a small cohort of RNA-sequencing data from sporadic and LFS-associated pediatric osteosarcoma and glioblastoma tumors45. Differentially expressed genes were identified using the limma package in R. To perform gene set enrichment analysis, we used the Broad Institute’s Gene Set Enrichment Analysis (GSEA) software comparing TP53 wild-type and TP53 mutant, and LFS-associated and sporadic tumors. Experimental gene expression is evaluated against pre-defined gene sets from the Molecular Signatures Database (MSigDb) to guide statistical analyses and calculate an enrichment score (ES) for each gene set. The enriched gene set scores are then normalized for variation in gene set size (NES). The default parameters were used: gene sets containing more than 15 genes and fewer than 500 genes. Immunologic human gene sets (C7) were excluded.
Machine learning classification
Classifier training was performed using the R package caret (v6.0-92)72 and ROC analysis was performed using the R package pROC (v1.18.0)73. An array of machine learning algorithms (logistic regression [glm], k-nearest neighbor [knn], support vector machine [svmRadial], random forest [rf], and gradient boosted machine [gbm]) were tested on each fragmentomic metric to determine the optimal algorithm. For each comparison (Healthy TP53-wildtype controls versus LFS cancer-free; LFS cancer-free versus LFS active cancer), data were first downsampled 10 times to produce 10 balanced datasets. Each algorithm was then evaluated on the downsampled datasets by 10-fold cross-validation (10 iterations of 90% train, 10% test). Performance was measured using the mean kappa metric and 95% confidence interval for each algorithm over the 10 balanced datasets and the optimal algorithm chosen. Using the optimal algorithm for each metric (Supplementary Table 1), data were again downsampled to create a balanced dataset and then further split into 10 folds (90% training, 10% test). Each fold underwent nested 10-fold cross-validation. Downsampling was reiterated over 100 times and prediction results were aggregated for each fold and downsampling iteration. 20% of the initial cohort, split proportionally, was held back as a test dataset. ROC, sensitivity, and specificity were calculated using the aggregated outputs. This was performed for both LFS (cancer-free) versus healthy TP53-wildtype controls and LFS cancer-free versus LFS active cancer. The integrated classifier (ensemble method) was trained using the prediction outputs (probabilities) for each individual metric using the same method explained above.
Healthy TP53-wildtype control cohort and statistical analyses
Healthy blood controls were consented to and recruited with institutional approval (REB#: 19-6239) and underwent shallow WGS (n = 30). All healthy TP53-wildtype control data were aligned to GRCh38 as described above and processed according to GATK best practices. Healthy TP53-wildtype control plasma analyses were performed the same as described above. All downstream analyses were performed using R (v4.0.0) and heatmap figures were generated using the package ComplexHeatmap (v2.13.1)74. Exact p-values for Supplementary Figs. can be found in Supplementary Data 3.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Description of Additional Supplementary Files
Source data
Acknowledgements
This work would not have been possible without the patients and their generous participation in this study. This work was supported by a grants from the Terry Fox Research Institute (grant number #1081) with funds from the Terry Fox Foundation (DM), Canadian Institutes for Health Research (DM; CIHR-159453), TD Ready Challenge (RHK and TJP), and the McLaughlin Centre at the University of Toronto (RHK and TJP). This study was a collaboration with the CHARM consortium (https://charmconsortium.ca) and is performed under the auspices of the LIBERATE study (NCT 03702309), which is an institutional liquid biopsy program at the University Health Network supported by the BMO Financial Group Chair in Precision Cancer Genomics (Chair held by Dr. Lillian Siu). This work was supported in part by the Canadian Institutes for Health Research Foundation Scheme Grant #143234 (DM) and the Garron Family Cancer Centre through funds from SickKids Foundation (DM). Additional funding support for this project was made possible by the Shar Foundation, FDC Foundation, Soccer for Hope, Princess Margaret Cancer Foundation and the Ontario Institute for Cancer Research (OICR). DW is supported by a Princess Margaret Cancer Center Fellowship, a Princess Margaret Cancer Digital Intelligence SPARK Award, a Canadian Institutes of Health Research Fellowship, and a Children’s Tumor Foundation Young Investigator Award. EE is supported by a Canadian Cancer Society Research Training Award. RHK is supported by The Bhalwani Family Charitable Foundation, Goldie R. Feldman, Karen Green and George Fischer Genomics and Genetics Fund, Lindy Green Family Foundation, the Devine/Sucharda Charitable Foundation, Hal Jackman Foundation, Nicol Family Foundation, Belbeck-Fukakusa Family Foundation, Kamin Foundation, and the Princess Margaret Cancer Foundation. TJP holds the Canada Research Chair in Translational Genomics and is supported by a Senior Investigator Award from the Ontario Institute for Cancer Research and the Gattuso-Slaight Personalized Cancer Medicine Fund. DM holds the CIBC Children’s Foundation Chair in Child Health Research. We thank the staff of the OICR Genomics Program (https://genomics.oicr.on.ca) for their expertize in generating the sequencing data used in this study. OICR is supported by funding provided by the Government of Ontario. We thank all of the staff involved at Ultima Genomics for their support in sequencing and analysis. The results published here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga. We thank members of the Pugh, Kim, and Malkin Lab for their extensive peer editing and feedback to improve the manuscript. Lastly, we particularly thank all of the Li-Fraumeni syndrome family members who contributed samples for the study.
Author contributions
DW and TJP designed and supervised the study. DW and TJP synthesized and interpreted data and drafted the manuscript and figures. DW performed bioinformatic analyses. PL helped with machine learning models. MT, EE, and JB performed validation studies. NWF, BL, and VS provided functional LFS studies and support. LO and HG provided clinical support and data abstraction. DM and RHK recruited patients, coordinated specimen collection, and synthesized and interpreted data. SD, RK, AV, and AS provided additional patient samples and support.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
The sequencing data generated in this study have been deposited in the European Genome-Phenome Archive (EGA) database under accession code EGAS00001006539. The datasets are available under restricted access in compliance with patient consent for data sharing, access can be obtained by approval from the University Health Network data access committee (Contact person: Natalie Stickle, Email: natalie.stickle@uhn.ca). The remaining data, including de-identified clinical data, are available within the Article, Supplementary Information, the Zenodo database under the accession code 10.5281/zenodo.7448380, or Source Data file. Source data are provided with this paper.
Code availability
Custom code for analysis and producing visualization of the paper can be accessed via the project github repository https://github.com/pughlab/TGL49_CHARM_LFS_Fragmentomics or through Zenodo under the accession number 10.5281/zenodo.12638137. Code to reproduce analyses are available at github https://github.com/pughlab/fragmentomics or through Zenodo under the accession number 10.5281/zenodo.12638261.
Competing interests
DW reports funding support from the Princess Margaret Cancer Foundation, the Canadian Institutes for Health Research, and the Children’s Tumor Foundation. EE reports funding support from the Canadian Cancer Society. RHK reports grants from the Princess Margaret Cancer Foundation, the Canadian Institutes for Health Research, TD Ready Challenge, and McLaughlin Centre for Molecular Medicine. DAM reports consultancy/advisory board for ymAbs Therapeutics, EUSA Pharma and Clarity Pharmaceuticals (compensated). TJP reports grants from Terry Fox Research Institute, Canadian Institutes for Health Research, TD Ready Challenge, and MacLaughlin Centre at the University of Toronto during the conduct of the study; consultation for Illumina, AstraZeneca, Merck, Chrysalis Biomedical Advisors, SAGA Diagnostics, and the Canadian Pension Plan Investment Board (compensated); and receives research support (institutional) from Roche/Genentech. No disclosures were reported by the other authors.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Raymond H. Kim, Email: Raymond.Kim@uhn.ca
David Malkin, Email: david.malkin@sickkids.ca.
Trevor J. Pugh, Email: trevor.pugh@utoronto.ca
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-51529-w.
References
- 1.Vogelstein, B., Lane, D. & Levine, A. J. Surfing the p53 network. Nature408, 307–310 (2000). 10.1038/35042675 [DOI] [PubMed] [Google Scholar]
- 2.Malkin, D. et al. Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas, and other neoplasms. Science250, 1233–1238 (1990). 10.1126/science.1978757 [DOI] [PubMed] [Google Scholar]
- 3.Guha, T. & Malkin, D. Inherited TP53 Mutations and the Li–Fraumeni Syndrome. Cold Spring Harb. Perspect. Med7, a026187 (2017). 10.1101/cshperspect.a026187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mai, P. L. et al. Risks of first and subsequent cancers among TP53 mutation carriers in the National Cancer Institute Li-Fraumeni syndrome cohort: Cancer Risk in TP53 Mutation Carriers. Cancer122, 3673–3681 (2016). 10.1002/cncr.30248 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Malkin, D. Li-Fraumeni Syndrome. Genes Cancer2, 475–484 (2011). 10.1177/1947601911413466 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kamihara, J., Rana, H. Q. & Garber, J. E. Germline TP53 mutations and the changing landscape of li-fraumeni syndrome. Hum. Mutat.35, 654–662 (2014). 10.1002/humu.22559 [DOI] [PubMed] [Google Scholar]
- 7.Zhu, D. et al. Circulating cell-free DNA fragmentation is a stepwise and conserved process linked to apoptosis. BMC Biol.21, 253 (2023). 10.1186/s12915-023-01752-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer17, 223–238 (2017). 10.1038/nrc.2017.7 [DOI] [PubMed] [Google Scholar]
- 9.Paramathas, S., Guha, T., Pugh, T. J., Malkin, D. & Villani, A. Considerations for the use of circulating tumor DNA sequencing as a screening tool in cancer predisposition syndromes. Pediatr. Blood Cancer67, e28758 (2020). 10.1002/pbc.28758 [DOI] [PubMed] [Google Scholar]
- 10.Snyder, M. W., Kircher, M., Hill, A. J., Daza, R. M. & Shendure, J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell164, 57–68 (2016). 10.1016/j.cell.2015.11.050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee, D.-F. et al. Modeling Familial Cancer with Induced Pluripotent Stem Cells. Cell161, 240–254 (2015). 10.1016/j.cell.2015.02.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Samuel, N. et al. Genome-wide DNA methylation analysis reveals epigenetic dysregulation of microRNA-34A in TP53 -associated cancer susceptibility. JCO34, 3697–3704 (2016). 10.1200/JCO.2016.67.6940 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wong, D. et al. Early cancer detection in li-fraumeni syndrome with cell-free DNA. Cancer Discov.10.1158/2159-8290.CD-23-0456 (2023). [DOI] [PMC free article] [PubMed]
- 14.Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature570, 385–389 (2019). 10.1038/s41586-019-1272-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhu, G. et al. Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden. Nat. Commun.12, 2229 (2021). 10.1038/s41467-021-22463-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Szymanski, J. J. et al. Cell-free DNA ultra-low-pass whole genome sequencing to distinguish malignant peripheral nerve sheath tumor (MPNST) from its benign precursor lesion: A cross-sectional study. PLoS Med18, e1003734 (2021). 10.1371/journal.pmed.1003734 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Almogy, G. et al. Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform. 10.1101/2022.05.29.493900 (2022).
- 18.Beck, J., Urnovitz, H. B., Riggert, J., Clerici, M. & Schütz, E. Profile of the Circulating DNA in Apparently Healthy Individuals. Clin. Chem.55, 730–738 (2009). 10.1373/clinchem.2008.113597 [DOI] [PubMed] [Google Scholar]
- 19.Chandrananda, D., Thorne, N. P. & Bahlo, M. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA. BMC Med Genomics8, 29 (2015). 10.1186/s12920-015-0107-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tiwari, B., Jones, A. E. & Abrams, J. M. Transposons, p53 and Genome Security. Trends Genet.34, 846–855 (2018). 10.1016/j.tig.2018.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ulz, P. et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat. Genet48, 1273–1278 (2016). 10.1038/ng.3648 [DOI] [PubMed] [Google Scholar]
- 22.Trkova, M., Prochazkova, K., Krutilkova, V., Sumerauer, D. & Sedlacek, Z. Telomere length in peripheral blood cells of germlineTP53 mutation carriers is shorter than that of normal individuals of corresponding age. Cancer110, 694–702 (2007). 10.1002/cncr.22834 [DOI] [PubMed] [Google Scholar]
- 23.Tabori, U., Nanda, S., Druker, H., Lees, J. & Malkin, D. Younger age of cancer initiation is associated with shorter telomere length in li-fraumeni syndrome. Cancer Res.67, 1415–1418 (2007). 10.1158/0008-5472.CAN-06-3682 [DOI] [PubMed] [Google Scholar]
- 24.Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci. Transl. Med10, eaat4921 (2018). 10.1126/scitranslmed.aat4921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res.42, e75–e75 (2014). 10.1093/nar/gku181 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Feuerbach, L. et al. TelomereHunter – in silico estimation of telomere content and composition from cancer genomes. BMC Bioinforma.20, 272 (2019). 10.1186/s12859-019-2851-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kato, S. et al. Understanding the function–structure and function–mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc. Natl Acad. Sci. USA.100, 8424–8429 (2003). 10.1073/pnas.1431692100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet50, 1381–1387 (2018). 10.1038/s41588-018-0204-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yuwono, N. L., Warton, K. & Ford, C. E. The influence of biological and lifestyle factors on circulating cell-free DNA in blood plasma. eLife10, e69679 (2021). 10.7554/eLife.69679 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Vanderstichele, A. et al. Nucleosome footprinting in plasma cell-free DNA for the pre-surgical diagnosis of ovarian cancer. npj Genom. Med.7, 30 (2022). 10.1038/s41525-022-00300-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gaffney, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet8, e1003036 (2012). 10.1371/journal.pgen.1003036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Han, D. S. C. & Lo, Y. M. D. The nexus of cfDNA and nuclease biology. Trends Genet.37, 758–770 (2021). 10.1016/j.tig.2021.04.005 [DOI] [PubMed] [Google Scholar]
- 33.Chan, R. W. Y. et al. Plasma DNA profile associated with DNASE1L3 gene mutations: clinical observations, relationships to nuclease substrate preference, and in vivo correction. Am. J. Hum. Genet.107, 882–894 (2020). 10.1016/j.ajhg.2020.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Penkert, J. et al. Genotype–phenotype associations within the Li-Fraumeni spectrum: a report from the German Registry. J. Hematol. Oncol.15, 107 (2022). 10.1186/s13045-022-01332-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet45, 1134–1140 (2013). 10.1038/ng.2760 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Doebley, A.-L. et al. A framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA. Nat. Commun.13, 7475 (2022). 10.1038/s41467-022-35076-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yevshin, I., Sharipov, R., Kolmykov, S., Kondrakhin, Y. & Kolpakov, F. GTRD: a database on gene transcription regulation—2019 update. Nucleic Acids Res.47, D100–D105 (2019). 10.1093/nar/gky1128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ulz, P. et al. Inference of transcription factor binding from cell-free DNA enables tumor subtype prediction and early detection. Nat. Commun.10, 4666 (2019). 10.1038/s41467-019-12714-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kim, T. H. et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell128, 1231–1245 (2007). 10.1016/j.cell.2006.12.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Eisenberg, E. & Levanon, E. Y. Human housekeeping genes, revisited. Trends Genet.29, 569–574 (2013). 10.1016/j.tig.2013.05.010 [DOI] [PubMed] [Google Scholar]
- 41.Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol.37, 925–936 (2019). 10.1038/s41587-019-0206-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature584, 244–251 (2020). 10.1038/s41586-020-2559-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kolmykov, S. et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Res.49, D104–D111 (2021). 10.1093/nar/gkaa1057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Donehower, L. A. et al. Integrated analysis of TP53 gene and pathway alterations in the cancer genome atlas. Cell Rep.28, 1370–1384.e5 (2019). 10.1016/j.celrep.2019.07.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Villani, A. et al. The clinical utility of integrative genomics in childhood cancer extends beyond targetable mutations. Nat. Cancer4, 203–221 (2022). 10.1038/s43018-022-00474-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Frisch, S. M., Farris, J. C. & Pifer, P. M. Roles of grainyhead-like transcription factors in cancer. Oncogene36, 6067–6073 (2017). 10.1038/onc.2017.178 [DOI] [PubMed] [Google Scholar]
- 47.Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science362, eaav1898 (2018). 10.1126/science.aav1898 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Villani, A. et al. Biochemical and imaging surveillance in germline TP53 mutation carriers with Li-Fraumeni syndrome: 11 year follow-up of a prospective observational study. Lancet Oncol.17, 1295–1305 (2016). 10.1016/S1470-2045(16)30249-2 [DOI] [PubMed] [Google Scholar]
- 49.Kratz, C. P. et al. Cancer screening recommendations for individuals with li-fraumeni syndrome. Clin. Cancer Res23, e38–e45 (2017). 10.1158/1078-0432.CCR-17-0408 [DOI] [PubMed] [Google Scholar]
- 50.de Andrade, K. C. et al. Cancer incidence, patterns, and genotype–phenotype associations in individuals with pathogenic or likely pathogenic germline TP53 variants: an observational cohort study. Lancet Oncol.22, 1787–1798 (2021). 10.1016/S1470-2045(21)00580-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rana, H. Q. et al. Genotype–phenotype associations among panel-based TP53+ subjects. Genet. Med.21, 2478–2484 (2019). 10.1038/s41436-019-0541-y [DOI] [PubMed] [Google Scholar]
- 52.Grimwood, J. et al. The DNA sequence and biology of human chromosome 19. Nature428, 529–535 (2004). 10.1038/nature02399 [DOI] [PubMed] [Google Scholar]
- 53.Harris, R. A., Raveendran, M., Worley, K. C. & Rogers, J. Unusual sequence characteristics of human chromosome 19 are conserved across 11 nonhuman primates. BMC Evol. Biol.20, 33 (2020). 10.1186/s12862-020-1595-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dhaka, B. & Sabarinathan, R. Differential chromatin accessibility landscape of gain-of-function mutant p53 tumours. BMC Cancer21, 669 (2021). 10.1186/s12885-021-08362-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zhu, J. et al. Gain-of-function p53 mutants co-opt chromatin pathways to drive cancer growth. Nature525, 206–211 (2015). 10.1038/nature15251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Pfister, N. T. et al. Mutant p53 cooperates with the SWI/SNF chromatin remodeling complex to regulate VEGFR2 in breast cancer cells. Genes Dev.29, 1298–1315 (2015). 10.1101/gad.263202.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sun, K. et al. Size-tagged preferred ends in maternal plasma DNA shed light on the production mechanism and show utility in noninvasive prenatal testing. Proc. Natl Acad. Sci. USA.115, E5106–E5114 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Nishimura, M., Arimura, Y., Nozawa, K. & Kurumizaka, H. Linker DNA and histone contributions in nucleosome binding by p53. J. Biochem.168, 669–675 (2020). 10.1093/jb/mvaa081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Yu, X. & Buck, M. J. Defining TP53 pioneering capabilities with competitive nucleosome binding assays. Genome Res29, 107–115 (2019). 10.1101/gr.234104.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Pantziarka, P. Primed for cancer: Li Fraumeni Syndrome and the pre-cancerous niche. ecancer9, 541 (2015). 10.3332/ecancer.2015.541 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Schwitalla, S. et al. Loss of p53 in enterocytes generates an inflammatory microenvironment enabling invasion and lymph node metastasis of carcinogen-induced colorectal tumors. Cancer Cell23, 93–106 (2013). 10.1016/j.ccr.2012.11.014 [DOI] [PubMed] [Google Scholar]
- 62.Zhang, C. et al. Tumour-associated mutant p53 drives the Warburg effect. Nat. Commun.4, 2935 (2013). 10.1038/ncomms3935 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Wang, P.-Y. et al. Increased oxidative metabolism in the li–fraumeni syndrome. N. Engl. J. Med368, 1027–1032 (2013). 10.1056/NEJMoa1214091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Dameron, K. M., Volpert, O. V., Tainsky, M. A. & Bouck, N. Control of angiogenesis in fibroblasts by p53 regulation of thrombospondin-1. Science265, 1582–1584 (1994). 10.1126/science.7521539 [DOI] [PubMed] [Google Scholar]
- 65.Huang, Y. et al. p53 regulates mesenchymal stem cell-mediated tumor suppression in a tumor microenvironment through immune modulation. Oncogene33, 3830–3838 (2014). 10.1038/onc.2013.355 [DOI] [PubMed] [Google Scholar]
- 66.Budhraja, K. K. et al. “Genome-wide analysis of aberrant position and sequence of plasma DNA fragment ends in patients with cancer”. Sci. Transl. Med. 15, eabm6863 10.1126/scitranslmed.abm6863. [DOI] [PMC free article] [PubMed]
- 67.Mathios, D. et al. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat. Commun.12, 5060 (2021). 10.1038/s41467-021-24994-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Doebley, A.-L. et al. A framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA. Nat. Commun. 13, 7475 10.1038/s41467-022-35076-w. [DOI] [PMC free article] [PubMed]
- 69.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio] (2013).
- 70.Jiang, P. et al. Plasma DNA end motif profiling as a fragmentomic marker in cancer, pregnancy and transplantation. Cancer Discov, CD-19-0622 (2020). [DOI] [PubMed]
- 71.Fischer, M. Census and evaluation of p53 target genes. Oncogene36, 3943–3956 (2017). 10.1038/onc.2016.502 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Soft. 28 (2008).
- 73.Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma.12, 77 (2011). 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Gu, Z. Complex heatmap visualization. iMeta1, e43 (2022). 10.1002/imt2.43 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Files
Data Availability Statement
The sequencing data generated in this study have been deposited in the European Genome-Phenome Archive (EGA) database under accession code EGAS00001006539. The datasets are available under restricted access in compliance with patient consent for data sharing, access can be obtained by approval from the University Health Network data access committee (Contact person: Natalie Stickle, Email: natalie.stickle@uhn.ca). The remaining data, including de-identified clinical data, are available within the Article, Supplementary Information, the Zenodo database under the accession code 10.5281/zenodo.7448380, or Source Data file. Source data are provided with this paper.
Custom code for analysis and producing visualization of the paper can be accessed via the project github repository https://github.com/pughlab/TGL49_CHARM_LFS_Fragmentomics or through Zenodo under the accession number 10.5281/zenodo.12638137. Code to reproduce analyses are available at github https://github.com/pughlab/fragmentomics or through Zenodo under the accession number 10.5281/zenodo.12638261.