Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Jul 11:2023.07.10.548264. [Version 1] doi: 10.1101/2023.07.10.548264

Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment

Marcos Díaz-Gay 1,2,3,^, Raviteja Vangara 1,2,3,^, Mark Barnes 1,2,3, Xi Wang 1,2,3, S M Ashiqul Islam 1,2,3, Ian Vermes 4, Nithish Bharadhwaj Narasimman 1,2,3, Ting Yang 1,2,3, Zichen Jiang 1,2,3, Sarah Moody 5, Sergey Senkin 6, Paul Brennan 6, Michael R Stratton 5, Ludmil B Alexandrov 1,2,3,*
PMCID: PMC10369904  PMID: 37502962

Abstract

Analysis of mutational signatures is a powerful approach for understanding the mutagenic processes that have shaped the evolution of a cancer genome. Here we present SigProfilerAssignment, a desktop and an online computational framework for assigning all types of mutational signatures to individual samples. SigProfilerAssignment is the first tool that allows both analysis of copy-number signatures and probabilistic assignment of signatures to individual somatic mutations. As its computational engine, the tool uses a custom implementation of the forward stagewise algorithm for sparse regression and nonnegative least squares for numerical optimization. Analysis of 2,700 synthetic cancer genomes with and without noise demonstrates that SigProfilerAssignment outperforms four commonly used approaches for assigning mutational signatures. SigProfilerAssignment is freely available at https://github.com/AlexandrovLab/SigProfilerAssignment with a web implementation at https://cancer.sanger.ac.uk/signatures/assignment/.


Somatic mutations accumulate in the genomes of all cells of the human body throughout an individual’s lifetime1,2. These mutations arise from different endogenous and exogenous mutational processes, with each process generating a characteristic pattern of mutations, known as a mutational signature3. By leveraging the vast amounts of high-throughput DNA sequencing data generated over the last two decades, distinct mutational signatures have been elucidated from various cancer types4,5 and normal somatic tissues69. Sets of mutation-type specific reference signatures have been developed and deposited in the Catalogue of Somatic Mutations in Cancer (COSMIC) database4,10, including signatures of single base substitutions (SBSs), doublet base substitutions (DBSs), small insertions and deletions (IDs), and copy number alterations (CNs). In practice, mutational signatures have been widely utilized in research and clinical settings, including for detecting previously unappreciated cancer predispositions11,12, pathogenic classification of germline variants13, clinical management of cancer patients14, and identifying sensitivity to anti-cancer therapeutics15.

There are at least two distinct approaches for analyzing mutational signatures. De novo extraction is an unsupervised machine learning approach that allows identifying the patterns of known and previously unknown mutational signatures16. This type of analysis is predominately used for deriving reference signatures as it requires large cohorts, generally more than 100 samples and, usually, many thousands of samples. In contrast, refitting of mutational signatures is a numerical optimization approach that allows the assignment of known (in most cases, reference) signatures to an individual sample by quantifying the number of mutations attributed to each signature operative in that sample. While refitting cannot identify or quantify the activities of previously unknown mutational signatures, this approach is widely applied to small cohorts and for clinical samples where evaluations are almost exclusively performed for an individual cancer patient17.

In the past decade, multiple tools for refitting known mutational signatures were developed, including deconstructSigs17, MutationalPatterns18,19, sigLASSO20, and SignatureToolsLib5,21. The majority of these tools lack online interface and provide support almost exclusively for substitution signatures. Further, most tools for refitting of known mutational signatures have never been compared and no existing tool supports probabilistically assigning mutational signatures to somatic mutations. To address these limitations, here, we present SigProfilerAssignment, a comprehensive computational tool for assigning mutational signatures to individual samples and individual somatic mutations (Fig. 1). In contrast to other tools, SigProfilerAssignment provides desktop and online support for all types of mutational signatures, including the COSMIC sets of reference SBS, DBS, ID, and CN signatures. In addition to COSMIC reference signatures, SigProfilerAssignment supports assignment of de novo extracted mutational signatures as well as of a user provided set of custom mutational signatures. Our benchmarking based on 2,700 simulated cancer genomes demonstrates that SigProfilerAssignment outperforms other commonly used tools on simulation data with and without noise.

Figure 1. Assigning known mutational signatures to an individual sample and individual mutations with SigProfilerAssignment.

Figure 1.

SigProfilerAssignment supports input data in a standard format (VCF, MAF, or text) and it allows assigning a set of known signatures (e.g., ones from the COSMIC database) to an a) individual sample and b) probabilistically to an individual somatic mutation. Note that the probabilistic assignment of mutational signatures to an individual somatic mutation is only possible if a user provides a list of individual mutations (e.g., VCF file) for the examined sample instead of a mutational vector, as a mutational vector lacks information for individual mutations.

Given a set of known mutational signatures and a set of mutations in a cancer genome, both classified under the same mutational schema3,22, SigProfilerAssignment identifies the number of mutations caused by each signature in that cancer genome (Fig. 1a). Mathematically, a mutational schema can be represented as a finite alphabet Ξ of mutation types containing a total of ξ letters. Here, a mutational signature is defined as a probability mass function with domain the alphabet Ξ.

In vector notation, a mutational signature can be denoted as s=s1,s2,,sξT, where sk,1kξ, is the probability for the mutational signature, s, to cause mutations of type corresponding to the kth letter of the alphabet Ξ. Since a mutational signature is a probability mass function, 0sk1 and k=1ξsk=1. As such, a set of known n mutational signatures can be expressed as a signature matrix, SR+ξ×n, where S=[s1,s2,,sn]. Further, a set of mutations in a cancer genome can be defined as v:ΞN+ξ. In vector notation, a set of mutations in a cancer genome v=v1,v2,,vξT, where vk,1kξ, reflects the number of mutations in that cancer genome of the mutation type corresponding to the kth letter of the alphabet Ξ. SigProfilerAssignment takes as an input a signature matrix, S, and a set of mutations, v, to output a column vector of activities a=a1,a2,,anT, where atN0n,1tn, corresponding to the number of somatic mutations attributed to the tth mutational signature. The underlying assumption of assigning mutational signatures is that the mutations within a sample can be approximated as a superposition of known mutational signatures and their activities:

vSa (1)

Thus, subject to a0, one needs to derive the vector a that best fits the provided input data. To solve this optimization problem, SigProfilerAssignment uses a custom implementation of the forward stagewise algorithm23 and it applies nonnegative least squares (NNLS)24, based on the Lawson-Hanson method24:

mina0vSa22 (2)

The algorithm starts by first computing a minimum relative error, ϵmin=vSa22v22, by deriving the nal nonnegative vector a for the complete set of all reference signatures, S, using equation (2). This minimum error provides the best possible explanation of the data, but it also results in overfitting as all available signatures are utilized. Next, the tool uses steps for removing and adding signatures based on the backward and forward stepwise algorithms, respectively23. First, signatures are removed by employing a backward stepwise algorithm23 (Algorithm 1). Specifically, each signature from the reference signature set, S, is removed iteratively and the remaining signature set, S^, is attributed to the sample v by applying equation (2). The increase in the relative error, ϵj=vs^a22v22ϵmin, due to removing a signature is calculated by removing the jth signature from S. The signature with the least relative increase in error rate is removed from the signature set, S, provided that the increase is less than a specific threshold (default value of 0.01). After the final removal of the signature with least relative error rate increase, the minimum relative error, ϵmin, and the set of signatures, S, are updated to reflect this removal. The removal step is repeated until all signatures satisfying the conditions are removed from S. The removal steps are followed by addition steps based on the forward stepwise algorithm23 (Algorithm 1). Specifically, each of the previously removed reference signatures is added back iteratively to S and the new signature set, S^, is fit for the sample v by applying equation (2). Thus, the decrease in the relative error, ϵl=ϵminvS^a22v22, due to adding a signature is calculated by adding the lth signature to S. The signature with maximum relative decrease of the error rate is added back to the signature set, S, provided that the increase is more than a specific threshold (default value of 0.05). After the final addition of the signature with most relative rate decrease, the minimum relative error, ϵmin, and the set of signatures, S, are updated to reflect this addition. The addition step is repeated until all signatures satisfying the conditions are added back to S. Lastly, the addition and removal steps are repeated until convergence, where no signature is added or removed from the list of signatures (Algorithm 1).

In addition to quantifying the activity of each mutational signature, SigProfilerAssignment also assigns known signatures to individual mutations (Fig. 1b) based on their specific mutational context:

pkt=sktat[Sa]k (3)

where, pkt represents the probability of a mutation corresponding to the 𝑘th letter of the alphabet Ξ being caused by the 𝑡th signature in the sample; skt is the probability of the 𝑡th signature to cause mutation corresponding to the 𝑘th letter of the alphabet Ξ;at is the number of mutations attributed to the 𝑡th mutational signature; and [Sa]k is the value of the 𝑘th element of the vector obtained by the matrix multiplication of the signature matrix, S, and the derived signature activities, a.

To evaluate the performance of SigProfilerAssignment and another four commonly used tools for refitting mutational signatures5,1721, we performed a comparative benchmarking using a previously generated independent synthetic dataset16 (Fig. 2). The dataset encompasses the SBS patterns of 2,700 simulated cancer genomes, corresponding to 300 tumors from nine different cancer types, generated using 21 different COSMIC reference signatures. To emulate a typical refitting of mutational signatures, the complete set of 79 COSMICv3.3 SBS signatures was used as input. The mutational signature activities obtained by each tool were compared against the ground truth activities used to synthetically generate these samples. Three different levels of random noise (0%, 5%, and 10%) were tested to assess the stability of the different algorithms in a real biological context. To evaluate the accuracy of the signature refitting, we calculated sensitivity, specificity, and F1 score (Methods). In addition, we also examined the runtime and memory utilization of each tool.

Figure 2. Benchmarking of SigProfilerAssignment and four other tools for assigning mutational signatures.

Figure 2.

Each tool was evaluated using 2,700 synthetic cancer genomes generated using 21 different COSMIC reference mutational signatures. All COSMICv3.3 signatures were used as the input set of known mutational signatures. a) Three different levels of non-systematic random noise (0%, 5%, and 10%) were used to evaluate the precision (x-axes), sensitivity (y-axes), and F1 scores (harmonic mean of precision and sensitivity; red dotted lines) of each tool. b) Computational benchmarking based on CPU elapsed time (x-axis; log-scaled) and maximum memory usage (y-axis) for each tool.

Our synthetic benchmarking revealed that SigProfilerAssignment outperforms all other approaches for the examined noise levels (Fig. 2a). For 10% random noise, only SigProfilerAssignment obtained an F1 score >0.90. In all cases, SigProfilerAssignment exhibited a high precision while showing an improved sensitivity compared to other approaches (Fig. 2a). In terms of computational performance, SigProfilerAssignment processed the 2,700 samples within 9.6 minutes (0.21 seconds per sample). Only the standard mode of MutationalPatterns generated results substantially faster (Fig. 2b). However, MutationalPatterns’ standard mode exhibited sub-optimal performance, with a significant drop in precision for all noise levels, likely due to overfitting of the input data (Fig 2a)19. This issue has been addressed in the most recent version of MutationalPatterns with the addition of a strict mode18, albeit with a significant computational performance cost (Fig. 2b). Other approaches limit overfitting by implementing different penalties based on the L1 error (viz., sigLASSO)20 or the sum-squared error (viz., deconstructSigs)17, as well as post-hoc filters based on the percentage of the total number of mutations attributed to a given signature (viz., deconstructSigs and SignatureToolsLib)5,17. No significant memory requirements were observed for any of the tools (Fig. 2b).

Assigning mutational signatures to individual samples provides an opportunity to identify the processes responsible for somatic mutations on a sample-by-sample basis. Considering our synthetic benchmarking, SigProfilerAssignment stands out as the most precise and sensitive tool while maintaining high computational performance and bringing novel capabilities. To the best of our knowledge, SigProfilerAssignment represents the first computational tool for assigning signature probabilities to individual mutations, which can allow uncovering the mutational processes responsible for specific driver genomic alterations leading to tumor evolution. SigProfilerAssignment is also the first tool that supports assignment of the recently developed copy number signatures25, which are good predictors of clinical survival25,26.

In summary, SigProfilerAssignment provides a novel computational package and an accessible online interface to accurately assign known mutational signatures to an individual cancer and individual somatic mutations, thus, enabling users to ascertain the mutational processes operative in a cancer genome.

Algorithm 1:

Assigning mutational signatures to samples with SigProfilerAssignment

Input: vN+ξ×1 (a vector corresponding to a set of mutations in a sample) and SR+ξ×1 (a matrix corresponding to a set of n known mutational signatures)
Output: aN+n×1 (the vector reflecting the activities of the n known signatures in sample v)
1: ϵmin, a=calcNNLS(v,S)
Sall=S
2: While FLAG = True:
3: ϵmin, S=removeSignatures(v,S,ϵmin)
4: ϵmin, S=addSignatures(v,Sall,S,ϵmin)
5: Set FLAG = False if S remains constant and there is no addition or removal of signatures
END While
6: ϵmin, a=calcNNLS(v,S)
7: Return a
8: FUNCTION removeSignatures (v,S,ϵmin)
9: While FLAG = True:
10:   For j in 1 to size(S, 2) do // loop from 1 to the total number of signatures in S
11:    Sˆ=S[:,j] // remove the jth signature from S
12:    [j], aj=calcNNLS(v,S^)
  END For
13:   minIndex, minValue=min(ϵ) // find the signature set with least relative error
14:   If (minValueϵmin0.01)
15:    S=S[:,minIndex]
  else
16:    Return minValue, S
  END If
END While
END removeSignatures
17: FUNCTION addSignatures (v,Sall,S,ϵmin)
18: While FLAG = True:
19:   For p in 1 to size(Sall, 2) do // loop from 1 to the total number of signatures in Sall
20:    S^=[S;Sall[:,p]] // add the pth signature from Sall
21:    ϵ[j], aj=calcNNLS(v,S^)
  END For
22:   minIndex, minValue=min(ϵ) // find the signature set with least relative error
23:   If ϵminminValue0.05
24:    S=[S;Sall[:,minValue]]
  else
25:    Return minValue, S
  END If
END While
END addSignatures
26: FUNCTION calcNNLS (v,S)
27: a=nnls(S,v) // Calculating NNLS with the Lawson-Hanson method
28: ϵ=vSa22/v22 // Computing relative error
29: Return ϵ, a
END calcNNLS

ONLINE METHODS

Distribution and Usage

SigProfilerAssignment is distributed as a Python and R package (https://github.com/AlexandrovLab/SigProfilerAssignment and https://github.com/AlexandrovLab/SigProfilerAssignmentR) with support for most operating systems and an extensive documentation at https://osf.io/mz79v/wiki/home/?view_only=5998aee28bd14ef6bda9189184a984bd. In addition, a user-friendly online interface is provided as part of the COSMIC Mutational Signatures website10 at https://cancer.sanger.ac.uk/signatures/assignment/. For compliance with EU and UK specific privacy regulations, the COSMIC website requires free registration prior to using SigProfilerAssignment. This ensures that all uploaded user data are maintained privately and purged properly.

Input data for both desktop and online versions can be provided by mutation calling and segmentation files, depending on the variant class, and is processed internally by SigProfilerMatrixGenerator22,27. The tool supports common formats for SBS, DBS, and ID somatic mutations, including the Variant Call Format (VCF), the Mutation Annotation Format (MAF), and simple text files. Multi-sample segmentation files obtained from ASCAT28, ABSOLUTE29, Sequenza30, FACETS31, Battenberg28, or PURPLE32 are supported for analysis of copy number signatures. In addition, SigProfilerAssignment can use standard mutational matrices, where rows correspond to mutational channels and columns to samples, extracted from the SigProfiler suite of tools16,22,33. Different sequencing assays (whole genome sequencing, whole exome sequencing, and targeted sequencing), species (human, mouse, and rat) , genome builds (GRCh37/38, mm9/10, and rn6), and signatures (default COSMICv3.310, prior COSMIC versions, and custom signature databases) are supported.

The main output of SigProfilerAssignment includes the activity of each known mutational signature for each of the supplied samples, the reconstruction of the original dataset, and the probability of each individual mutation being caused by a specific signature. The latter is not provided when the input file is a mutational vector or mutational matrix as this input format lacks information about individual somatic mutations. Signature activities correspond to the specific numbers of mutations from the original catalog caused by a particular mutational process. Considering these activities, as well as the provided set of known mutational signatures, a reconstruction of the original mutational catalog for each sample is derived. Different accuracy metrics for this reconstruction are outputted by SigProfilerAssignment, including cosine similarity, Kullback–Leibler divergence, Pearson correlation, L1 relative error, and L2 relative error.

The signature assignment results are summarized using three independent visualizations: (i) a bar plot depicting the activities of all mutational signatures within a sample; (ii) a tumor mutational burden (TMB) signature plot showing the activities per mutational signature; and (iii) an individual reconstruction plot per sample, which includes the mutational profiles for both the original and the reconstructed input sample, different accuracy metrics, and the mutational profiles for each of the known mutational signatures assigned to that sample. For the online version of the tool, an interactive heatmap plot, including the signatures’ activities and the samples’ reconstruction accuracies is also provided. Raw data files containing activities, reconstruction metrics, and signature probabilities for individual mutations are generated by the desktop tool and can be downloaded from the online version.

Benchmarking of bioinformatics tools for refitting known mutational signatures

To evaluate the performance of tools for refitting known mutational signatures, we used a standard set of evaluation metrics and compared SigProfilerAssignment with another four commonly used approaches: deconstructSigs17, MutationalPatterns18,19, sigLASSO20, and SignatureToolsLib5,21. Specifically, each tool was applied to 2,700 previously simulated cancer genomes16, corresponding to 300 simulated tumors from nine different cancer types, including: bladder transitional cell carcinoma, esophageal adenocarcinoma, breast adenocarcinoma, lung squamous cell carcinoma, renal cell carcinoma, ovarian adenocarcinoma, osteosarcoma, cervical adenocarcinoma, and stomach adenocarcinoma. The cancer genomes of these samples were simulated using 21 different COSMIC reference signatures. To emulate a typical refitting of mutational signatures, each tool was applied by utilizing the complete set of 79 COSMICv3.3 SBS signatures. After assigning the signatures, the assignment of each signature to each sample was classified as either a true positive (TP), false positive (FP), or false negative (FN) result. A known signature was considered TP if at least one mutation was assigned to the signature by a particular tool and the ground truth activity of the signature was greater than zero. In contrast, a signature was classified as FP when it was assigned by a tool, but the ground truth activity was zero. Lastly, FN results were signatures with ground truth activities above zero that were not assigned any somatic mutation. These standard metrics allowed calculating the precision, sensitivity, and F1 score of each tool per sample, defined as:

Precision=TPTP+FP
Sensitivity=TPTP+FN
F1score=2*Precision*SensitivityPrecision+Sensitivity

These metrics were calculated for each synthetically generated sample and, subsequently, averaged to obtain a final accuracy value for each random noise level (0%, 5%, and 10%). To benchmark the computational performance of the different bioinformatics tools, their CPU elapsed time and peak memory usage were monitored and averaged for the three noise levels.

SigProfilerAssignment v0.0.28 was run using default parameters. deconstructSigs17 v1.8.0 was used with default parameters as indicated in https://github.com/raerose01/deconstructSigs/. MutationalPatterns18 v3.0.1 was run with default parameters independently using its standard and strict modes, corresponding to the fit_to_signatures and fit_to_signatures_strict functions, respectively. The max_delta parameter was fixed to a default value of 0.004 for the strict mode, according to authors’ instructions at https://bioconductor.org/packages/release/bioc/vignettes/MutationalPatterns/inst/doc/Introductio n_to_MutationalPatterns.html. sigLASSO20 v1.1 was used with default parameters (no priors) following the instructions at https://github.com/gersteinlab/siglasso; albeit avoiding the generation of plots for the comparison of the computational performance. SignatureToolsLib5 v2.1.2 was run with global signatures using the Fit function and default parameters as indicated at https://github.com/Nik-Zainal-Group/signature.tools.lib.

ACKNOWLEDGEMENTS

This work was supported by Cancer Research UK Grand Challenge Award C98/A24032, as well as US National Institute of Health grants R01ES030993-01A, R01ES032547, and R01CA269919, and a Packard Fellowship for Science and Engineering to LBA. The funders had no roles in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The computational analyses reported in this manuscript have utilized the Triton Shared Computing Cluster at the San Diego Supercomputer Center of UC San Diego.

Footnotes

COMPETING INTEREST

LBA is a compensated consultant and has equity interest in io9, LLC and Genome Insight. His spouse is an employee of Biotheranostics, Inc. LBA is an inventor of a US Patent 10,776,718 for source identification by non-negative matrix factorization, and he also declares U.S. provisional applications with serial numbers: 63/289,601; 63/269,033; 63/366,392; 63/412,835; 63/483,237; and 63/492,348. All other authors declare no competing interests.

CODE AVAILABILITY

SigProfilerAssignment is developed as a Python package and it is available under a permissive BSD 2-clause license at https://github.com/AlexandrovLab/SigProfilerAssignment and https://pypi.org/project/SigProfilerAssignment/. An R wrapper is also provided using the same license at https://github.com/AlexandrovLab/SigProfilerAssignmentR. SigProfilerAssignment provides support for most operating systems, including Windows, macOS, and Linux-based systems. An online version of the tool, requiring a free registration, is provided as part of the COSMIC Mutational Signatures website at https://cancer.sanger.ac.uk/signatures/assignment/.

DATA AVAILABILITY

All synthetic benchmarking data used in this article are available on FigShare at https://doi.org/10.6084/m9.figshare.20409430 and were originally generated as part of Ref. 16. They are publicly available under the Creative Commons Attribution 4.0 International license.

REFERENCES

  • 1.Stratton M.R., Campbell P.J. & Futreal P.A. The cancer genome. Nature 458, 719–24 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Martincorena I. & Campbell P.J. Somatic mutation in cancer and normal cells. Science 349, 1483–9 (2015). [DOI] [PubMed] [Google Scholar]
  • 3.Alexandrov L.B., Nik-Zainal S., Wedge D.C., Campbell P.J. & Stratton M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3, 246–59 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Alexandrov L.B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Degasperi A. et al. Substitution mutational signatures in whole-genome–sequenced cancers in the UK population. Science 376(2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lee-Six H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019). [DOI] [PubMed] [Google Scholar]
  • 7.Lawson A.R.J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020). [DOI] [PubMed] [Google Scholar]
  • 8.Olafsson S. et al. Somatic Evolution in Non-neoplastic IBD-Affected Colon. Cell 182, 672–684 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yoshida K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tate J.G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 47, D941–D947 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Georgeson P. et al. Evaluating the utility of tumour mutational signatures for identifying hereditary colorectal cancer and polyposis syndrome carriers. Gut 70, 2138–2149 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Grolleman J.E. et al. Mutational signature analysis reveals NTHL1 deficiency to cause a multi-tumor phenotype. Cancer Cell 35, 256–266 (2019). [DOI] [PubMed] [Google Scholar]
  • 13.Georgeson P. et al. Identifying colorectal cancer caused by biallelic MUTYH pathogenic variants using tumor mutational signatures. Nat Commun 13, 3254 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Davies H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat Med 23, 517–525 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Levatić J., Salvadores M., Fuster-Tormo F. & Supek F. Mutational signatures are markers of drug sensitivity of cancer cells. Nature Communications 13, 2926 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Islam S.M.A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom 2, 100179 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rosenthal R., McGranahan N., Herrero J., Taylor B.S. & Swanton C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol 17, 31 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Manders F. et al. MutationalPatterns: the one stop shop for the analysis of mutational processes. BMC Genomics 23, 134 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Blokzijl F., Janssen R., van Boxtel R. & Cuppen E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med 10, 33 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Li S., Crawford F.W. & Gerstein M.B. Using sigLASSO to optimize cancer mutation signatures jointly with sampling likelihood. Nature Communications 11, 3575 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Degasperi A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat Cancer 1, 249–263 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bergstrom E.N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hastie T., Tibshirani R. & Friedman J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, (Springer, 2009). [Google Scholar]
  • 24.Lawson C.L. & Hanson R.J. Solving Least Squares Problems. Journal of the American Statistical Association 72, 930–931 (1977). [Google Scholar]
  • 25.Steele C.D. et al. Signatures of copy number alterations in human cancer. Nature 606, 984–991 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Drews R.M. et al. A pan-cancer compendium of chromosomal instability. Nature 606, 976–983 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Khandekar A. et al. Visualizing and exploring patterns of large mutational events with SigProfilerMatrixGenerator. bioRxiv, 2023.02.03.527015 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Van Loo P. et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A 107, 16910–5 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Carter S.L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 30, 413–21 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Favero F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann Oncol 26, 64–70 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shen R. & Seshan V.E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res 44, e131 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shale C. et al. Unscrambling cancer genomes via integrated analysis of structural variation and copy number. Cell Genomics 2, 100112 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bergstrom E.N., Kundu M., Tbeileh N. & Alexandrov L.B. Examining clustered somatic mutations with SigProfilerClusters. Bioinformatics 38, 3470–3473 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All synthetic benchmarking data used in this article are available on FigShare at https://doi.org/10.6084/m9.figshare.20409430 and were originally generated as part of Ref. 16. They are publicly available under the Creative Commons Attribution 4.0 International license.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES