Skip to main content
. 2024 Feb 15;56(3):541–552. doi: 10.1038/s41588-024-01659-0

Fig. 5. In silico validation quantifies the consistency between signature assignments and data.

Fig. 5

a, Diagram of the data-driven approach for parameter optimization and in silico validation of post-processing steps after de novo signature discovery. See the main text for details. b, Signature assignments by MuSiCal (upper, HsMuSiCal) and the PCAWG Consortium (lower, HsPCAWG)5 for the PCAWG glioblastoma dataset, showing discrepancies in both assigned signatures and the corresponding exposures. For simplicity, samples with MMRD or hypermutation because of temozolomide treatment are removed. c, Four de novo signatures discovered from the original data (left, Wdata) are compared with those discovered from simulated data based on MuSiCal (middle, WsimulMuSiCal) and PCAWG (right, WsimulPCAWG) assignments. d, Cosine distances between Wdata and WsimulMuSiCal (left) are compared with those between Wdata and WsimulPCAWG (right), showing that MuSiCal assignments achieve better consistency between simulation and data in terms of de novo signatures. e, NNLS weights of de novo signatures for both simulations and data. De novo signatures from data (Wdata) and MuSiCal-derived simulations (WsimulMuSiCal) are matched to the MuSiCal-assigned signatures; that is, SBS1, 2, 5, 8, 12, 13, 30 and 31. De novo signatures from PCAWG-derived simulations (WsimulPCAWG) are matched to the PCAWG-assigned signatures; that is, SBS1, 5, 30 and 40. Standard NNLS is used and the obtained weights are plotted. f, Differences between the NNLS weights in e for the original data and those for MuSiCal- and PCAWG-derived simulations. Positive and negative differences are plotted separately in opposite directions in a cumulative manner. The results are indicative of over-assignment of SBS40 and under-assignment of SBS8 by PCAWG as well as much improved signature assignment by MuSiCal.