What the VAF? A guide to the interpretation of variant allele fraction, percent mosaicism, and copy number in cancer

Adam C Smith; Hubert Tsui; Sila Usta; Jose-Mario Capo-Chichi

doi:10.1186/s13039-025-00718-3

. 2025 Jul 8;18:13. doi: 10.1186/s13039-025-00718-3

What the VAF? A guide to the interpretation of variant allele fraction, percent mosaicism, and copy number in cancer

Adam C Smith ^1,^2,^3,^4,^5,^✉, Hubert Tsui ^1,^4,^6,⁷, Sila Usta ^4,⁶, Jose-Mario Capo-Chichi ^1,⁸

PMCID: PMC12239336 PMID: 40629455

Abstract

The evolution of techniques used to identify structural variants (SVs) and copy number variants (CNVs) in genomes have seen significant development in the last decade. With the growing use of more technologies including chromosomal microarray, genome sequencing and genome mapping in clinical cytogenetics laboratories, reporting the frequency of SVs and CNVs has increased the complexity of genomic results. In conventional testing (e.g. karyotype or FISH) individual cells are analyzed and abnormalities are reported at the single cell level directly as a proportion of the analyzed cells. Whereas for bulk genome assays structural and sequence changes are often reported as variant allele frequencies and fractional copy number states. The International System of Cytogenomic Nomenclature (ISCN) recommends converting these values into a “proportion of the sample”, which requires different calculations and underlying assumptions based on the data type. This review illustrates how the different methods of interpreting and reporting data are performed and identifies challenges in the conversion of these values to a proportion of the sample. We stress the need for careful interpretation of data with consideration for factors that may alter how proportions are reported including overlapping SVs and CNVs or regions with acquired homozygosity. We also demonstrate, using validation data of SVs and CNVs tested by multiple techniques how results are largely consistent across methodologies, but can show dramatic differences in rare circumstances. This review focuses on illustrating many of the challenges with aligning reporting using different techniques and their underlying assumptions. As hematologic disease classifications start to incorporate numeric limits (e.g. VAF defining thresholds), it is important for laboratory geneticists, pathologists and clinicians to appreciate the differences in methodologies, potential pitfalls and the nuances when comparing bulk genome analyses to the more conventional single cell techniques.

Introduction

Detecting and quantifying variation in the genome is a core activity of modern genetic diagnostic laboratories. The detection of structural variants (SVs) in the genome has been achieved by a variety of techniques over the years. However, the only technique– even today– that can report on the complete structure of the chromosomes at the single cell level is the karyogram (an image of metaphase chromosomes of a single nucleus). A karyotype is the written summary of the observed clonal structural and numerical changes in a collection of karyograms (usually 20). Chromosomal microarray (CMA) testing is also in widespread use for the clinical detection of copy number variants (CNVs) in the genome. However, CMA analysis is limited in that it does not resolve the underlying mechanism of the observed CNV. Furthermore, CMA cannot detect balanced structural rearrangements and only interrogates regions of the genome where unique sequence probes can be generated to avoid cross-hybridization with repetitive sequences. The development of massively parallel sequencing technologies, now collectively known as ‘next generation sequencing’ (NGS), has revolutionized the detection of small sequence variants (e.g. ranging from single to hundreds of base pairs). Long-read genome sequencing (lrGS) techniques have expanded the resolution of NGS to enable the detection of larger (e.g. up to thousand base pairs) genetic changes. In fact, the use of lrGS technologies with other mapping and single strand sequencing techniques made possible the first telomere-to-telomere (i.e. T2T) reference sequence, closing many of the gaps present in the next most recent genome assembly, GRCh38 [1]. Despite this significant achievement in reading through and assembling of highly repetitive DNA sequences (e.g. rRNA sequences, centromere satellite DNA), lrGS’s ability to detect larger and/or complex SVs (> 1 kb in size), still faces technical challenges. Low sensitivity/specificity and high false positive rate hinder its widespread use to detect large SVs, particularly in a clinical setting [2–4].

Genome Mapping techniques, such as Optical Genome Mapping, were used in conjunction with sequencing techniques to assemble the T2T reference genome and have shown to have good concordance with conventional techniques (e.g. karyotype) for the detection of SVs with high sensitivity and relatively low false positive rates– making it a technique amenable for clinical use [5–8].

Nomenclature standardization efforts carried out by the Human Genome Variation Society (HGVS)(https://hgvs-nomenclature.org/) and the International System for Cytogenomic Nomenclature (ISCN) (https://iscn.karger.com) committees, have helped harmonize nomenclature for the description of structural and numerical variations using newer, genome-wide high-resolution techniques. However, each reporting system has certain nuances that pertain to the methodology applied. For example, only the ISCN nomenclature defines how the proportion of a given abnormality should be reported for SVs across multiple testing modalities. The ISCN nomenclature has been developed over many years with the goal of accurately describing structural and numerical changes of chromosomes. For this reason and the single cell nature of conventional cytogenetic testing methods (i.e. karyotype and FISH), results have traditionally been reported as a proportion of affected cells (Fig. 1). This is intuitive and can be directly correlated with other clinicopathologic findings, such as the proportion of leukemic cells (e.g. blasts) within a sample as determined by morphological analysis and/or flow cytometry. In contrast, high resolution techniques to detect copy number and structural variation (CMA, OGM and NGS) are largely performed on bulk genome samples (mixtures of thousands to millions of cells). The data, depending on variant type (i.e. structural or numerical), may be reported as a variant allele fraction (VAF) or a copy number state (Fig. 1). These measures of genomic abnormalities cannot report the simple cell frequency that is used in conventional cytogenetic techniques such as karyotype and FISH, as it is not based on a single cell analysis. Therefore, for the purposes of reporting, abnormalities are described as their proportion within the sample. This has been recommended in the attempt to make reporting consistent across abnormality types and traditional testing modalities. It should be noted that while both single cell (karyotype/FISH) and bulk genome results should be more or less concordant when performed on the same specimen, there are many issues that need to be considered when using bulk genome specimens (see section below on concordance between techniques). Regardless of the technology chosen to detect SVs, the nomenclature and interpretation of variants, especially across platforms and modalities can be challenging. In addition, it is also critical to consider that the cancer cell fraction of the sample submitted can have the potentially largest effect on the detection of any genomic abnormality. In hematologic malignancies for example, background normal hematopoiesis, clonal hematopoiesis, clonal/subclonal hierarchies and presence of more than one neoplastic process in the same specimen can make clinical interpretation of the “cancer cell fraction” a challenging endeavor. This review will attempt to clarify why the ISCN recommends the conversion of results to a ”proportion of the sample” and will highlight many of the issues and challenges for users and stakeholders using high-resolution structural variation analysis on bulk genome samples.

Fig. 1 — Enumerating SVs and Copy Number Changes Based on Genomic Approach. Single cell-based technologies, such as karyotype and FISH (Cell Based Analysis Column), are single cell techniques that directly enumerate normal and abnormal nuclei. The fraction of abnormal cells is equivalent to the fractional proportion reported in the nomenclature. However, for bulk genome-based techniques this is not the case. For SVs reported with a variant allele fraction, the number of alleles (or reads/molecules) with an abnormality is the numerator over the total number of alleles (or reads/molecules). Since each diploid cell theoretically should have two alleles (for autosomes and individuals with two X chromosomes) the VAF value can (in most cases) be converted to a proportion of the sample by multiplying the VAF by 2 (see Table 3 for more details). CMA, Genome Mapping and NGS (coverage based) represent copy number data as an estimated copy number. Two copies are normal (assuming diploid), with 1.0 being a complete loss of one copy of the region with a copy number change or 3.0 being a complete gain. Fractional values in between whole numbers represent a copy number change that is not present in all cells. Again, this value is converted for reporting to represent a proportion of the sample as demonstrated in Tables 1 and 2. Created in BioRender. Usta, S. (2025) https://BioRender.com/py657pk

Classes of numeric abnormalities of chromosomes and copy number

This section will discuss the considerations for interpretation for copy number abnormalities of different scales, from largest to smallest. Changes in genome ploidy refer to gains or losses of whole sets of chromosomes (or nearly whole sets). Aneuploidies refer to the gain(s) or loss(es) of one or more whole chromosomes within a nucleus. When chromosome aneuploidy reduces copy number below diploid this situation is referred to as hypodiploidy (45 down to 35 chromosomes) and when aneuploidy increases chromosome number it is considered hyperdiploidy (47 to 57 chromosomes). For a full description of ploidy levels and naming conventions see Table 6 in the International System for Cytogenomic Nomenclature [9], http://iscn.karger.com). Note that in certain diseases, such as Acute Myeloid Leukemia (AML), “hyperdiploid AML” is associated with very specific karyotypic findings - defined as the presence of at least three clonal trisomies that may also have other structural and numerical chromosomal abnormalities [10] and not just a general hyperdiploid number of chromosomes. This avoids confusing cases with a single aneuploidy, such as + 8 - while technically hyperdiploid - but not associated with the good prognosis that cases with 3 or more chromosome gains without additional structural abnormalities are associated with. Finally, deletions and duplications refer to gains or losses of segments of chromosomes ranging from very small, submicroscopic segments to whole chromosome arms. Copy number, for bulk genomic techniques, is often computed by comparing the signal (i.e. probe strength, read depth) for any given data point against a reference sample or samples. Raw copy number data is often normalized (e.g. log₂ normalization) and then segmented (e.g. circular binary segmentation, wild binary segmentation) and a copy number estimate can be calculated across the informative regions. Since most human reference samples are diploid these algorithms attempt to normalize the data to a diploid state, despite whatever the true ploidy of the sample is.

Ploidy

Numerical chromosome abnormalities can occur at varying scale. Most normal somatic cells have a diploid karyotype (e.g. 46 chromosomes, 2n) as determined by two sets of 23 chromosomes (i.e. 22 autosomes and sex chromosomes). However, in many tumours genome instability can lead to changes in ploidy. These changes can manifest as a catastrophic loss of chromosomes, leading to a near haploid state (~ 23 chromosomes, 1n) or gains of whole sets of chromosomes resulting in triploidy (~ 69 chromosomes, 3n), tetraploidy (~ 92 chromosomes, 4n), etc. Genome instability can also generate changes in ploidy from clonal structural changes and/or losses or gains for chromosomes or chromosomal regions (deletions, duplications) leading to complex karyotypes/genomes. Detecting changes in ploidy by karyotype is relatively straightforward as all visible chromosomes within a metaphase can be enumerated, however, for bulk genome techniques this can become more complicated as copy number analysis algorithms normalize copy number data to a diploid state. This can mean that while relative gains and losses are detected, the actual ploidy level can be obscured. Ploidy can sometimes be inferred using bulk genome techniques, for example when using single nucleotide polymorphism (SNP) data, the imbalance of allele frequencies in the SNP track or B-allele frequency shows a pattern consistent with more than 3 allele states seen in a diploid sample (e.g. AA, AB, BB < 2n> -> AAA, AAB, ABB, BBB < 3n> ) suggesting a change in ploidy. While karyotyping is generally the most straightforward technique for detecting ploidy in B-cell Acute Lymphoblastic Leukemia/Lymphoma (B-ALL) the pattern of allele frequencies can be quite helpful in differentiating hyperdiploidy from masked hypodiploidy. In this case of a hyperdiploidy clone the chromosomes with increased copy number should also show allele frequency distortion consistent with trisomy (AAA, AAB, ABB, BBB). However, in the case of masked hypodiploidy, loss of a homologous chromosome for several chromosome pairs results in a loss of heterozygosity for each chromosome that is reduced to one homologue. During the endoreduplication, the chromosome number is increased leading to a mixture of chromosomes with 2 and 3 copies, however, the loss of heterozygosity (LOH) remains and can be seen as LOH for multiple chromosomes (see example in [11]). Nevertheless, in clinical scenarios where a straightforward explanation to explain copy number and allele frequency distortion cannot be achieved with cytogenomic techniques, a limited karyotype study (e.g. 5 metaphases or counts only) may be useful. This is particularly important in disease entities where prognostic risk is affected by ploidy (e.g. B-ALL).

Aneuploidy

Copy number abnormalities of whole chromosomes are easier to detect with bulk genome analysis when sufficient coverage is present. For example, OGM and shallow whole genome sequencing techniques approach the analytical sensitivity of karyotyping [12–14] and FISH assays [15] down to about 5–10% of the sample. Conversely, genome sequencing requires a very high deep read depth (> 200x), that is not routinely achieved in a clinical setting, to achieve the same sensitivity making this strategy less sensitive [16].

Deletions and duplications

Detection of deletions and duplications is more straightforward using bulk genome techniques and a high concordance with conventional cytogenetic techniques is noted. Differences in the performance of these strategies can be observed depending on the size, breakpoints or resolution of the segment. For example, NGS panels typically target consensus coding DNA sequences and compared to OGM, will not ascertain a deletion or duplication extending outside of the targeted region (e.g. deep in the introns). Overall, CMA, OGM and NGS have a better resolution (depending on assay design) than FISH or karyotyping. The theoretical lower limit of karyotype resolution is approximately 5 Mb for high resolution banding (i.e. >600 bands). However, in practice, poor karyotype resolution (300–450 band level) is often seen in cancer specimens. Several factors such as the nature of the hematologic malignancy, the treatment regimen, the use of intercalating agents or the mitotic index can reduce the resolution of karyotypes from bone marrow specimens to an experimental resolution of 300–450 bands. In such conditions, the lower limit of karyotype resolution could be anywhere from 5 to 30 megabases. In comparison, using bulk genome techniques such as CMA, OGM and NGS that are not impacted by poor cell morphology, deletions and duplications can be detected down to hundreds of bases in size.

Bulk genome techniques rely on bioinformatic algorithms with various levels of stringency to detect SVs and CNVs of different size and complexity. For instance, using the recommended settings with OGM, smaller CNVs (e.g. <500 kb) are detected by the SV detection algorithm with a lower limit of detection of 5% VAF whereas larger CNVs detected by the copy number pipeline have a lower limit of detection at approximately 10% VAF. For SVs smaller than a whole chromosome arm, the CNV will likely be detected by both pipelines. In combining data from its SV and copy number detection pipelines, OGM is also able to deduce the plausible underlying mechanism of a CNV. Unlike OGM, CMA can detect a copy number change but is not able to provide the context in which such an abnormality might have occurred. As such, the structural context is relevant while analyzing a given abnormality as well as determining optimal approaches for follow-up or reflex testing. For instance, knowing whether a genomic segment is duplicated in tandem or inserted somewhere else in the genome can provide key information for interpretation and potential clinical impact.

Reporting copy number variants

The ISCN provides a system to describe karyotypically visible genome wide structural and numerical chromosome changes. The ISCN also provides nomenclature recommendations for complementary platforms for SVs and copy number analysis including region specific assays, CMA, OGM and sequencing based technologies. Similarly, the HGVS uses also a hybrid nomenclature system with ISCN to describe large structural variation-nomenclature (https://hgvs-nomenclature.org/stable/recommendations/DNA/complex). The ISCN nomenclature for CMA, OGM and sequencing technologies recommends that all copy number and structural variation frequencies are converted to a “proportion of the sample” format, which is different from the direct cell-based enumeration that is reported for karyotype (albeit with potential culture/metaphase biases) and FISH assays. Therefore, depending on the technique and type of abnormality, the correct type of conversion will need to be applied.

For example, a karyotype may report a “monosomy 7” in 14 of 20 metaphases. It is therefore easy to ascertain that this abnormality is present in 70% of the cells analyzed. Copy number and numeric chromosome changes on bulk genome samples (e.g. microarray, OGM, NGS) are generally reported as an estimated copy number. In the case above, the copy number for chromosome 7 would be 1.3 indicating that 70% of the sample has only one chromosome 7 (Table 1). For single copy number loss, each 0.1 increment below 2.0 represents an increment of 10% in proportion of the sample. For example, consider five techniques and how the monosomy 7 (seen in 70% of cells) example would be reported using ISCN nomenclature in each case:

Table 1.

Converting estimated copy number to proportion of the sample for autosomes and XX females

Estimated Copy Number	Interpretation	Proportion of the Sample
0	Nullisomy	100%
0.1	Mosaic Nullisomy	95%
0.2	Mosaic Nullisomy	90%
0.3	Mosaic Nullisomy	85%
0.4	Mosaic Nullisomy	80%
0.5	Mosaic Nullisomy	75%
0.6	Mosaic Nullisomy	70%
0.7	Mosaic Nullisomy	65%
0.8	Mosaic Nullisomy	60%
0.9	Mosaic Nullisomy	55%
1	Monosomy	100%
1.1	Mosaic Monosomy	90%
1.2	Mosaic Monosomy	80%
1.3	Mosaic Monosomy	70%
1.4	Mosaic Monosomy	60%
1.5	Mosaic Monosomy	50%
1.6	Mosaic Monosomy	40%
1.7	Mosaic Monosomy	30%
1.8	Mosaic Monosomy	20%
1.9	Mosaic Monosomy	10%
2	Normal Diploid	N/A
2.1	Mosaic Trisomy	10%
2.2	Mosaic Trisomy	20%
2.3	Mosaic Trisomy	30%
2.4	Mosaic Trisomy	40%
2.5	Mosaic Trisomy	50%
2.6	Mosaic Trisomy	60%
2.7	Mosaic Trisomy	70%
2.8	Mosaic Trisomy	80%
2.9	Mosaic Trisomy	90%
3	Trisomy	100%
3.1	Mosaic Tetrasomy	55%
3.2	Mosaic Tetrasomy	60%
3.3	Mosaic Tetrasomy	65%
3.4	Mosaic Tetrasomy	70%
3.5	Mosaic Tetrasomy	75%
3.6	Mosaic Tetrasomy	80%
3.7	Mosaic Tetrasomy	85%
3.8	Mosaic Tetrasomy	90%
3.9	Mosaic Tetrasomy	95%
4	Tetrasomy	100%

Open in a new tab

Karyotype: 45,XX,-7[14]/46,XX[6]

Interphase FISH: nuc ish(D7Z1,D7S522)x1[140/200]

Microarray: arr (7)x1[0.7]

OGM: ogm (7)x1[0.7]

Sequencing: seq (7)x1[0.7]

All the examples above indicate that loss of chromosome 7 is observed in about 70% of the sample.

Converting copy number abnormalities or aneuploidies to “proportion of the sample” can be done in a relatively straightforward fashion assuming that the chromosome in question is normally present in two copies (Table 1) or only one copy (Table 2). In Table 1, note the proportion changes when passing below an estimated copy number of 1 or above an estimated copy number of 3. In the former instance the assumption is that both chromosomes have been lost (nullisomy) in the abnormal cell line and in the latter the assumption is that two chromosomes have been gained (e.g. tetrasomy). However, since this is not a single cell analysis it is not possible to distinguish if there is a single clone with tetrasomy or two clones, one with trisomy and a subclone with tetrasomy. As is the case with all bulk genome assays, the loss of ‘single cellness’ requires some compromise with the nomenclature. Additionally, when reporting amplifications of regions, it is generally difficult to determine the proportion of the sample as the many additional copies of the amplified sequence are much greater than the normal diploid copy number making estimation of proportion of the sample challenging. In such cases reporting of these regions as amplified, without a proportion, should be acceptable. Correlation with other clinical findings may be of assistance when interpreting amplifications, such as cancer cell fraction or using the proportion of other observed genomic abnormalities when other information isn’t available. It is important to remember that a very highly amplified region may not appear as highly amplified in a sample with low cancer cell fraction and therefore these issues should be taken into consideration when reporting the presence of amplifications.

Table 2.

Converting fractional copy number to proportion of the sample for XY chromosomes in males

Estimated Copy Number	Interpretation	Proportion of the Sample
0	Nullisomy	100%
0.1	Mosaic Loss of X or Y	90%
0.2	Mosaic Loss of X or Y	80%
0.3	Mosaic Loss of X or Y	70%
0.4	Mosaic Loss of X or Y	60%
0.5	Mosaic Loss of X or Y	50%
0.6	Mosaic Loss of X or Y	40%
0.7	Mosaic Loss of X or Y	30%
0.8	Mosaic Loss of X or Y	20%
0.9	Mosaic Loss of X or Y	10%
1	Normal	N/A
1.1	Mosaic Gain of X or Y	10%
1.2	Mosaic Gain of X or Y	20%
1.3	Mosaic Gain of X or Y	30%
1.4	Mosaic Gain of X or Y	40%
1.5	Mosaic Gain of X or Y	50%
1.6	Mosaic Gain of X or Y	60%
1.7	Mosaic Gain of X or Y	70%
1.8	Mosaic Gain of X or Y	80%
1.9	Mosaic Gain of X or Y	90%
2	Gain of X or Y	100%

Open in a new tab

Reporting considerations for structural abnormalities

Genome wide SV detection was first made possible using karyotyping. FISH SV and CNV detection is limited to targeted regions of interest and therefore not a whole genome detection method and will not be further discussed in detail. CMA is also unable to detect SVs– only the copy number state of non-repetitive genome sequences. Therefore, of the newer whole genome techniques only genome mapping (e.g. OGM) and genome-wide NGS methods can detect SVs at genome scale.

OGM reports structural variants by Bayesian modeling based on relevant labels within the aligned molecules surrounding the SV breakpoint. The OGM SV VAF is inferred from differences in coverage of a set of reference labels given the reference genotype (Bionano Theory of Operation: Structural Variant Calling - https://bionano.com/wp-content/uploads/2024/01/CG-30110_Bionano-Solve-Theory-of-Operation-Structural-Variant-Calling.pdf). NGS uses an approach where VAF is determined from the ratio of sequencing reads indicating a given genotype over the total of reads at the same position. In practice, converting VAF of an abnormality to the corresponding proportion in a sample is straightforward. For example, for autosomes, it is implied that an SV with VAF < 0.5 only affects one of the homologues. Therefore, estimating the proportion of cells with the abnormality is calculated by multiplying the VAF value by 2 (Table 3), as they are two sets of autosomes in a sample. VAF conversion for the X and Y chromosome is also detailed in Table 3. VAF calculation can be distorted by several biological and technical factors. Some possible mechanisms that might account for a VAF > 0.5 but < 1 include: (1) low precision of a true germline homozygous or heterozygous variant, (2) both alleles are affected– i.e. copy neutral loss of heterozygosity (CN-LOH), via a recombination event, leading to the same abnormality being present on both homologues, (3) co-occurring mechanisms in trans, such as an SNV on one allele and a deletion on the other homologue, (4) or complex ploidy. Using sequencing techniques (i.e. short read sequencing) it is often not possible to account for these underlying mechanisms that occult VAF calculation.

Table 3.

VAF conversion to proportion of the sample

	VAF 0 to 0.5	VAF > 0.5
Chromosomes present in 2 copies (1–22, XX females)	x2	x1
Chromosomes present as only one copy (X and Y in males)	x1	x1

Open in a new tab

In some situations, the VAF can be distorted by technical issues. Examples affecting OGM VAF estimation include: (1) false positive calls, (2) misalignment of molecules in complex genome regions (e.g. repetitive sequences in regions of high homology) and (3) artefactual low confidence calls from molecules aligning to genomic regions with low label density. Similar technical biases with VAF determination are also observed for sequencing techniques in highly repetitive or complex gene regions resulting in poor sequencing reads or misalignment. The nature (e.g. high GC content regions are difficult to amplify), composition (e.g. SNP interfering with proper binding of a primer) or size of the sequence to amplify (e.g. preferential amplification of smaller alleles) can skew VAF calculation. While OGM and NGS technologies are able to call SVs, evaluation of the data and an understanding of potential biases and limitations in VAF precision are important to understand as the complexity of the genomic data we evaluate increases.

CNVs, excluding aneuploidies of whole chromosomes, are both structural and copy number changes (e.g. a deletion breaks at two spots in a given chromosome, the intervening material is lost, and the two ends are fused together). While the SV portion is reported as a VAF, the corresponding CNV is reported as estimated copy number. VAF and Estimated Copy Number are typically determined by distinct analysis pipelines. While VAF is widely used in reporting sequence variants and may seem more familiar to those routinely exposed to clinical sequencing reports; this can lead to misunderstanding of the proportion of an abnormality, especially when compared to any single cell-based analysis. Let us consider the case for a “copy number” abnormality, such as a small deletion at 17p involving TP53 with a VAF of 0.25 described as:

ogm[GRCh38] 17p13.1(7,568,421_7,787,490)x1

As discussed above, the corresponding proportion of this deletion in the sample is 50% (Table 3) and reported as:

ogm[GRCh38] 17p13.1(7,568,421_7,787,490)x1[0.5]

If we were to perform FISH and get a concordant result it would be written (Fig. 2A):

nuc ish(TP53 × 1,D17Z1 × 2)[100/200].

This nomenclature indicates one copy of the TP53 probe with two copies of the centromere of chromosome 17 (D17Z1) in 100 out of 200 nuclei analyzed– or 50%. Note that two clones with a single loss of TP53 in 50 cells plus another clone with homozygous deletion of the TP53 region as visualized by FISH could give an apparently equivalent result to that reported by OGM (Fig. 2B):

nuc ish(TP53 × 1,D17Z1 × 2)[50/200]/(TP53 × 0,D17Z1 × 2)[25/200]

However, this assumes that both deletions on the chromosome 17 homologues were identical, because if they had different breakpoints they would be reported as two distinct structural variants by OGM, each with its own VAF.

Reporting considerations for single nucleotide variation (SNV)

At a cellular level, tumorigenesis can be orchestrated through a combination of SV abnormalities (e.g. cytogenetically visible and submicroscopic SVs and copy number changes) as well as smaller DNA variants that can impact the same gene (e.g. at the resolution of sequencing techniques). A very clinically relevant example is TP53 mutated myeloid malignancies, where SVs and/or SNVs can inactivate TP53 (e.g. by del(17p), -17 or mutation, Fig. 2C) leading to genomic instability and complex karyotypes [17, 18]. Such a combination of two alterations in TP53 is termed “biallelic” and conjures an image of having both TP53 alleles affected by mutations (Fig. 2D, left). There are important factors to consider while reporting TP53 SNVs in this context. While “biallelic” necessitates a trans configuration of the TP53 variants, short read DNA sequencing techniques are often limited in their ability to determine the phase of variants impacting TP53. Phasing refers to the ability to differentiate the two different parental DNA strands across a larger genomic interval, usually by using single nucleotide polymorphisms to uniquely identify each strand and is usually only feasible by long read sequencing. Further, DNA sequencing panels are not well suited for the detection of the larger SVs and/or CNVs that may overlap the TP53 gene region, such as copy neutral loss of heterozygosity, isochromosome 17q, or deletion 17p (Fig. 2D, middle, right). Note that other SV mechanisms occurring above the resolution of sequencing, such as the ones shown in Fig. 1 in Smith et al. [4],can also disrupt gene function. While TP53 is arguably the most well-known example of multiple genomic mechanisms leading to gene inactivation, it is not the only example and illustrates the potential for SNV, copy number and SV abnormalities working together to disrupt gene function, not to mention the added complexity of other potential epigenetic and post-translational mechanisms.

Reporting considerations for post-allogenic stem cell transplants

Another situation that is encountered in somatic analysis, especially in hematology patients, is genomic assessment post allogeneic bone marrow transplant. In this case there may be a mixture of donor and recipient cells in the sample that can lead to unusual results. In Fig. 3, the results of a sex-mismatched allogenic bone marrow transplant is shown at time of relapse. The karyotype result shows several abnormal clones arising from the recipient marrow (Fig. 3A-C) and OGM results show some residual Y chromosome presence that was not detected by karyotypic analysis (Fig. 3D). The result shown by OGM is a chimera, however, and can be reported using a similar example from microarray section of the ISCN, pg. 132 example viii [9]. An equivalent example for OGM with 90% of the sample being from the recipient (XX) and 10% from the donor (XY).

Fig. 3 — Sex Mismatched Considerations in the Post Transplant Scenario. In post allogeneic bone marrow transplant patients, a mixture of donor and recipient cells may be present. In this example of a female AML patient post unrelated hematopoietic transplant from a male donor, a karyotype was performed and showed four abnormal clones originating from the donor. The karyotype result is 47,XX,+22[2]/48,idem,+13[2]/49,idem,+8,+13[10] /45,XX, t(4;12)(q12;p13),t(5;17)(p13;p11.2),-7[6]. A) karyogram showing the presumed stemline clone with trisomy 22. Chromosome abnormalities indicated by red arrows. B) Karyogram showing the mainline clone (by karyotype), with trisomy 8, 13 and 22. C) Karyogram showing an unrelated abnormal clone t(4;12)(GSX2::ETV6), a t(5;17) and monosomy 7. No residual metaphases from the donor were observed. D) OGM using Access v1.7 and Solve v3.7 shows all the abnormalities seen with karyotype (noted with red arrows) except for trisomy 22. Aneuploidy for smaller chromosomes have been demonstrated to have a slightly elevated lower limit of detection by OGM compared to larger chromosomes with prior versions of Solve. Processing the same molecule file through Solve 3.8 then using the SNPFASST3 algorithm in VIA 7.1 (processing type: OGM BAM Multiscale– LAF, with aneusomy % ACF at 5.0; using reference: hg38_female_300x_maskedMSR_230420) showed detection of all three aneuploidies with their estimated copy number in parethesis: +8 (2.05), + 13 (2.07), and + 22 (2.07). It should be noted that while the karyotype showed the clone with + 8, +13 and + 22 to be most abundant, based on the VAF and copy number data on OGM it is at a much lower proportion than the clone from panel C. OGM shows the presence of some residual donor cells evidenced by B-allele fraction (BAF) values distorted outside of expected homozygous and heterozygous thresholds (panel E, lower track), and genome-wide allelic imbalance detection (panel E, purple shading). Further, an apparent imbalance between X and Y chromosomes and the autosomes, shows only low residual levels of Y (panel D, green arrows and E). This is consistent with a sex-mismatched, post-transplant scenario with relapse as the genome assembly is detecting both sequences from X and Y (donor is XY, recipient is XX). The BAF segment values and sex chromosome aneusomies represent imbalances which enumerate the relative proportion of donor and recipient cell in the sample

ogm (X,1-22)x2[0.9]//(X, Y)x1(1-22),x2[0.1]

Additionally, the table style ISCN reporting format may also be used if multiple abnormalities are present where an adequate text description helps clarify the observed abnormalities and presence of sex mismatched results. Clinically, this may be very important in supporting a relapsed primary hematological malignancy, development of new primary (perhaps treatment related or prior chronic) hematologic neoplasm or rarely, donor derived leukemia.

Bias between techniques

An important consideration when comparing results from orthogonal assays comes from both human and technical factors that can lead to intrinsic biases. When comparing karyotype results to bulk genome assays it is important to consider several factors. Biologic factors, such as ‘metaphase bias’, can be seen even between karyotype (metaphase) and interphase FISH results. Some clones may be more represented or less represented in nuclei in metaphase rather than in interphase. In such cases it is possible to see drastically different frequencies of an abnormality depending on the assay. This difference can be seen particularly well in patients with dual disease, where a myeloid clone may be readily visible by karyotype analysis, but a less proliferative lymphoid clone may not appear without the use of mitogen stimulated cultures. This is also observed between karyotype and all bulk genome techniques (CMA, genome mapping and NGS) and while most orthogonal testing shows similar proportions for any given abnormality, sometimes the difference between a metaphase and interphase analysis can be quite dramatic. Another potential difference can be attributed to observer bias, commonly seen in B-cell acute lymphoblastic leukemia (B-ALL) where the detection of a normal karyotype is followed by FISH assays demonstrating what should be a visible abnormality or clone. B-ALL is known to often generate poor quality chromosomes and some of the worst quality metaphases will contain the abnormal clone. During karyotype analysis this clone can be missed, resulting in the reporting of a normal karyotype.

Karyotyping, in comparison to other bulk genome techniques, evaluates a relatively small sampling of abnormal nuclei, from 10 to 50 in general. A 20 metaphase analysis has the statistical power to exclude an abnormal clone (mosaicism) is about 87% at the 95th confidence limit [12]. Leaving the distinct possibility that in samples with low cancer cell fraction an abnormal clone could be missed simply due to insufficient sampling. In addition, the ISCN stipulates that additional chromosomes and structural changes must be observed in a minimum of two metaphase and a loss of chromosome must be seen in three metaphases. This serves to reduce the potential for false positives at the cost of sensitivity. Technical factors, such as hemodilution can also reduce the cancer cell fraction in successive aspirations, and other biological considerations may also play a significant role in the detection of abnormal clones by karyotyping as well, such as how big the abnormality is (can it be easily seen), or does the abnormality also result in a clearly visible change in banding pattern? There are many cryptic abnormalities in the hematologic malignancies that are very difficult, if not impossible to see, by karyotype alone. In this context, high resolution methods are less subjective to observer bias due to the greatly increased number of molecules examined and higher objectivity of bioinformatic algorithms for SV detection. However, it should be noted that appropriate depth of coverage is also required for high resolution methods in order to achieve equivalent sensitivity [16]. SNP microarray can detect CNVs with tumour burden in the 10–20% range [19]. OGM with > 300-400x genome coverage can also detect CNVs in this range and SVs reliably down to 10% [13]. The lower limit of detection for NGS will vary considerably depending on the approach. For example, shallow whole genome at relatively low genome coverage (5x) will provide near equivalent detection sensitivity with microarray and OGM [14], obviously without SV detection. Genome sequencing looking for CNVs and SVs will be much more limited at standard sequencing depths (30x) with tumour burden of 30–40% required for a true positive detection rate at 95% [16]. To detect abnormalities down to 10% tumour burden would require sequencing depths in excess of 124x, which is much higher than is routinely performed. These numbers only address the mathematical probabilities of detecting a CNV or SV at any given depth of coverage without also addressing issues that are especially prevalent in short read sequencing technologies in resolving SVs or CNVs in repetitive regions of the genome that are difficult to align with short reads leading to lower sensitivity and higher false positive rates [20].

“Quantitative-ness” between techniques: a comparison of results between karyotype, FISH and optical genome mapping

During validation for new technologies, we often compare the new technique to the current standard or an orthogonal method. For cytogenetic laboratories comparing karyotype and FISH to OGM is an obvious comparison. However, as described above in the Bias section there can be differences that are observed. Over the course of our validation for OGM we performed parallel karyotype, FISH and OGM on over 30 samples collected between April 2021 and April 2023 at the Cancer Cytogenetics Laboratory at the University Health Network (Toronto, Canada). The point of presenting this data here is to demonstrate the overall concordance between these techniques– but most importantly to point out some significant discrepancies as well to discuss their causes and interpretation.

Table 4 shows seven individual cases of interstitial deletions evaluated by all 3 methods. The overwhelming majority of the samples show close correlation in terms of the reported proportion of the sample. One sample shows a low-level del(7q) reported by karyotype (10% by karyotype) but not supported by either FISH or OGM. The bands defined by karyotype were 7q21 to 7q32 which should include the deletion and be detected by the FISH probe targeted to band 7q31. This result obviously produces a conundrum with a few possible interpretations. The abnormality may be at a very low level in metaphases and thus not detected by either interphase or bulk DNA analysis. However, the lower limit of detection for a deletion by OGM should be 10% of the sample or a VAF of 5% and FISH would be between 5 and 10% as well. It is not possible to completely discard the possibility that the karyotype observation is also an error.

Table 4.

Interstitial deletions

Case	Dx	Abnormality	% Abnormality by Karyotype	% Abnormality by FISH	% Abnormality by OGM
1	AML	del(5q)	95	88	92
2	MDS	del(5q)	100	92	91
3	MPN	del(7q)	10	0	0
4	AML	del(7q)	33	27.5	25
5	MPN	del(11q)	100	92	89
6	MDS	del(20q)	15	16.5	0*
7	MPN	del(20q)	25	23	28

Open in a new tab

* the copy number algorithm did not detect a deletion of chromosome 20q, however the structural variant pipeline detected a fusion between a proximal and distal segment of chromosome 20q supporting a deletion call at approximately 20% of the sample (10% VAF). This is at the lower limit of detection for copy number calls on OGM. OGM data generated on Solve version 3.7 and Access version 1.7

Table 5 shows the results of 9 aneuploidy results with 2 occurring within the same sample. Again, the overall conclusion from the results shows the concordance between techniques. Interestingly, several of the karyotype studies showed results where the abnormality was seen in 100% of metaphases (-Y, -7, + 8 and + 8, +19). Reviewing the FISH and OGM results do not show near 100% results likely due to the increased sampling used by both FISH (200 nuclei) and OGM (~ 300x coverage) compared to karyotype analysis (~ 20 metaphases). One of the cases with − 7 was not detected by karyotype 45,XY, del(5)(q?13q?33),add (6)(p22),der(16)t(11;16)(q13;q22),-21[9] and based on the FISH results the clone with − 7 is a subclone nuc ish(D5S721/D5S23 × 2,EGR1 × 1)[184/200], (D7Z1,D7S522)x1[29/200], (5’KMT2A,3’KMT2A)x3(5’KMT2A con 3’KMT2Ax3)[100/200]. As can be seen by the FISH result the − 7 is only present in 14.5% of nuclei compared to the deletion 5q that is present in 92% of nuclei. Karyotype was also sub-optimal as only 9 metaphases were analyzed. Therefore, a subclone of the mainline clone reported by karyotype likely exists but was not reported, as it was not observed in sufficient numbers to define a separate clone. Most interestingly, the clone that contained both the + 8 and + 19 that appeared to be a single clone by karyotype appears to be, in fact, two clones when interpreting the FISH and OGM– one with sole + 8 and a sideline clone with + 8 and + 19. The final discrepant result is the karyotype call for + 14 that was not supported by either FISH or OGM. Again, this could be at the limit of detection for FISH and OGM however there was no support for an extra chromosome 14– even looking for signal below the cut-off. Given access to therapeutic agents in certain jurisdictions rely critically on disease subclassification (e.g. AML, myelodysplasia-related); this further emphasizes the importance of interdisciplinary communication and integrated (theragnostic) reporting.

Table 5.

Aneuploidies

Case	Dx	Abnormality	% Abnormality by Karyotype	% Abnormality by FISH	% Abnormality by OGM
8	AML	-Y	100	70.5	57
9	AML	-7	100	96.5	90
2	MDS	-7	0	14.5	16
10	AML	+ 8	100	69.5	86
11	AML	+ 8,+19	100,100	85.5,78.5	96,66
12	B-ALL	+ 9	40	67	96
13	MDS	-11	55	61	63
14	AML-r	+ 14	9	0	0

Open in a new tab

OGM data generated on Solve version 3.7 and Access version 1.7

Table 6 shows the concordance for intra- and inter-chromosomal rearrangements. Again, the great majority of samples show highly similar proportions considering the different techniques used. The obvious drawback observed from karyotype analysis alone is that several of the abnormalities were not confirmed without additional studies, i.e. FISH. For example, the MECOM::MBNL1 rearrangement was reported by karyotype as a del [3](p13). However, the routine practice at the University Health Network Cytogenetics Laboratory is to investigate all abnormal chromosome 3 results in AML and MDS for MECOM rearrangements. Similarly, for the t(12;19)(ZNF384::TCF3) rearrangement the involvement of chromosome 12 was not appreciated by karyotype as the breakpoint is very close to 12p and therefore cryptic. FISH and OGM were able to confirm this rearrangement. However, most interesting is the case with t(4;11)(KMT2A::AF4) translocation where the KMT2A rearrangement was seen in the context of a karyotype with multiple clones: 46,XX, del (4)(q21)[7]/46,XX, del(3)(p13)[3]/46,XX, t(4;11)(q21;q23)[2]/46,XX[14]. The result shows the t(4;11) was called only in a sideline clone at low proportion. However, FISH and OGM both showed the presence of the rearrangement in over 85% of the sample. It is likely that the original stemline clone with the t(4;11) underwent additional clonal evolution and that the metaphases characterized as del (4)(q21) also had the KMT2A::AF4 rearrangement but was cryptic by karyotype analysis.

Table 6.

Intra- and Inter-chromosomal rearrangements

Case	Dx	Abnormality	% Abnormality by Karyotype	% Abnormality by FISH	% Abnormality by OGM
15	AML	inv(3q)(MECOM::GATA2)	74	41	74
16	AML	inv(3pq)(MECOM::SATB1)	86	62	80
17	AML	inv(3q)(MECOM::MBNL1)	74^a	71	46
18	AML	t(3;8)(MECOM::MYC)	100	59	88
19	AML	t(4;12)(ETV6::GSX2)	100	87 ~ 91^b	88
20	AML	t(4;12)(ETV6::GSX2)	100	83	92
21	MPAL	t(9;22)(BCR::ABL1)	80	87	88
22	B-ALL	t(4;11)(KMT2A::AF4)	8	85	> 99
23	AML	t(6;11)(KMT2A::AFDN)	100	94	> 99
24	AML	t(6;11)(KMT2A::AFDN)	82	89	> 99
10	AML	t(9;11)(KMT2A::MLLT3)	100	81.5	> 99
25	AML	t(9;11)(KMT2A::MLLT3)	100	95	94
26	AML	t(10;11)(KMT2A::TET1)	45	55	48
27	B-ALL	t(12;19)(ZNF384::TCF3)	62^a	87.5	> 99
28	MLN-TK	ins(14;8)(c14orf93::FGFR1)	100^a	54 ~ 66	88

Open in a new tab

^a karyotype did not correctly identify the rearrangement, but did identify the chromosome(s) were abnormal. The proportion of the abnormality was taken from the frequency of the abnormality identified. ^b Two separate break-apart probes were used and the range of the two is reported. Abbreviations: AML: acute myeloid leukemia (AML-r, relapsed), B-ALL: B-cell acute lymphoblastic leukemia, MDS: myelodysplastic syndrome, MPAL: mixed phenotype acute leukemia, MPN: myeloproliferative neoplasm, MLN-TK: myeloid or lymphoid neoplasm with tyrosine kinase rearrangement. OGM data generated on Solve version 3.7 and Access version 1.7

In conclusion, this comparison highlights a remarkably high concordance between techniques in the detection of various SV classes. The outlying discrepancies, for the most part, illustrate the limitations of each technique (such as the detection of cryptic rearrangements by karyotype). Observations near the limit of detection should be carefully considered in the somatic context and correlation with the fraction of cancer cells within a sample is good practice, when available. As a low frequency abnormality in a sample with low cancer cell fraction may indicate that it is the predominant clone. However, an SV with low proportion in a sample with abundant cancer cells, one must consider carefully the clinical significance of such a finding.

Clinical impacts

Genomic information is fundamental to diagnosis, classification, prognosis and eligibility for targeted treatment of hematopoietic neoplasms as seen in the newly updated classification systems (i.e. 5th Ed. World Health Organization Classification of Tumours: Hematolymphoid Tumours (WHO HAEM-5) and the 2022 International Consensus Classification (ICC) of Myeloid Neoplasms and Acute Leukemias [21, 22]. Using AML as an example, risk stratification by the 2022 European Leukemia Network [23] is entirely based on cytogenetic and molecular biomarkers, although prognostic risk varies with type of therapy. As such, an alternative genetic risk stratification applies to patients receiving less-intensive therapies [24]. Generally for AML, classification-defining subtypes are categorized by the presence or absence of a particular variant above the validated limit of detection of the assay, without a specific VAF requirement or proportion abnormal in the sample (e.g. NPM1 mutations, RUNX1::RUNX1T1 translocation, respectively). In the HAEM-5 Hematolymphoid Classification, certain AML subclasses with specific rearrangements (e.g. t(8;21)(RUNX1::RUNX1T1), t(15;17)(PML::RARA), inv [16](MYH11::CBFB)) do not require an elevated blast count and are diagnostic of AML upon detection. Certainly, caution may be advised in situations where genomic results and clinical context are discordant, ideally this integration should also incorporate classical morphologic and immunophenotypic assessments. One should especially consider chromosomal mimics detected by karyotyping that appear as AML defining rearrangements. A mimic is a rearrangement with similar breakpoints to a recurrent rearrangement, but that do not involve the recurrently involved genes– a situation that occurs rarely, but that can significantly change patient management [25].

For some diagnoses in the ICC, specific VAF thresholds are stipulated. For example, MDS with mutated TP53 is defined by combinations of TP53 variants (2 variants > 10% VAF or 1 variant > 50% VAF), concurrent 17p deletion and/or demonstration of copy neutral LOH at the TP53 locus (17p13.1). Another example of VAF cutoffs being used to define an overt hematologic neoplasm including varying VAF thresholds with its precursor states is the ICC requirements for diagnosis of chronic myelomonocytic leukemia (CMML). Here, presence of clonality in CMML is defined as abnormal cytogenetics and/or presence of at least one myeloid neoplasm associated mutation of at least 10% allele frequency [21, 26]. In contrast, the CMML-precursor state clonal monocytosis of undetermined significance (CMUS) and clonal cytopenia and monocytosis of undetermined significance (CCMUS) can be diagnosed with a lower VAF threshold of > 2% [21, 26]. Although high-resolution testing for SVs is not clinically widespread, emerging data demonstrates that SVs, such as deletions, insertions or inversions, can also overlap critical myeloid neoplasm-associated genes leading to loss of function. As such, understanding how the proportion of these abnormalities is reported and how to apply these values to existing classification systems is of critical importance.

In other premalignant states such as the spectrum of clonal hematopoiesis (without monocytosis or cytopenia), clonal hematopoiesis of indeterminate potential (CHIP) is defined as a somatic mutation in a known driver gene at VAF > or equal to 2% VAF but normal blood counts (reviewed in [27]). Given the prevalence of CHIP, prognostic scoring systems have been developed to guide risk of progression to a hematologic neoplasm. The clonal hematopoiesis risk score (CHRS) for example includes a partitioning variable of VAF greater than or equal to 20% as a risk factor [28]. From an SV perspective, mosaic chromosomal alterations are another form of clonal hematopoiesis, although limit of detection depends on the methodology (e.g. 5–10% cell frequency when using SNP microarray) [29]. With routine use of higher resolution SV detecting technologies, one would expect the prevalence of SV-defined clonal hematopoiesis to increase. Additional research will be required, however, to determine how high-resolution SV results can be incorporated into current clinical risk stratification systems for different hematological neoplasms, especially where risk scores may influence treatment such as referral for allogenic-transplant [30]. Additionally, NGS panels and genome wide SV detection methods will detect SNVs and SVs that may be associated with predisposition to hematologic malignancies or other cancers. Having a variant with a VAF near 0.5 is often used to trigger queries of germline predisposition. However, using VAF fraction alone without consideration of the many possible distortions in VAF frequency (both technical and biologic) that may occur will surely cause under-ascertainment of these variants [28].

Awareness of these clinical contexts is important for those reporting genomic results as well as end users where multi-disciplinary integration is required for accurate diagnosis, prognosis and therapeutic decision making from accessibility to targeted therapy and eligibility for clinical trials.

Conclusion

The use of new genomic technologies to diagnose somatic disease has already improved patient outcomes, especially where targeted therapies are available. While the concordance between orthogonal detection strategies for structural variants is extremely high, due to the differences in analysis (e.g. single cell vs. bulk genome) different numerical representations of the data are necessary. In most cases the differences will be small but can be significant and must be well understood and translated effectively for accurate diagnosis, risk stratification and therapy.

Clarification on certain abbreviations

Copy number variant (CNV)

this term has been variably defined in the literature with a lower size limit from 50 bp to 1 kb. It has also been used to define normal variation seen in the population, sometimes not including pathogenic copy number abnormalities. In this manuscript, we define the term CNV in the most encompassing sense, including the largest possible size range and both populational and deleterious copy number changes. By using CNV, we maintain consistency between Structural Variants (SV) and Single Nucleotide Variants (SNV), all of which can include populational and deleterious variants.

Variant allele fraction (VAF)

referring to the proportion of a variant allele in a sample. The literature will commonly also use the term Variant Allele Frequency, however, we prefer the term Fraction for the proportion within a sample, whereas frequency would be more appropriate for situations where the sample was a population, rather than an individual. For example, the minor allele frequency refers to the number of alleles within a population that have the less frequent variant.

Acknowledgements

We would like to acknowledge the hard work and support of the Technologists at the University Health Network Cancer Cytogenetics Laboratory, especially Ana Baptista, Olivia King, Shabnam Salehi-Rad and Kate Harris. We would also like to acknowledge Dr. Ben Clifford (Bionano, Inc) for his expert advice on OGM and review of the manuscript.

Author contributions

Conceptualization, ACS; writing of the manuscript, ACS, HT, JMCC; data preparation, ACS; data review, ACS, HT, SU, JMCC; Illustrations and Tables, ACS and SU, editing ACS, HT, SU, JMCC. All authors reviewed the manuscript.

Funding

This research received no external funding.

Data availability

No datasets were generated or analysed during the current study.

Declarations

Institutional review board

Aggregated data that is presented in this study was part of Optical Genome Mapping Validation studies at the University Health Network and was approved by the University Health Network Research Ethics Board (CAPCR# 20-6121).

Informed consent

Patient consent was waived since the purpose of residual or banked samples being used was for test development (consent waiver part of CACPR# 20-6121).

Competing interests

ACS has a small personal financial interest in Bionano, Inc (<$1000).

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A et al. The complete sequence of a human genome. Science (1979) [Internet]. 2022;376(6588):44–53. Available from: https://www.science.org/doi/10.1126/science.abj6987 [DOI] [PMC free article] [PubMed]
2.Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N et al. Utility of long-read sequencing for All of Us. bioRxiv [Internet]. 2023;2023.01.23.525236. Available from: https://www.biorxiv.org/content/10.1101/2023.01.23.525236v1%0A https://www.biorxiv.org/content/10.1101/2023.01.23.525236v1.abstract [DOI] [PMC free article] [PubMed]
3.Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H et al. Beyond the exome: What’s next in diagnostic testing for Mendelian conditions. Am J Hum Genet [Internet]. 2023;110(8):1229–48. Available from: http://www.ncbi.nlm.nih.gov/pubmed/37541186 [DOI] [PMC free article] [PubMed]
4.Smith AC, Hoischen A, Raca G. Cytogenetics Is a Science, Not a Technique! Why Optical Genome Mapping Is So Important to Clinical Genetic Laboratories. Vol. 15, Cancers. Multidisciplinary Digital Publishing Institute (MDPI); 2023. [DOI] [PMC free article] [PubMed]
5.Iqbal MA, Broeckel U, Levy B, Skinner S, Sahajpal NS, Rodriguez V et al. Multisite Assessment of Optical Genome Mapping for Analysis of Structural Variants in Constitutional Postnatal Cases. The Journal of Molecular Diagnostics [Internet]. 2023;25(3):175–88. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1525157823000028 [DOI] [PMC free article] [PubMed]
6.Levy B, Liu J, Iqbal MA, DuPont B, Sahajpal N, Ho M, et al. Multisite evaluation and validation of optical genome mapping for prenatal genetic testing. J Mol Diagn. 2024;26(10):906–16. [DOI] [PubMed] [Google Scholar]
7.Neveling K, Mantere T, Vermeulen S, Oorsprong M, van Beek R, Kater-Baats E et al. Next-generation cytogenetics: Comprehensive assessment of 52 hematological malignancy genomes by optical genome mapping. Am J Hum Genet [Internet]. 2021;108(8):1423–35. Available from: 10.1016/j.ajhg.2021.06.001 [DOI] [PMC free article] [PubMed]
8.Mantere T, Neveling K, Pebrel-Richard C, Benoist M, van der Zande G, Kater-Baats E, et al. Optical genome mapping enables constitutional chromosomal aberration detection. Am J Hum Genet. 2021;108(8):1409–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hastings RJ, Moore S, Chia N, editors. ISCN 2024. S. Karger AG; 2024.
10.Chilton L, Hills RK, Harrison CJ, Burnett AK, Grimwade D, Moorman AV. Hyperdiploidy with 49–65 chromosomes represents a heterogeneous cytogenetic subgroup of acute myeloid leukemia with differential outcome. Leukemia. 2014;28(2):321–8. [DOI] [PubMed] [Google Scholar]
11.Levy B, Kanagal-Shamanna R, Sahajpal NS, Neveling K, Rack K, Dewaele B et al. A framework for the clinical implementation of optical genome mapping in hematologic malignancies. Am J Hematol [Internet]. 2024; Available from: https://onlinelibrary.wiley.com/doi/10.1002/ajh.27175 [DOI] [PubMed]
12.Hook EB. Exclusion of chromosomal mosaicism: tables of 90%, 95% and 99% confidence limits and comments on use. Am J Hum Genet [Internet]. 1977;29(1):94–7. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=168522826tool=pmcentrez26rendertype=abstract [PMC free article] [PubMed]
13.Sahajpal NS, Mondal AK, Tvrdik T, Hauenstein J, Shi H, Deeb KK et al. Clinical Validation and Diagnostic Utility of Optical Genome Mapping for Enhanced Cytogenomic Analysis of Hematological Neoplasms. The Journal of Molecular Diagnostics [Internet]. 2022;(October):2022.03.14.22272363. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1525157822002902 [DOI] [PubMed]
14.Chaubey A, Shenoy S, Mathur A, Ma Z, Valencia CA, Reddy Nallamilli BR, et al. Low-Pass genome sequencing: validation and diagnostic utility from 409 clinical cases of Low-Pass genome sequencing for the detection of copy number variants to replace constitutional microarray. J Mol Diagn. 2020;22(6):823–40. [DOI] [PubMed] [Google Scholar]
15.Wiktor AE, Van Dyke DL, Stupca PJ, Ketterling RP, Thorland EC, Shearer BM et al. Preclinical validation of fluorescence in situ hybridization assays for clinical practice. Genetics in Medicine [Internet]. 2006 Jan [cited 2011 Aug 11];8(1):16–23. Available from: http://content.wkhealth.com/linkback/openurl?sid=WKPTLP:landingpage&an=00125817-200601000-00003 [DOI] [PubMed]
16.Petrackova A, Vasinek M, Sedlarikova L, Dyskova T, Schneiderova P, Novosad T, et al. Standardization of sequencing coverage depth in NGS: recommendation for detection of clonal and subclonal mutations in Cancer diagnostics. Front Oncol. 2019;9(September):1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Shah MV, Hung K, Baranwal A, Kutyna MM, Al-Kali A, Toop C et al. Evidence-based risk stratification of myeloid neoplasms harboring TP53 mutations. Blood Adv. 2025. [DOI] [PubMed]
18.Wei Q, Hu S, Loghavi S, Toruner GA, Ravandi-Kashani F, Tang Z, et al. Chromoanagenesis is frequently associated with highly complex karyotypes, extensive clonal heterogeneity, and treatment refractoriness in acute myeloid leukemia. Am J Hematol. 2025;100(3):417–26. [DOI] [PubMed] [Google Scholar]
19.Schoumans J, Suela J, Hastings R, Muehlematter D, Rack K, van den Berg E et al. Guidelines for genomic array analysis in acquired haematological neoplastic disorders. Genes Chromosomes Cancer [Internet]. 2016;55(5):480–91. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26774012 [DOI] [PubMed]
20.Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun [Internet]. 2019;10(1):1784. Available from: http://www.nature.com/articles/s41467-018-08148-z [DOI] [PMC free article] [PubMed]
21.Arber DA, Orazi A, Hasserjian RP, Borowitz MJ, Calvo KR, Kvasnicka HM, et al. International consensus classification of myeloid neoplasms and acute leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Khoury JD, Solary E, Abla O, Akkari Y, Alaggio R, Apperley JF, et al. The 5th edition of the world health organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia. 2022;36(7):1703–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Döhner H, Wei AH, Appelbaum FR, Craddock C, DiNardo CD, Dombret H, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140(12):1345–77. [DOI] [PubMed] [Google Scholar]
24.Döhner H, DiNardo CD, Appelbaum FR, Craddock C, Dombret H, Ebert BL, et al. Genetic risk classification for adults with AML receiving less-intensive therapies: the 2024 ELN recommendations. Blood. 2024;144(21):2169–73. [DOI] [PubMed] [Google Scholar]
25.Zhao M, Ryall S, Brody SJ, Harris AC, Cabral K, Brownstein C, et al. Integrative cytogenetic and molecular studies unmask chromosomal mimicry in hematologic malignancies. Blood Adv. 2025;9(5):1003–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Valent P, Orazi A, Savona MR, Patnaik MM, Onida F, van de Loosdrecht AA, et al. Proposed diagnostic criteria for classical chronic myelomonocytic leukemia (CMML), CMML variants and pre-CMML conditions. Haematologica. 2019;104(10):1935–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Weeks LD, Ebert BL. Causes and consequences of clonal hematopoiesis. Blood. 2023;142(26):2235–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Weeks LD, Niroula A, Neuberg D, Wong W, Lindsley RC, Luskin M et al. Prediction of risk for myeloid malignancy in clonal hematopoiesis. NEJM Evid. 2023;2(5). [DOI] [PMC free article] [PubMed]
29.Laurie CC, Laurie Ca, Rice K, Doheny KF, Zelnick LR, McHugh CP et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet [Internet]. 2012 Jun [cited 2014 Jan 15];44(6):642–50. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=336603326tool=pmcentrez26rendertype=abstract [DOI] [PMC free article] [PubMed]
30.Yang H, Garcia-Manero G, Sasaki K, Montalban-Bravo G, Tang Z, Wei Y et al. High-resolution structural variant profiling of myelodysplastic syndromes by optical genome mapping uncovers cryptic aberrations of prognostic and therapeutic significance. Leukemia [Internet]. 2022;36(9):2306–16. Available from: https://www.nature.com/articles/s41375-022-01652-8 [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were generated or analysed during the current study.

[CR1] 1.Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A et al. The complete sequence of a human genome. Science (1979) [Internet]. 2022;376(6588):44–53. Available from: https://www.science.org/doi/10.1126/science.abj6987 [DOI] [PMC free article] [PubMed]

[CR2] 2.Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N et al. Utility of long-read sequencing for All of Us. bioRxiv [Internet]. 2023;2023.01.23.525236. Available from: https://www.biorxiv.org/content/10.1101/2023.01.23.525236v1%0A https://www.biorxiv.org/content/10.1101/2023.01.23.525236v1.abstract [DOI] [PMC free article] [PubMed]

[CR3] 3.Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H et al. Beyond the exome: What’s next in diagnostic testing for Mendelian conditions. Am J Hum Genet [Internet]. 2023;110(8):1229–48. Available from: http://www.ncbi.nlm.nih.gov/pubmed/37541186 [DOI] [PMC free article] [PubMed]

[CR4] 4.Smith AC, Hoischen A, Raca G. Cytogenetics Is a Science, Not a Technique! Why Optical Genome Mapping Is So Important to Clinical Genetic Laboratories. Vol. 15, Cancers. Multidisciplinary Digital Publishing Institute (MDPI); 2023. [DOI] [PMC free article] [PubMed]

[CR5] 5.Iqbal MA, Broeckel U, Levy B, Skinner S, Sahajpal NS, Rodriguez V et al. Multisite Assessment of Optical Genome Mapping for Analysis of Structural Variants in Constitutional Postnatal Cases. The Journal of Molecular Diagnostics [Internet]. 2023;25(3):175–88. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1525157823000028 [DOI] [PMC free article] [PubMed]

[CR6] 6.Levy B, Liu J, Iqbal MA, DuPont B, Sahajpal N, Ho M, et al. Multisite evaluation and validation of optical genome mapping for prenatal genetic testing. J Mol Diagn. 2024;26(10):906–16. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Neveling K, Mantere T, Vermeulen S, Oorsprong M, van Beek R, Kater-Baats E et al. Next-generation cytogenetics: Comprehensive assessment of 52 hematological malignancy genomes by optical genome mapping. Am J Hum Genet [Internet]. 2021;108(8):1423–35. Available from: 10.1016/j.ajhg.2021.06.001 [DOI] [PMC free article] [PubMed]

[CR8] 8.Mantere T, Neveling K, Pebrel-Richard C, Benoist M, van der Zande G, Kater-Baats E, et al. Optical genome mapping enables constitutional chromosomal aberration detection. Am J Hum Genet. 2021;108(8):1409–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Hastings RJ, Moore S, Chia N, editors. ISCN 2024. S. Karger AG; 2024.

[CR10] 10.Chilton L, Hills RK, Harrison CJ, Burnett AK, Grimwade D, Moorman AV. Hyperdiploidy with 49–65 chromosomes represents a heterogeneous cytogenetic subgroup of acute myeloid leukemia with differential outcome. Leukemia. 2014;28(2):321–8. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Levy B, Kanagal-Shamanna R, Sahajpal NS, Neveling K, Rack K, Dewaele B et al. A framework for the clinical implementation of optical genome mapping in hematologic malignancies. Am J Hematol [Internet]. 2024; Available from: https://onlinelibrary.wiley.com/doi/10.1002/ajh.27175 [DOI] [PubMed]

[CR12] 12.Hook EB. Exclusion of chromosomal mosaicism: tables of 90%, 95% and 99% confidence limits and comments on use. Am J Hum Genet [Internet]. 1977;29(1):94–7. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=168522826tool=pmcentrez26rendertype=abstract [PMC free article] [PubMed]

[CR13] 13.Sahajpal NS, Mondal AK, Tvrdik T, Hauenstein J, Shi H, Deeb KK et al. Clinical Validation and Diagnostic Utility of Optical Genome Mapping for Enhanced Cytogenomic Analysis of Hematological Neoplasms. The Journal of Molecular Diagnostics [Internet]. 2022;(October):2022.03.14.22272363. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1525157822002902 [DOI] [PubMed]

[CR14] 14.Chaubey A, Shenoy S, Mathur A, Ma Z, Valencia CA, Reddy Nallamilli BR, et al. Low-Pass genome sequencing: validation and diagnostic utility from 409 clinical cases of Low-Pass genome sequencing for the detection of copy number variants to replace constitutional microarray. J Mol Diagn. 2020;22(6):823–40. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Wiktor AE, Van Dyke DL, Stupca PJ, Ketterling RP, Thorland EC, Shearer BM et al. Preclinical validation of fluorescence in situ hybridization assays for clinical practice. Genetics in Medicine [Internet]. 2006 Jan [cited 2011 Aug 11];8(1):16–23. Available from: http://content.wkhealth.com/linkback/openurl?sid=WKPTLP:landingpage&an=00125817-200601000-00003 [DOI] [PubMed]

[CR16] 16.Petrackova A, Vasinek M, Sedlarikova L, Dyskova T, Schneiderova P, Novosad T, et al. Standardization of sequencing coverage depth in NGS: recommendation for detection of clonal and subclonal mutations in Cancer diagnostics. Front Oncol. 2019;9(September):1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Shah MV, Hung K, Baranwal A, Kutyna MM, Al-Kali A, Toop C et al. Evidence-based risk stratification of myeloid neoplasms harboring TP53 mutations. Blood Adv. 2025. [DOI] [PubMed]

[CR18] 18.Wei Q, Hu S, Loghavi S, Toruner GA, Ravandi-Kashani F, Tang Z, et al. Chromoanagenesis is frequently associated with highly complex karyotypes, extensive clonal heterogeneity, and treatment refractoriness in acute myeloid leukemia. Am J Hematol. 2025;100(3):417–26. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Schoumans J, Suela J, Hastings R, Muehlematter D, Rack K, van den Berg E et al. Guidelines for genomic array analysis in acquired haematological neoplastic disorders. Genes Chromosomes Cancer [Internet]. 2016;55(5):480–91. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26774012 [DOI] [PubMed]

[CR20] 20.Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun [Internet]. 2019;10(1):1784. Available from: http://www.nature.com/articles/s41467-018-08148-z [DOI] [PMC free article] [PubMed]

[CR21] 21.Arber DA, Orazi A, Hasserjian RP, Borowitz MJ, Calvo KR, Kvasnicka HM, et al. International consensus classification of myeloid neoplasms and acute leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200–28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Khoury JD, Solary E, Abla O, Akkari Y, Alaggio R, Apperley JF, et al. The 5th edition of the world health organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia. 2022;36(7):1703–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Döhner H, Wei AH, Appelbaum FR, Craddock C, DiNardo CD, Dombret H, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140(12):1345–77. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Döhner H, DiNardo CD, Appelbaum FR, Craddock C, Dombret H, Ebert BL, et al. Genetic risk classification for adults with AML receiving less-intensive therapies: the 2024 ELN recommendations. Blood. 2024;144(21):2169–73. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Zhao M, Ryall S, Brody SJ, Harris AC, Cabral K, Brownstein C, et al. Integrative cytogenetic and molecular studies unmask chromosomal mimicry in hematologic malignancies. Blood Adv. 2025;9(5):1003–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Valent P, Orazi A, Savona MR, Patnaik MM, Onida F, van de Loosdrecht AA, et al. Proposed diagnostic criteria for classical chronic myelomonocytic leukemia (CMML), CMML variants and pre-CMML conditions. Haematologica. 2019;104(10):1935–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Weeks LD, Ebert BL. Causes and consequences of clonal hematopoiesis. Blood. 2023;142(26):2235–46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Weeks LD, Niroula A, Neuberg D, Wong W, Lindsley RC, Luskin M et al. Prediction of risk for myeloid malignancy in clonal hematopoiesis. NEJM Evid. 2023;2(5). [DOI] [PMC free article] [PubMed]

[CR29] 29.Laurie CC, Laurie Ca, Rice K, Doheny KF, Zelnick LR, McHugh CP et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet [Internet]. 2012 Jun [cited 2014 Jan 15];44(6):642–50. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=336603326tool=pmcentrez26rendertype=abstract [DOI] [PMC free article] [PubMed]

[CR30] 30.Yang H, Garcia-Manero G, Sasaki K, Montalban-Bravo G, Tang Z, Wei Y et al. High-resolution structural variant profiling of myelodysplastic syndromes by optical genome mapping uncovers cryptic aberrations of prognostic and therapeutic significance. Leukemia [Internet]. 2022;36(9):2306–16. Available from: https://www.nature.com/articles/s41375-022-01652-8 [DOI] [PMC free article] [PubMed]

PERMALINK

What the VAF? A guide to the interpretation of variant allele fraction, percent mosaicism, and copy number in cancer

Adam C Smith

Hubert Tsui

Sila Usta

Jose-Mario Capo-Chichi

Abstract

Introduction

Fig. 1.

Classes of numeric abnormalities of chromosomes and copy number

Ploidy

Aneuploidy

Deletions and duplications

Reporting copy number variants

Table 1.

Table 2.

Reporting considerations for structural abnormalities

Table 3.

Fig. 2.

Reporting considerations for single nucleotide variation (SNV)

Reporting considerations for post-allogenic stem cell transplants

Fig. 3.

Bias between techniques

“Quantitative-ness” between techniques: a comparison of results between karyotype, FISH and optical genome mapping

Table 4.

Table 5.

Table 6.

Clinical impacts

Conclusion

Clarification on certain abbreviations

Copy number variant (CNV)

Variant allele fraction (VAF)

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Institutional review board

Informed consent

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases