Skip to main content
The Journal of Molecular Diagnostics : JMD logoLink to The Journal of Molecular Diagnostics : JMD
. 2021 Oct;23(10):1404–1413. doi: 10.1016/j.jmoldx.2021.07.012

Accurate Detection and Quantification of FLT3 Internal Tandem Duplications in Clinical Hybrid Capture Next-Generation Sequencing Data

Jack K Tung 1, Carlos J Suarez 1, Tsoyu Chiang 1, James L Zehnder 1, Henning Stehr 1,
PMCID: PMC8527870  PMID: 34363960

Abstract

FLT3 internal tandem duplications (ITDs) are found in approximately one-third of patients with acute myeloid leukemia and have important prognostic and therapeutic implications that have supported their assessment in routine clinical practice. Conventional methods for assessing FLT3-ITD status and allele burden have been primarily limited to PCR fragment size analysis because of the inherent difficulty in detecting large ITD variants by next-generation sequencing (NGS). In this study, we assess the performance of publicly available bioinformatic tools for the detection and quantification of FLT3-ITDs in clinical hybridization-capture NGS data. We found that FLT3_ITD_ext had the highest overall accuracy for detecting FLT3-ITDs and was able to accurately quantify allele burden. Although all other tools evaluated were able to detect FLT3-ITDs reasonably well, allele burden was consistently underestimated. We were able to significantly improve quantification of FLT3-ITD allelic burden independent of the detection method by utilizing soft-clipped reads and/or ITD junctional sequences. In addition, we show that identifying mutant reads by previously identified junctional sequences further improves the sensitivity of detecting FLT3-ITDs in post-treatment samples. Our results demonstrate that FLT3-ITDs can be reliably detected in clinical NGS data using available bioinformatic tools. We further describe how accurate quantification of FLT3-ITD allele burden can be added on to existing clinical NGS pipelines for routine assessment of FLT3-ITD status in patients with acute myeloid leukemia.


FMS-like tyrosine kinase 3 (FLT3) is a receptor tyrosine kinase upstream of multiple cell signaling pathways (eg, RAS signaling pathway, JAK/STAT signaling pathway, and phosphatidylinositol 3-kinase pathway) that promote proliferation, impair differentiation, and increase cell survival in hematopoietic stem cells. Mutations in FLT3 have been associated with up to 30% of patients with acute myeloid leukemia (AML). In-frame internal tandem duplications (ITDs) are the most common type of mutation, accounting for up to 25% of FLT3 mutations. These mutations typically involve the juxtamembrane domain in exons 14 and 15 (70% of ITDs), which leads to loss of its autoinhibitory function and constitutive activation of the tyrosine kinase domain.1

The presence of FLT3-ITDs has been shown to confer an overall worse prognosis in AML patients.1, 2, 3, 4, 5 In addition, the allelic burden of FLT3-ITDs also contributes to the risk stratification of AML patients, with high (≥0.5) allelic ratios (ARs) of mutant/wild-type alleles conferring worse prognosis than low (<0.5) ARs. The prognostic value of FLT3-ITDs has been incorporated into the clinical risk stratification guidelines outlined by the European Leukemia Net and the National Comprehensive Cancer Network and has thus made assessment of FLT3-ITD status a routine component of the clinical workup for patients newly diagnosed with AML.6

The assessment for FLT3-ITDs has been largely achieved by PCR, followed by fragment size analysis with capillary electrophoresis (CE). The presence of insertions in exons 14 and 15 is indicated by additional peaks in the electropherogram, and the allelic ratio is determined by quantifying the area under the wild-type and mutant peaks. Although these methods have proved to be informative in elucidating the prognostic and therapeutic value of FLT3-ITD status, they face several inherent issues, including being semiquantitative in nature, being restricted in the types of mutations they can identify, and having limited sensitivity compared with other methods, such as next-generation sequencing (NGS) and digital droplet PCR.

Although it is inherently difficult to accurately detect and quantify FLT3-ITDs by NGS because of their heterogeneity in size and challenges they present to bioinformatic pipelines, FLT3 is commonly covered in many clinical NGS assays because of its high clinical relevance. We therefore evaluated the performance of various publicly available bioinformatic tools in detecting FLT3-ITDs in clinical NGS data and further demonstrate how the accuracy of allele burden quantification can be improved independent of the detection method using relatively easy-to-implement methods.

Materials and Methods

Patient Samples

A total of 100 specimens from 75 unique patients previously tested as part of routine clinical care at our institution by both CE and NGS from 2018 to 2021 were used for this study. After manual review of PCR fragment analysis results, 51 positive specimens, representing patients at all clinical stages (initial diagnosis, remission, and relapse), were identified. A total of 49 negative specimens by PCR fragment analysis were selected at random. The specimen types consisted of 77 bone marrow aspirates, 20 peripheral blood samples, 2 cerebrospinal fluid samples, and 1 formalin-fixed, paraffin-embedded tissue.

FLT3-ITD Fragment Size Analysis

A separate PCR sizing assay is performed with every NGS case at our institution to evaluate for FLT3-ITDs. Exons 14 and 15 of FLT3 were PCR amplified with fluorescently labeled primers and resolved by CE. Fragment size analysis was performed with GeneScan analysis software version 3.1 (Thermo Fisher Scientific, Waltham, MA) using manufacturer recommendations. The presence of ITDs was determined by assessing for the presence of any peaks larger than the 239-bp wild-type allele. The allelic ratio of any identified FLT3-ITD was calculated by dividing the sum of the area under any mutant peaks by the area under the wild-type peak.

Next-Generation Sequencing

Next-generation sequencing was performed using the Stanford Actionable Mutation Panel for Hematopoietic and Lymphoid Neoplasms assay. This targeted hybrid capture assay covers 164 genes recurrently altered in myeloid and lymphoid neoplasms and detects single-nucleotide variants, short insertions and deletions (indels), and select gene fusions at a minimum allele fraction of 5% for single-nucleotide variants and indels. DNA is first extracted (Qiagen LLC, Germantown, MD) and acoustically sheared (Covaris Inc., Woburn, MA). NGS libraries are then prepared using the Kapa library preparation kit (Kapa Biosystems, Wilmington, MA). Purified libraries are then enriched using custom-designed capture probes (Roche Nimblegen Inc., Pleasanton, CA) for hybrid selection of 164 genes important in hematological disorders. The enriched libraries are then sequenced on an Illumina (San Diego, CA) Miseq or Nextseq instrument to yield 150-bp paired-end reads.

The Stanford Actionable Mutation Panel for Hematopoietic and Lymphoid Neoplasms bioinformatic pipeline includes the following steps: demultiplexing Illumina raw FASTQ files, mapping reads to the hg19 reference genome with BWA-mem (Burrows Wheeler Aligner,7 version 0.7.9a), deduplication of reads using molecular barcodes, and variant calling. Separate BAM (Binary Alignment Map) files without base quality adjustment are generated for indel calling. Single-nucleotide variants and indels are called using SAMtools8 version 1.2 and VarScan9 version 2.3.6 and must have a minimum of five supporting variant reads, 100× depth of coverage, and >1% variant allele frequency.

Detection of FLT3-ITDs

BAM files were first filtered with SAMtools to only include reads (and their unmapped pairs) covering the FLT3 gene (chromosome 13:28577411-28674729) to facilitate downstream analysis.

getITD is a python program that detects FLT3-ITDs after Needlemann-Wunsch alignment.10 Paired-end FASTQ files were first generated from the filtered FLT3 BAM files using SAMtools and default settings. getITD version 1.2.3 was downloaded from GitHub (https://github.com/tjblaette/getitd, last accessed March 28, 2021) and installed with the required dependencies. The paired-end FASTQ files were analyzed with either default settings or settings modified for hybrid capture data (-require_indel_free_primers False; -min_read_copies 1; and -infer_sense_from_alignment True). The high confidence ITD output file was manually reviewed for any detected ITDs.

Pindel is a command-line program run in a Linux environment that utilizes a pattern-growth de novo alignment algorithm to identify large deletions, medium-sized insertions, inversions, tandem duplications, and other structural variants.11 Pindel was downloaded from GitHub (https://github.com/genome/pindel, last accessed March 28, 2021) and installed with the required dependencies. The filtered FLT3 BAM files were analyzed with Pindel using the chromosome 13 reference sequence, an insert size of 200 bp, and a defined region limited to exons 14 and 15 of FLT3 (chromosome 13:28608023-28608352). The small insertions output file was manually reviewed for any detected insertions. Any insertions with less than two supporting reads or <2 bp in length were considered potentially spurious and were excluded.

ITD Assembler (ITDA) is a command-line program run in a Linux environment that utilizes de novo assembly to detect ITDs in NGS data.12 ITDA was downloaded from SourceForge (https://sourceforge.net/projects/itdassembler, last accessed March 28, 2021) and installed with the required dependencies. The filtered FLT3 BAM files were analyzed with ITDA using the following parameters: c = 5 cutoff for filtering reads, pkmer = 15, max_bin = 85, and min_bin = 5. The output file was manually reviewed, and any ITDs with fewer than two supporting reads or shorter than 2 bp were considered potentially spurious and excluded.

FLT3_ITD_ext is a Perl program that performs local realignment and in silico extension of reads to identify FLT3-ITDs from paired-end NGS data.13 FLT3_ITD_ext version 1.1 was downloaded from GitHub (https://github.com/ht50/FLT3_ITD_ext, last accessed March 28, 2021) and installed with the required dependencies. The filtered FLT3 BAM files were analyzed with FLT3_ITD_ext using default settings, and the output VCF (Variant Call Format) file was manually reviewed. Any ITDs with a raw allelic ratio <0.01 were considered potentially spurious and were not counted unless the same ITD was identified in a previous sample.

Determination of FLT3-ITD Allelic Ratio from NGS Data

The FLT3-ITD AR was calculated from the allele fraction (AF; number of mutant ITD reads divided by total reads) with the following formula: AR=AF1AF. The AF was provided as a direct output by getITD and FLT3_ITD_ext, whereas the AFs for Pindel and ITDA were calculated using the number of supporting reads as the number of mutant reads and the maximum coverage over the ITD region as the total number of reads.

The number of mutant ITD reads was also determined by detecting soft-clipped reads spanning the ITD region. Reads were filtered from the FLT3 BAM file with a custom Awk script for reads that either: have >10 soft-clipped or inserted bases and completely span the ITD region defined by Pindel or have >10 soft-clipped or inserted bases and are clipped exactly at the beginning or end of the ITD region defined by Pindel. Reads containing soft-clipping or inserted bases were subsequently identified using their CIGAR (Compact Idiosyncratic Gapped Alignment Report) strings. An example Awk script that utilizes SAMtools to extract reads from an input BAM file (inputbam.bam) that are soft clipped by >10 bp at the genomic coordinates of the ITD region identified by Pindel (Pindelstart or Pindelend) or reads that are soft clipped by >10 bp and completely span the ITD region identified by Pindel is as follows:

samtools view -h "inputbam.bam" | awk -v pinstart="$Pindelstart" -v pinend="$Pindelend" ‘BEGIN{OFS="∖t"}{if($1 ∼ /@/) {print} else {if ((($4== pinstart+1) ||($4 + ($6 sed -r "s/(.∗[ˆ0-9])∗([0-9]∗)M/∖2/") == pinend) || (($4 <= pinstart+1 && ($4 + (length($10))) >= pinend-1)) )&&($6 ∼ /[1-9]∗[1-9][0-9][IS]/)) {print} else {}}}’ | samtools view -Sbo "outputbam.bam"

The number of mutant ITD reads was also determined by using the grep command to search for sequences in the filtered FLT3 BAM file containing the junctional sequence between the duplicated region of the ITD. The number of total reads (sum of mutant and wild-type reads) was similarly determined by using the grep command on the two junctional sequences between the ITD and the flanking 5′ and 3′ reference sequences and averaging them. This method of estimating the number of total reads was introduced by Tsai et al.13 These junctional sequences were defined by the ITD sequence identified by Pindel and the respective FLT3 reference sequence. A simplified example bash script to grep, a 30-bp mutant junction ITD sequence, is as follows:

ITDseq="ACATTCCATTCTTACCAAACTCTAAATTTTCTCTTG"

FLT3refseq="CTTTCAGCATTTTGACGGCAACCTGGATTGAGACTCCTGTTTTGCTAATTCCATAAGCTGTTGCGTTCATCACTTTTCCAAAAGCACCTGATCCTAGTACCTTCCCTGCAAAGACAAATGGTGAGTACGTGCATTTTAAAGATTTTCCAATGGAAAAGAAATGCTGCAGAAACATTTGGCACATTCCATTCTTACCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAGTACTCATTATCTGAGGAGCCGGTCACCTGTACCATCTGTAGCTGGCTTTCATACCTAAATTGC"

#define mutant junction sequence

jxn30bp="${ITDseq:(-15)}${ITDseq:0:15}"

#count number of reads with mutant junction sequence

samtools view inputbam.bam | grep -c $jxn30bp

Results

Detection of FLT3-ITDs by CE

Exons 14 and 15 of FLT3 were amplified by PCR and resolved by CE to detect mutant ITDs. A total of 51 specimens were positive for an insertion in exon 14 or 15 of FLT3. These putative ITDs ranged in size from 6 to 174 bp, with an average of 51 bp (SD, 33 bp). Three of these specimens had more than one mutant ITD allele (one to three ITDs). In addition, the allelic ratio (determined by the area under the mutant peak divided by the area under the wild-type peak) was calculated for every mutant ITD and ranged from 0.01 to 4.92, with an average of 0.47 (SD, 0.75). One sample was not quantifiable because of the absence of a wild-type peak. Of the 51 specimens positive for an ITD, 15 were classified as high AR (≥0.5) and 35 were classified as low AR (<0.5). The distribution of ITD size versus allelic ratio is shown in Figure 1 and did not reveal any significant trends.

Figure 1.

Figure 1

Distribution of FLT3 internal tandem duplications (ITDs) and allelic ratios. ITD size (bp) and corresponding allelic ratio, determined by capillary electrophoresis, are plotted for all of the samples analyzed in the current study.

Detection of FLT3-ITDs by NGS

Four different bioinformatic tools were evaluated for their ability to detect FLT3-ITDs in our clinical NGS data. These tools were selected after a review of the literature and represent different approaches for detecting duplication events: getITD uses the Needlemann-Wunsch alignment algorithm,10 Pindel uses a pattern-growth de novo alignment algorithm,11 ITDA utilizes a de novo assembly method,12 and FLT3_ITD_ext utilizes a local alignment and in silico extension algorithm.13 The detection of FLT3-ITDs by these tools compared with CE is shown in Table 1, and the performance metrics are summarized in Table 2.

Table 1.

Comparison of Bioinformatic Tools and CE for Detection of FLT3-ITDs

Variable CE positive CE negative Total
getITD positive 44 2 46
getITD negative 7 47 54
Pindel positive 48 4 52
Pindel negative 3 45 48
ITDA positive 43 3 46
ITDA negative 8 46 54
FLT3_ITD_ext positive 51 5 56
FLT3_ITD_ext negative 0 44 44

CE, capillary electrophoresis; ITD, internal tandem duplication; ITDA, ITD Assembler.

Table 2.

Analytic Performance of getITD, Pindel, and ITDA for Detection of FLT3-ITDs

Variable Sensitivity (95% CI), % Specificity (95% CI), % Accuracy (95% CI), %
getITD 86.3 (73.7–94.3) 95.9 (86.0–99.5) 91.0 (83.6–95.8)
Pindel 94.1 (83.8–98.8) 91.8 (80.4–97.7) 93.0 (86.1–97.1)
ITDA 84.3 (71.4–93.0) 93.9 (83.1–98.7) 89.0 (81.2–94.4)
FLT3_ITD_ext 100.0 (93.0–100.0) 89.8 (77.8–96.6) 95.0 (88.7–98.4)

Sensitivity, specificity, and accuracy were calculated using capillary electrophoresis as the gold standard.

ITD, internal tandem duplication; ITDA, ITD Assembler.

NGS was able to provide additional information that was not apparent on the CE assay. Seven of the FLT3-ITD–positive samples by CE were determined to have an intermediate insertion (1 to 7 bp) within the ITD. In addition, two other CE-positive samples were determined to be solely in-frame insertions (6- and 21-bp insertions).

Compared with the CE assay, getITD had a sensitivity of 86.3% (95% CI, 73.7%–94.3%), a specificity of 95.9% (95% CI, 86.0%–99.5%), and an accuracy of 91.0% (95% CI, 83.6%–95.8%). getITD was able to detect ITDs ranging from 15 to 93 bp in specimens with blast percentages of 1% to 97%, all specimens with multiple ITDs, and all specimens with an intermediate insertion within an ITD. getITD was not able to detect one of the in-frame insertions that was 6 bp in length.

Pindel had a sensitivity of 94.1% (95% CI, 83.8%–98.8%), a specificity of 91.8% (95% CI, 80.4%–97.7%), and an accuracy of 93.0% (95% CI, 86.1%–97.1%) compared with the CE assay. Pindel was able to detect ITDs ranging from 6 to 93 bp in specimens with blast percentages of 0.5% to 97%, all specimens with multiple ITDs, all specimens with an intermediate insertion within an ITD, and both specimens with in-frame insertions.

ITDA had a sensitivity of 84.3% (95% CI, 71.4%–93.0%), a specificity of 93.9% (95% CI, 83.1%–98.7%), and an accuracy of 89.0% (95% CI, 81.2%–94.4%) compared with the CE assay. ITDA was able to detect ITDs ranging from 15 to 92 bp in specimens with blast percentages ranging from 0.5% to 97% and six specimens with an intermediate insertion within an ITD. ITDA was not able to detect one specimen with an intermediate insertion of 6 bp and both specimens with an in-frame insertion.

FLT3_ITD_ext had a sensitivity of 100% (95% CI, 93.0%–100%), a specificity of 89.8% (95% CI, 77.8%–96.6%), and an accuracy of 95.0% (95% CI, 88.7%–98.4%) compared with the CE assay. FLT3_ITD_ext was able to detect ITDs ranging from 6 to 174 bp in specimens with blast percentages of 0.5% to 97%, all specimens with multiple ITDs, all specimens with an intermediate insertion within an ITD, and both specimens with in-frame insertions.

On further review of discordant results, six of the specimens that were called negative by CE were most likely false negatives because of supporting evidence by orthogonal pathologic findings (morphology, cytogenetics, or flow cytometry) or by another bioinformatic method (Table 3). All of these cases were post-treatment specimens, and multiple reads supporting the previously identified ITDs were detected by one or more bioinformatic tools in the NGS data. Of note, the previously identified ITD for one of the samples could only be detected by the grep method, described below. After correcting for these potential false negatives, getITD had a sensitivity of 80.7% (95% CI, 68.1%–90.0%), a specificity of 100% (95% CI, 91.8%–100%), and an accuracy of 89.0% (95% CI, 81.2%–94.4%); Pindel had a sensitivity of 93.0% (95% CI, 83.0%–98.1%), a specificity of 100% (95% CI, 91.8%–100%), and an accuracy of 96.0% (95% CI, 90.1%–98.9%); ITDA had a sensitivity of 80.7% (95% CI, 68.1%–90.0%), a specificity of 100% (95% CI, 91.8%–100%), and an accuracy of 89.0% (95% CI, 91.2%–94.4%); FLT3_ITD_ext had a sensitivity of 98.3% (95% CI, 90.6%–100%), a specificity of 100% (95% CI, 91.8%–100%), and an accuracy of 99.0% (95% CI, 94.6%–100%).

Table 3.

Summary of False-Negative CE Cases

Sample Pretreatment ITD size, bp PCR/CE result Concurrent pathologic findings getITD AF Pindel AF ITDA AF FLT3_ITD_ext AF Grep AF
1 36 Equivocal peak No morphologic, flow cytometric, or cytogenetic evidence of disease 0.003 0.003 0.001 0.004 0.004
2 69 Equivocal peak Morphologic evidence of disease (0.5% blasts) and persistent abnormal karyotype Not detected 0.002 0.003 0.008 0.009
3 78 Negative Morphologic evidence of disease (5% blasts) and persistent abnormal karyotype 0.001 0.003 0.003 0.006 0.009
4 6 Negative No morphologic, flow cytometric, or cytogenetic evidence of disease Not detected 0.004 Not detected 0.006 NA
5 36 Negative No morphologic or cytogenetic evidence of disease but 0.04% abnormal blasts by MFC Not detected Not detected Not detected 0.001 0.001
6 21 Negative Morphologic evidence of disease and 8% abnormal blasts by MFC Not detected Not detected Not detected Not detected 0.001

AF, allele fraction; CE, capillary electrophoresis; ITD, internal tandem duplication; ITDA, ITD Assembler; MFC, multiparameter flow cytometry; NA, not applicable.

Overall, FLT3_ITD_ext had the best performance in detecting FLT3-ITDs in our clinical NGS data. Other evaluated bioinformatic tools were also highly concordant (>89% overall accuracy) with the CE results. In addition, all evaluated tools were able to detect FLT3-ITDs in CE-negative cases, suggesting that NGS methods are generally more sensitive than CE. Indeed, although the lowest AF detectable by CE in our data was approximately 0.01, NGS was able to detect the 126-bp FLT3-ITD in the PL-21 cell line down to a theoretical AF of 0.006 in a separate dilution experiment.

Quantification of FLT3-ITD Allele Burden

The ability to accurately measure FLT3-ITD allele burden from NGS data was evaluated by comparing the allele fraction determined by each bioinformatic tool with the allele fraction determined by CE, the current gold standard for quantifying FLT3-ITD allele burden.

The correlation between the FLT3-ITD allele fraction determined by each bioinformatic tool and the allele fraction determined by CE is shown in Figure 2A and Supplemental Table S1. FLT3_ITD_ext showed the highest linear correlation, with an R2 of 0.91 and a slope of 0.9. getITD, Pindel, and ITDA showed weak to moderate linear correlation, with Pindel having the highest R2 of 0.70. These three tools also significantly underestimated the allelic ratio determined by CE, as indicated by the slopes of the regression line being less than one. In addition, these three tools showed significant negative proportional bias with increasing allelic ratios (Supplemental Figure S1). These negative biases directly translated into misclassification of allelic ratio status, with getITD correctly classifying 84%, Pindel correctly classifying 73%, and ITDA correctly classifying 70% of specimens determined by CE (Figure 2B).

Figure 2.

Figure 2

Assessment of getITD, Pindel, Internal Tandem Duplication Assembler (ITDA), and FLT3_ITD_ext in quantifying FLT3-ITD allelic burden. A: Allele fraction (AF) quantification by getITD, Pindel, ITDA, and FLT3_ITD_ext compared with capillary electrophoresis (CE). B: Allelic ratio classification by each tool compared with CE.

FLT3-ITD Allele Fraction Quantification Is Improved Using Soft-Clipped Reads and ITD Junctional Sequences

Because multiple bioinformatic tools underestimated the FLT3-ITD allele fraction, we sought to improve allele burden quantification independent of the detection method. We utilized both the sequence and the genomic coordinates of the identified FLT3-ITDs to identify ITD junctional sequences and soft-clipped reads, respectively, as surrogates for mutant ITD reads.

Mutant reads that adequately span the ITD junction will only partially align to the reference sequence and will thus be soft clipped on either the 5′ or the 3′ end of the region that is duplicated (Figure 3A). We therefore sought to quantify mutant ITD reads by counting the number of soft-clipped reads spanning the ITD region identified by Pindel. As described fully in Materials and Methods, soft-clipped reads in the ITD region were identified using an Awk script interrogating the CIGAR string of the BAM files; the allele fraction was subsequently calculated using the number of wild-type reads determined from the total number of reads in the same region. The correlation of these allele fractions with those determined by CE is shown in Figure 3B. Although this method of identifying mutant reads also appeared to underestimate the allele fraction determined by CE, it was significantly improved, with a slope of its linear regression line equal to 0.57 and R2 of 0.84. This improved linear correlation translated to an 81% overall accuracy of correctly classifying allelic burden determined by CE.

Figure 3.

Figure 3

Identifying mutant reads by soft-clipping or internal tandem duplication (ITD) junctional sequences. A: Schematic depicting various types of reads being utilized for calculating allelic ratio. Green boxed areas indicate duplicated regions. Blue and orange lines represent reads containing mutant junction sequence between the duplicated regions. These reads will be soft clipped at their 5′ or 3′ ends (dashed ends flanking green boxed areas) when they are aligned with the reference sequence, as shown in the example below (a 48-bp ITD is shown in Integrated Genomic Viewer). Black lines indicate reads spanning 5′ and 3′ boundaries of duplicated region. B: Allele fraction (AF), determined by soft-clipped reads compared with capillary electrophoresis (CE). C: Allele fraction determined by 30-bp ITD junctional sequences compared with CE. WT, wild type.

Both mutant and wild-type reads can be readily identified by their respective ITD junctional sequences (Figure 3A); mutant reads can be determined by the junctional sequence between the two copies of the duplicated sequence, whereas the total reads can be determined by the junctional sequences between the ITD and flanking sequences. As described fully in Materials and Methods, we calculated FLT3-ITD allele fractions by querying the BAM files with 30-bp junctional sequences using grep. The correlation of these calculated allele fractions with those determined by CE is shown in Figure 3C. This method of identifying mutant versus wild-type reads produced allele fractions that were highly correlated with those determined by CE, resulting in an R2 of 0.89 and a linear slope of 0.92. This high correlation directly translated to correctly classifying 93% (14/15) of high allelic ratio samples and 85% (29/34) of low allelic ratio samples. The misclassified samples by NGS (mean AR, 0.53; SD, 0.16) were all near the 0.5 cutoff, with a mean AR of 0.44 (SD, 0.06) by CE.

Quantification of FLT3-ITD allele burden can therefore be improved by utilizing the presence of soft-clipped reads and/or identified ITD junctional sequences, independent of the detection method used. Although the grep method had a similar overall accuracy as FLT3_ITD_ext in correctly classifying FLT3-ITD allele burden (88% versus 86%), the grep method was better at classifying high allele burden samples (93% versus 60%), whereas FLT3_ITD_ext was better in classifying low allele burden samples (97% versus 85%).

Monitoring FLT3-ITDs as a Marker of Residual Disease by NGS

Because detection of FLT3-ITDs with NGS appeared to be more sensitive than detection by CE, we decided to interrogate all negative samples by CE that also had prior positive FLT3-ITD status in our cohort. Of these 19 samples, five were positive for their previously identified ITD junctional sequences using grep (as described previously in the Methods). All of these samples also had multiple mutations identified by prior NGS profiling, supporting the presence of residual disease. Figure 4 shows the allelic ratios determined by using the grep command on FLT3-ITD junctional sequences compared with CE over time, along with clinical correlates for one of these patients. The NGS data were positive for the previously identified FLT3-ITD at two time points after induction (months 2 and 4) at low allelic fractions despite no PCR, morphologic, or flow cytometric evidence of disease. These results suggest that detecting FLT3-ITDs by their junctional sequences is both highly sensitive and specific and may be useful for monitoring disease levels in some patients undergoing treatment.

Figure 4.

Figure 4

FLT3 internal tandem duplication (ITD) junctional sequences can be tracked over time as a measure of disease burden. A: Allelic ratio (AR), determined by capillary electrophoresis and ITD junctional sequences, is plotted over time for a patient who eventually relapsed after chemotherapy. B: Mutant reads containing ITD junctional sequences identified by grep are shown for the specimen at month 4. ITD sequence is highlighted in green. AML, acute myeloid leukemia; C1 to C6, chemotherapy cycles consisting of 5 days of decitabine and venetoclax; IND, induction chemotherapy consisting of 10 days of decitabine and midostaurin; NGS, next-generation sequencing; ref., reference.

Discussion

Detection of FLT3-ITDs in NGS data continues to be challenging because of limitations of standard clinical NGS pipelines. In this study, we evaluated the performance of several publicly available bioinformatic tools for detection of FLT3-ITDs in clinical hybrid capture NGS data and found that FLT3_ITD_ext had the best performance, identifying all FLT3-ITDs detected by fragment size analysis in addition to five presumed false negatives by CE. We further describe methods to improve the estimation of FLT3-ITD allelic ratio by quantifying mutant and wild-type reads using soft-clipped reads and junctional sequences. Querying junctional sequences was not only more sensitive at identifying mutant reads, but also the most accurate method for classifying FLT3-ITD allele burden using fragment size analysis as the gold standard, achieving a correct classification of FLT3-ITD status in 88% of evaluated cases.

Several groups have previously developed methods for detecting FLT3-ITDs in clinical NGS data.11, 12, 13, 14, 15, 16, 17 However, straightforward application of these methods can oftentimes be challenging if they are platform specific, proprietary, and/or suboptimal in regard to quantifying allelic burden. Indeed, the tools evaluated in this study were developed in the context of different sequencing platforms, and differences in performance may be linked to certain platform-specific features that did not generalize well. Clinical laboratories should therefore carefully evaluate available tools to determine which one is best suited for their particular workflow. Our results provide a select survey of publicly available tools that represent different approaches to FLT3-ITD detection. We found that all of the tools evaluated in our study were able to detect FLT3-ITDs reasonably well, achieving >89% accuracy compared with CE. However, all tools except for FLT3_ITD_ext had challenges with quantifying allele burden, with AR underestimation being a common problem that has been observed in other studies as well.13,15,18, 19, 20 These results are in agreement with a recent study that evaluated the performance of various computational tools in detecting FLT3-ITDs in hybrid capture data.21 We further found that quantification of FLT3-ITDs could be significantly improved by utilizing soft-clipping and junctional sequences to identify mutant reads, supporting similar approaches utilized by other groups.13,15 The methods presented herein can be easily added on to existing clinical workflows.

Although CE has become the clinical standard for detecting and quantifying FLT3-ITD allele burden, NGS has unique advantages that warrant its clinical assessment. First, NGS can achieve a lower limit of detection for single-nucleotide variants and indels than conventional capillary electrophoresis methods. Dilution studies have shown that CE can detect allele frequencies down to 1% to 2%,3,22 whereas NGS can have detection limits down to 0.001% with the use of unique molecular barcodes and digital error suppression.23 Indeed, six samples determined to be false negatives by CE were positive by NGS in our cohort. This higher degree of sensitivity may prove particularly useful for minimal/measurable residual disease applications, as FLT3-ITDs at low allele frequencies have been shown to be a useful prognostic biomarker in several studies.10,16,24,25 However, the sensitivity of detecting FLT3-ITDs can vary widely, depending on the method used. We demonstrated that using the grep command for mutant junctional sequences (which has a theoretical sensitivity of a single read and has been utilized by others to quantify and identify alterations, such as rearrangements and insertions, with high specificity26, 27, 28, 29) was more sensitive at detecting FLT3-ITDs than the other bioinformatic tools evaluated in our study. Second, NGS offers the advantage of more comprehensive assessment of FLT3 status in addition to other relevant genes. In our cohort, we were able to identify seven patients with FLT3 point mutations that would not have been identified by CE. In addition, NGS methods can also be readily applied for interrogation of other insertions and deletions involving other genes. Third, NGS offers a more objective quantification of FLT3-ITD allele burden. Although CE is the current gold standard for quantifying FLT3-ITD allele burden, it is inherently a semiquantitative assay that relies on estimating the area under the fragment distribution curves. Furthermore, these methods are also susceptible to PCR bias, where preferential amplification of shorter wild-type alleles can confound its relative quantification to longer mutant alleles.22 With limited PCR cycles and absolute quantification of allele frequencies, hybrid capture NGS offers a more unbiased and objective method of assessing FLT3-ITD status.

There are certain limitations with the bioinformatic tools and methods we describe in this study. In particular, the maximum ITD size able to be detected with each bioinformatic tool varies. For Pindel, the maximum detectable insert size is 20 bp less than the read length.11 For getITD, the maximum detectable ITD is 6 bp (minimum insert length) less than the read length.10 For ITDA, the maximum detectable ITD length is not necessarily limited by the read length, but only ITDs up to 80 bp have been detected by us and others.12 These size limitations were confirmed when the PL-21 cell line was evaluated, and only getITD and Pindel were able to detect the 126-bp FLT3-ITD it harbors. In contrast, only FLT3_ITD_ext was able to detect a 300-bp ITD present in the HD829 reference material (Horizon Discovery, Waterbeach, UK). Although large FLT3-ITDs >200 bp are rare,5 longer read lengths can be utilized to overcome this size limitation. Another limitation is that the quantification of FLT3-ITDs by junctional reads is contingent on knowing the sequence of the duplicated region a priori. This method of quantification may therefore be limited in low cellularity samples (ie, for minimal/measurable residual disease assessment) where the diagnostic specimen or other specimen with higher tumor burden is not available for analysis. Another important limitation is that assessment of FLT3-ITD status by NGS has a significantly longer turnaround time (typically 2 to 4 weeks) compared with CE (typically 1 to 2 days). Assessment of FLT3-ITD status by CE will therefore continue to be required for most clinical laboratories to support rapid therapeutic decision making and enrollment in clinical trials.

The methods we describe in this study can be easily integrated into most clinical NGS workflows. However, software parameters must still be optimized for the individual workflows and desired performance characteristics of each laboratory. The region of interest, number of supporting reads, and length of junctional sequence queried are examples of parameters that should be considered to maximize the performance of detection and quantification of FLT3-ITDs. For example, we chose to look only at the Pindel small insertions output to ensure that we do not miss any cases with duplications harboring an intermediate insertion. Indeed, this strategy was helpful in identifying the 171-bp ITD that was above the theoretical detectable ITD size for Pindel by identifying the intermediate 15-bp insertion.

In summary, FLT3-ITDs are well represented in clinical hybrid-capture NGS data and can be detected using various publicly available bioinformatic tools. The quantification of FLT3-ITD allele burden and the detection of ITD sequences identified in prior diagnostic specimens can be improved using bioinformatic markers, such as soft-clipping and mutant junctional sequences. Implementation of FLT3-ITD assessment in clinical NGS workflows will be necessary for assessment of future minimal/measurable residual disease studies in patients with AML.

Acknowledgments

We thank the members of the Stanford Molecular Pathology Laboratory for technical assistance.

Footnotes

Supported by National Cancer Institute2PO 1CA49605 (J.L.Z.); and Stanford University, Department of Pathology.

Disclosures: None declared.

Supplemental material for this article can be found at http://doi.org/10.1016/j.jmoldx.2021.07.012.

Supplemental Data

Supplemental Figure S1.

Supplemental Figure S1

Difference plots illustrate negative proportional bias of bioinformatic tools in allele fraction (AF) calculation. Difference in AF determined by each bioinformatic tool from AF determined by capillary electrophoresis (CE) is plotted against the average AF determined by the two methods. ITDA, Internal Tandem Duplication Assembler.

Supplemental Table S1
mmc1.docx (22.5KB, docx)

References

  • 1.Patnaik M.M. The importance of FLT3 mutational analysis in acute myeloid leukemia. Leuk Lymphoma. 2018;59:2273–2286. doi: 10.1080/10428194.2017.1399312. [DOI] [PubMed] [Google Scholar]
  • 2.Onecha E., Linares M., Rapado I., Ruiz-Heredia Y., Martinez-Sanchez P., Cedena T., Pratcorona M., Oteyza J.P., Herrera P., Barragan E., Montesinos P., Vela J.A.G., Magro E., Anguita E., Figuera A., Riaza R., Martinez-Barranco P., Sanchez-Vega B., Nomdedeu J., Gallardo M., Martinez-Lopez J., Ayala R. A novel deep targeted sequencing method for minimal residual disease monitoring in acute myeloid leukemia. Haematologica. 2019;104:288–296. doi: 10.3324/haematol.2018.194712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kim Y., Lee G.D., Park J., Yoon J.H., Kim H.J., Min W.S., Kim M. Quantitative fragment analysis of FLT3-ITD efficiently identifying poor prognostic group with high mutant allele burden or long ITD length. Blood Cancer J. 2015;5:1–7. doi: 10.1038/bcj.2015.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yalniz F., Abou Dalle I., Kantarjian H., Borthakur G., Kadia T., Patel K., Loghavi S., Garcia-Manero G., Sasaki K., Daver N., DiNardo C., Pemmaraju N., Short N.J., Yilmaz M., Bose P., Naqvi K., Pierce S., Nogueras González G.M., Konopleva M., Andreeff M., Cortes J., Ravandi F. Prognostic significance of baseline FLT3-ITD mutant allele level in acute myeloid leukemia treated with intensive chemotherapy with/without sorafenib. Am J Hematol. 2019;94:984–991. doi: 10.1002/ajh.25553. [DOI] [PubMed] [Google Scholar]
  • 5.Gale R.E., Green C., Allen C., Mead A.J., Burnett A.K., Hills R.K., Linch D.C. The impact of FLT3 internal tandem duplication mutant level, number, size, and interaction with NPM1 mutations in a large cohort of young adult patients with acute myeloid leukemia. Blood. 2008;111:2776–2784. doi: 10.1182/blood-2007-08-109090. [DOI] [PubMed] [Google Scholar]
  • 6.Döhner H., Estey E., Grimwade D., Amadori S., Appelbaum F.R., Büchner T., Dombret H., Ebert B.L., Fenaux P., Larson R.A., Levine R.L., Lo-Coco F., Naoe T., Niederwieser D., Ossenkoppele G.J., Sanz M., Sierra J., Tallman M.S., Tien H.-F., Wei A.H., Löwenberg B., Bloomfield C.D. Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel. Blood. 2017;129:424–447. doi: 10.1182/blood-2016-08-733196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;2515:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;1525:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Koboldt D.C., Zhang Q., Larson D.E., Shen D., McLellan M.D., Lin L., Miller C.A., Mardis E.R., Ding L., Wilson R.K. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Blätte T.J., Schmalbrock L.K., Skambraks S., Lux S., Cocciardi S., Dolnik A., Döhner H., Döhner K., Bullinger L. getITD for FLT3-ITD-based MRD monitoring in AML. Leukemia. 2019;33:2535–2539. doi: 10.1038/s41375-019-0483-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ye K., Schulz M.H., Long Q., Apweiler R., Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865–2871. doi: 10.1093/bioinformatics/btp394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rustagi N., Hampton O.A., Li J., Xi L., Gibbs R.A., Plon S.E., Kimmel M., Wheeler D.A. ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data. BMC Bioinformatics. 2016;17:1–8. doi: 10.1186/s12859-016-1031-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Tsai H.K., Brackett D., Szeto D., Frazier R., MacLeay A., Davineni P., Manning D., Garcia E., Lindeman N.I., Le L.P., Lennerz J.K., Gibson C.J., Lindsley R.C., Kim A.S., Nardi V. Targeted informatics for optimal FLT3-ITD detection, characterization, and quantification across multiple NGS platforms. J Mol Diagn. 2020;22:1162–1178. doi: 10.1016/j.jmoldx.2020.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Au C.H., Wa A., Ho D.N., Chan T.L., Ma E.S.K. Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms. Diagn Pathol. 2016;11:1–12. doi: 10.1186/s13000-016-0456-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.He R., Devine D.J., Tu Z.J., Mai M., Chen D., Nguyen P.L., Oliveira J.L., Hoyer J.D., Reichard K.K., Ollila P.L., Al-Kali A., Tefferi A., Begna K.H., Patnaik M.M., Alkhateeb H., Viswanatha D.S. Hybridization capture-based next generation sequencing reliably detects FLT3 mutations and classifies FLT3-internal tandem duplication allelic ratio in acute myeloid leukemia: a comparative study to standard fragment analysis. Mod Pathol. 2020;33:334–343. doi: 10.1038/s41379-019-0359-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Levis M.J., Perl A.E., Altman J.K., Gocke C.D., Bahceci E., Hill J., Liu C., Xie Z., Carson A.R., McClain V., Stenzel T.T., Miller J.E. A next-generation sequencing-based assay for minimal residual disease assessment in AML patients with FLT3-ITD mutations. Blood Adv. 2018;2:825–831. doi: 10.1182/bloodadvances.2018015925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang T.Y., Yang R. ScanITD: detecting internal tandem duplication with robust variant allele frequency estimation. Gigascience. 2020;9:1–11. doi: 10.1093/gigascience/giaa089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Spencer D.H., Abel H.J., Lockwood C.M., Payton J.E., Szankasi P., Kelley T.W., Kulkarni S., Pfeifer J.D., Duncavage E.J. Detection of FLT3 internal tandem duplication in targeted, short-read-length, next-generation sequencing data. J Mol Diagn. 2013;15:81–93. doi: 10.1016/j.jmoldx.2012.08.001. [DOI] [PubMed] [Google Scholar]
  • 19.Kim B., Kim S.J., Lee S.T., Min Y.H., Choi J.R. FLT3 internal tandem duplication in patients with acute myeloid leukemia is readily detectable in a single next-generation sequencing assay using the pindel algorithm. Ann Lab Med. 2019;39:327–329. doi: 10.3343/alm.2019.39.3.327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Schranz K., Hubmann M., Harin E., Vosberg S., Herold T., Metzeler K.H., Rothenberg-Thurley M., Janke H., Bräundl K., Ksienzyk B., Batcha A.M.N., Schaaf S., Schneider S., Bohlander S.K., Görlich D., Berdel W.E., Wörmann B.J., Braess J., Krebs S., Hiddemann W., Mansmann U., Spiekermann K., Greif P.A. Clonal heterogeneity of FLT3-ITD detected by high-throughput amplicon sequencing correlates with adverse prognosis in acute myeloid leukemia. Oncotarget. 2018;9:30128–30145. doi: 10.18632/oncotarget.25729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Yuan D., He X., Han X., Yang C., Liu F., Zhang S., Luan H., Li R., He J., Duan X., Wang D., Zhou Q., Gao S., Niu B. Comprehensive review and evaluation of computational methods for identifying FLT3 -internal tandem duplication in acute myeloid leukaemia. Brief Bioinform. 2021:bbab099. doi: 10.1093/bib/bbab099. [DOI] [PubMed] [Google Scholar]
  • 22.Murphy K.M., Levis M., Hafez M.J., Geiger T., Cooper L.C., Smith B.D., Small D., Berg K.D. Detection of FLT3 internal tandem duplication and D835 mutations by a multiplex polymerase chain reaction and capillary electrophoresis assay. J Mol Diagn. 2003;5:96–102. doi: 10.1016/S1525-1578(10)60458-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Young A.L., Wong T.N., Hughes A.E.O., Heath S.E., Ley T.J., Link D.C., Druley T.E. Quantifying ultra-rare pre-leukemic clones via targeted error-corrected sequencing. Leukemia. 2015;29:1608–1611. doi: 10.1038/leu.2015.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bibault J.-E., Figeac M., Hélevaut N., Rodriguez C., Quief S., Sebda S., Renneville A., Nibourel O., Rousselot P., Gruson B., Dombret H., Castaigne S., Preudhomme C. Next-generation sequencing of FLT3 internal tandem duplications for minimal residual disease monitoring in acute myeloid leukemia. Oncotarget. 2015;6:22812–22821. doi: 10.18632/oncotarget.4333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Thol F., Kölking B., Damm F., Reinhardt K., Klusmann J.H., Reinhardt D., von Neuhoff N., Brugman M.H., Schlegelberger B., Suerbaum S., Krauter J., Ganser A., Heuser M. Next-generation sequencing for minimal residual disease monitoring in acute myeloid leukemia patients with FLT3-ITD or NPM1 mutations. Genes Chromosomes Cancer. 2012;51:689–695. doi: 10.1002/gcc.21955. [DOI] [PubMed] [Google Scholar]
  • 26.Panagopoulos I., Gorunova L., Bjerkehagen B., Heim S. The “grep” command but not FusionMap, FusionFinder or ChimeraScan captures the CIC-DUX4 fusion gene from whole transcriptome sequencing data on a small round cell tumor with t(4;19)(q35;q13) PLoS One. 2014;9:10–15. doi: 10.1371/journal.pone.0099439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Panagopoulos I., Gorunova L., Bjerkehagen B., Heim S. Novel KAT6B-KANSL1 fusion gene identified by RNA sequencing in retroperitoneal leiomyoma with t(10;17)(q22;q21) PLoS One. 2015;10:e0117010. doi: 10.1371/journal.pone.0117010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bujakowska K.M., White J., Place E., Consugar M., Comander J. Efficient in silico identification of a common insertion in the MAK gene which causes retinitis pigmentosa. PLoS One. 2015;10:e0142614. doi: 10.1371/journal.pone.0142614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tyburczy M.E., Dies K.A., Glass J., Camposano S., Chekaluk Y., Thorner A.R., Lin L., Krueger D., Franz D.N., Thiele E.A., Sahin M., Kwiatkowski D.J. Mosaic and intronic mutations in TSC1/TSC2 explain the majority of TSC patients with no mutation identified by conventional testing. PLoS Genet. 2015;11:e1005637. doi: 10.1371/journal.pgen.1005637. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Table S1
mmc1.docx (22.5KB, docx)

Articles from The Journal of Molecular Diagnostics : JMD are provided here courtesy of American Society for Investigative Pathology

RESOURCES