To the Editor,
Submicroscopic levels of leukemic cells that persist after treatment are commonly designated as measurable residual disease (MRD). The last decade has witnessed a growing body of evidence proving the prognostic significance of MRD for both progression‐free and overall survival in chronic lymphocytic leukemia (CLL) [1, 2]. Moreover, MRD detection is now increasingly used to tailor treatment in accordance with the needs of the individual patient [3]. Currently accepted flow cytometry assays reach a detection limit of 10−4, but logically, MRD detection with higher sensitivity (e.g., 10−5) holds promise for further improved prediction.
The European Research Initiative on CLL (ERIC) has successfully developed a standardized 4‐color MRD flow assay featuring a fixed combination of markers, gates, and instructions for the application of gates with a sensitivity of 10−4 [4]. The more recent ERIC 8‐color MRD flow tube reportedly achieves a sensitivity of 10−5 [5], but lacks the precise description of an analysis strategy. Therefore, we assessed the reproducibility of the current benchmark ERIC 8‐color CLL MRD method (Figure 1A, Figure S1A, Table S1, see also supplemental materials and methods). A total of 99 samples from our dilution experiments were acquired and fully blinded. MRD levels were reported by four recognized experts with long‐standing experience in CLL MRD flow (including multicentric international trials performed at national MRD reference laboratories). MRD levels down to an expected MRD level of 10−4 were reproducibly and accurately reported by the experts (average agreement to expected: 92%). However, MRD levels between 10−4 and 10−5 from the dilution series were scored as expected in 74% of all cases only. Importantly, 23% of normal donor samples were considered MRD positive, albeit usually at very low levels (mean reported level: 5.3 × 10−5 range: 7.3 × 10−6–1 × 10−4). Furthermore, the data suggested personal biases of individual experts (compare Figure 1A, left and right panels). Despite the described variability, we acknowledge that the accuracy of the ERIC 8‐color CLL MRD method at levels below 10−4 might be better than reported herein if the individual pre‐therapeutic immunophenotype is known. Conversely, we hypothesized that reproducible MRD assessments might be demanding even at levels above 10−4 for operators with lesser experience.
FIGURE 1.
Blinded ERIC 8‐color panel analysis and evaluation of the final EuroFlow 8‐color panel in combination with the expert‐independent analysis algorithm. (A) Dilution series data were fully blinded and manually analyzed by four experts. The reported MRD levels vs. the expected CLL levels from the dilution are shown here for two of the four aforementioned experts. The results for the two other experts are shown in Figure S1A. Each dot represents one sample. Red dots indicate samples with discordant results between the four experts. Blue lines reflect the linear regression line for positive MRD values. Spearman correlation coefficient (R) is given. Black numbers provide percentages of false positive, true positive, false negative, and true negative scores of the individual experts using the expected values of the dilution series as reference. Red numbers indicate how often the other three experts disagreed with a given expert on the groups of false positive, true positive, false negative, and true negative samples. The threshold according to the limit of detection (LOD) is symbolized by the full line (not to scale). The dotted lines indicate the MRD levels of 10−4 and 10−5. uMRD, undetectable measurable residual disease with respect to LOD. (B) Composition of the final EuroFlow 8‐color panel. (C) The final EuroFlow 8‐color panel version 6 was tested in 23 dilution series (dilution steps: 10−1, 10−3, 10−4, 10−5, and 2 × 10−6), of those 20 were done in parallel with the ERIC 8‐color reference approach and 18 in parallel with NGS‐based MRD detection. Our analysis algorithm was as follows: (1) Clustering was performed and normal B‐cell clusters were assigned to defined normal B‐cell subpopulations by AG&I in the 10‐dimensional space (illustrated here for the dimensions CD27 and CD38). The underlying database was constructed from 14 normal PB samples. Clusters, that are not recognized as resembling any normal B‐cell subpopulation were considered putative abnormal (CLL) B cells. (2) 2D‐Robust Curve (2D‐RC) or canonical correlation analysis (CCA) trained either with the selection of typical CLLs (generic) or with the respective individual CLL phenotype (individual) of the CLL case were used to definitively classify the putative abnormal B‐cell clusters as CLL. The example shows a 2D‐RC that was trained to separate CLL‐like (CD5+CD27+) normal B cells from the generic CLL phenotype. The 1.5SD outline of the CLL cases is used as classifier. The final MRD result is the mean of the MRD levels of both tubes. Result is MRD negative (uMRD) if at least one tube is below the LOD (20 events). (D) The identified MRD levels by 2D‐RC or CCA trained either with the selection of typical CLLs (generic) or with the respective individual CLL phenotype of the CLL (individual) are shown vs. the expected MRD levels. The dotted lines indicate the MRD levels of 10−4 and 10−5. Blue lines reflect the linear regression lines for positive MRD values. Spearman correlation coefficients (R) are given. Numbers in the upper‐left (discrepant), lower‐left (concordant), and lower‐right (discrepant) quadrants indicate the percentage of MRD‐negative samples when assessed irrespective of threshold according to LOD (20 events per tube). The threshold according to LOD is symbolized by the full line (not to scale). The percentage of concordant MRD‐positive samples is shown in the upper‐right quadrant. (E) Correlation between MRD levels obtained with the EuroFlow 8‐color approach vs. the ERIC eight‐color flow approach for real MRD samples with matching full‐blown and follow‐up samples are shown (n = 13). Blue lines reflect the linear regression lines for positive MRD values. Spearman correlation coefficients (R) are provided. The red circle represents the sample with a phenotype shift from diagnosis to follow‐up. That sample can be well quantified using our analytical strategy with 2D‐RC trained with the generic CLL phenotype (upper panel) but not with an identical strategy when the individual diagnostic phenotype of the patient was used to train the 2D‐RC plot (lower panel). The threshold according to LOD is symbolized by the full line (not to scale). The dotted lines indicate the MRD levels of 10−4 and 10−5.
For broad applicability outside of specialized, expert‐led centers, refined panels, and fully standardized analysis strategies would be desirable for reproducible operator‐independent MRD detection at the 10−4 threshold or ideally even below. Therefore, the EuroFlow consortium developed an optimized 8‐color CLL MRD panel in six consecutive design‐validate‐redesign rounds, using false‐positivity rates as the read‐out for objective performance evaluation (Figure S1B,C, Tables S2 and S3). Our final EuroFlow 8‐color CLL MRD panel is shown in Figure 1B. In summary, the newly developed panel was more specific than the initial panel versions and at the same time more robust in the presence of state‐of‐the‐art therapies (Figures S2 and S3). The panel can be run on standard 8‐color flow cytometers and is suited to bulk lysis‐based sample preparation methods (Figure S4) which is a prerequisite for staining 10 million cells. These features make the new method a broadly available and cost‐effective tool for sensitive MRD assessments in CLL. An in‐depth description of the panel design steps can be found in the Supporting Information.
This novel panel was the basis for the development of an operator‐independent analytical strategy in order to obviate the inter‐operator variability, which was observed for the ERIC 8‐color MRD flow approach. Benefiting from EuroFlow experience in multiple myeloma MRD flow [6], we decided to integrate a clustering approach (to generate clusters of cellular events that resemble each other in the 10‐dimensional immunophenotypic space) and dedicated databases (one for each tube of the panel) for automated gating and identification (AG&I) of all normal B‐cell populations. B‐cell clusters that did not match any normal B‐cell population were regarded as putative CLL cells (so‐called “different from normal” approach). The optimized AG&I approach on its own proved sensitive enough to detect MRD (i.e., it is a good screening method), but lacked sufficient specificity (Figure S5). Therefore, we introduced a second step to automatically categorize the clusters which according to AG&I were considered as putative MRD events. This additional analytical step utilized the CLL leukemia‐associated immunophenotype (LAIP) to increase the specificity of cluster assignment. We derived the LAIP either from a collection of typical CLL cases (generic phenotype) or from the individual CLL immunophenotype of a particular patient. We evaluated two methods of dimension reduction of the 10‐dimensional CLL immunophenotype: canonical correlation analysis (CCA) and a two‐dimensional representation of robust Mahalanobis' distance (2D‐RC). We conclude from the single tube analyses that the information obtained by either of the newly developed MRD tubes of the two‐tube panel is sufficient to construct an algorithm that allows for fully automated MRD diagnosis with a limit of detection of 10−4 and an acceptable correlation to expected (R = 0.95–0.97, Figure S5). A priori knowledge of the initial immunophenotype will improve the accuracy of the automated analyses (correlation to expected: R = 0.99).
To fully utilize the information from the whole panel, we next combined the information from both tubes. Following approaches initially developed by the ERIC group [4], the final MRD level was calculated as the mean MRD level of the two tubes of the panel if at least 20 CLL events were identified in each of the two tubes; otherwise, the sample was classified as MRD negative (Figure 1C). We observed a high degree of correlation between identified and expected MRD levels when we quantitatively analyzed our results without considering specific MRD level thresholds (Figure 1D). The correlation coefficients vs. expected were better using analyses employing the particular individual CLL phenotype as compared to an approach that used a collection of CLL cases as reference (generic immunophenotype).
Considering the official International Workshop on CLL (iwCLL) threshold of 10−4 for a positive MRD result, all automated approaches that incorporated the individual immunophenotype yielded a sufficient agreement between identified and expected MRD (EuroFlow with 2D‐RC: 100%; EuroFlow with CCA: 99%; ERIC with 2D‐RC: 96%; Table S1). The EuroFlow 8‐color panel combined with cluster‐based individual analysis strategies showed a significantly better agreement to expected than the average manual result of the four experts that evaluated the ERIC 8‐color panel (2D‐RC: p = 0.0015; CCA: p = 0.01). An automated, 2D‐RC‐based analysis of the dilution series, stained with the ERIC panel, also improved the average expert‐driven manual analysis of the same samples, but was inferior vs. the novel EuroFlow panel (p = 0.047). Thus, both the novel analysis strategy and the novel panel could improve accuracy at the 10−4 threshold.
With 97% agreement to expected, the generic analysis strategies developed for the EuroFlow 8‐color panel demonstrated (numerically) a better performance as compared to the average expert rates based on the ERIC 8‐color panel (92%, p = n.s.). A fully automated analysis could therefore replace an expert‐driven manual analysis with an MRD threshold of 10−4 even when the initial immunophenotype of the particular patient is not known.
We subsequently compared EuroFlow and ERIC panels for samples with expected MRD levels between 10−4 and 10−5 when an automated 2D‐RC driven analysis trained with the individual immunophenotypes was applied. Our results showed a significantly higher concordance for the EuroFlow panel (94%) as compared with the ERIC panel (70%, p = 0.001, Table S1), thus again indicating that the EuroFlow panel provides more information to distinguish CLL from benign B‐cells. While this investigation shows an improvement in the overall performance of the novel EuroFlow panel, automatic real‐life MRD assessments at a 10−5 sensitivity threshold would require the knowledge of the initial phenotype of the specific patient.
We additionally demonstrated good correlations between the results obtained from our automated approach using the EuroFlow 8‐color panel and parallel assessments using the ERIC 8‐color panel and a novel NGS‐based MRD method (Figure S6).
Finally, we evaluated our approach in real MRD samples. Compared to the expert‐based manual analysis of the ERIC 8‐color tube, we found a strong correlation to our automated analysis based on the generic CLL immunophenotype (Figure 1E, upper diagram). When the initial individual immunophenotypes of the same patients were utilized to classify clusters from follow‐up samples after AG&I as CLL vs. benign, we observed a poorer correlation (Figure 1E, lower diagram) due to a single sample from a patient with TP53 mutation who was treated for 4 years with ibrutinib. This patient showed a significant immunophenotypic shift in the follow‐up sample (Figure S7) that precluded the identification of CLL cells using the automated algorithm trained with the initial patient‐specific immunophenotype.
We conclude that our novel MRD panel contains enough information to assess MRD in CLL down to the level of 10−5 if the initial CLL phenotype is known and as long as immunophenotypic shifts are unlikely. However, since immunophenotypic shifts that might affect our algorithm occur at a yet unknown frequency, caution is warranted when the individual phenotype variant of the algorithm is applied. In contrast, the generic approach proved robust against immunophenotypic shifts and allows expert‐independent automatic MRD flow with the current iwCLL threshold of 10−4.
Author Contributions
R.E. centrally analyzed the raw flow data, performed the data analysis, contributed to the establishment of the operator‐independent algorithms, drafted the manuscript, and approved the final version of the manuscript. J.F.M. contributed to the panel design, acquired flow cytometry data, revised the manuscript, and approved the final version of the manuscript. J.S.V. performed the NGS‐based MRD analyses, acquired flow cytometry data, revised the manuscript, and approved the final version of the manuscript. M.R. acquired flow cytometry data, contributed to interpretation of the data, revised the manuscript, and approved the final version of the manuscript. P.J.H. performed the NGS‐based MRD analyses, contributed to interpretation of the data, revised the manuscript, and approved the final version of the manuscript. S.K. acquired flow cytometry data, contributed to interpretation of the data, revised the manuscript, and approved the final version of the manuscript. G.G. established the AG&I database for the final EuroFlow 8‐color CLL‐MRD panel, revised the manuscript, and approved the final version of the manuscript. R.F.R. contributed to the establishment of the operator‐independent algorithms, revised the manuscript, and approved the final version of the manuscript. Q.L. contributed to the establishment of the operator‐independent algorithms, revised the manuscript, and approved the final version of the manuscript. J.P. acquired flow cytometry data, revised the manuscript, and approved the final version of the manuscript. N.V. acquired flow cytometry data, revised the manuscript, and approved the final version of the manuscript. P.F. acquired flow cytometry data, revised the manuscript, and approved the final version of the manuscript. L.B. acquired flow cytometry data, revised the manuscript, and approved the final version of the manuscript. J.J.M.v.D. contributed to the design of the study and panels as well as to the interpretation of the data, revised the manuscript, and approved the final version of the manuscript. A.O. contributed to the design of the study and panels as well as to the interpretation of the data, revised the manuscript, and approved the final version of the manuscript. A.W.L. contributed to the panel design as well as to the interpretation of the data, revised the manuscript, and approved the final version of the manuscript. S.B. contributed to the design of the study and panels as well as to the interpretation of the data, drafted and revised the manuscript, and approved the final version of the manuscript.
Conflicts of Interest
Sebastian Böttcher: Research funding: Roche, Genentech, AbbVie, Celgene, Becton Dickinson, and Janssen‐Cilag; Honoraria: Roche, AbbVie, Novartis, Becton Dickinson, Janssen, Astra‐Zeneca, and Sanofi; Travel support: Janssen and BeiGene. Jacques J. M. van Dongen and Alberto Orfao: Scientific advisory agreement and educational services agreement with BD Biosciences, San José, CA, USA (fees for USAL‐CIC, Salamanca). Anton W. Langerak: Research Support from Roche‐Genentech, Gilead, and Janssen; speaker fee from Janssen and Gilead. The IGHV leader NGS MRD assay was applied with financial support from the EuroClonality consortium. Georgiana Grigore and Rafael Fluxa Rodriguez are employees of Becton Dickinson and were formerly employed by Cytognos SL, Salamanca, Spain. Matthias Ritgen: Advisory boards, honoraria, and travel support by Janssen, AbbVie, Roche, BeiGene, and AstraZeneca. Sebastian Böttcher, Robby Engelmann, Juan Flores‐Montero, and Alberto Orfao each report being one of the inventors on the EuroFlow‐owned patent P135960EP00 (Methods, reagents and kits for detecting minimal/measurable disease in chronic lymphocytic leukemia [CLL]) filed on October 12, 2023. The Infinicyt software is based on intellectual property (IP) of some EuroFlow laboratories (University of Salamanca, Spain) and the scientific input of other EuroFlow members. Potential royalties from the patent P135960EP00 will be paid to the EuroFlow Consortium. These royalties will be exclusively used for continuation of the EuroFlow collaboration and sustainability of the EuroFlow consortium. The other authors declare no conflicts of interest.
Supporting information
Data S1. Supporting Information.
Acknowledgments
We thank Patrick Brennan for proofreading the English language.
Alberto Orfao, Anton W. Langerak, and Sebastian Böttcher contributed equally to this study.
Contributor Information
Jacques J. M. van Dongen, Email: j.j.m.vandongen@eslho.org.
Sebastian Böttcher, Email: sebastian.boettcher@med.uni-rostock.de.
Data Availability Statement
All primary data and analysis R scripts are provided upon reasonable request to the corresponding authors.
References
- 1. Böttcher S., Ritgen M., Fischer K., et al., “Minimal Residual Disease Quantification Is an Independent Predictor of Progression‐Free and Overall Survival in Chronic Lymphocytic Leukemia: A Multivariate Analysis From the Randomized GCLLSG CLL8 Trial,” Journal of Clinical Oncology 30 (2012): 980–988. [DOI] [PubMed] [Google Scholar]
- 2. “Hematologic Malignancies: Regulatory Considerations for Use of Minimal Residual Disease in Development of Drug and Biological Products for Treatment—Guidance for Industry,” accessed February 7, 2023, https://www.fda.gov/media/134605/download.
- 3. Munir T., Cairns D. A., Bloor A., et al., “Chronic Lymphocytic Leukemia Therapy Guided by Measurable Residual Disease,” New England Journal of Medicine 390 (2024): 326–337. [DOI] [PubMed] [Google Scholar]
- 4. Rawstron A. C., Villamor N., Ritgen M., et al., “International Standardized Approach for Flow Cytometric Residual Disease Monitoring in Chronic Lymphocytic Leukaemia,” Leukemia 21 (2007): 956–964. [DOI] [PubMed] [Google Scholar]
- 5. Rawstron A. C., Fazi C., Agathangelidis A., et al., “A Complementary Role of Multiparameter Flow Cytometry and High‐Throughput Sequencing for Minimal Residual Disease Detection in Chronic Lymphocytic Leukemia: An European Research Initiative on CLL Study,” Leukemia 30 (2016): 929–936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Flores‐Montero J., Sanoja‐Flores L., Paiva B., et al., “Next Generation Flow for Highly Sensitive and Standardized Detection of Minimal Residual Disease in Multiple Myeloma,” Leukemia 31 (2017): 2094–2103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supporting Information.
Data Availability Statement
All primary data and analysis R scripts are provided upon reasonable request to the corresponding authors.