Abstract
The cellular composition of the chronic myeloid leukemia (CML) bone marrow (BM) beyond granulocyte enrichment remains poorly understood. We analyzed 1548 routinely stained BM aspirate slides from 598 patients across seven sites using deep learning‐based image analysis to identify cytomorphological markers predictive of major molecular response. Erythroid precursor enrichment, monocyte nuclear lobulation, and low peripheral leukocyte count were associated with improved tyrosine kinase inhibitor (TKI) response. These features were validated both visually and computationally in two independent cohorts. We developed a Morphoclinical model integrating these image‐derived and clinical variables, outperforming (area under the receiver‐operating curve [AUROC] 0.76) the clinically used EUTOS long‐term survival score (AUROC 0.53) and BCR::ABL1 halving time (AUROC 0.61). Notably, poor‐risk patients treated with second‐generation TKIs achieved outcomes similar to favorable‐risk patients on imatinib. These results underline the overlooked prognostic value of BM cytomorphology to refine risk stratification and support more personalized frontline therapy in CML.
INTRODUCTION
Chronic myeloid leukemia (CML) is a hematological malignancy marked by the presence of the Philadelphia chromosome, a result of the t(9;22)(q34.1;q11.2) alteration that results in the BCR::ABL1 fusion gene. 1 Tyrosine kinase inhibitors (TKIs) targeting the BCR::ABL1 kinase have revolutionized the treatment of CML, transforming it from a fatal disease into a manageable chronic condition for most patients. 2 Currently, several TKIs are approved for first‐line use, including imatinib, dasatinib, nilotinib, and bosutinib. 3 Among these, the first‐generation TKI imatinib remains a commonly used cost‐effective option for initial therapy, particularly in patients with chronic phase (CP) CML. 3 However, second‐generation TKIs (2GTKIs) such as dasatinib, nilotinib, and bosutinib induce deep molecular responses faster and more often compared to imatinib.4, 5, 6 While overall survival is comparable between first‐ and second‐generation TKIs, 5 , 7 the ability of 2GTKIs to achieve deeper molecular responses earlier may support later achievement of treatment‐free remission (TFR). 4 , 8
First‐line treatment selection for CML is primarily guided by the patient's prognostic risk at diagnosis, in particular the European Treatment and Outcome Study long‐term survival (ELTS) score. 5 , 6 These risk assessments guide the selection of the initial TKI to optimize long‐term outcomes. In turn, decisions regarding continuation or modification of therapy are guided by treatment response. Achieving a deep molecular response early is a strong predictor of TFR, 4 a contemporary therapeutic goal in CML. Such a response increases the likelihood of successful TFR,4, 8, 9, 10 which reduces the need for lifelong therapy and improves patients' quality of life. 11
Although the ELTS score identifies patients at higher risk of progression, 70% of CML‐related deaths are among low‐ to intermediate‐risk patients, highlighting the need to identify more accurate treatment biomarkers. 12 Despite being an inexpensive and routinely assessed resource in CML diagnosis, the bone marrow (BM) cell composition, maturation, and morphology have not been fully explored in conjunction with treatment response. Recently, we demonstrated in a multisite cohort that higher abundance of neutrophils and elliptical nuclear morphology of maturing granulocytes measured at diagnosis are associated with TKI discontinuation success. 13 Moreover, pretreatment BM biopsy fibrosis, adipocyte distribution, and megakaryocyte clustering have been suggested to predict TKI response. 14 While biopsies can better conserve the spatial structure of the BM, these have become less essential in the diagnosis of CML. Aspirates are less invasive, more patient‐friendly, and typically sufficient when combined with molecular testing, leading many clinical sites to prioritize them over biopsy samples.
Recent advances in computational pathology might transform the diagnosis of hematological malignancies. 15 Given that the ELTS score has been designed to predict prognosis and not TKI sensitivity, we hypothesize that the BM cytomorphological fingerprint could support the ELTS score in the selection of 2GTKI and improve the rate of deep molecular responses and the likelihood of achieving TFR. Here, we analyzed 1548 May‐Grünwald Giemsa (MGG) stained BM aspirate slides from seven different sites. We used deep learning‐based image analysis to study the clinical significance of cytomorphology in CML patients. Additionally, we developed a model to predict TKI sensitivity and evaluated its performance against existing scores.
METHODS
Patients
We included a total of 598 unique patients from seven clinical sites: Helsinki University Hospital (HUS; Helsinki, Finland, n = 399), Royal Adelaide Hospital (Adelaide, Australia, n = 143), St. Olavs Hospital (Trondheim, Norway, n = 9), Örebro University Hospital (Örebro, Sweden, n = 6), Uppsala University Hospital (Uppsala, Sweden, n = 8), Ome Medical Center (Tokyo, Japan, n = 12), and Saga University Hospital (Saga, Japan, n = 11; Figure 1A). The eligibility criteria at diagnosis were age over 18 years, major BCR::ABL1 transcript, and frontline treatment with a TKI. The dataset was composed of three cohorts based on disease phase and slide availability (Figure 1A and Table 1):
-
1.
To study BM cell composition across CML patients and multiple time points, we included MGG‐stained BM aspirate slides at diagnosis and follow‐up of both blast‐phase and CP‐CML patients (“Full cohort,” Figure 1A).
-
2.
To build, validate, and evaluate a multiparameter model, we included MGG‐stained BM aspirate slides at diagnosis and with available MMR data (“Modeling cohort,” Figure 1A).
-
3.
In contrast to other cohorts, we collected a separate “Cell differential cohort” composed of numerical cell differentials from samples defined by morphologists in Adelaide, Australia (Figure 1A). Therefore, the Cell differential cohort validates is used to visually validate the biological findings identified by computational methods.
Figure 1.

Study overview and biomarker discovery. (A) Study design and datasets. The map indicates patient numbers per center included in the Modeling cohort. (B) Uniform manifold approximation and projection (UMAP) of imaging features of all CML samples labeled by disease stage, peripheral blood (PB) white blood cell (WBC) count, and bone marrow (BM) blast proportion. (C) Visualization of clinical and imaging variables based on their statistical association (Cox regression) in the training dataset with major molecular response (MMR) at 2 years. The 10 variables with the lowest P‐values are labeled. (D) Kaplan–Meier plots (log‐rank test) for cumulative MMR in the training and internal and external test sets stratified by PB WBC and (E) BM proerythroblast proportion. (F) Comparison of the time to MMR in the Cell differential cohort based on the PB WBC count and (G) BM erythroid precursor proportion. (H) Kaplan–Meier plots for cumulative MMR in the training and internal and external test sets stratified by the nuclei perimeter of BM monocytes. (I) Representative examples of monocytes with increasing nuclei perimeter size. Values under cell images reflect the nuclei perimeter size (μm). BP, blast phasel; CP, chronic phase; Dg, diagnosis; ELTS, European Treatment and Outcome Study long‐term survival; Fu, follow‐up; HR, hazard ratio; ns, not significant.
Table 1.
Patient characteristics.
| Variable | N | Modeling cohort, N = 238a | Cell differential cohort, N = 148a | P‐valueb |
|---|---|---|---|---|
| Gender | 386 | 0.2 | ||
| Female | 101/238 (42%) | 73/148 (49%) | ||
| Male | 137/238 (58%) | 75/148 (51%) | ||
| Age at diagnosis | 386 | 53.00 [43.00–64.91] | 53.79 [44.29–63.19] | 0.7 |
| First TKI | 385 | 0.085 | ||
| Bosutinib/ponatinib | 6/238 (2.5%) | 0/147 (0%) | ||
| Dasatinib | 34/238 (14%) | 17/147 (12%) | ||
| Imatinib | 165/238 (69%) | 100/147 (68%) | ||
| Nilotinib | 33/238 (14%) | 30/147 (20%) | ||
| (Missing) | 0 | 1 | ||
| MMR | 386 | 222/238 (93%) | 148/148 (100%) | 0.003 |
| MMR time | 383 | 0.67 [0.42–1.26] | 0.50 [0.33–0.98] | 0.009 |
| (Missing) | 0 | 3 | ||
| Response at 24 months | 383 | 200/238 (84%) | 133/145 (92%) | 0.044 |
| (Missing) | 0 | 3 | ||
| MR4.0 | 385 | 207/238 (87%) | 147/147 (100%) | <0.001 |
| (Missing) | 0 | 1 | ||
| MR4.0 time | 380 | 1.58 [0.82–3.04] | 1.36 [0.64–3.10] | 0.2 |
| (Missing) | 1 | 5 | ||
| PB WBC at diagnosis (×109/L) | 369 | 64.05 [28.05–133.69] | 56.80 [27.90–122.50] | 0.4 |
| (Missing) | 16 | 1 |
Abbreviations: MMR, major molecular response; PB, peripheral blood; TKI, tyrosine kinase inhibitor; WBC, white blood cell.
n/N (%); median [25%–75%].
Pearson's chi‐squared test; Wilcoxon rank sum test.
In the Modeling and Cell differential cohorts, we included only newly diagnosed CP‐CML patients. A study/data permit was approved by institutional research boards at each clinical site before the study started. The study adhered to the Declaration of Helsinki.
We also included 636 age‐ and sex‐matched control individuals without active hematologic disease, for whom MGG‐stained BM aspirate slides were available. These controls comprised stem cell donors, patients with normal BM cytomorphology who were evaluated for persistent blood count abnormalities but remained disease‐free for ≥12 months of follow‐up, and individuals in deep molecular remission of a prior hematologic condition.
Data collection
Investigators or associated biobanks collected data from their respective clinical sites. We considered the following information:
-
1.
Patient demographics at diagnosis: patient age and sex.
-
2.
Laboratory values: peripheral blood (PB) white blood cell (WBC) and platelet count. PB eosinophil, basophil, and lymphocyte counts (×109/L) and their proportions (%). BM and PB blast proportion (%).
-
3.
The ELTS risk score and its components.
-
4.
TKI treatment: first‐line TKI and TKI generation.
-
5.
TKI response: complete cytogenetic response (CCyR), time to CCyR, MMR, time to MMR, MR4.0, time to MR4.0, and BCR::ABL1 halving time.
Slide digitization
We digitized the MGG‐stained slides at two magnifications at HUS with West Medica's HemaVision Ultimate, an imaging system designed for scanning cytomorphological slides. First, we acquired whole‐slide images at 10× magnification. Then, we manually annotated representative regions of interest to be scanned at 100× magnification with oil immersion. The cohorts included both squash and wedge‐type slides representing technical variations observed in clinical laboratories. In squash BM slides, we sought areas proximal to connective tissue islets avoiding regions characterized as technically suboptimal, overcrowded with cells, or concentrated with dead cells. In wedge BM slides, we sought similar areas from the distal section of the slide.
Image analysis
We analyzed images with the Cellbytes application, an image analysis‐based medical software forming a comprehensive overview of the BM cell composition and associated cytomorphological landscape. All image analysis algorithms are based on convolutional neural networks or transformer‐based networks (Table 2).
Table 2.
Description of Cellbytes' image analysis algorithms.
| Image level | Task |
|---|---|
| 10× | Sample segmentation |
| 10× | Center section segmentation |
| 10× | Segmentation of nucleated hematopoietic cells, red blood cells, and connective tissue |
| 10× | Segmentation of lipid droplets |
| 10× | Segmentation of megakaryocytes |
| 10× | Removal of out‐of‐focus areas |
| 100× | Cell detection |
| 100× | Cell classification into 17 distinct classes |
| 100× | Classification of dysplastic erythroblasts: 1. nuclear:cytoplasmic asynchrony, 2. dysmorphic nucleus, 3. cytoplasmic fraying, and 4. normal |
| 100× | Classification of dysplastic megakaryocytes: 1. hypolobular nuclei, 2. separated nuclei, and 3. normal |
| 100× | Classification of dysplastic promyelocytes: 1. Auer rods, 2. dysmorphic nuclei, and 3. hypergranular cytoplasm |
| 100× | Pyknotic cells |
| 100× | Classification of vacuolization in blasts, erythroid cells, granulocytes, lymphocytes, megakaryocytes, monocytes, and plasma cells |
| 100× | Quantification of granularity in granulocytes |
| 100× | Cell morphometry at the cellular, nuclear, and cytoplasmic levels (e.g., size, circularity, and color) aggregated at the sample level by mean and median |
For image analysis at 100× magnification, we pre‐trained a Vision Transformer (ViT) 16 model using the Masked Siamese Networks (MSN) 17 framework. The purpose was to increase model generalization by developing a base model specialized for subtle patterns occurring specifically in blood cells instead of visual patterns in images of animals, houses, or other categories. We used YOLO‐based object detection for cell segmentation. 18 We used in total 27 million single‐cell images from multiple clinical sites and digital scanners surpassing previous attempts to train the ViT model resulting in a robust feature extractor for downstream tasks such as cell classification and semantic segmentation of cell morphometry (Table 2). 19
BM sampling, sample processing, and imaging can induce various technical artifacts. 20 To mitigate these, we trained multiple algorithms to minimize bias (Table 2). First, we excluded 100× images where over 40% of all cells were either labeled as artifacts or represented technically pyknotic cells. Then, we removed individual cells, which were classified as pyknotic or were located at the image border and therefore challenging to classify. After these steps, we could reliably calculate sample‐level cell distributions.
Next, while most cells were classifiable, some showed abnormal morphology due to technical factors. We annotated around 1000 cells per type, trained classifiers, and excluded low‐quality cells with distortions like mechanical compression. Finally, we calculated cell morphometry (e.g., size and shape of the cell, its nucleus, and its cytoplasm) using OpenCV library (e.g., perimeter with arcLength) and aggregated the results at the sample level by their median value (Table 2). 21
Model development
We developed a multiparameter score in the Modeling cohort to predict TKI response. We split the dataset into a training set, internal test set, and an external test set. The training and internal test sets included randomly assigned patients from clinical sites located in Finland, Japan, Norway, and Sweden, whereas the external test set only included patients from Australia. Patients from Australia, Japan, Norway, and Sweden had participated in TKI discontinuation trials and therefore included more patients achieving MMR within 24 months. Instead, patients from Finland represented a real‐world cohort diagnosed between 2009 and 2023.
We calculated the time to MMR as the difference between the MMR date and the TKI start date. We explored conventional, penalized, and decision‐tree‐based Cox regression models. Given the large number of variables in relation to the number of patients, penalized and decision‐tree‐based models overfitted despite extensive parameter‐tuning strategies (data not shown). Therefore, we developed a multiparameter Cox regression model in two phases. First, we performed univariate Cox proportional hazards regression on randomly selected 50% subsets of the training dataset. This process was repeated 10 times, each time using a different random half of the training set. For each iteration, we recorded the features that were statistically significant (P < 0.05). Only features that were significant at least 4 out of 10 iterations were selected for inclusion in the subsequent multivariate Cox regression model using forward selection. We used as a base model the ELTS score given its importance in stratifying patients by prognosis. The optimal cutoff for the risk score was defined with the Youden index in the internal test set. Model evaluation was performed both in the internal and external test sets.
Statistical analysis
To compare two continuous variables, we used the Wilcoxon rank‐sum test (unpaired, two‐tailed). To assess MMR, we used the Kaplan–Meier visualization and Cox regression (log‐rank test). We adjusted P‐values with the Benjamin–Hochberg method when relevant.
We performed uniform manifold approximation and projection (UMAP) on cytomorphology‐based cell proportions to study how these separated samples. We used z‐normalized cell proportions and default UMAP configurations with cosine distance.
We evaluated model performance with multiple metrics. We defined the ELTS high‐risk class as the benchmark as it is widely used to guide TKI selection in newly diagnosed CML patients. Additionally, we considered the BCR::ABL1 halving time threshold (≥76 days vs. <76 days) as an additional benchmark as the same cutoff has been established as a predictor of high‐risk CML patients based on their TKI response. 22 We compared Cox regression models by assessing MMR accumulation at multiple time points. In addition, we compared the 24‐month classification accuracy with the area under the receiver‐operating curve (AUROC) assessing global performance, and the area under the precision‐recall curve (PRAUC) complementing AUROC in imbalanced datasets. The AUROC of the multivariate model was compared to the ELTS high‐risk class and the BCR::ABL1 halving time with the Delong's test. We evaluated the contribution of each feature to the final model by performing a leave‐one‐feature‐out analysis using the internal test set. For each feature, the model was re‐fitted after excluding that feature, and the resulting change in the concordance index (C‐index) was recorded.
All statistical analyses and visualizations were conducted using RStudio and R 4.4.2. in the HUS Acamedic environment, a Microsoft Azure cloud‐based data secure computing environment compliant with European data privacy regulations (GDPR and the Finnish National Act on the secondary use of health and social data).
RESULTS
Patient characteristics
In this study, we sought to better understand the BM cytomorphological fingerprint of CML patients from a total of seven clinical sites (Figure 1A). We used high‐resolution imaging of MGG‐stained BM aspirate samples and deep learning‐based image analysis (example images in Supporting Information S1: Figure 1 and Supporting Information S2: Figure 2). Patients were divided into three cohorts (Full cohort: n = 1167; Modeling cohort: n = 238; Cell differential cohort: n = 143, Figure 1A).
The Modeling and Cell differential cohorts were used for statistical analysis, model development and evaluation, and their characteristics are presented in Table 1. We could not observe significant differences in age or gender distribution between these. Similarly, there were no significant differences in the risk scores or PB WBC counts (Table 1). In contrast, MMR rates and time to MMR differed between the cohorts. In the Modeling cohort, a total of 222 patients (90%) achieved MMR during follow‐up, compared to 148 patients (100%) in the Cell differential cohort. The median time to MMR was 0.67 years in the Modeling cohort and 0.50 years in the Cell differential cohort. The slightly faster MMR in the Cell differential cohort is likely due to a higher proportion of patients from TKI cessation trials.
To understand the dynamics between the BM and PB compartments, we correlated computationally defined hematopoietic cell proportions with blood counts. We observed varying correlations underlining that BM and PB are distinct compartments from one another (Supporting Information S3: Figure 3A–G). Next, we mapped latent representations of the BM cytomorphology using cell proportions of the Full cohort with UMAP. Samples diverged primarily according to disease stage and its associated PB WBC count with blast‐phase CML samples forming a distinct cluster from CP‐CML samples (Supporting Information S3: Figure 3H). When focusing only on pretreatment CP‐CML samples, these were characterized by an abundance of neutrophils, metamyelocytes, and myelocytes as expected (Figure 1B). A considerable proportion of samples were also enriched with erythroblasts implying distinct lineage‐commitment patterns from early myeloid progenitor and stem cells. Few samples contained marked levels of lymphocytes, promyelocytes, and blasts highlighting heterogeneity in the hematopoietic maturation and by lineage.
WBC count, erythroid skewing, and monocytic morphology predict MMR probability
To identify key clinical and cytomorphological features associated with MMR, we analyzed essential clinical and almost 950 computationally derived cytomorphological variables covering cell size, shape, and texture using univariate Cox regression in the Modeling cohort training dataset (n = 142 patients). Among the most significant features (P < 0.05), elevated PB WBC, spleen size, and ELTS score were associated with a lower likelihood of MMR (Figure 1C). When examining BM cell types, a high proportion of proerythroblasts and immature erythroid cells (sum of proerythroblasts and erythroblasts) predicted a higher MMR rate (Figure 1C).
High proportion of proerythroblasts and low PB WBC were also associated with deeper response (MR4.0) but not CCyR (Supporting Information S4: Figure 4A,B). By examining the entire Modeling cohort (n = 238), both PB WBC and the proerythroblast proportion remained predictive of MMR (Figure 1D,E). We could validate both associations in the Cell differential cohort, where BM erythroid progenitors were visually confirmed by an experienced cytomorphologist with digital microscopy (Figure 1F,G). Using the median value as a cutoff, individuals with low BM erythroid precursor proportion (<8%) experienced significantly prolonged time to MMR compared to those with high BM erythroid precursor percentages (≥8%; Figure 1G).
We also identified numerous BM cytometry features in myeloid cells to predict TKI sensitivity (Figure 1C). Megakaryocyte cell roundness and larger blast cell size were associated with a lower MMR rate (Figure 1C). Instead, high monocyte cell size, nuclei size, and nuclei perimeter favored higher MMR rates (Figure 1C). As the monocytic features shared high collinearity, we selected to focus on monocyte nuclei perimeter, which remained predictive also in the entire Modeling cohort (Figure 1H). When examining single‐cell images, low nuclei perimeter appeared to reflect both lower nuclear size and lobulation possibly indicating a shift in monocyte maturation (Figure 1I). Collectively, these findings highlight the clinical utility of BM cellular composition and morphometry to stratify patients by their TKI sensitivity.
BM cytomorphology is partly reflected on clinical characteristics
To better understand the clinical significance of BM cytomorphology, we explored MMR biomarkers with patient demographics and established prognostic factors (Figure 2A). None of the newly described biomarkers were associated with patient age or gender. Instead, we observed elevated PB WBC counts in conjunction with high ELTS scores, PB blast proportion, enlarged spleen, and consequently higher use of first‐line 2GTKI (Figure 2A–D). In contrast, high PB lymphocyte proportion was associated with decreased PB WBC counts (Figure 2A). When examining the cytomorphological features associated with MMR, an elevated PB WBC count was associated with low proerythroblast proportion and lipid droplet concentration in the slide periphery (Figure 2E and Supporting Information S4: Figure 4C–E). We reasoned that lipid droplet concentration likely reflected lower hematopoietic expansion given its association with opposite clinical characteristics to those of PB WBC (Supporting Information S4: Figure 4C). Lastly, we observed that low BM proerythroblast proportion was associated with higher spleen size and elevated PB basophil counts (Figure 2A).
Figure 2.

Clinical association of novel biomarkers. (A) Balloon plot visualizing the association of emerging biomarkers (x‐axis) by dichotomized clinical variables (y‐axis; Wilcoxon signed‐rank test). The balloon color reflects the ratio of the studied biomarkers by clinical variables (high vs. low; female vs. male; yes vs. no). (B) Comparison of the peripheral blood (PB) white blood cell (WBC) count by European Treatment and Outcome Study long‐term survival (ELTS) risk class, (C) PB blast proportion, and (D) spleen size. The median value was used as a cutoff to distinguish between low‐ and high‐count groups. (E) Comparison of the bone marrow (BM) proerythroblast proportion by PB WBC count. (F) Balloon plot visualizing the association of PB WBC differentials and erythroblasts by the PB WBC count.
Ranked by the P‐value in univariate Cox regression, PB WBC emerged as the most consistent biomarker of MMR across the training dataset and demonstrated uniformity with established prognostic variables. Therefore, we conducted Spearman correlation analysis to examine the relationship between PB WBC and WBC differential counts. We observed high WBC to associate with a lower granulocytic, lymphocytic, and monocytic count (Figure 2F and Supporting Information S5: Figure 5). Although PB WBC counts correlated with lower BM erythroid precursors, it was linked to a leukoerythroblastic blood picture with elevated erythroblasts and blasts (Figure 2F and Supporting Information S5: Figure 5). Collectively, these findings suggest that PB WBC count does not only reflect a higher disease load but also a left shift and skewing toward myeloid lineages other than granulocytes. In addition, these observations imply that elevated PB WBC counts could promote the displacement of erythroblasts from the BM into the PB, accounting for their higher circulating levels.
Clinical and cytomorphological variables predict MMR more robustly than the ELTS score
Next, to study the synergy of clinical and cytomorphological variables, we developed a multiparameter Cox regression model predicting TKI sensitivity (Figure 3A). To avoid model overfitting, we included only repeatedly significant features (P < 0.05 in ≥4/10 iterations) in univariate analyses to the multiparameter model with a forward selection approach. The final multiparameter Morphoclinical score consisted of a total of five statistically significant features. BM proerythroblast proportion and median monocyte nuclei perimeter increased while PB WBC count, first‐line imatinib, and a high‐risk ELTS score decreased the probability of achieving MMR (Figure 3B). To assess the contribution of each feature to the final Morphoclinical score's predictive performance, we performed a leave‐one‐feature‐out analysis (Figure 3B). The C‐index decreased with the removal of any single feature, demonstrating that all features in the final model contribute to its predictive ability. Excluding the ELTS score resulted in only a modest decrease in C‐index, confirming that the remaining features provide an independent prognostic signal. Consistent with the univariate analysis, in which PB WBC count exhibited the highest predictive value, its exclusion led to the largest reduction in model performance from the model C‐index of 0.67.
Figure 3.

Multiparameter model development and evaluation. (A) Overview of the model development. (B) Forest plot of the Morphoclinical model Cox hazard ratios and concordance index (C‐index) changes in leave‐one‐feature‐out analysis. (C) Kaplan–Meier plot (log‐rank test) for the cumulative major molecular response (MMR) in the internal and (D) external test sets stratified by the Morphoclinical model classification and its performance at 2 years. (E) Kaplan–Meier plot for the cumulative MMR in the training and internal and external test sets stratified by European Treatment and Outcome Study long‐term survival (ELTS) high‐risk class and its performance at 2 years. (F) Area under the receiver‐operating curve (AUROC) for the Morphoclinical model in the internal test set, the ELTS high‐risk class, and the BCR::ABL1 halving time (≥76 days). (G) The area under the precision‐recall curve (PRAUC) for the Morphoclinical model in the internal test set, the ELTS high‐risk class, and the BCR::ABL1 halving time (≥76 days). (H) Kaplan–Meier plot for the cumulative MMR in the internal and external test sets stratified by median Morphoclinical model score and tyrosine kinase inhibitor (TKI) generation. (I) Comparison of the monocyte nuclear perimeter in control subjects and chronic myeloid leukemia (CML) patients stratified into equally sized groups by monocyte nuclear perimeter. 2GTKI, second‐generation tyrosine kinase inhibitor; HR, hazard ratio; PB, peripheral blood; WBC, white blood cell.
After categorizing patients into low‐ and high‐risk groups, we observed that the high Morphoclinical score group achieved a 100% cumulative incidence of MMR at 24 months, compared to a 60% cumulative incidence for the low Morphoclinical score group in the internal test set (Figure 3C). We observed consistent performance in the training and external test sets (Figure 3D and Supporting Information S6: Figure 6A), where high Morphoclinical score patients demonstrated significantly faster achievement of MMR than low Morphoclinical score patients.
We compared the Morphoclinical score with the BCR::ABL1 halving time and the ELTS high‐risk group, which is the current clinical guideline for selecting 2GTKI. Stratification using the BCR::ABL1 halving time demonstrated excellent separation between the groups. However, as expected, few patients were classified in the long halving time group due to the high cutoff value of 76 days originally established to identify patients with poor TKI response (12 vs. 154, Supporting Information S6: Figure 6B). 14 In turn, the ELTS high‐risk group showed no significant difference in cumulative MMR at 24 months (Figure 3E). Finally, the Morphoclinical model demonstrated both higher AUROC and PRAUC at 24 months over the ELTS high‐risk group and BCR::ABL1 halving time (Figure 3F,G).
In a site‐stratified evaluation, overall Morphoclinical score performance was consistent across most contributing sites despite heterogeneity in sample preparation and staining (Supporting Information S7: Figure 7). Performance was lower in samples from Örebro, Sweden (Supporting Information S7: Figure 7C), which corresponded to a markedly small sample size, highlighting the impact of limited data on classification accuracy.
To further explore whether nuclear size parameters reflect monocyte maturation status, we examined correlations between nuclear morphology features and available PB monocyte percentages (Supporting Information S8: Figure 8A). PB monocyte data were limited, because these values were available only for HUS, Finland, and among the remaining patients only about one‐third had reportable PB monocyte percentages. We observed only weak and non‐significant correlations between nuclear size parameters and PB monocyte percentages. The only statistically significant relationship was the expected correlation between BM and PB monocyte percentages.
2GTKIs offset the lower response rate associated with low Morphoclinical score
We investigated the utility of the Morphoclinical score by treatment groups. High Morphoclinical scores in both first‐line imatinib and 2GTKI‐treated groups increased the probability and pace of achieving MMR (Figure 3H). Strikingly, administration of 2GTKI to patients with adverse Morphoclinical scores improved their likelihood of achieving MMR to the level of patients with a more favorable score and treated with imatinib.
To evaluate whether the Morphoclinical model can stratify MMR probability within established ELTS risk groups, we restratified ELTS categories (low, intermediate, and high) into high‐ and low‐score groups using the median Morphoclinical score. In the low ELTS risk category, patients with low Morphoclinical scores showed markedly poorer or comparable MMR outcomes relative to patients in the intermediate‐ or high‐ELTS groups with low Morphoclinical scores (Supporting Information S8: Figure 8B). However, the patient numbers were insufficient in the intermediate and high ELTS risk classes. Together, these findings indicate that the Morphoclinical model provides additional prognostic discrimination beyond ELTS classification.
Monocyte nuclei perimeter distinguishes high‐risk patients from controls
We compared monocyte nuclei perimeter values in CML patients to those in controls by stratifying patients into two groups based on the median monocyte nuclei value (≤38.7 vs. >38.7). The control cohort comprised 636 sex‐ and age‐matched individuals without active hematological disease (see the Patients section). Both patient groups differed significantly from controls. The distribution of monocyte nuclei perimeter in the lower group resembled that of controls, whereas the group with higher values was distinctly divergent (Figure 3I). In summary, the integration of BM cytomorphology with clinical variables over the current ELTS score could help to better identify patients benefiting from 2GTKI and enhance the achievement of MMR.
DISCUSSION
In this multi‐center study, we analyzed the clinical significance of BM cytomorphology in 598 CP‐CML patients. We identified erythroid precursors as well as monocyte nuclear size and lobulation as novel and independent predictors of MMR. Our findings demonstrate that a simple Morphoclinical model, integrating both established prognostic factors, PB WBC, and novel cytomorphological features, could help to identify patients benefiting from first‐line 2GTKIs.
We collected data and BM aspiration samples from seven international clinical sites, incorporating technical variations in sample preparation and staining. Despite these differences, deep learning‐based image analysis consistently identified biologically meaningful cytomorphological features, emphasizing its potential for broader applicability. 23 Furthermore, the cytomorphological biomarkers were validated in two independent external datasets using both computational and visual analysis. The Morphoclinical model also demonstrated similar consistency in an external dataset. These findings underscore the reproducibility of our approach and highlight the potential of computational pathology to uncover novel markers of treatment sensitivity and prognosis in CML.
Another major advantage is the resemblance of the image analysis pipeline to the decision‐making process of a hematopathologist. Traditional cytomorphological assessments are inherently subjective, with inter‐observer variability influencing diagnostic and prognostic outcomes. Additionally, it is impractical for a human expert to numerically assess the morphology of thousands of individual cells within a BM sample. Machine learning‐driven solutions are required to overcome these limitations, offering an objective and transparent method to extract clinically relevant cytomorphological features and possibly provide novel insights into the pathobiology of CML.
CML is a myeloproliferative disorder characterized by the excessive production of mature granulocytes. We identified predictive features of MMR that suggest a potential selection process of favoring cell lineages, which could influence TKI sensitivity. Notably, lineage differentiation has been demonstrated to impact drug sensitivity in acute myeloid leukemia. Blast maturation along the monocytic lineage increases resistance to venetoclax, whereas promyelocytic maturation is linked to sensitivity to ATRA‐ATO. 24 Additionally, erythroid differentiation has been associated with BCL‐XL inhibition sensitivity in erythroid leukemia. 25 A recent study explored a similar concept in CML, showing that single‐cell transcriptomic modules associated with erythroid fate correlated with optimal TKI responses. 26
However, the mechanisms driving the enrichment of erythroid precursors in patients with improved TKI response remain unclear. Potential contributing factors include mutations in other myeloid genes or variations in the BM microenvironment. Future studies are needed to investigate these mechanisms. Could erythroid commitment be increased with pharmacological interventions, and would this improve TKI sensitivity? Exploring complementary methods, such as image‐based analyses of PB smears, could offer a clearer view of granulocyte morphology and provide additional prognostic information.
Our study suggests that monocyte morphology may serve as a valuable indicator of MMR in CML. Specifically, a larger monocyte nuclear perimeter was associated with improved treatment outcomes, indicating a correlation or potential causality between monocyte maturation and TKI efficacy. The increase in nuclear perimeter may result from overall nuclear enlargement or enhanced monocyte nuclear lobulation during the differentiation process. 27 However, it remained unclear why patients less sensitive to TKI showed monocytic nuclear morphology more similar to that of control subjects. A deeper understanding of monocyte morphological changes could provide further insight into the mechanisms underlying TKI resistance in CML patients.
Consistent with previous studies, we found that elevated WBC counts were significantly associated with a lower cumulative incidence of treatment response.5, 28, 29, 30, 31 Our results support prior research identifying an association concerning molecular milestones such as MMR, MR4, or MR4.5. 5 , 28 , 30 Earlier studies have used WBC cutoffs of 120–150 × 109/L. Our findings suggest that a baseline WBC of 50 × 109/L or higher would strongly favor starting imatinib as first‐line TKI. Additionally, we observed associations between elevated WBC, leukoerythroblastic blood picture, splenomegaly, and high ELTS score, in line with prior research. 5 , 28 , 30 Elevated PB blast percentage may indicate disease progression or duration. 3 Therefore, patients with higher WBC at diagnosis may have a larger leukemic stem cell burden and require longer or more intensive treatment to achieve molecular responses. However, it is unclear whether the duration of CML is prolonged or if other factors such as mutations in myeloid driver genes, regulate PB WBC count. 32
Despite its strengths, our study has certain limitations. As the diagnosis of CML patients is usually straightforward, genetic sequencing is not clinically indicated. Therefore, we could not associate monocytic morphology or erythroid differentiation with mutation data. Second, the developed Morphoclinical model, particularly the calculation of cell nuclei size, relies on high‐quality BM imaging and digital analysis, which are not yet standard procedures in routine clinical practice. The current implementation requires 100× oil‐immersion imaging and manual region‐of‐interest selection. Although this approach is aligned with standard hematopathology workflows, it may introduce pathologist‐dependent variability and limit immediate scalability. Future work could explore the development of fully automated, whole‐slide compatible pipelines and evaluate performance across multiple microscopes. Finally, given the prospective study setting, first‐line TKI generation was included as a covariate to isolate the contribution of morphology. However, this limitation restricts the prospective applicability of the model intended to guide initial TKI selection, which will require validation before clinical use. As digital pathology and artificial intelligence‐driven tools become more integrated into hematology workflows, models of this kind could be implemented in clinical decision‐making, akin to how gene sequencing and analysis have become an essential part of diagnostics.
To our knowledge, this is the first study to perform a detailed computational analysis of BM cytomorphology in CML to predict TKI response. In a recent study, we highlighted the role of granulocyte abundance and morphology at diagnosis in predicting TKI discontinuation response. 13 Both studies highlight the potential of digital cytomorphological analysis to optimize frontline TKI treatment selection and possibly improve rates of treatment discontinuation, ultimately leading to improved cost‐effectiveness, quality of life, and better long‐term management of CML patients. 33 , 34 Our findings reinforce the notion that the BM microenvironment is central to fully understanding CML pathobiology, and novel approaches such as imaging can uncover missed mechanisms and even interventions to modulate TKI sensitivity. 35
AUTHOR CONTRIBUTIONS
Katariina Luukkainen: Conceptualization; data curation; visualization; writing—original draft; writing—review and editing; methodology; investigation; formal analysis; validation; software; resources. Mikko Purhonen: Data curation; writing—review and editing; software; resources; validation. Mikael Tatun: Data curation; writing—review and editing; software; resources; validation. Kevin Hung: Resources; writing—review and editing; data curation; validation. Oda Tafjord: Resources; writing—review and editing; data curation; validation. Henri Sundquist: Software; data curation; writing—review and editing; resources; validation. Stina Söderlund: Writing—review and editing; resources; data curation; validation. Shady Adnan‐Awad: Resources; writing—review and editing; conceptualization; methodology; validation. Anni Dohlen: Resources; writing—review and editing; data curation; validation. Johanna Heikkinen: Resources; writing—review and editing; data curation; validation. Perttu Koskenvesa: Writing—review and editing; resources; conceptualization; methodology; validation. Lotta Joutsi‐Korhonen: Writing—review and editing; resources; data curation; validation. Anna Lempiäinen: Resources; writing—review and editing; data curation; validation. Sanna Siitonen: Conceptualization; methodology; writing—review and editing; data curation; resources; validation. Satu Mustjoki: Conceptualization; methodology; writing—review and editing; resources; validation. Naranie Shanmuganathan: Writing—review and editing; data curation; resources; validation. Coral Bryce: Writing—review and editing; data curation; resources; validation. Signe Danielsson: Resources; writing—review and editing; data curation; validation. Henrik Hjorth‐Hansen: Resources; writing—review and editing; data curation; validation. Ulla Olsson‐Strömberg: Resources; writing—review and editing; data curation; validation. Takashi Kumagai: Resources; writing—review and editing; data curation; validation. Shinya Kimura: Resources; writing—review and editing; data curation; validation. David M. Ross: Resources; writing—review and editing; data curation; validation. Oscar Brück: Conceptualization; methodology; software; data curation; investigation; validation; formal analysis; supervision; funding acquisition; visualization; project administration; resources; writing—original draft; writing—review and editing.
CONFLICT OF INTEREST STATEMENT
O.B. declares research funding (Pfizer and Gilead Sciences), consultancy fees (AstraZeneca, Novartis, Sanofi, Astellas Pharma, GSK, and Amgen), and stock ownership (Cellbytes Ltd.). N.S. received honoraria from Novartis, Mallinckrodt, and Takeda. S.K. has received honoraria from Pfizer, Otsuka Pharmaceuticals, Novartis, and Bristol‐Myers Squibb, as well as research funding from Pfizer, Novartis, Bristol‐Myers Squibb, and Ohara Pharmaceuticals. T.K. has received honoraria (for speakers) from Bristol‐Myers Squibb, Novartis, Pfizer, and Otsuka Pharmaceuticals.
FUNDING
This study was supported by research grants from the Edward P. Evans Foundation, national‐level medical research funding, the European Hematology Association, Juselius Foundation, Finnish‐Swedish Medical Society (Finska Läkaresällskapet), K. Albin Johansson Foundation, Paulo Foundation, the Finnish Medical Research Foundation, the Signe och Ane Gyllenbergs Foundation, the Päivi and Sakari Sohlberg Foundation, the Finnish Cancer Foundation, the Research Council of Finland, the Helsinki University Hospital, and the University of Helsinki. Open access publishing facilitated by Helsingin yliopisto, as part of the Wiley ‐ FinELib agreement.
Supporting information
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
ACKNOWLEDGMENTS
The author wishes to thank the members of the Hematoscope Lab for discussion and comments. Some images have been created with Biorender.com.
DATA AVAILABILITY STATEMENT
Codes are available at https://github.com/obruck/CML_morphoclinical_score. Due to the multisite nature of the dataset, a study permit is required at each participating site. Please contact the corresponding author for data access.
REFERENCES
- 1. Cerveira N, Bizarro S, Teixeira MR, Mariz JM. When to stop TKIs in patients with chronic myeloid leukemia and how to follow them subsequently. Curr Treat Options Oncol. 2021;22:49. [DOI] [PubMed] [Google Scholar]
- 2. Saußele S, Richter J, Hochhaus A, Mahon F‐X. The concept of treatment‐free remission in chronic myeloid leukemia. Leukemia. 2016;30:1638‐1647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Hochhaus A, Baccarani M, Silver RT, et al. European LeukemiaNet 2020 recommendations for treating chronic myeloid leukemia. Leukemia. 2020;34:966‐984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Shanmuganathan N, Pagani IS, Ross DM, et al. Early BCR‐ABL1 kinetics are predictive of subsequent achievement of treatment‐free remission in chronic myeloid leukemia. Blood. 2021;137:1196‐1207. [DOI] [PubMed] [Google Scholar]
- 5. Zhang X‐S, Gale RP, Huang X‐J, Jiang Q. Is the Sokal or EUTOS long‐term survival (ELTS) score a better predictor of responses and outcomes in persons with chronic myeloid leukemia receiving tyrosine‐kinase inhibitors? Leukemia. 2022;36:482‐491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Cross NCP, White HE, Müller MC, Saglio G, Hochhaus A. Standardized definitions of molecular response in chronic myeloid leukemia. Leukemia. 2012;26:2172‐2175. [DOI] [PubMed] [Google Scholar]
- 7. Höglund M, Sandin F, Hellström K, et al. Tyrosine kinase inhibitor usage, treatment outcome, and prognostic scores in CML: report from the population‐based Swedish CML registry. Blood. 2013;122:1284‐1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Branford S. Why is it critical to achieve a deep molecular response in chronic myeloid leukemia? Haematologica. 2020;105:2730‐2737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Saifullah HH, Lucas CM. Treatment‐free remission in chronic myeloid leukemia: can we identify prognostic factors? Cancers. 2021;13:4175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Okamoto Y, Hirano M, Morino K, et al. Early dynamics of chronic myeloid leukemia on nilotinib predicts deep molecular response. NPJ Syst Biol Appl. 2022;8:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ross DM, Hughes TP. Treatment‐free remission in patients with chronic myeloid leukaemia. Nat Rev Clin Oncol. 2020;17:493‐503. [DOI] [PubMed] [Google Scholar]
- 12. Zhang X, Liu B, Huang J, et al. A predictive model for therapy failure in patients with chronic myeloid leukemia receiving tyrosine kinase inhibitor therapy. Blood. 2024;144:1951‐1961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Purhonen M, Tatun M, Luukkainen K, et al. Granulocyte abundance and maturation state at diagnosis predicts treatment‐free remission in CML. Leukemia. 2025;39:2968‐2977. 10.1038/s41375-025-02769-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Murković M, Babarović E, Marijić B, Grohovac D, Hadžisejdić I. Association of pre‐treatment bone marrow morphology and achievement of BCR‐ABL1 transcript milestones in CML. Pathol Res Pract. 2023;246:154517. [DOI] [PubMed] [Google Scholar]
- 15. Walter W, Pohlkamp C, Meggendorfer M, et al. Artificial intelligence in hematological diagnostics: game changer or gadget? Blood Rev. 2023;58:101019. [DOI] [PubMed] [Google Scholar]
- 16. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16 × 16 words: transformers for image recognition at scale. Preprint posted online June 3, 2021. 10.48550/arXiv.2010.11929 [DOI]
- 17. Assran M, Caron M, Misra I, et al. Masked siamese networks for label‐efficient learning. Preprint posted online April 14, 2022. 10.48550/arXiv.2204.07141 [DOI]
- 18. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real‐time object detection. Preprint posted online June 8, 2015. 10.48550/ARXIV.1506.02640 [DOI]
- 19. Koch V, Wagner SJ, Kazeminia S, et al. DinoBloom: a foundation model for generalizable cell embeddings in hematology. Preprint posted online April 7, 2024. 10.48550/arXiv.2404.05022 [DOI]
- 20. Lee SH, Erber WN, Porwit A, Tomonaga M, Peterson LC. ICSH guidelines for the standardization of bone marrow specimens and reports. Int J Lab Hematol. 2008;30:349‐364. [DOI] [PubMed] [Google Scholar]
- 21. De Almeida JG, Gudgin E, Besser M, et al. Computational analysis of peripheral blood smears detects disease‐associated cytomorphologies. Nat Commun. 2023;14:4378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Branford S, Yeung DT, Parker WT, et al. Prognosis for patients with CML and >10% BCR‐ABL1 after 3 months of imatinib depends on the rate of BCR‐ABL1 decline. Blood. 2014;124:511‐518. [DOI] [PubMed] [Google Scholar]
- 23. Ma J, Xie R, Ayyadhury S, et al. The multimodality cell segmentation challenge: toward universal solutions. Nat Methods. 2024;21:1103‐1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Pei S, Pollyea DA, Gustafson A, et al. Monocytic subclones confer resistance to venetoclax‐based therapy in patients with acute myeloid leukemia. Cancer Discov. 2020;10:536‐551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Kuusanmäki H, Dufva O, Vähä‐Koskela M, et al. Erythroid/megakaryocytic differentiation confers BCL‐XL dependency and venetoclax resistance in acute myeloid leukemia. Blood. 2023;141:1610‐1625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Krishnan V, Schmidt F, Nawaz Z, et al. A single‐cell atlas identifies pretreatment features of primary imatinib resistance in chronic myeloid leukemia. Blood. 2023;141(22):2738‐2755. 10.1182/blood.2022017295 [DOI] [PubMed] [Google Scholar]
- 27. Goasguen JE, Bennett JM, Bain BJ, Vallespi T, Brunning R, Mufti GJ. Morphological evaluation of monocytes and their precursors. Haematologica. 2009;94:994‐997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Zhang X, Gale RP, Li Z, Zhang M, Huang X, Jiang Q. Predictive scoring systems for molecular responses in persons with chronic phase chronic myeloid leukemia receiving initial imatinib therapy. Leukemia. 2022;36:2042‐2049. [DOI] [PubMed] [Google Scholar]
- 29. Wu A, Yen R, Grasedieck S, et al. Identification of multivariable microRNA and clinical biomarker panels to predict imatinib response in chronic myeloid leukemia at diagnosis. Leukemia. 2023;37:2426‐2435. [DOI] [PubMed] [Google Scholar]
- 30. Qin Y‐Z, Jiang Q, Jiang H, et al. Combination of white blood cell count at presentation with molecular response at 3 months better predicts deep molecular responses to imatinib in newly diagnosed chronic‐phase chronic myeloid leukemia patients. Medicine. 2016;95:e2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Zhang X‐S, Gale RP, Zhang M‐J, Huang X‐J, Jiang Q. A predictive scoring system for therapy‐failure in persons with chronic myeloid leukemia receiving initial imatinib therapy. Leukemia. 2022;36:1336‐1342. [DOI] [PubMed] [Google Scholar]
- 32. Adnan Awad S, Brück O, Shanmuganathan N, et al. Epigenetic modifier gene mutations in chronic myeloid leukemia (CML) at diagnosis are associated with risk of relapse upon treatment discontinuation. Blood Cancer J. 2022;12:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Saussele S, Richter J, Guilhot J, et al. Discontinuation of tyrosine kinase inhibitor therapy in chronic myeloid leukaemia (EURO‐SKI): a prespecified interim analysis of a prospective, multicentre, non‐randomised, trial. Lancet Oncol. 2018;19:747‐757. [DOI] [PubMed] [Google Scholar]
- 34. Schoenbeck KL, Atallah E, Lin L, et al. Patient‐reported functional outcomes in patients with chronic myeloid leukemia after stopping tyrosine kinase inhibitors. JNCI: J Natl Cancer Inst. 2022;114:160‐164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Brück O, Blom S, Dufva O, et al. Immune cell contexture in the bone marrow tumor microenvironment impacts therapy response in CML. Leukemia. 2018;32:1643‐1656. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Supporting Information.
Data Availability Statement
Codes are available at https://github.com/obruck/CML_morphoclinical_score. Due to the multisite nature of the dataset, a study permit is required at each participating site. Please contact the corresponding author for data access.
