Skip to main content
Blood Advances logoLink to Blood Advances
. 2021 Aug 13;5(16):3066–3075. doi: 10.1182/bloodadvances.2020004055

A predictive algorithm using clinical and laboratory parameters may assist in ruling out and in diagnosing MDS

Howard S Oster 1,2, Simon Crouch 3, Alexandra Smith 3, Ge Yu 3, Bander Abu Shrkihe 1, Shoham Baruch 4, Albert Kolomansky 1,4, Jonathan Ben-Ezra 2,5, Shachar Naor 5, Pierre Fenaux 6, Argiris Symeonidis 7, Reinhard Stauder 8, Jaroslav Cermak 9, Guillermo Sanz 10, Eva Hellström-Lindberg 11, Luca Malcovati 12, Saskia Langemeijer 13, Ulrich Germing 14, Mette Skov Holm 15, Krzysztof Madry 16, Agnes Guerci-Bresler 17, Dominic Culligan 18, Laurence Sanhes 19, Juliet Mills 20, Ioannis Kotsianidis 21, Corine van Marrewijk 13, David Bowen 22, Theo de Witte 23, Moshe Mittelman 1,2,
PMCID: PMC8405190  PMID: 34387647

Key Points

  • A BM examination is the gold standard for the diagnosis of MDS, but it is invasive and subjective.

  • A predictive algorithm/app using data of 10 readily available parameters from 1004 subjects was developed to help diagnose/rule out MDS.

Visual Abstract

graphic file with name advancesADV2020004055absf1.jpg

Abstract

We present a noninvasive Web-based app to help exclude or diagnose myelodysplastic syndrome (MDS), a bone marrow (BM) disorder with cytopenias and leukemic risk, diagnosed by BM examination. A sample of 502 MDS patients from the European MDS (EUMDS) registry (n > 2600) was combined with 502 controls (all BM proven). Gradient-boosted models (GBMs) were used to predict/exclude MDS using demographic, clinical, and laboratory variables. Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were used to evaluate the models, and performance was validated using 100 times fivefold cross-validation. Model stability was assessed by repeating its fit using different randomly chosen groups of 502 EUMDS cases. AUC was 0.96 (95% confidence interval, 0.95-0.97). MDS is predicted/excluded accurately in 86% of patients with unexplained anemia. A GBM score (range, 0-1) of less than 0.68 (GBM < 0.68) resulted in a negative predictive value of 0.94, that is, MDS was excluded. GBM ≥ 0.82 provided a positive predictive value of 0.88, that is, MDS. The diagnosis of the remaining patients (0.68 ≤ GBM < 0.82) is indeterminate. The discriminating variables: age, sex, hemoglobin, white blood cells, platelets, mean corpuscular volume, neutrophils, monocytes, glucose, and creatinine. A Web-based app was developed; physicians could use it to exclude or predict MDS noninvasively in most patients without a BM examination. Future work will add peripheral blood cytogenetics/genetics, EUMDS-based prospective validation, and prognostication.

Introduction

An important trend in modern medicine is to develop less-invasive diagnostic and therapeutic techniques that can replace invasive procedures, while maintaining high accuracy and efficacy.1,2 In addition, patients expect to be involved, and that their preferences be considered.3 The use of digital systems in clinical practice allows data collection, computer analysis, and machine learning as well as development of algorithms that were not possible in the past. These systems can improve diagnostic techniques and make them less invasive.4 Here, we propose a new paradigm to help in the diagnosis and exclusion of myelodysplastic syndromes (MDS).

MDS is a clonal bone marrow (BM) stem cell disorder, and the median age of onset is in the eighth decade of life.5-7 MDS is characterized by abnormal hematopoietic maturation and differentiation that leads to cytopenias, mainly symptomatic anemia, and the potential for leukemic transformation.7,8 The current gold standard for diagnosis is BM examination.8-10 Although considered a common and relatively straightforward procedure, it is still invasive, painful, and occasionally associated with infectious and bleeding complications.9,10-12 Such examination also depends on subjective interpretation of morphology. Many patients and their physicians prefer to avoid this examination. The lack of diagnosis or its delay may result in disease progression and may prevent patient access to effective treatment. In some countries, this may also prevent the patient from receiving the social and financial privileges accorded to those diagnosed with MDS.13,14

We have developed an algorithm to help in the diagnosis or exclusion of MDS based on demographic, clinical, and laboratory parameters that would obviate, in many patients, the need for a BM examination. In our previous work, we introduced a formula that incorporated 6 clinical variables (age, sex, hemoglobin [Hb], mean corpuscular volume [MCV], white blood cells [WBCs], and platelets [PLTs]). Using a logistic regression model, we were able to classify patients into 1 of 3 categories: probable MDS (pMDS), probably not MDS (pnMDS), and indeterminate.15 We performed internal validation with a new set of patients. Approximately 50% of the patients could be classified as either pMDS or pnMDS. The model was improved by increasing the number of studied individuals, adding more variables, and using a more appropriate model, the gradient-boosted model (GBM).16,17 Here, we have improved the method using the new GBM, more variables, and many more patients. A Web app has been developed that would help a clinician diagnose, and especially rule out, MDS noninvasively, without BM examination, in ≈86% of patients.

Methods: patients and model development

Patients

For the model, 502 (BM based) diagnosed MDS patients were randomly selected from the European MDS (EUMDS) registry.5,6 The criteria for MDS diagnosis in the EUMDS registry have been published earlier.8 To choose controls, we reviewed consecutive reports from the BM registry of the Tel Aviv Sourasky Medical Center (TASMC).16,17 The control group included subjects aged 50 years and older who had undergone BM examination (BME) between January 2011 and December 2018, with BM reported as normal. The indication for BME in most of these individuals was the evaluation of an unexplained anemia; some, for staging of lymphoproliferative disorders. Patients with BM involvement as a part of a hematological or other disease or with any degree of BM dysplasia could not serve as controls. The characteristics of the control group (n = 502), as well as the MDS patient study group, are both described in Table 1.

Table 1.

Patient characteristics

MDS, mean (SD) or % Controls, mean (SD) or % P
Age, y 72.5 (9.9) 69.3 (9.8) <10−4
Sex, M/F 57/43 58/42 =.85
Hb 10.0 (1.9) 11.2 (2.2) <10−4
WBC 5.1 (3.0) 7.8 (5.4) <10−4
Platelets 205 (154) 213 (140) =.38
MCV 97.1 (10.6) 89.9 (9.4) <10−4
Neutrophils 3.0 (2.4) 5.4 (4.7) <10−4
Monocytes 0.45 (0.41) 0.65 (0.54) <10−4
Glucose, g/dL 111.6 (39.0) 117.0 (51.9) =.075
Creatinine, g/dL 1.0 (0.42) 1.3 (1.04) <10−4

M/F, male/female ratio; SD, standard deviation.

The institutional review board of the Tel Aviv Sourasky Medical Center approved this study, which was conducted in accordance with the Declaration of Helsinki.

Model development

The clinical and laboratory variables listed in Table 1 (age, sex, Hb, MCV, WBC, PLT, neutrophil and monocyte counts, serum glucose and creatinine) were entered as explanatory variables into a logistic GBM,18,19 with case (MDS patients) or control (patient with MDS excluded) status as outcome, using the R package gbm.20 Most of the variables included in the model were selected from among those routinely measured in patients referred for BME, on the basis of their known association with MDS.8 The caret package21 was used to search for optimal model parameters and also to estimate out-of-sample model performance using 10 times 10-fold cross-validation. The final model used an interaction depth of 5, a shrinkage parameter of 0.001, and was constrained to have at least 10 observations at each terminal node. Because the caret training function requires a complete variable data set, missing values in the data were imputed using bagged tree models for each variable (using the caret function preProcess). Imputation was not required for the final model, as the gradient-boosted trees can naturally deal with missing data. As data on MDS patients (cases) and controls were obtained from separate sources, with different degrees of precision, all variables were rounded to common precision. This ensured a model fitting to the values and not to the precision. Because of the stochastic nature of a GBM, the sensitivity of the model performance to the choice of the random number seed was examined.

Positive predictive values (PPVs) and negative predictive values (NPVs) were calculated assuming a 20% prevalence of MDS within the population of patients to which the model would be applied in practice: that is, patients with unexplained anemia, in whom other causes of anemia have been excluded, who would likely undergo BM examination in clinical practice.22,23 We also examined a 2-threshold system in which the model is predictive of MDS diagnosis with high PPV above the upper threshold and predictive of MDS exclusion with high NPV below the lower threshold. We targeted a PPV of 90% for the upper threshold and NPV of 95% for the lower threshold. Finally, we repeated the analysis with pretest probabilities of 10% and 30% in addition to the main analysis with 20% probability of disease. All analyses were performed using the software package R, version 3.5.2.24

Results

In Figure 1, the distribution of scores from the GBM, stratified by known case/control status, is shown. The red bars on the right represent patients diagnosed with MDS (cases) and the green bars (left) represent patients for whom MDS has been ruled out by BME (controls). The lavender region represents the overlap between case and control patients. It is notable that there is an excellent separation between patients with and without MDS. Note that in this figure, case and control prevalence is assumed equal, to illustrate the score distributions most clearly; in practice, case prevalence is likely to be much lower (we have taken 20% as indicative in our calculations, see "Model development").

Figure 1.

Figure 1.

GBM probability scores stratified by case (red) and control (green) status. The lavender region represents overlap between case and control patients. Threshold values of 0.68 (green vertical line) and 0.82 (red line) are indicated; above the red threshold value a patient is predicted to have MDS, below the green threshold, the patient is predicted not to have MDS. Between these 2 threshold values, no prediction is made. In this figure, case and control prevalence is assumed equal to illustrate the score distributions most clearly; in practice, case prevalence is likely to be much lower (we have taken 20% as indicative in our calculations).

The area under the receiver operating characteristic curve (AUC) for the model fit on the full training data was 0.96 (95% confidence interval [CI], 0.95-0.97) (Figure 2).

Figure 2.

Figure 2.

Receiver operating characteristic curve for the fitted GBM. The AUC is 0.96 (95% CI, 0.95-0.97).

The relative influence of each of the 10 variables in the GBM18 is shown in Figure 3. Note that the first 3 variables (in order of importance: MCV, serum creatinine, and neutrophil count) are responsible for >55% of the influence on the predictive model. Other hematologic and chemistry variables, including lactate dehydrogenase, bilirubin, and other routine laboratory parameters were tested and found to have an insignificant contribution.

Figure 3.

Figure 3.

Relative influence values of variables in the GBM. Creat, creatinine; Gluc, glucose; Mono, monocyte; Neut, neutrophil.

The model has a sensitivity of 88% and specificity of 95%. Assuming a case (MDS) prevalence of 20% in the population of patients with unexplained anemia,23 setting a probability threshold of 0.68 to achieve an NPV of 0.95, any patient with a predicted GBM probability (GBMP) of <0.68 would be classified as predicted not to have MDS. Setting a probability threshold of 0.82 (i.e. GBMP ≥ 0.82) would classify a subject as predicted to have MDS, and would achieve a PPV of 0.90 (at which point the NPV is also 0.90). In reality, the upper and lower thresholds achieved PPV and NPV of 88.4% and 94.4% respectively (Table 2). Using these two thresholds defines three regions: (i) for GBMP ≥ 0.82 a patient is predicted to have probable MDS (pMDS, Figure 1, red vertical line, on the right), (ii) for GBMP < 0.68 a patient is predicted to be probably not MDS (pnMDS, Figure 1, green line) and (iii) for 0.68 ≤ GBMP < 0.82, no prediction is made (between the 2 lines). Here, 5% of controls and 23% of MDS patients (14% of the entire group) lie in the no-prediction zone between these 2 thresholds. For a comparison, in our earlier logistic regression model, ≈50% of the patients fell into this region.15

To determine the robustness of this model, we have examined its predictive characteristics in a variety of situations. Although most patients being evaluated for MDS have anemia, others have deficiencies in other cell lines, or in multiple cell lines. Table 2 displays the PPV and NPV for patients with anemia, neutropenia, and thrombocytopenia, as well as bi- and pan-cytopenia. Approximately 90% of the MDS patients had anemia; ≈35% to 40% of them had neutropenia, thrombocytopenia or bicytopenia, and ≈15% had pancytopenia, all according to World Health Organization (WHO) criteria. Using the more severe cytopenia criteria as would be used for the International Prognostic Scoring System (IPSS) score, ≈50% of MDS patients were severely anemic, ≈20-25% neutropenic, thrombocytopenic, or bicytopenic, and 5% pancytopenic. The PPV is lower, ranging from 72% to 90% (58% for severe pancytopenia), and the CIs broaden. Most important, however, is that the NPV and the lower limits of its 95% CI are all above 90%. This emphasizes the importance of this model at this stage as an effective “rule out” predictor.

Table 2.

PPV and NPV for all patients and those with anemia, neutropenia, and thrombocytopenia, also demonstrated for patients with bicytopenia and pancytopenia

MDS, n (%) No MDS, n (%) PPV, % [95% CI] NPV, % [95% CI]
Total (all patients) 502 (100) 502 (100) 88.4 [79.9, 93.6] 94.3 [93.4, 95.2]
Cytopenia: WHO *
 Anemia 454 (90.44) 354 (70.52) 85.0 [74.8, 91.6] 94.8 [93.8, 95.7]
 Neutropenia 178 (35.46) 66 (13.20) 73.4 [51.5, 87.8] 97.4 [95.9, 98.4]
 Thrombocytopenia 210 (41.83) 174 (34.66) 86.8 [67.9, 95.3] 93.2 [91.6, 94.5]
 Bicytopenia 184 (36.65) 112 (22.40) 82.2 [60.2, 93.5] 93.3 [91.5, 94.8]
 Pancytopenia 83 (16.53) 31 (6.20) 72.3 [40.4, 91.0] 98.2 [95.7, 99.2]
Severe cytopenia: IPSS
 Anemia 244 (48.61) 151 (30.08) 85.7 [69.4, 94.1] 95.7 [94.3, 96.8]
 Neutropenia 135 (26.89) 48 (9.60) 89.1 [54.0, 98.3] 97.6 [95.7, 98.6]
 Thrombocytopenia 124 (24.70) 103 (20.52) 90.0 [60.9, 96.3] 93.8 [91.7, 95.4]
 Bicytopenia 94 (18.73) 41 (8.20) 84.1 [48.7, 93.9] 96.9 [94.6, 98.2]
 Pancytopenia 25 (4.98) 9 (1.80) 57.5 [21.5, 79.8] 97.5 [90.8, 99.4]

IPSS, International Prognostic Scoring System; WHO, World Health Organization.

*

Cytopenia according to WHO criteria: anemia (hemoglobin: <12 g/dL, women; <13 g/dL, men), neutropenia (absolute neutrophil count, <1.8 × 109/L), and thrombocytopenia (platelets, <150 × 109/L).

Severe cytopenia, using IPSS criteria: anemia (hemoglobin, <10 g/dL), neutropenia (absolute neutrophil count, <1.5 × 109/L), and thrombocytopenia (platelets, <100 × 109/L).

Finally, we examined variation in pretest probability. We have assumed that the a priori prevalence of MDS in our patient population with unexplained anemia is ≈20%. Recognizing that this prevalence could vary according to age or other factors, we looked at the model’s performance with the full data set, also using 10% and 30% pretest probabilities.

Using an a priori prevalence of 10%, PPV = 77.2% (95% CI, 63.8%, 86.7%) and NPV = 97.4% (97.0%, 97.8%). With a 30% prevalence, PPV = 92.9% (87.2%, 96.2%) and NPV = 90.8% (89.3%, 92.1%).

To evaluate and internally validate the model, 25 times repeated fivefold cross-validation was used on the training data to get an estimate of out-of-sample performance. The cross-validation process was performed on the GBM fitting process, under the assumptions of fixed shrinkage value and interaction depth. This gave an AUC of 0.88. For comparison, logistic regression achieved an AUC of 0.82 under similar repeated cross-validation. The choice of random number seed used in the GBM construction was examined and the model was found to be insensitive to this choice.

To translate this methodology to a practical tool for clinicians, we have developed a Web-based predictor calculator (Figure 4). Figure 4A provides both the Web address as well as the quick response code. Upon entering the Web site, a window opens into which the values for the 10 variables should be entered (Figure 4B). In Figure 5, 3 examples are shown demonstrating typical data for patients with pMDS (Figure 5A), pnMDS (Figure 5B), and indeterminate diagnosis (Figure 5C), respectively. Note that this figure is created assuming a case prevalence of 20% (as opposed to Figure 1, where 50% was assumed).

Figure 4.

Figure 4.

The Web-based app for the noninvasive diagnostic tool. (A) The quick response (QR) code and the full Web address allow entrance to the Web site. (B) Once in the site, the window opens for entering the values of the 10 variables and calculating the probability of having MDS. The variables: age, sex, Hb, MCV, WBC, neutrophil count, monocyte count, platelet count (Plt), serum creatinine, and serum glucose. F, female; M, male.

Figure 5.

Figure 5.

Examples of the predictive app in practice. Values for a given patient are entered into the appropriate spaces, and the calculate button is pressed. A blue line indicates the probability of the patient having MDS. (A) Values are entered for a patient with pMDS. Note the position of the blue line in the red region. (B) Values for a patient who probably does not have MDS (pnMDS). (C) Patient with an indeterminate diagnosis. In this figure, a case prevalence of 20% is assumed (as opposed to Figure 1 where 50% was assumed).

In summary, assuming that the target population would be ≈20% of patients with unexplained anemia, 10 simple parameters are used in the model. The model sensitivity and specificity are 88% and 95% respectively, with an NPV and PPV of 0.94 and 0.88, respectively. The model helps in exclusion or diagnosis of MDS in 86% of the tested individuals.

Discussion

In 1959, B. J. Davis reported on the use of machine learning to improve diagnostic hematology.25 Today, digital and computational techniques are revolutionizing medicine. The possibility of collecting and analyzing large amounts of data has allowed the development of predictive models for new diagnostic techniques.4,26 These are already being applied in several fields, such as imaging,27,28 nuclear medicine,29 and pathology.30 Digital tools can also improve monitoring, predict outcome and course, and assist in the treatment of disease. Several examples of the endless potential of these tools include: electrocardiographic imaging for monitoring arrhythmias from the body surface,31-33 a smart watch to detect atrial fibrillation,34 a computational algorithm that can predict septic shock,35 tools that can monitor and control hypertension,36 and the development of prostheses by 3-dimensional techniques.37

Less attention has been paid to another potential role of these tools: improving quality of life using less-invasive techniques, while maintaining high accuracy. Today, diagnostic procedures and treatments are assessed not only by their effect on morbidity and mortality, efficacy and toxicity, but also by their effect on quality of life, as well as parameters reported by the patients (patient-reported outcomes).38-44

Here, we propose a noninvasive tool that might, in some situations, obviate the need for a BME, the gold standard for the diagnosis of MDS.7,8 This approach may be appropriate as a predictive tool for the primary care physician evaluating anemic patients, especially those who may be reluctant to undergo a BME.

In clinical practice, we often encounter elderly patients with mildly symptomatic (especially macrocytic) anemia or pancytopenia, for whom the initial workup has excluded the common causes, such as iron, B12, or folate deficiencies, or hemolysis. These individuals have an unexplained anemia and a BME would be the next recommended diagnostic step. This is the patient population who might benefit from such a novel noninvasive diagnostic technique.

The developed computer app is based on an analysis following data collection from >1000 individuals, MDS patients, and non-MDS controls, all BM proven. Several internal validations have confirmed the reliability of the predictive model. In practice, to help in the diagnosis or exclusion of MDS with this model, one needs only to enter 10 readily available clinical parameters such as the patient’s age, sex, blood counts, and routine blood chemical values. The result is a picture and a predictive conclusion (Figures 1 and 5): pMDS (the red area), pnMDS (green), or indeterminate (lavender). We have found that, in this patient population with unexplained anemia, ≈86% of them can have a determination of either pMDS or pnMDS. In the remaining indeterminate group, the patient and the physician would have to discuss whether the BME should be performed to make the definitive diagnosis. Although a long delay in diagnosis can be detrimental, postponing the decision for only 3 to 4 months is usually harmless in this lower-risk population.

We examined the model in patients with neutropenia and thrombocytopenia as well as in those with bicytopenia and pancytopenia. We found that the predictive model continues to be reliable especially with MDS exclusion in almost all of these categories, with NPV values all above 90% and relatively narrow 95% CIs. Moreover, the lower boundaries of the 95% CI are all above 90% as well.

As expected, for prediction of MDS in these groups the accuracy is somewhat diminished, and the 95% CIs are widened. This is in large part owing to the small numbers of patients in these groups. It is likely that for patients with multiple cytopenias, a BM evaluation would be indicated, irrespective of the model prediction.

Most of the variables found to be relevant and introduced into the model (Table 1; Figures 3 and 4) were expected to have an impact and help in the diagnosis. The likelihood of MDS is expected to increase as Hb, WBC, neutrophil, and platelet counts are reduced. The likelihood may also increase with increasing age, and sex has little effect, as expected. These were seen in the model (Figure 3). However, the impact of 2 variables, creatinine and glucose, was less expected. A possible hypothesis for the inverse relationship between creatinine and the incidence of MDS is that normal serum creatinine excludes the anemia associated with renal failure and makes the diagnosis of MDS more likely. The association of glucose and MDS requires further investigation. It is worth mentioning that impaired glucose metabolism in red blood cells,45 and involvement of glucose metabolism in the erythropoiesis in MDS patients, has already been reported.46-48 MCV in diabetes has been investigated but no definitive conclusions made. Although studies reported on lower MCV,49,50 others suggested that the hyperosmolarity is associated with an increased MCV.51 One should bear in mind that variables with high predictive value do not necessarily predict causality. These unexpected findings, however, highlight the power of such computer-based analyses, where the data and the machine learning draw our attention to new biologic phenomena that we had not noticed previously.

The proposed predictive model has some limitations. Although it has a high potential to help in the diagnosis or exclusion of MDS, certain relevant information has not yet been integrated into the model, especially morphology, blast percentage, genetics, and cytogenetics. We and others have suggested that BM morphology is not only subjective, but may also be less important today than in the past.52,53 MDS is not the first hematologic disease diagnosed without a BME. Chronic lymphocytic leukemia is diagnosed using peripheral blood (PB) cytogenetics and flow cytometry,54 and polycythemia vera is diagnosed with the demonstration of JAK-2 mutation in PB.55 Although the BM blast percentage, cytogenetics, and mutational analysis would also not be available, these limitations could eventually be overcome by obtaining PB genetic information,56,57 by flow cytometry,58 and also by medical imaging.59,60

A recent study has demonstrated that specific morphologies may be associated with somatic mutations.61 Perhaps, conversely, specific genetic signatures reflect corresponding morphologic changes. Thus, such genetic mutational information, when obtained from PB, could be a complementary component on the way toward a noninvasive MDS diagnosis, avoiding BME. Other studies on using machine learning diagnostic models have recently been reported.62-64

Today, next-generation sequencing is available in many laboratories and helps in the diagnosis of MDS.8 However, this technique is still not a standard in much of the world and is still not a mandatory component of the diagnosis of MDS. Moreover, although myeloid mutations are increasingly seen with advancing age and are associated with a markedly greater incidence of MDS, their presence is still not sufficient for diagnosis because the vast majority of patients with such genetic signatures do not have MDS.65-67 Although the exact place of the myeloid mutations is not fully determined at this time, its increasing importance makes it very likely that future incorporation of such information into our model will only improve its predictive quality. In the meantime, such a predictive model might be applied by any physician in the community, without the need for performing mutation analysis.

At this time, the principal use of this method would be to help in ruling out MDS without a BME. A BME would be recommended for the indeterminate patients to make a diagnosis, and for those with pMDS, to obtain the morphologic and genetic information. Of course, a BME would also be necessary when the diagnoses of other diseases are under consideration. We envision that, in the future, as the methods for obtaining PB genetic information are perfected, our model would be used to make the diagnosis as well.

Another limitation of the proposed model relates to the control population and to the model’s generalizability. The predictive model and the thresholds set were based on our MDS and control patients, where we assumed a 20% prevalence of MDS in the population of unexplained anemia. Although there is a great deal of information on the prevalence of MDS in the general population, there is a paucity of such information in our population. Whether the prevalence is the same for various regions around the world is also not clear. Because of the paucity of data, we made assumptions of prevalence based on personal experience, the experience of colleagues, and the literature. Our experience, along with that of our colleagues, estimated the MDS prevalence to range from 10% to 30%. We found similar results in estimations and extrapolations from the literature and then chose 20% as the pretest probability for the model.22,23,68-70 The ideal control is the patient with unexplained anemia after the initial negative workup, who has a normal BME. In reality, however, not all control patients fell into that category. Although all of them were at least 50 years old and had a normal BME, a portion of them had undergone the procedure as a part of staging for lymphoproliferative disorder. A control group consisting only of patients with unexplained anemia and a negative workup could probably result in a more accurate diagnostic model. We used our control group and assumed a 20% prevalence of MDS knowing well that neither assumption is perfect. We also do not know for certain whether any of our control patients had a suspicious myeloid mutation or eventually developed MDS with time. It is also possible that some of them had idiopathic or clonal cytopenia of undetermined significance (ICUS or CCUS), but the numbers would be small given the small prevalence of these in the general population. The control group reflects a real-world situation, but to determine the dependence of our method on the a priori prevalence, we checked its performance using 10% and 30% prevalence in addition to the 20%. We found that the NPV remains high, but that PPV is reduced with lower pretest probability. Our future work will perform a prospective external validation using new patient data (MDS and controls) from various centers in the EUMDS group and eventually branch out to other world locations. At least a portion of these data will include genetic information, allowing us to fine-tune the model and examine its robustness.

Because of these limitations, it would still be important for the physician to follow the patient, and with time, if there is a still a significant level of uncertainty, to consider performing a BME to make the definitive diagnosis.

Despite the limitations, the proposed model is indeed a step toward a less-invasive method to help in diagnosis or exclusion of MDS in the patient with unexplained anemia. Another group developed a basic MDS model with 4 variables using logistic regression, and the AUC to predict that confirmed MDS was 0.67.22,71 In our earlier logistic regression MDS model with 6 variables, the AUC was 0.75,15 the NPV was 0.87, and the PPV was 0.65.35 These compare with our current gradient-boosted MDS model, in which the AUC, NPV, and PPV are 0.96, 0.94, and 0.88, respectively.

This MDS model has the potential to be more than a helpful tool in the diagnostic process. In the future, this model could also be tested on patients for estimating prognosis (which at this time requires a BME) and following the GBM score as disease progresses and as patients respond the therapy. Moreover, broadening the concept, it may serve as a platform or example of incorporating big data and machine learning into the diagnostic process of diseases in general, and can serve to stimulate research to use such databases to develop similar noninvasive predictive models for a variety of other diseases.

In summary, a Web-based computer app has been developed to help the physician primarily to exclude MDS in a cytopenic individual and also to predict the possibility of MDS without performing the invasive BME. The app is based on analysis of data collected from >1000 individuals. Ten readily available clinical variables of the suspected patients are introduced into the app to assess the probability that the patient has MDS. In the future, we plan to increase the number of measured variables (eg, red blood cell distribution width, whose relevance has recently been demonstrated72) to improve the predictive power of the model. Moreover, as planned by the EUMDS group, the model will be validated with independent prospective patient data, and applications will be developed to test using the model as a predictive prognostic tool in addition to diagnosis.

Acknowledgments

The authors thank Yocheved Akiva for assistance in preparing the manuscript and Nitzan Cohen Sagy for assistance as research coordinator.This work was carried out within the BM registry of the Tel Aviv Sourasky Medical Center (TASMC) and the EUMDS Registry. The authors acknowledge all patients whose data were contributed to these registries, as well as all local investigators and operational team members for their continuing contribution to the EUMDS registry. The EUMDS Registry is supported by an educational grant from Novartis Pharmacy B.V. Oncology Europe, Amgen Limited, Celgene International, Janssen Pharmaceutica, and Takeda Pharmaceuticals International.

Authorship

Contribution: H.S.O., M.M., and T.d.W. designed and performed the research, analyzed the data, and wrote the paper; S.C., A. Smith, and G.Y. contributed vital analytical tools and contributed to writing the paper; B.A.S., S.B., A.K., S.N., and J.B.-E. gathered the data; and P.F., A. Symeonidis, R.S., J.C., G.S., E.H.-L., L.M., S.L., U.G., M.S.H., K.M., A.G.-B., D.C., L.S., J.M., I.K., C.v.M., and D.B. were involved in study design and writing the paper.

Conflict-of-interest disclosure: The authors declare no competing financial interests for the work described in this manuscript. Potentially perceived conflicts of interest outside the submitted work are as follows. A. Smith received research funding from Novartis, Cilag-Janssen, and Boehringer Ingelheim. P.F. received research funding and/or honoraria from Aprea, Astex, Celgene Corporation, and Jazz Pharmaceuticals. A. Symeonidis received institutional research funding, honoraria and/or consulting fees from Abbvie, Amgen, Bristol-Myers Squibb, Celgene/GenesisPharma, Gilead, Janssen-Cilag, Merck Sharp & Dohme, Novartis, Pfizer, Roche, Sanofi/Genzyme, and Takeda. R.S. received research funding, honoraria and/or consulting fees from Celgene, Novartis, and Teva (Ratiopharm). E.H.-L. received research funding from Celgene. U.G. received research funding and/or honoraria from Amgen, Celgene, Jazz Pharmaceuticals, and Novartis. C.v.M., project manager of the EUMDS Registry, is funded from the EUMDS (educational grants from Novartis Pharmacy B.V. Oncology Europe, Amgen Limited, Celgene International, Janssen Pharmaceutica, and Takeda Pharmaceuticals International) and MDS-RIGHT (grant from EU’s Horizon 2020 program) project budgets. T.d.W. received research funding from Amgen, Celgene, Janssen, Novartis, and Takeda during the conduct of the study, as project coordinator EUMDS. M.M. received research funding and/or honoraria from Novartis. The remaining authors declare no competing financial interests.

Correspondence: Moshe Mittelman, Department of Medicine, Tel Aviv Sourasky Medical Center, 6 Weizmann St, Tel-Aviv 64239, Israel; e-mail: moshemt@tlvmc.gov.il; and Howard S. Oster, Department of Medicine, Tel Aviv Sourasky Medical Center, 6 Weizmann St, Tel-Aviv 64239, Israel; e-mail: howardo@tlvmc.gov.il.

References

  • 1.Newby DE, Adamson PD, Berry C, et al. SCOT-HEART Investigators . Coronary CT angiography and 5-year risk of myocardial infarction. N Engl J Med. 2018;379(10):924-933. [DOI] [PubMed] [Google Scholar]
  • 2.Smith LN, Smith ML, Fletcher ME, Henderson AJ. A 3D machine vision method for non-invasive assessment of respiratory function. Int J Med Robot. 2016;12(2):179-188. [DOI] [PubMed] [Google Scholar]
  • 3.Rotenstein LS, Huckman RS, Wagle NW. Making patients and doctors happier - the potential of patient-reported outcomes. N Engl J Med. 2017;377(14):1309-1312. [DOI] [PubMed] [Google Scholar]
  • 4.Greene JA, Lea AS. Digital futures past - the long arc of big data in medicine. N Engl J Med. 2019;381(5):480-485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.de Swart L, Crouch S, Hoeks M, et al. EUMDS Registry Participants . Impact of red blood cell transfusion dose density on progression-free survival in patients with lower-risk myelodysplastic syndromes. Haematologica. 2020;105(3):632-639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.de Swart L, Smith A, Johnston TW, et al. Validation of the revised international prognostic scoring system (IPSS-R) in patients with lower-risk myelodysplastic syndromes: a report from the prospective European LeukaemiaNet MDS (EUMDS) registry. Br J Haematol. 2015;170(3):372-383. [DOI] [PubMed] [Google Scholar]
  • 7.DeAngelo DJ, Stone RM. Myelodysplastic syndromes: biology and treatment. In: Hoffman R, Benz EJ Jr, Silberstein LE, Heslop H, Anastasi J, Weitz J, eds. Hematology: Basic Principles and Practice. Philadelphia, PA: Elsevier Health Sciences; 2013:882-903. [Google Scholar]
  • 8.Malcovati L, Hellström-Lindberg E, Bowen D, et al. European Leukemia Net . Diagnosis and treatment of primary myelodysplastic syndromes in adults: recommendations from the European LeukemiaNet. Blood. 2013;122(17):2943-2964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mangi MH, Mufti GJ. Primary myelodysplastic syndromes: diagnostic and prognostic significance of immunohistochemical assessment of bone marrow biopsies. Blood. 1992;79(1):198-205. [PubMed] [Google Scholar]
  • 10.Saad ST, Vassallo J, Arruda VA, Lorand-Metze I. The role of bone marrow study in diagnosis and prognosis of myelodysplastic syndrome. Pathologica. 1994;86(1):47-51. [PubMed] [Google Scholar]
  • 11.Ríos A, Cañizo MC, Sanz MA, et al. Bone marrow biopsy in myelodysplastic syndromes: morphological characteristics and contribution to the study of prognostic factors. Br J Haematol. 1990;75(1):26-33. [DOI] [PubMed] [Google Scholar]
  • 12.Tricot G, De Wolf-Peeters C, Vlietinck R, Verwilghen RL. Bone marrow histology in myelodysplastic syndromes. II. Prognostic value of abnormal localization of immature precursors in MDS. Br J Haematol. 1984;58(2):217-225. [DOI] [PubMed] [Google Scholar]
  • 13.Social Security Administration. Disability Evaluation Under Social Security. 7.10 Disorders Of Bone Marrow Failure. Accessed 29 August 2019.
  • 14.Israel National Insurance Agency. Disability Level Establishment: Hematologic Disorders [in Hebrew]. Accessed 29 August 2019.
  • 15.Oster HS, Carmi G, Kolomansky A, et al. Is bone marrow examination always necessary to establish the diagnosis of myelodysplastic syndromes? A proposed non-invasive diagnostic model. Leuk Lymphoma. 2018;59(9):2227-2232. [DOI] [PubMed] [Google Scholar]
  • 16.Oster HS, Abu Shrkihe B, Crouch S, et al. Can we diagnose MDS without bone marrow examination? a proposed EUMDS-based non-invasive diagnostic model [abstract]. Blood. 2017;130(suppl 1):2975. [Google Scholar]
  • 17.Oster HS, Crouch S, Smith A, et al. MDS diagnosis: many patients may not require bone marrow examination [abstract]. Blood. 2018; 132(suppl 1):4357. [Google Scholar]
  • 18.Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189-1232. [Google Scholar]
  • 19.Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367-378. [Google Scholar]
  • 20.Greenwell B, Boehmke B, Cunningham J, Developers G. GBM: Generalized Boosted Regression Models. R package version 214. Accessed 16 June 2021.
  • 21.Kuhn M, Wing J, Weston S, et al. Caret: Classification and Regression Training. R package version 60-81. Accessed 16 June 2021.
  • 22.Buckstein R, Jang K, Friedlich J, et al. Estimating the prevalence of myelodysplastic syndromes in patients with unexplained cytopenias: a retrospective study of 322 bone marrows. Leuk Res. 2009;33(10):1313-1318. [DOI] [PubMed] [Google Scholar]
  • 23.Guralnik JM, Eisenstaedt RS, Ferrucci L, Klein HG, Woodman RC. Prevalence of anemia in persons 65 years and older in the United States: evidence for a high rate of unexplained anemia. Blood. 2004;104(8):2263-2268. [DOI] [PubMed] [Google Scholar]
  • 24.R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2018.
  • 25.Davis BJ. The application of computers to clinical medical data (including machine demonstration). In: Proceedings of the 1st IBM Medical Symposium; 15-17 June 1959; Poughkeepsie, NY. 1959;179-185.
  • 26.Banerjee A, Mathew D, Rouane K. Using patient data for patients’ benefit [editorial]. BMJ. 2017;358:j4413. [DOI] [PubMed] [Google Scholar]
  • 27.Lewis SJ, Gandomkar Z, Brennan PC. Artificial intelligence in medical imaging practice: looking to the future. J Med Radiat Sci. 2019;66(4): 292-295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sakai A, Onishi Y, Matsui M, et al. A method for the automated classification of benign and malignant masses on digital breast tomosynthesis images using machine learning and radiomic features. Radiol Phys Technol. 2019;13(1):27-36. [DOI] [PubMed] [Google Scholar]
  • 29.Aktolun C. Artificial intelligence and radiomics in nuclear medicine: potentials and challenges. Eur J Nucl Med Mol Imaging. 2019;46(13): 2731-2736. [DOI] [PubMed] [Google Scholar]
  • 30.Serag A, Ion-Margineanu A, Qureshi H, et al. Translational AI and deep learning in diagnostic pathology. Front Med (Lausanne). 2019;6:185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Oster HS, Taccardi B, Lux RL, Ershler PR, Rudy Y. Noninvasive electrocardiographic imaging: reconstruction of epicardial potentials, electrograms, and isochrones and localization of single and multiple electrocardiac events. Circulation. 1997;96(3):1012-1024. [DOI] [PubMed] [Google Scholar]
  • 32.Oster HS, Taccardi B, Lux RL, Ershler PR, Rudy Y. Electrocardiographic imaging: noninvasive characterization of intramural myocardial activation from inverse-reconstructed epicardial potentials and electrograms. Circulation. 1998;97(15):1496-1507. [DOI] [PubMed] [Google Scholar]
  • 33.Rudy Y. Noninvasive electrocardiographic imaging of arrhythmogenic substrates in humans. Circ Res. 2013;112(5):863-874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Perez MV, Mahaffey KW, Hedlin H, et al. Apple Heart Study Investigators . Large-scale assessment of a smartwatch to identify atrial fibrillation. N Engl J Med. 2019;381(20):1909-1917. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yee CR, Narain NR, Akmaev VR, Vemulapalli V. A data-driven approach to predicting septic shock in the intensive care unit. Biomed Inform Insights. 2019;11:1178222619885147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Milani RV, Lavie CJ, Bober RM, Milani AR, Ventura HO. Improving hypertension control and patient engagement using digital tools. Am J Med. 2017;130(1):14-20. [DOI] [PubMed] [Google Scholar]
  • 37.Alpert JS. Digital medicine: “O Brave New World”. Am J Med. 2017;130(3):243-244. [DOI] [PubMed] [Google Scholar]
  • 38.Basch E, Deal AM, Kris MG, et al. Symptom monitoring with patient-reported outcomes during routine cancer treatment: a randomized controlled trial [published corrections appear in J Clin Oncol. 2016;34(18):2198 and J Clin Oncol. 2019;37(6):528]. J Clin Oncol. 2016;34(6):557-565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Donovan K, Sanson-Fisher RW, Redman S. Measuring quality of life in cancer patients. J Clin Oncol. 1989;7(7):959-968. [DOI] [PubMed] [Google Scholar]
  • 40.Hsiao CJ, Dymek C, Kim B, Russell B. Advancing the use of patient-reported outcomes in practice: understanding challenges, opportunities, and the potential of health information technology. Qual Life Res. 2019;28(6):1575-1583. [DOI] [PubMed] [Google Scholar]
  • 41.Nelson EC, Eftimovska E, Lind C, Hager A, Wasson JH, Lindblad S. Patient reported outcome measures in practice. BMJ. 2015;350:g7818. [DOI] [PubMed] [Google Scholar]
  • 42.Schnipper LE, Davidson NE, Wollins DS, et al. Updating the American Society of Clinical Oncology value framework: revisions and reflections in response to comments received. J Clin Oncol. 2016;34(24):2925-2934. [DOI] [PubMed] [Google Scholar]
  • 43.Stauder R, Lambert J, Desruol-Allardin S, et al. Patient-reported outcome measures in studies of myelodysplastic syndromes and acute myeloid leukemia: literature review and landscape analysis. Eur J Haematol. 2020;104(5):476-487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Stauder R, Yu G, Koinig KA, et al. Health-related quality of life in lower-risk MDS patients compared with age- and sex-matched reference populations: a European LeukemiaNet study. Leukemia. 2018;32(6):1380-1392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cetto GL, Vettore L, De Matteis MC, Piga A, Perona G. Erythrocyte cation content, globin chain synthesis and glucose metabolism in dysmyelopoietic syndromes. Acta Haematol. 1982;68(2):124-130. [DOI] [PubMed] [Google Scholar]
  • 46.Basiorka AA, McGraw KL, Abbas-Aghababazadeh F, et al. Assessment of ASC specks as a putative biomarker of pyroptosis in myelodysplastic syndromes: an observational cohort study. Lancet Haematol. 2018;5(9):e393-e402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bouronikou E, Georgoulias P, Giannakoulas N, et al. Metabolism-related cytokine and hormone levels in the serum of patients with myelodysplastic syndromes. Acta Haematol. 2013;130(1):27-33. [DOI] [PubMed] [Google Scholar]
  • 48.Hamoudeh E, Zeidan AM, Barbarotta L, Rosano N. The interactions between diabetes mellitus and myelodysplastic syndromes: current state of evidence and future directions. Curr Diabetes Rev. 2016;12(3):231-239. [DOI] [PubMed] [Google Scholar]
  • 49.Kachekouche Y, Dali-Sahi M, Benmansour D, Dennouni-Medjati N. Hematological profile associated with type 2 diabetes mellitus. Diabetes Metab Syndr. 2018;12(3):309-312. [DOI] [PubMed] [Google Scholar]
  • 50.Wu TJ, Chuang LM, Tai TY. Erythrocyte deformability in diabetes mellitus. Taiwan Yi Xue Hui Za Zhi. 1989;88(3):240-243. [PubMed] [Google Scholar]
  • 51.Cintra LT, da Silva Facundo AC, Prieto AK, et al. Blood profile and histology in oral infections associated with diabetes. J Endod. 2014;40(8):1139-1144. [DOI] [PubMed] [Google Scholar]
  • 52.Calvo X, Arenillas L, Luño E, et al. Enumerating bone marrow blasts from nonerythroid cellularity improves outcome prediction in myelodysplastic syndromes and permits a better definition of the intermediate risk category of the Revised International Prognostic Scoring System (IPSS-R). Am J Hematol. 2017;92(7):614-621. [DOI] [PubMed] [Google Scholar]
  • 53.Greenbaum U, Joffe E, Filanovsky K, et al. Can bone marrow cellularity help in predicting prognosis in myelodysplastic syndromes? Eur J Haematol. 2018;101(4):502-507. [DOI] [PubMed] [Google Scholar]
  • 54.Hallek M. Chronic lymphocytic leukemia: 2020 update on diagnosis, risk stratification and treatment. Am J Hematol. 2019;94(11):1266-1287. [DOI] [PubMed] [Google Scholar]
  • 55.Spivak JL. How I treat polycythemia vera. Blood. 2019;134(4):341-352. [DOI] [PubMed] [Google Scholar]
  • 56.Bejar R, Papaemmanuil E, Haferlach T, et al. Somatic mutations in MDS patients are associated with clinical features and predict prognosis independent of the IPSS-R: analysis of combined datasets from the International Working Group for Prognosis in MDS-Molecular Committee [abstract]. Blood. 2015;126(23):907. [Google Scholar]
  • 57.Papaemmanuil E, Gerstung M, Malcovati L, et al. Chronic Myeloid Disorders Working Group of the International Cancer Genome Consortium . Clinical and biological implications of driver mutations in myelodysplastic syndromes. Blood. 2013;122(22):3616-3627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Duetz C, Westers TM, van de Loosdrecht AA. Clinical implication of multi-parameter flow cytometry in myelodysplastic syndromes. Pathobiology. 2019;86(1):14-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Agool A, Schot BW, Jager PL, Vellenga E. 18F-FLT PET in hematologic disorders: a novel technique to analyze the bone marrow compartment. J Nucl Med. 2006;47(10):1592-1598. [PubMed] [Google Scholar]
  • 60.Depaoli L, Davini O, Foggetti MD, et al. Evaluation of bone marrow cellularity by magnetic resonance imaging in patients with myelodysplastic syndrome. Eur J Haematol. 1992;49(2):105-107. [DOI] [PubMed] [Google Scholar]
  • 61.Nagata Y, Zhao R, Awada H, et al. Machine learning demonstrates that somatic mutations imprint invariant morphologic features in myelodysplastic syndromes. Blood. 2020;136(20):2249-2262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Baer C, Stengel A, Kern W, Haferlach C, Haferlach T. The potential of molecular genetic analysis for diagnostic and prognostic decision making in clonal cytopenia of undetermined significance (CCUS) and MDS - a study on 576 patients [abstract]. Blood. 2020;136(suppl):30-31. [Google Scholar]
  • 63.Goll JB, Jensen TL, Lindsley RC, et al. Targeted sequencing of 7 genes can help reduce pathologic misclassification of MDS [abstract]. Blood. 2020;136(suppl 1):32-33. [Google Scholar]
  • 64.Radakovich N, Meggendorfer M, Malcovati L, et al. A personalized clinical-decision tool to improve the diagnostic accuracy of myelodysplastic syndromes [abstract]. Blood. 2020;136(suppl 1):33-35. [Google Scholar]
  • 65.Genovese G, Kähler AK, Handsaker RE, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N Engl J Med. 2014;371(26):2477-2487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Jaiswal S, Fontanillas P, Flannick J, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med. 2014;371(26): 2488-2498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Malcovati L, Gallì A, Travaglino E, et al. Clinical significance of somatic mutation in unexplained blood cytopenia. Blood. 2017;129(25):3371-3378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Girelli D, Marchi G, Camaschella C. Anemia in the elderly. HemaSphere. 2018;2(3):e40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Goodnough LT, Schrier SL. Evaluation and management of anemia in the elderly. Am J Hematol. 2014;89(1):88-96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Pang WW, Schrier SL. Anemia in the elderly. Curr Opin Hematol. 2012;19(3):133-140. [DOI] [PubMed] [Google Scholar]
  • 71.Rauw J, Wells RA, Chesney A, Reis M, Zhang L, Buckstein R. Validation of a scoring system to establish the probability of myelodysplastic syndrome in patients with unexplained cytopenias or macrocytosis. Leuk Res. 2011;35(10):1335-1338. [DOI] [PubMed] [Google Scholar]
  • 72.Abelson S, Collord G, Ng SWK, et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature. 2018;559(7714):400-404. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Blood Advances are provided here courtesy of The American Society of Hematology

RESOURCES