Haemolytic anaemias arise when red blood cell (RBC) integrity is compromised, eventually resulting in premature clearance or lysis and leading to anaemia when these effects cannot be sufficiently compensated by the capacity of the bone marrow to produce new cells. 1 Hereditary anaemia occurs as a consequence of genetic mutation 2 (e.g. affecting membrane complex or cytoskeletal proteins, haemoglobin or metabolic enzymes), and diagnosing affected patients is a complex process since, given the wide variety of possible genetic causes, multiple examinations must be performed and an unambiguous result is usually reached only after DNA sequencing. 3 Furthermore, phenotypic severity can vary widely not just among individuals with different mutations but also among individuals suffering from the same mutation, thereby complicating diagnosis. 4
While molecular diagnoses have become increasingly easier, cheaper and faster to perform in recent years, constraints on their use still exist, 5 and phenotype‐based diagnostic methods still constitute an important proposition. Ektacytometry is a standard diagnostic platform for RBC disorders 6 , 7 but only provides cell population‐based data and requires a trained expert for data interpretation. Single‐cell rheoscopy can provide additional information, with higher complexity as a drawback; however, analysis of such data could potentially be facilitated by the use of machine learning (ML, automated, algorithm‐based systems that generate data‐driven predictions 8 ).
We present here a preliminary framework for automated rheoscopy‐based diagnosis of several types of hereditary haemolytic anaemia samples Fig 1A that requires low sample volumes and is efficient, rapid and expandable.
Fig 1.

Different hereditary rare anaemias display distinct area and deformability profiles. (A) Design of the method for automatic sample classification. Whole blood is collected by the clinician, and a sample is obtained and processed using an Automated Rheoscope and Cell Analyzer (ARCA). Images acquired are subjected to computational analysis to determine cross‐sectional area and deformability of at least 1000 individual cells, and the resulting datasets are then classified through trained computational models, achieving a diagnosis in less than 30 min. (B) Contour plots of cross‐sectional area plotted against the deformability index (as measured by dividing cell length by cell width), visualizing the probability distribution of erythrocytes (RBCs), cultured reticulocytes (reticulocytes) and erythrocytes treated with an anti‐Glycophorin A antibody (BRIC256, International Blood Group Reference Laboratory) before analysis to induce membrane stiffening (BRIC256 RBCs). The control erythrocyte and cultured reticulocyte data shown in this panel were previously reported in Moura et al.9 A minimum of 1000 cells were analysed per sample. All samples were analysed using the ARCA. (C) Contour plots of cross‐sectional area plotted against the deformability index (as measured by dividing cell length by cell width), visualizing the probability distribution of patient samples overlaid to allow for comparison with healthy controls. A minimum of 1000 cells were analysed per blood sample. All samples were analysed using the ARCA. The samples are listed from left to right: Top row: healthy controls (n = 6), hereditary spherocytosis patients (n = 13), congenital dyserythropoietic anaemia II patients (n = 9). Bottom row: pyruvate kinase deficiency patients (n = 6), dehydrated stomatocytosis type 1 or hereditary xerocytosis patients (n = 10), dehydrated stomatocytosis type 2 or Gardos xerocytosis patients (n = 3).
Materials and methods
Peripheral blood donor and patient samples
Healthy control donor and diagnosed patient samples were collected according to procedures approved by the research ethics committee and in accordance with the Declaration of Helsinki. In all, 47 blood samples were analysed at the University of Bristol (United Kingdom) following shipment from clinics in Milan (Italy) or Utrecht (the Netherlands) [6 controls, 13 hereditary spherocytosis (HS) patients, 9 congenital dyserythropoietic anaemia type II (CDAII) patients, 6 pyruvate kinase deficiency (PKD) patients, 10 hereditary xerocytosis/ dehydrated hereditary stomatocytosis (DHS) 1 (HX) patients and 3 Gardos xerocytosis/ DHS2 (GX) patients]. A further 26 samples (11 controls, 7 HS patients and 8 hereditary elliptocytosis [HE] patients) were analysed at Sanquin (Amsterdam, the Netherlands).
Automated Rheoscope and Cell Analyser
An amount of 1 µl of whole blood was diluted in 200 µl of a polyvinylpyrrolidone solution (viscosity 28·1 mPa·s). Samples were assessed in an Automated Rheoscope and Cell Analyser (ARCA) according to published protocols. 9 At least 1000 cells per sample were analysed, providing the deformability index (DI) and cross‐sectional area (area) quantification.
Computational analysis
A Python script was developed for statistical analysis, data visualisation and automatic dataset classification (Data availability). The full datasets used for training purposes were sampled and randomised into testing (500 cells) and training datasets (remainder). Deformability Index (DI) and area were normalised by the maximum measurable values (3.3/5.0 DI from Bristol and Sanquin, respectively, and 140 μm2 area) and the training datasets were repeatedly subjected to random sampling to generate 10,000 subsets of 500 cells each, followed by calculation of the average and standard deviation of the DI and area. Each sample category was then attributed unique identifiers. Classifiers were generated with the scikit‐learn package, 10 trained with the generated subsets and tested with the initial testing subsets. Classification of unseen datasets was performed by selecting the mode of the machine‐selected identifiers after 10,000 classifications.
Results and discussion
We have demonstrated in previous work that automated rheoscopy‐based analyses can elucidate differences arising from reticulocyte maturation 9 as well as loss of cellular stability. 11 A particularly interesting observation from the same work was the fact that combining the single‐cell deformability index (DI) and cross‐sectional area measurements provides a novel metric (Fig 1B) which to date has not been examined in the context of disease diagnosis.
Therefore, we evaluated whole blood samples from diagnosed anaemic patients of varied aetiologies (HS, CDAII, PKD, HX and GX) against healthy donors using the proposed methodology (Fig 1C). Crucially, despite these diseases being frequently misdiagnosed due to overlapping clinical or morphological phenotypes, 12 , 13 we observed them to display unique rheoscopy “fingerprints” upon visualisation.
Machine‐learning algorithms were next explored to automate the classification of ARCA data and thus facilitate the processing of larger numbers of samples, A flow chart listing the procedure used for these attempts is displayed in Fig 2A.
Fig 2.

Machine‐learning‐based classification of automated rheoscopy datasets provides accurate diagnoses for unseen samples. (A) Flow diagram outlining the procedure for ARCA‐based data visualisation and automated sample classification. The sample is first analysed to produce a raw data table. These data are then reorganised into a Python pandas (“panel data”) data frame for ease of processing. If visualisation is required, samples from a given sample type are stochastically equalised in cell number, joined and subjected to kernel density estimation to estimate the probability density functions of analysed features (e.g. cross‐sectional area, deformability index, cell angle) and then visualized through contour plots or scatter plots. Data to be used for machine learning undergo feature extraction (removal of all non‐essential information) and a subsection is sampled randomly (without reposition) for creation of a testing set. The remaining data then undergo augmentation by generation of a series of randomly sampled datasets (with reposition, 10,000×) which will be used for training a supervised machine‐learning algorithm. After training, a predictive model (i.e. classifier) is generated which first is tested with the previously generated testing set. Upon satisfactory results with the testing set, the classifier can then generate predictions for new unseen data. The final results consist of a sample label (or classification) and the certainty of that classification (B) Comparison of the overall prediction accuracy of multiple supervised machine‐learning algorithms in ARCA‐based automated sample diagnosis as a function of the number of datasets per condition used for classifier training (from no datasets used, which should result in a random diagnosis, to a maximum of six datasets), comparing the samples analysed at the University of Bristol (except Gardos xerocytosis samples, which were too few to analyse). Prediction accuracy is coloured on a percentage scale from red (0%) to blue (100%). The best‐performing algorithm per no. of datasets is bolded in the accuracy matrix. The graph displays the average prediction accuracy of all algorithms (blue). Error bars = ± standard deviation (SD). The prediction accuracies of the best‐performing algorithms are plotted in green, while the prediction accuracies of the worst‐performing algorithms are plotted in red. (C) Prediction accuracy of the best performing algorithm in (B). The samples used consist of healthy controls, congenital dyserythropoietic anaemia II patients (CDAII), hereditary spherocytosis patients (HS), hereditary xerocytosis patients (HX) and pyruvate kinase deficiency patients (PKD). Rows identify real samples provided, whilst columns identify the algorithm's prediction of the provided samples' identity. The blue diagonal indicates samples that were correctly diagnosed (true positives). Red cells in the surrounding matrix indicate incorrect diagnoses (i.e. two HS samples were misdiagnosed as CDAII and one HX sample was misdiagnosed as HS). Accuracy is provided as a percentage of the true positives within the total number of samples and is coloured on a percentage scale from red (0%) to blue (100%). Average accuracy is provided as an average of the accuracies for all sample types. Data for all other algorithms and sample numbers tested are provided in Figs S1–S7. (D) Comparison of the overall prediction accuracy of multiple supervised machine‐learning algorithms in ARCA‐based automated sample diagnosis as a function of the number of datasets used for training, comparing samples from healthy controls, hereditary spherocytosis patients and hereditary elliptocytosis patients analysed at Sanquin. The graph displays the average prediction accuracy of all algorithms (blue). Error bars = ±SD. The prediction accuracies of the best‐performing algorithms are plotted in green, while the prediction accuracies of the worst‐performing algorithms are plotted in red.
To provide sufficient information for training a ML classifier, the data were augmented through random sampling, vastly extending the number of new datasets with similar characteristics. We then tested the trained classifiers on a combination of fully unseen data and the testing sets generated before augmentation. A full summary of the prediction accuracies achieved (and listing the best performing classifiers) is provided in Fig 2B with the best performing algorithm correctly identifying sample datasets with 92% accuracy (Fig 2C). We note that the GX samples were excluded due to the sample number being too low for classifier training.
For further verification, the classifiers were retrained on additional samples (11 controls, 7 HS patients and 8 HE patients) obtained on a second ARCA device in an independent laboratory and using different acquisition settings. Again, we observed increasing classification accuracy up to the use of six training datasets (at which point the classifier likely overfits these data), as per Fig 2D, achieving a final prediction accuracy for multiclass classification that is comparable to that offered by osmotic gradient ektacytometry when classifying HS samples alone. 14 Importantly, the best performing algorithms utilized here achieve complete differentiation between controls and diseased patients and accurately identify a variety of disorders potentially allowing for the rapid preliminary identification or discrimination of more elusive diseases 15 (such as CDAII and PKD) without time‐consuming laboratory assays or molecular testing methods. Furthermore, the possibility to continuously incorporate data from new samples or the expansion with haematological conditions beyond those characterised in this study may ultimately allow for diagnosing a large number of samples in a relatively short period using minimal sample volumes. In conclusion, the method described in this work represents an exciting step forward towards facilitating the improved diagnosis of haemolytic anaemias.
Funding information
PLM was funded by the European Union (H2020‐MSCAITN‐2015, “RELEVANCE”, Grant agreement number 675117). MAER is supported by Eurostars grant estar18105 and by an unrestricted grant provided by RR Mechatronics. PB was funded by the Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Grant no. 2019 175/02, 2019. AMT and TJS were funded by an NHS Blood and Transplant (NHSBT) R&D grant (WP15‐05) and the National Institute for Health Research Blood and Transplant Research Unit (NIHR BTRU) in Red Cell Products (IS‐BTU‐1214‐10032). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
Author contributions
PLM acquired data, prepared figures and developed the Python code for dataset analysis and classification. JGGD and GJS provided essential ARCA equipment and image analysis software. MAER and RvW diagnosed HX patients and provided blood samples. MV and RvZ diagnosed HS and HE patients and performed initial ARCA analysis. EF and PB diagnosed HS, CDAII, HX and PKD patients and provided blood samples. PLM, AMT and TJS conceived and designed experiments and wrote the manuscript. TJS and AMT contributed equally to conception and supervision of the work. All authors read and edited the manuscript.
Conflicts of interest
The authors declare no competing financial interests.
Acknowledgements
The authors would like to thank the donors, patients and their family members for their willingness to participate in this research. The authors thank Mr. Ario Sadafi (Technische Universität München, Munich, Germany) for helpful discussions regarding feature extraction for machine‐learning algorithm development.
Contributor Information
Ashley M. Toye, Email: ash.m.toye@bristol.ac.uk.
Timothy J. Satchwell, Email: t.satchwell@bristol.ac.uk.
Data Availability Statement
All raw ARCA datasets obtained during this study, Python scripts generated for dataset analysis, classifier training and sample classification and the confusion matrices generated for classifier evaluation have been made publicly available through the following Github repository: https://github.com/pedrolmoura/ARCA‐ML.
References
- 1. Ucar K. Clinical presentation and management of hemolytic anemias. Oncology (Williston Park). 2002;16:163–70. [PubMed] [Google Scholar]
- 2. Risinger M, Emberesh M, Kalfa TA. Rare hereditary hemolytic anemias: diagnostic approach and considerations in management. Hematol Oncol Clin North Am. 2019;33:373–92. [DOI] [PubMed] [Google Scholar]
- 3. Kim Y, Park J, Kim M. Diagnostic approaches for inherited hemolytic anemia in the genetic era. Blood Res. 2017;52:84–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Glogowska E, Schneider ER, Maksimova Y, Schulz VP, Lezon‐Geyda K, Wu J, et al. Novel mechanisms of PIEZO1 dysfunction in hereditary xerocytosis. Blood. 2017;130:1845–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Di Resta C, Galbiati S, Carrera P, Ferrari M. Next‐generation sequencing approach for the diagnosis of human diseases: open challenges and new opportunities. EJIFCC. 2018;29:4–14. [PMC free article] [PubMed] [Google Scholar]
- 6. Da Costa L, Suner L, Galimand J, Bonnel A, Pascreau T, Couque N, et al., Society of, H., Pediatric Immunology, g. & French Society of, H . Diagnostic tool for red blood cell membrane disorders: Assessment of a new generation ektacytometer. Blood Cells Mol Dis. 2016;56:9–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Johnson RM, Ravindranath Y. Osmotic scan ektacytometry in clinical diagnosis. J Pediatr Hematol Oncol. 1996;18:122–9. [DOI] [PubMed] [Google Scholar]
- 8. Nichols JA, Herbert Chan HW, Baker MAB. Machine learning: applications of artificial intelligence to imaging and diagnosis. Biophys Rev. 2019;11:111–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Moura PL, Hawley BR, Mankelow TJ, Griffiths RE, Dobbe JGG, Streekstra GJ, et al. Non‐muscle myosin II drives vesicle loss during human reticulocyte maturation. Haematologica. 2018;103:1997–2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit‐learn: machine learning in python. J Machine Learn Res. 2011;12:2825–30. [Google Scholar]
- 11. Moura PL, Hawley BR, Dobbe JGG, Streekstra GJ, Rab MAE, Bianchi P, et al. PIEZO1 gain‐of‐function mutations delay reticulocyte maturation in hereditary xerocytosis. Haematologica. 2020;105: 6:e268–e271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Danise P, Amendola G, Nobili B, Perrotta S, Miraglia Del Giudice E, Matarese SM, et al. Flow‐cytometric analysis of erythrocytes and reticulocytes in congenital dyserythropoietic anaemia type II (CDA II): value in differential diagnosis with hereditary spherocytosis. Clin Lab Haematol. 2001;23:7–13. [DOI] [PubMed] [Google Scholar]
- 13. Fermo E, Vercellati C, Marcello AP, Zaninoni A, van Wijk R, Mirra N, et al. Hereditary Xerocytosis due to Mutations in PIEZO1 Gene Associated with Heterozygous Pyruvate Kinase Deficiency and Beta‐Thalassemia Trait in Two Unrelated Families. Case Rep Hematol. 2017;2017:2769570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Llaudet‐Planas E, Vives‐Corrons JL, Rizzuto V, Gomez‐Ramirez P, Sevilla Navarro J, Coll Sibina MT, et al. Osmotic gradient ektacytometry: A valuable screening test for hereditary spherocytosis and other red blood cell membrane disorders. Int J Lab Hematol. 2018;40:94–102. [DOI] [PubMed] [Google Scholar]
- 15. Zaninoni A, Fermo E, Vercellati C, Consonni D, Marcello AP, Zanella A, et al. Use of Laser Assisted Optical Rotational Cell Analyzer (LoRRca MaxSis) in the Diagnosis of RBC Membrane Disorders, Enzyme Defects, and Congenital Dyserythropoietic Anemias: A Monocentric Study on 202 Patients. Front Physiol. 2018;9:451. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All raw ARCA datasets obtained during this study, Python scripts generated for dataset analysis, classifier training and sample classification and the confusion matrices generated for classifier evaluation have been made publicly available through the following Github repository: https://github.com/pedrolmoura/ARCA‐ML.
