Skip to main content
Indian Journal of Hematology & Blood Transfusion logoLink to Indian Journal of Hematology & Blood Transfusion
. 2020 Oct 27;37(3):453–457. doi: 10.1007/s12288-020-01373-x

Role of Red Cell Indices in Screening for Beta Thalassemia Trait: an Assessment of the Individual Indices and Application of Machine Learning Algorithm

Aarzoo Jahan 1, Garima Singh 1, Ruchika Gupta 2, Namrata Sarin 1, Sompal Singh 1,
PMCID: PMC8239087  PMID: 34267466

Abstract

Antenatal screening for beta thalassemia trait (BTT) followed by counseling of couples is an efficient way of thalassemia control. Since high performance liquid chromatography (HPLC) is costly, other cost-effective screening methods need to be devised for this purpose. The present study was aimed at evaluating the utility of red cell indices and machine learning algorithms including an artificial neural network (ANN) in detection of BTT among antenatal women. This cross-sectional study included all antenatal women undergoing thalassemia screening at a tertiary care hospital. Complete blood count followed by HPLC was performed. Receiver operating characteristic (ROC) curve analysis was performed for obtaining optimal cutoff for each of the indices with determination of test characteristics for detection of BTT. Machine learning algorithms including C4.5 and Naïve Bayes (NB) classifier and a back-propagation type ANN including the red cell indices was designed and tested. Over a period of 15 months, 3947 patients underwent thalassemia screening. BTT was diagnosed in 5.98% of women on the basis of HPLC. ROC analysis yielded the maximum accuracy of 63.8%, sensitivity and specificity of 66.2% and 63.7%, respectively for Mean corpuscular hemoglobin concentration (MCHC). The C4.5 and NB classifier had accuracy of 88.56%–82.49% respectively while ANN had an overall accuracy of 85.95%, sensitivity of 83.81%, and specificity of 88.10% in detection of BTT. The present study highlights that none of the red cell parameters standalone is useful for screening for BTT. However, ANN with combination of all the red cell indices had an appreciable sensitivity and specificity for this purpose. Further refinements of the neural network can provide an appropriate tool for use in peripheral settings for thalassemia screening.

Electronic supplementary material

The online version of this article (10.1007/s12288-020-01373-x) contains supplementary material, which is available to authorized users.

Keywords: Red cell indices, Beta thalassemia trait, Machine learning algorithm

Introduction

Anemia is a major health problem which affects around 24.8% of the world population [1]. Thalassemias, an important cause of anemia, are the commonest genetic disorders of hemoglobin characterized by absent or decreased synthesis of one or more type of the constituent polypeptide chains leading to reduction in levels of normal hemoglobin [2, 3]. The spectrum of thalassemia varies from asymptomatic carrier to thalassemia major that requires regular blood transfusions with extensive medical care and leads to lower quality of life and life expectancy as compared to the general population [4].The severity of this disease along with the attendant high cost and high prevalence in certain regions of India justify the role of thalassemia screening. The screening activities focus on identification of couples at risk of giving birth to children with thalassemia major, i e detecting the asymptomatic carriers among pregnant women during their antenatal checkups. The prevalence of beta-thalassemia trait (BTT) varies from 1 to 17% of population in different regions of India, with the average rate of 3.3% [5]. The gold standard for diagnosis of BTT, high performance liquid chromatography (HPLC), is expensive with limited availability [6]. Hence, algorithms to utilize combination of red cell indices obtained from a hematology cell counter are being continually explored for screening of BTT.

Machine learning algorithms offer a wide range of methods for pattern recognition in the medical data in order to classify the observations into meaningful diagnostic or therapeutic categories [7]. Various techniques such as support vector machine, multilayer perceptron, C4.5 decision tree, Naïve Bayes (NB) classifier and artificial neural network (ANN) have been attempted for differentiation of iron deficiency anemia (IDA) and BTT [8, 9]. However, majority of such studies included cases diagnosed as IDA or BTT using iron studies and HPLC or hemoglobin electrophoresis, respectively. The real-life situations of BTT screening usually portend evaluating individuals without the knowledge of confirmatory tests for other causes of anemia. A recent study developed scoring system using red cell indices for detection of BTT and demonstrated a sensitivity of 100% with specificity of 79.25%–91.74%. The authors utilized C4.5 and NB classifiers which showed an accuracy of 95.27%–93.83% respectively [7]. In this study, though IDA was included in the normal category, other co-existing disorders like megaloblastic anemia were not accounted for.

The present study was designed to determine the appropriateness of various RBC indices as screening tests for thalassemia trait in pregnant women using receptor-operator curve (ROC) analysis. At the same time, machine learning algorithms using RBC indices in form of C4.5 classifier, Naïve Bayes classifier and an artificial neural network were also developed for detecting BTT patients during screening. Since the iron, B12 and folate status of an individual undergoing BTT screening is not known in many instances, the present study included all antenatal samples irrespective of the values of these parameters.

Materials and Methods

The present study was a retrospective cross-sectional analysis of samples of pregnant women who underwent thalassemia screening at our institute over a period of 15 months from January 2019 to March 2020. The data comprised of complete blood count (CBC) and hemoglobin HPLC reports of each sample. The CBC count was performed using a 5-part automated differential cell counter (XT2000i, Sysmex Corporation, Japan) and HPLC was done using beta Thalassemia VARIANT II equipment (Bio-Rad Laboratories Inc, USA) as per manufacturer’s instructions.

All samples with HbA2 levels ≥ 3.5% and < 7.1% were labeled as BTT, and those with HbA2 levels < 3.5% were labeled as non-BTT study subjects. Samples with HPLC reports of other β ð globin chain variants or those with HbA2 levels > 7.1% were excluded from further analysis.

Receiver operating characteristic (ROC) curve analysis was performed to obtain the area under the curve (AUC) for each of the red cell parameter of CBC. The point of the curve where sensitivity and specificity were almost equal was considered as the cut off (optimal cutoff) of a particular CBC parameter. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy were noted at the optimal cut off.

For machine learning algorithms, 210 non-BTT subjects and 210 BTT subjects were included. An equal number of subjects were taken to reduce the bias. C4.5 decision tree and Naïve Bayes’ (NB) classifier was applied on this data. A back-propagation neural network with seven input layer neurons (hemoglobin, RBC count, MCV, MCH, MCHC, PCV and RDW-CV), two middle layer neurons (with 6 and 3 neurons respectively) and output layer of two neurons was constructed. The diagnostic role of C4.5 decision tree, NB classifier and artificial neural network was also assessed for detection of thalassemia trait patients. All statistical procedures were performed using R and R Studio [10, 11].

The nature of distribution of the red cell indices was tested using Shapiro Wilk test. The variables that did not show a normal distribution were analyzed using non parametric tests. A p value of < 0.05 was taken as statistically significant.

Results

Over the period of 15 months (January 2019–March 2020), 3947 blood samples of antenatal women were received for beta-thalassemia screening. Majority (3664 cases, 92.83%) of the cases had normal HPLC findings. BTT was diagnosed in 236 cases (5.98%). Twenty-two cases (0.56%) were heterozygous for HbE, one was homozygous HbE and 16 cases (0.41%) were heterozygous for HbD Punjab. Eight cases (0.20%) were double heterozygous for beta ð thalassemia and HbE. These 47 cases were excluded from further analysis.

The comparative values of red cell parameters, HPLC of BTT and non-BTT subjects have been tabulated in Supplementary Table 1 The ROC curve characteristics of the red cell parameters along with their respective cut-offs and sensitivity, specificity, positive predictive and negative predictive values are included in Supplementary file.

C4.5 Classifier and NB Classifier

The C4.5 classifier correctly classified 88.56% instances of BTT and non-BTT subjects while NB classifier could accurately classify 82.49% cases.

Artificial Neural Network

The training of neural network required 95,625 iteration steps. The results are shown in Fig. 1. The network had an overall accuracy of 85.95% (95% CI 82.26–89.13%), sensitivity of 83.81%, specificity of 88.10%, PPV of 87.56% and NPV of 84.47%.

Fig. 1.

Fig. 1

Structure of the trained artificial neural network. Neurons on the extreme right side (1 and 0) are output neurons

Discussion

ð thalassemia is a highly prevalent autosomal recessive disease leading to health problems for the affected individual as well as socioeconomic burden on the society [12]. The present retrospective study at a tertiary level hospital in Delhi found ð as the most common hemoglobinopathy, comprising 5.98% of all antenatal women undergoing thalassemia screening. Our results are similar to the previous studies showing a prevalence of ð ranging from 4.6 to 8.9% in India [1315]. The age range of our study population was similar to a previous study from India done by Baliyan et al. reporting a mean age of 25 years [16]. The median values of various red cell indices were found to be similar to those previously reported in the literature (Supplementary file).

Given the importance of BTT detection and the low sensitivity and specificity of red cell indices in differentiation of BTT and non-BTT subjects, interest was generated in utilizing machine learning algorithms to this end. In a study by Ayyildiz et al., the sensitivity and specificity of machine learning algorithms was more than 90% [17]. Amendolia et al., in 2003, used support vector machine (SVM) in the first layer and K-nearest neighbour in the next layer to diagnose BTT and normal subjects. However, there were only 27 BTT identified in this study and hence, the numbers of BTT in the training and test set were too low. The authors also excluded students with IDA from their training and test set [8]. Barnhart-Magen et al. devised a large number of ANNs and selected the best of these for differentiation of BTT from controls, myelodysplastic syndrome and IDA. Using three red cell parameters gave a sensitivity of 90.2% while ANN using six parameters had a sensitivity of 89.7% [18]. However, this study also included IDA as a separate group rather than with non-BTT subjects. A recent study by Das et al. from India used C4.5, NB classifier and ANN for BTT detection. In their study with normal, BTT, IDA and BTT and IDA patients, C4.5 and NB classifiers gave an accuracy of 95.27%–93.83%, respectively [7]. In the present study, the accuracy of C4.5 and NB classifiers was 88.56%–82.49% respectively. The lower results in our study is because of the composition of the non-BTT group. Considering the real-life situation of BTT screening being done in individuals for whom other etiologies of anemia may not have been ruled out, we did not exclude the IDA, myelodysplastic syndromes or other such disorders from our dataset. We also analyzed the RBC parameters using a back-propagation ANN algorithm and obtained a sensitivity and specificity of 83.81%–88.10% respectively. There have been a number of studies for ANN-based models for differentiation of iron deficiency anemia and BTT in various study groups [19, 20]. A study including couples undergoing premarital testing developed a back-propagation algorithm with a sensitivity of 92%, specificity of 94%, and an accuracy of 93.9% [21]. However, the proportion of individuals with BTT has not been mentioned by the authors in their report. Antenatal screening for thalassemia is being undertaken in our country, and hence we developed an ANN using the data from antenatal patients at our hospital. In the present study, ANN had a good sensitivity and specificity that was almost comparable with machine learning algorithms analyzed in the previous studies [17, 22].

In conclusion, the present study reiterates the significant differences in CBC parameters between beta-thalassemia trait and normal subjects. However, none of the CBC parameter by itself was found suitable for identification of BTT with an appreciable sensitivity and specificity. Machine learning algorithms, such as C4.5 classifier and artificial neural network including all the CBC parameters was found to have a good sensitivity and specificity in detecting BTT. Further work on this aspect may provide the clinicians and laboratory personnel with a useful tool to screen-detect BTT and help in reducing the burden of thalassemia disease.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Funding

The study did not receive any funding from public or private agencies.

Compliance with Ethical Standards

Conflict of interest

The authors confirm that there are no conflicts of interests to declare.

Ethics Approval

Ethical approval was waived by the local Ethics Committee of University A in view of the retrospective nature of the study and all the procedures being performed were part of the routine care.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kaur K. Anemia ‘a silent killer’ among women in India: present scenario. Eur J Zool Res. 2014;3:32–36. [Google Scholar]
  • 2.Weatherall DJ, Clegg JB. Thalassemia-a global public health problem. Nat Med. 1996;2:847–849. doi: 10.1038/nm0896-847. [DOI] [PubMed] [Google Scholar]
  • 3.Firkin F, Chesterman C, Penington D, Rush B, (eds.) de Gruchy’s 1996 Clinical Haematology in Medical Practice, 5th edition Springer, New York, 137–171
  • 4.Koren A, Profeta L, Zalman L, et al. Prevention of beta Thalassemia in Northern Israel a cost-benefit analysis. Mediterr J Hematol Infect Dis. 2014;6:e2014012. doi: 10.4084/mjhid.2014.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Madan N, Sharma S, Sood SK, Colah R, Bhatia LH. Frequency of β-thalassemia trait and other hemoglobinopathies in northern and western India. Indian J Hum Genet. 2010;16(1):16–25. doi: 10.4103/0971-6866.64941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martin A, Thompson AA. Thalassemias. Pediatr Clin North Am. 2013;60:1383–1391. doi: 10.1016/j.pcl.2013.08.008. [DOI] [PubMed] [Google Scholar]
  • 7.Das R, Datta S, Kaviraj A, et al. A decision support scheme for beta thalassemia and HbE carrier screening. J Adv Res. 2020;24(1):183–190. doi: 10.1016/j.jare.2020.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Amendolia SR, Brunetti A, Carta P, et al. A real-time classification system of thalassemic pathologies based on artificial neural networks. Med Decis Making. 2002;22:18–26. doi: 10.1177/0272989X0202200102. [DOI] [PubMed] [Google Scholar]
  • 9.Setsirichok D, Piroonratana T, Wongseree W, et al. Classification of complete blood count and haemoglobin typing data by a C4. 5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassemia screening. Biomed Signal Process Control. 2012;7:202–212. doi: 10.1016/j.bspc.2011.03.007. [DOI] [Google Scholar]
  • 10.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2013. URL https://www.R-project.org/. Accessed on July 02, 2020.
  • 11.RStudio Team. RStudio: Integrated Development for R. RStudio, Inc., Boston, MA 2015.URL https://www.rstudio.com/. Accessed on July 02, 2020.
  • 12.Mendiratta SL, Bajaj S, Popli S, Singh S. Screening of women in the antenatal period for thalassemia carrier status: comparison of NESTROFT, red cell indices, and HPLC analysis. J Fetal Med. 2015;2:21–25. doi: 10.1007/s40556-015-0036-0. [DOI] [Google Scholar]
  • 13.Sachdev R, Dam AR, Tyagi G. Detection of Hb variants and hemoglobinopathies in Indian population using HPLC: report of 2600 cases. Indian J Pathol Microbiol. 2010;53:57–62. doi: 10.4103/0377-4929.59185. [DOI] [PubMed] [Google Scholar]
  • 14.Khera R, Singh T, Khurana N, Gupta N, Dubey AP. HPLC in characterization of hemoglobin profile in thalassemia syndromes and hemoglobinopathies: a clinic-hematological correlation. Indian J Hematol Blood Transfus. 2015;31:110–115. doi: 10.1007/s12288-014-0409-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mondal SK, Mandal S. Prevalence of thalassemia and hemoglobinopathy in eastern India: a 10-years high-performance liquid chromatography study of 119,336 cases. Asian J Transfus Sci. 2016;10:105–110. doi: 10.4103/0973-6247.175424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Baliyan M, Kumar M, Nangia A, Parakh N. Can RBC indices be used as screening test for beta thalassemia in Indian antenatal women? J Obstet Gynecol India. 2019;69:495–500. doi: 10.1007/s13224-019-01220-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ayyıldız H, Tuncer SA. Determination of the effect of red blood cell parameters in the discrimination of iron deficiency anemia and beta thalassemia via neighborhood component analysis feature selection-based machine learning. Chemometr Intell Lab Syst. 2020;196:103886. doi: 10.1016/j.chemolab.2019.103886. [DOI] [Google Scholar]
  • 18.Barnhart-Magen G, Gotlib V, Marilus R, Einav Y. Differential diagnostics of thalassemia minor by artificial neural networks model. J Clin Lab Anal. 2013;27:481–486. doi: 10.1002/jcla.21631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Laengsri V, Shoombuatong W, Adirojananon W, Nantasenamat C, Prachayasittikul V, Nuchnoi P. ThalPred: a web-based prediction tool for discriminating thalassemia trait and iron deficiency anemia. BMC Med Inf DecisMak. 2019;19:212. doi: 10.1186/s12911-019-0929-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kabootarizadeh L, Jamshidnezhad A, Koohmareh Z. Differential diagnosis of iron-deficiency anemia from β-thalassemia trait using an intelligent model in comparison with discriminant indexes. Acta Inf Med. 2019;27:78–84. doi: 10.5455/aim.2019.27.78-84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hosseini Eshpala R, Langarizadeh M, Kamkar Haghighi M, Banafsheh T. Designing an expert system for differential diagnosis of β-thalassemia minor and iron-deficiency anemia using neural network. Horm Med J. 2016;20:1–9. [Google Scholar]
  • 22.Bordbar E, Taghipour M, Zucconi BE. Reliability of different rbc indices and formulas in discriminating between β- thalassemia minor and other microcytic hypochromic cases. Mediterr J Hematol Infect Dis. 2015;7:e2015022. doi: 10.4084/mjhid.2015.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Indian Journal of Hematology & Blood Transfusion are provided here courtesy of Springer

RESOURCES