Abstract
Objective
The children with Henoch-Schönlein purpura (HSP) may suffer from renal insufficiency, which seriously affects the life and health of the children. This study aims to construct a prediction model of Henoch-Schönlein purpura nephritis (HSPN).
Methods
A total of 240 children with HSP treated in dermatology and pediatrics in our hospital were selected. The general information, patients' clinical symptoms, and laboratory examination indicators were collected for feature selection, and the XGBoost algorithm prediction model was built.
Results
According to the input feature indexes, the top ten crucial feature indicators output by the XGBoost model were urine N-acetyl-β-D-aminoglucosidase, urinary retinol-binding protein, IgA, age, recurrence of purpura, purpura area, abdominal pain, 24-h urinary protein quantification, percentage of neutrophils, and serum albumin. The areas under the curves of the training set (0.895, 95% CI: 0.827-0.963) and test set (0.870, 95% CI: 0.799-0.941) models were similar.
Conclusion
The prediction model based on XGBoost is used to predict HSP renal damage based on clinical data of children, which can reduce the harm caused by invasive examination for patients.
1. Introduction
Henoch-Schönlein purpura (HSP) is one of the most common systemic vasculitides in childhood, a common vascular allergic disease. It mainly affects the skin, kidneys, intestines, joints, and other body parts [1]. In recent years, the number of children with HSP has increased significantly, and some studies show that the annual incidence of HSP is 160-191 cases per million children [2]. HSP points to the body receiving the stimulation of all kinds of sensitizing material, bringing about capillary brittleness and permeability enhancement inside the body, and causing inflammation or bleeding in the place such as skin, joints, and bowel [1, 3]. Clinical features of Henoch-Schönlein purpura nephritis (HSPN) were the fibrosis in patients with renal fibrosis [4]. HSPN is the most common secondary glomerular disease in children [5, 6].
Epidemiology suggests that HSP patients develop HSPN at a rate as high as 30% to 50% [7]. Although most HSPN patients have a good prognosis, 1%-3% of the children still suffer from renal insufficiency to end-stage renal failure, which seriously affects the life and health of the children [8]. Therefore, an early and accurate diagnosis of HSPN is crucial for prognosis and individualized treatment. Kidney biopsy is the gold standard for the diagnosis of HSPN. Still, this method is invasive and difficult for parents and children to accept, leading to some patients with severe kidney disease at the time of diagnosis [9].
Machine learning can use clinical data to build a prediction model and verify its predictive efficiency [10–12]. In recent years, the application in the medical field has been increasing gradually [13–16]. To our knowledge, there are few studies on machine learning to predict HSPN. Therefore, this paper mainly constructed a prediction model based on machine learning to predict the occurrence of renal damage in HSP through clinical data, providing a new method for the efficient diagnosis of HSPN in children diagnosed with HSP for the first time in dermatology.
2. Methods
2.1. General Information
A total of 240 children with HSP treated in dermatology and pediatrics in our hospital from October 2019 to December 2021 were selected, of which 153 were complicated with HSPN. According to the European Union Against Rheumatism [17], HSP is diagnosed as a palpable rash (essential) with at least one of four clinical symptoms: Abdominal pain, arthritis or arthralgia, renal involvement, and histopathological findings suggest IgA deposition. Renal impairment was predominantly clinical: abnormalities in hematuria, proteinuria, and renal function, such as increased serum creatinine (SCr) and decreased estimated glomerular filtration rate (eGFR), within 6 months of the course of HSP. The calculation formula of eGFR is as follows: ≤16 years old using Schwartz formula [18]; CKD-EPI formula was used when >16 years old [19]. When the eGFR <90 ml/(min·1.73 m2), it is considered as renal insufficiency.
2.2. Predicted Index
The indicators tested in this study mainly include general information, clinical symptoms, and laboratory indicators. General information includes sex, age, and season of onset. Clinical signs and symptoms include joint swelling, abdominal pain and gastrointestinal bleeding, purpura of the upper body skin, and recurrence of purpura. Laboratory indicators include blood routine tests, urine routine tests, and biochemical tests.
2.3. Machine Learning
The machine learning used in this study is the integrated machine learning XGBoost algorithm based on a classification and regression tree [20]. XGBoost algorithm has high scalability and high computing speed. Under the same environment and conditions, the XGBoost algorithm is more than 10 times faster than similar algorithms [21]. The specific detection process is shown in Figure 1.
Figure 1.

Flowchart of XGBoost detection.
XGBoost is an ensemble learning algorithm based on gradient boosting. Its principle is to achieve an accurate classification effect through the iterative calculation of a weak classifier [22]. It is an additive expression consisting of K base models:
| (1) |
where ft is k basis models and is the predicted value of the ith sample.
The model's deviation and variance jointly determine the model's prediction accuracy, and the variation of the model is embodied as the loss function. Therefore, the objective function is composed of the model's loss function and the regular term Ω that inhibits the complexity of the model. Thus, the objective function can be expressed as
| (2) |
According to the calculation method of the Taylor formula, the above objective function can be written as
| (3) |
The CRT is defined as ft = wq(x), x is a certain text, q (x) represents the leaf node where the sample is located, and wq represents the value of the leaf node w. Therefore, wq(x) represents the value of w of each sample (i.e., the predicted value). The regular term of the objective function can be defined as
| (4) |
Gradient enhancement generates a series of CRTs in the training process. The corresponding value of the leaf node of the CRT is an actual score, and the cumulative score of each CRT is the final predicted value. We test the accuracy of the algorithm using the 5-fold crossover method. The data set was divided into five parts, 4 of which were taken as the training set and the other as the test set. The accuracy of each experiment was obtained, and the average accuracy of the 5 results was taken as the estimation of the algorithm's accuracy. The specific modeling process is shown in Figure 2.
Figure 2.

Modeling flowchart of XGBoost algorithm.
2.4. Statistical Analysis
Counting data were counted by χ2 test. Measurement data were expressed by mean ± standard deviation and t test was adopted. P < 0.05 means the difference is statistically significant.
3. Results
3.1. General Information
Among the 240 children with HSP, there were 126 males and 114 females. The onset age was 2-16 years old, with an average age of 9.03 ± 2.68 years old. Among them, 62 cases (25.8%) occurred in winter, 102 cases (42.5%) had joint swelling and pain, 128 cases (53.3%) had abdominal pain and gastrointestinal bleeding, and 38 cases (15.8%) had upper body skin purpura. There were 153 cases of HSPN children and 87 cases without renal damage. There was no significant difference between the training group and the test group in gender, onset season, joint swelling and pain, abdominal pain, purpura of the upper body, and HSPN (Figure 3, P > 0.05).
Figure 3.

Comparison of clinical data between the training set and test set. There were no significant differences between training set and test set in gender (a), painful swelling of joint (b), season of onset (c), abdominal pain (d), purpura of upper body (e), and HSPN (f).
3.2. Selection of Predictive Features
According to the statistical analysis results, gender is an insignificant factor in predicting the occurrence of HSPN in general information. The indicators of clinical symptoms are all statistically significant (P < 0.05) (Table 1).
Table 1.
Test results of general information and clinical symptom indexes.
| Indexes | χ 2 | P |
|---|---|---|
| General information | ||
| Gender | 0.038 | 0.758 |
| Age | 385.874 | <0.001 |
| Season of onset | 8.983 | 0.009 |
| Clinical symptom | ||
| Abdominal pain | 4.213 | 0.023 |
| Painful swelling of joint | 7.896 | 0.014 |
| Purpura area | 238.705 | <0.001 |
| Recurrence of purpura | 7.942 | 0.012 |
The correlation of HSPN occurrence was predicted according to laboratory indexes such as biochemical tests, among which there was no significant difference in HSPN in platelet count, C-reactive protein, total cholesterol, IgM, and D-dimer (P > 0.05) (Table 2). However, the other indexes including white cell count, percentage of neutrophils, percentage of eosinophil, serum albumin, serum creatinine, IgG, IgA, IgE, hospitalization time, rinary retinol-binding protein (RBP), urine N-acetyl-β-D-aminoglucosidase (NAG), and 24-h urinary protein quantification were significantly correlated with HSPN (P < 0.05).
Table 2.
Test results of biochemical laboratory indexes.
| Indexes | t | P |
|---|---|---|
| White cell count | -4.763 | 0.019 |
| Percentage of neutrophils | 1.684 | 0.041 |
| Percentage of eosinophil | 3.980 | 0.027 |
| Platelet count | -1.415 | 0.185 |
| C-reactive protein | 0.970 | 0.384 |
| Serum albumin | -4.542 | 0.021 |
| Serum creatinine | -3.425 | 0.032 |
| Total cholesterol | -0.468 | 0.571 |
| D-dimer | -1.538 | 0.214 |
| IgG | -3.978 | 0.030 |
| IgA | -2.342 | 0.034 |
| IgM | 0.795 | 0.436 |
| IgE | 3.012 | 0.028 |
| Hospitalization time | 2.031 | 0.037 |
| Urinary retinol-binding protein | -5.784 | 0.014 |
| Urine N-acetyl-β-D-aminoglucosidase | 3.869 | 0.028 |
| 24 h urinary protein quantification | 4.825 | 0.024 |
3.3. Prediction Results of XGBoost Algorithm Model
The XGBoost model automatically calculates features. According to the input feature indexes, the top ten important feature indicators output by the XGBoost model are as follows (Figure 4): NAG, RBP, IgA, age, recurrence of purpura, purpura area, abdominal pain, 24-h urinary protein quantification, percentage of neutrophils, and serum albumin.
Figure 4.

Ranking of important features of XGBoost model output.
3.4. Performance Evaluation of Model Prediction
In the training set, the area under the curve of the XGBoost model was 0.895 (95% CI: 0.827-0.963). In the test set, the area under the curve of the model was 0.870 (95% CI: 0.799-0.941). The XGBoost prediction model has good sensitivity and specificity. The receiver operation characteristic curves of XGBoost algorithm model is shown in Figure 5.
Figure 5.

The receiver operation characteristic curves of XGBoost algorithm model. (a) Training set. (b) Test set.
4. Discussion
HSP is a kind of systemic vasculitis, which mainly involves the skin, joints, gastrointestinal tract, capillaries, and small blood vessels of the kidney, accompanied by significant deposition of IgA [23]. Clinically, it is more common in children. It has been reported that more than 90% of HSPN occurs in children and adolescents, accounting for the first place in children with secondary nephropathy. Kidney biopsy is invasive and difficult for parents and children to accept [9]. Therefore, we predicted the incidence of HSPN from clinical data, clinical symptoms, and laboratory test indicators based on the XGBoost prediction model.
The XGBoost model can automatically obtain the importance score of each attribute, thus effectively filtering features. Our study screened children for general information, clinical symptoms, and laboratory test indicators. The top 3 indicators based on the XGBoost model are NAG, RBP, and IGA. Our results are consistent with Karadag et al. [24], who believe that vascular endothelial injury was an essential link in the pathogenesis of HSP. The possible reasons are as follows: (1) The permeability of the tube wall increased due to allergic reaction, and extravasation increased the concentration and slowed the blood flow. In a high viscosity state, immune complexes were more likely to deposit, further damaging the vascular endothelium and increasing the chances of platelet counting adhesion and self-aggregation. (2) Inflammatory reaction damages vascular endothelium. The damaged vascular endothelium enhances the coagulation promoting effect, stimulates the release of platelet count activating factor, and further promotes the activation and adhesion of platelet count.
Serum IgA is the main component of the body's mucosal defense system. It is widely distributed in milk, saliva, and mucosal secretions of the gastrointestinal tract, respiratory tract, and urogenital tract. Therefore, it plays a vital role in the first line of defense against infection, especially in the respiratory tract and intestinal tract. This is also an essential indicator in the prediction model of this paper. NAG is a lysosomal enzyme that occurs in the urinary system and is usually found in very low levels of urine. When the tubular cells are damaged, many NAG is released from the tubular epithelial cells into the urine, where NAG levels are elevated. RBP is the third important feature in the prediction model in our study. Liu et al. believed that the RBP has an important predictive value for delayed renal involvement in children with HSP [25].
In this study, the prediction model constructed based on the XGBoost algorithm can effectively reduce the overfitting problem and automatically specify the default branch direction for missing values, thus improving the algorithm's efficiency [26, 27]. Therefore, this provides more possibilities for the extensive application of the model. In addition, the areas under the curves of the training set (0.895, 95% CI: 0.827-0.963) and test set (0.870, 95% CI: 0.799-0.941) models are similar and have good sensitivity and specificity. Thus, the prediction model based on XGBoost can provide a new method for diagnosing HSPN in children diagnosed with HSP for the first time in dermatology.
There are several limitations to our study. It was a single-center retrospective study with a small sample size and no external validation. Secondly, due to the limitation of data sources, although this study included many predictive variables for screening, it was still not comprehensive. There may be potential predictive variables that were not included. In addition, this will further limit the advantage of the XGBoost algorithm. The next study will increase the sample size and expand the prediction index.
5. Conclusion
Based on the XGBoost prediction model, we can preliminarily predict HSP renal damage according to clinical test data in dermatological outpatient work. This can reduce the harm caused by invasive examination of children. It provides a new idea for the prognosis of children with Henoch-Schönlein purpura in the first diagnosis of dermatology. In future work, we will improve the shortcomings, starting from clinical needs, to better serve the clinical application.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflict of interest.
References
- 1.Hetland L. E., Susrud K. S., Lindahl K. H., Bygum A. Henoch-Schönlein purpura: a literature review. Acta Dermato-Venereologica . 2017;97(10):1160–1166. doi: 10.2340/00015555-2733. [DOI] [PubMed] [Google Scholar]
- 2.Mossberg M., Segelmark M., Kahn R., Englund M., Mohammad A. J. Epidemiology of primary systemic vasculitis in children: a population-based study from southern Sweden. Scandinavian Journal of Rheumatology . 2018;47(4):295–302. doi: 10.1080/03009742.2017.1412497. [DOI] [PubMed] [Google Scholar]
- 3.Audemard-Verger A., Pillebout E., Guillevin L., Thervet E., Terrier B. IgA vasculitis (Henoch-Shonlein purpura) in adults: diagnostic and therapeutic aspects. Autoimmunity Reviews . 2015;14(7):579–585. doi: 10.1016/j.autrev.2015.02.003. [DOI] [PubMed] [Google Scholar]
- 4.Leung A. K. C., Barankin B., Leong K. F. Henoch-Schönlein purpura in children: an updated review. Current Pediatric Reviews . 2020;16(4):265–276. doi: 10.2174/1573396316666200508104708. [DOI] [PubMed] [Google Scholar]
- 5.Nie S., He W., Huang T., et al. The spectrum of biopsy-proven glomerular diseases among children in China. Clinical Journal of the American Society of Nephrology . 2018;13(7):1047–1054. doi: 10.2215/CJN.11461017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.López-Mejías R., Castañeda S., Genre F., et al. Genetics of immunoglobulin-A vasculitis (Henoch-Schonlein purpura): an updated review. Autoimmunity Reviews . 2018;17(3):301–315. doi: 10.1016/j.autrev.2017.11.024. [DOI] [PubMed] [Google Scholar]
- 7.Audemard‐Verger A., Terrier B., Dechartres A., et al. Characteristics and management of IgA vasculitis (Henoch-Schönlein) in adults: data from 260 patients included in a French multicenter retrospective survey. Arthritis & Rhematology . 2017;69(9):1862–1870. doi: 10.1002/art.40178. [DOI] [PubMed] [Google Scholar]
- 8.Shi D., Chan H., Yang X., et al. Risk factors associated with IgA vasculitis with nephritis (Henoch-Schönlein purpura nephritis) progressing to unfavorable outcomes: a meta-analysis. PLoS One . 2019;14(10, article e0223218) doi: 10.1371/journal.pone.0223218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yang Y. H., Yu H. H., Chiang B. L. The diagnosis and classification of Henoch-Schonlein purpura: an updated review. Autoimmunity Reviews . 2014;13(4-5):355–358. doi: 10.1016/j.autrev.2014.01.031. [DOI] [PubMed] [Google Scholar]
- 10.Currie G., Hawk K. E., Rohren E., Vial A., Klein R. Machine learning and deep learning in medical imaging: intelligent imaging. Journal of Medical Imaging and Radiation Sciences . 2019;50(4):477–487. doi: 10.1016/j.jmir.2019.09.005. [DOI] [PubMed] [Google Scholar]
- 11.Lan K., Fong S., Liu L.-S., et al. A clustering based variable sub-window approach using particle swarm optimisation for biomedical sensor data monitoring. Enterprise Information Systems . 2021;15(1):15–35. doi: 10.1080/17517575.2019.1597388. [DOI] [Google Scholar]
- 12.Deb S., Tian Z., Fong S., Wong R., Millham R., Wong K. Elephant search algorithm applied to data clustering. Soft Computing . 2018;22(8):6035–6046. [Google Scholar]
- 13.Peiffer-Smadja N., Rawson T. M., Ahmad R., et al. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clinical Microbiology and Infection . 2020;26(5):584–595. doi: 10.1016/j.cmi.2019.09.009. [DOI] [PubMed] [Google Scholar]
- 14.Awan S. E., Sohel F., Sanfilippo F. M., Bennamoun M., Dwivedi G. Machine learning in heart failure. Current Opinion in Cardiology . 2018;33(2):190–195. doi: 10.1097/HCO.0000000000000491. [DOI] [PubMed] [Google Scholar]
- 15.Li J., Fong S., Wong R. K., Millham R., Wong K. Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Scientific Reports . 2017;7(1):1–14. doi: 10.1038/s41598-017-04037-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shi J., Ye Y., Zhu D., Su L., Huang Y., Huang J. Automatic segmentation of cardiac magnetic resonance images based on multi-input fusion network. Computer Methods and Programs in Biomedicine . 2021;209, article 106323 doi: 10.1016/j.cmpb.2021.106323. [DOI] [PubMed] [Google Scholar]
- 17.Ozen S., Pistorio A., Iusan S. M., et al. EULAR/PRINTO/PRES criteria for Henoch-Schonlein purpura, childhood polyarteritis nodosa, childhood wegener granulomatosis and childhood takayasu arteritis: Ankara 2008. Part ii: final classification criteria. Annals of the Rheumatic Diseases . 2010;69(5):798–806. doi: 10.1136/ard.2009.116657. [DOI] [PubMed] [Google Scholar]
- 18.Mian A. N., Schwartz G. J. Measurement and estimation of glomerular filtration rate in children. Advances in Chronic Kidney Disease . 2017;24(6):348–356. doi: 10.1053/j.ackd.2017.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chi X. H., Li G. P., Wang Q. S., et al. CKD-EPI creatinine-cystatin c glomerular filtration rate estimation equation seems more suitable for chinese patients with chronic kidney disease than other equations. BMC Nephrology . 2017;18(1):1–7. doi: 10.1186/s12882-017-0637-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Davagdorj K., Pham V. H., Theera-Umpon N., Ryu K. H. Xgboost-based framework for smoking-induced noncommunicable disease prediction. International Journal of Environmental Research and Public Health . 2020;17(18, article 6513) doi: 10.3390/ijerph17186513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yu B., Qiu W., Chen C., et al. Submito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and extreme gradient boosting. Bioinformatics . 2020;36(4):1074–1081. doi: 10.1093/bioinformatics/btz734. [DOI] [PubMed] [Google Scholar]
- 22.Li Y., Li M., Li C., Liu Z. Forest aboveground biomass estimation using Landsat 8 and Sentinel-1A data with machine learning algorithms. Scientific Reports . 2020;10(1):1–12. doi: 10.1038/s41598-020-67024-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Reamy B. V., Servey J. T., Williams P. M. Henoch-Schönlein purpura (IgA vasculitis): rapid evidence review. American Family Physician . 2020;102(4):229–233. [PubMed] [Google Scholar]
- 24.Karadağ Ş. G., Çakmak F., Çil B., et al. The relevance of practical laboratory markers in predicting gastrointestinal and renal involvement in children with henoch-schönlein purpura. Postgraduate Medicine . 2021;133(3):272–277. doi: 10.1080/00325481.2020.1807161. [DOI] [PubMed] [Google Scholar]
- 25.Liu H., Cui W., Liu H., Zhang C. Predicative value of urinary protein biomarkers on delayed renal involvement in children with henoch-schönlein purpura. Science China. Life Sciences . 2019;62(12):1594–1596. doi: 10.1007/s11427-018-9544-0. [DOI] [PubMed] [Google Scholar]
- 26.Le N. Q. K., Do D. T., Chiu F. Y., Yapp E. K. Y., Yeh H. Y., Chen C. Y. Xgboost improves classification of mgmt promoter methylation status in idh1 wildtype glioblastoma. Journal of Personalized Medicine . 2020;10(3, article 128) doi: 10.3390/jpm10030128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liu P., Fu B., Yang S. X., Deng L., Zhong X., Zheng H. Optimizing survival analysis of xgboost for ties to predict disease progression of breast cancer. IEEE Transactions on Biomedical Engineering . 2021;68(1):148–160. doi: 10.1109/TBME.2020.2993278. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used to support the findings of this study are available from the corresponding author upon request.
