Skip to main content
PLOS One logoLink to PLOS One
. 2022 Sep 16;17(9):e0272709. doi: 10.1371/journal.pone.0272709

Developing a machine learning-based tool to extend the usability of the NICHD BPD Outcome Estimator to the Asian population

Monalisa Patel 1, Japmeet Sandhu 2, Fu-Sheng Chou 3,*
Editor: Jose Palma4
PMCID: PMC9480997  PMID: 36112600

Abstract

The NICHD BPD Outcome Estimator uses clinical and demographic data to stratify respiratory outcomes of extremely preterm infants by risk. However, the Estimator does not have an option in its pull-down menu for infants of Asian descent. We hypothesize that respiratory outcomes in extreme prematurity among various racial/ethnic groups are interconnected and therefore the Estimator can still be used to predict outcomes in infants of Asian descent. Our goal was to apply a machine learning approach to assess whether outcome prediction for infants of Asian descent is possible with information hidden in the prediction results using White, Black, and Hispanic racial/ethnic groups as surrogates. We used the three racial/ethnic options in the Estimator to obtain the probabilities of BPD outcomes for each severity category. We then combined the probability results and developed three respiratory outcome prediction models at various postmenstrual age (PMA) by a random forest algorithm. We showed satisfactory model performance, with receiver operating characteristics area under the curve of 0.934, 0.850, and 0.757 for respiratory outcomes at PMA 36, 37, and 40 weeks, respectively, in the testing data set. This study suggested an interrelationship among racial/ethnic groups for respiratory outcomes among extremely preterm infants and showed the feasibility of extending the use of the Estimator to the Asian population.

Introduction

Bronchopulmonary dysplasia (BPD) is a multifactorial clinical syndrome of lung injury that disrupts the alveolarization and microvascular development [1]. Most infants with BPD have a prolonged need for respiratory support and supplemental oxygen. Despite advances in neonatology, a significant improvement in the incidence of BPD has not been observed, especially in extremely low gestational age infants. While these infants may eventually recover from respiratory support needs as the damage to the lung improves with adequate nutrition and growth over time, the diagnosis of BPD per se is also associated with an increased risk of long-term cardiopulmonary and neurodevelopmental impairment [2].

Preterm infants are at risk for mechanical, oxidant, and inflammatory injury because of lung underdevelopment and insufficient quantities of biochemical protectants, such as surfactant, antioxidants, and protease inhibitors [3]. The diagnosis of BPD is typically based on the NICHD 2001 workshop consensus. That is, a preterm infant must require supplemental oxygen for the first 28 postnatal days. The severity of BPD is further classified at 36 weeks postmenstrual age (PMA) based on the continuous need for respiratory support and the degree of O2 supplementation that is needed. Severe BPD is defined as receiving supplemental O2 of 30% or more, moderate BPD is defined as receiving 21–29% supplemental O2, and mild BPD is defined as breathing room air by 36 weeks PMA [4].

Various interventions, including protective ventilation strategies, maintaining optimal oxygen saturation goals, timely surfactant administration, and the use of antenatal corticosteroids, have been shown to reduce BPD or risk factors associated with BPD [5]. Additionally, caffeine therapy for apnea of prematurity has also been linked to a decreased risk of BPD [6]. Vitamin A supplementation may also provide a modest benefit in oxygen dependency at 36 weeks PMA, likely related to its role in promoting the distal alveolar development [7, 8]. Postnatal corticosteroids have been used in clinical practice to decrease the length of invasive mechanical ventilation, a significant risk for BPD. However, its role in BPD prevention remains controversial. Moreover, postnatal corticosteroid use has been linked to risks of growth restriction, cardiac hypertrophy, and cerebral palsy [9, 10].

The use of these interventions is currently largely based on protocols of individual NICUs. In practice, neonatal providers would benefit from risk stratification to identify infants most likely to benefit from preventive therapies by using an objective risk estimating tool [11]. The BPD Outcome Estimator (the Estimator) is one of the available tools endorsed by the National Institute of Child Health and Human Development (NICHD) [12, 13]. The Estimator has been used to guide corticosteroid use and to counsel families, although it can be confounded by the varying rates of mortality among different racial/ethnic groups and the racial disparities in preterm birth [1416]. In our institutional experience, one additional obstacle to the use of the Estimator was not being able to apply it to infants of Asian descent. At this time, due to limitations of the applicability of the Estimator, Asian families are being counseled with data that originally only applied to neonates of White, Black, or Hispanic descent. Therefore, we sought an approach to extend the usability of the Estimator to Asian extremely preterm infants. We hypothesize that respiratory outcomes in extreme prematurity among various racial/ethnic groups are interconnected and therefore the Estimator can be safely used to predict outcomes in infants of Asian descent. We aimed to test this hypothesis by first calculating the probability of each BPD severity group using the Estimator with demographic and respiratory data for Asian infants and created three sets of probability scores by choosing White, Black, and Hispanic as races/ethnicity for the same set of Asian infants. These scores were used as input into a machine learning algorithm for respiratory outcome prediction.

Methods

Patient population and data collection

We conducted a retrospective study at the Loma Linda University Children’s Hospital (LLUCH) and Riverside University Health System (RUHS) neonatal intensive care units (NICUs). The Loma Linda University Health human research protection programs institutional review board (IRB) (#520338) and Riverside University Health System IRB (#1689889) both approved the study and both waived the consenting requirement given the retrospective nature of the study. Only the corresponding author (F.-S. C.) had access to the identifiable information and the raw data. The LLUCH NICU is a level 4 NICU and the RUHS NICU is a level 3 NICU, both attended by the same group of neonatologists. Both NICUs have adapted to the same electronic medical record system since 2013. Our inclusion criteria included babies with gestational age (GA) of 30 weeks or less at birth, a birth weight of less than 1,250 grams, and a race/ethnicity designation as Asian in the admission face sheet in the electronic medical records from 2013 until 2020. A database search for patients that met inclusion criteria was performed by a data architect at LLU. Demographic as well as respiratory support, medications, and comorbidity data were extracted either directly from the database or manually via chart review. Babies that did not meet our inclusion criteria for gestational age and birth weight, as well as babies that did not have documented Asian race, were excluded. If the birth weight fell out of the range allowed in the Estimator based on the other demographic data, we used the closest weight to obtain probability scores.

BPD risk estimation and primary outcomes

Demographic and respiratory data including GA, birth weight, sex, postnatal day, ventilator type, and FiO2, on day of life 1, 3, 7, 14, 21, and 28 of the identified patients of Asian descent were entered into the Estimator to obtain probabilities of BPD for each severity category using White, Black, and Hispanic as surrogate race/ethnicity [12]. If more than one ventilator type was used within a day, the type that was used for the longest duration within the day, and the highest FiO2 for that ventilator type were used. Three sets of probability score results were generated from each set of input data based on the three races/ethnicity entries. If the birth weight fell out of the range required based on the other parameters, the closest weight instructed by the Estimator was used for calculation. For ventilator type, non-invasive positive pressure ventilation (NIPPV) was considered “CPAP”, and both high- and low-flow nasal cannulae were treated as “Cannula”. Respiratory support needs at 36, 37, and 40 weeks PMA were used as outcomes for model development. Three models were developed for the prediction of the respiratory outcome at each PMA using the three sets of probability scores. For the 36-week PMA outcome designation, we classified the patients into two groups: one group with no respiratory support need or with low-flow nasal cannula (LFNC) use, and the other group with higher-level respiratory support needs; for the outcomes at 37 and 40 weeks PMA, we classified the patients into either with or without respiratory support needs. If a patient was discharged without respiratory support before the indicated PMA, the patient was classified into the group without support needs; on the other hand, if a patient was discharged before reaching the indicated PMA on respiratory support, the patient was classified into the group with support needs. None of the patients included in the study had a tracheostomy upon discharge, and none were discharged before 36 weeks PMA on LFNC. Similar to the analytic approach used in the BPD Outcome Estimator, we did not take repeated measures for each patient into consideration. In other words, each set of input data across various days of life for one patient was treated as an independent set of input data.

Machine learning-based model development

Supervised machine learning was performed in this study. Specifically, the three sets of risk stratification probability results for the five severity categories (Death, Severe, Moderate, Mild, and No BPD) obtained from the three racial/ethnic surrogate groups were included as features (15 total independent variables, also known as features); respiratory support needs at either 36, 37, and 40 weeks PMA classified as mentioned above were included as the outcome for supervised learning. With the assumption that the probability scores for the five severity categories for each surrogate race/ethnicity already contained hidden information with regards to the demographic and respiratory support data in them, we did not incorporate the raw demographic and respiratory support data in model development. Due to the intercorrelated nature among severity categories (e.g. higher probability in the Death or Severe categories negatively correlates with lower probability in the Mild or No BPD categories, or vice versa), non-linear algorithms were chosen. We tested random forest (RF) and kernel (radial) support vector machine (SVM) algorithms, as well as an ensemble algorithm that incorporated both RF and SVM.

Data were randomly split into two data sets in a 2:1 ratio, with the larger set used for training, and the small set for testing. A 10-fold cross-validation repeated 10 times was performed during training. For hyperparameter tuning, mtry was set as constant at 4; minimal node sizes of 1, 3, 5, and 10 were used, and splitting rules by the Gini index and extra trees were both tried for the final model selection.

For model performance assessment, metrics including sensitivity, specificity, positive and negative predictive values, overall predictive accuracy, and the kappa score were reported. The F1 score, calculated as the harmonic mean of sensitivity and positive predictive value, was also reported. A predicted probability of 50% or higher is considered as having a positive prediction. Additionally, the area under the receiver’s operating characteristics curve (ROC AUC) was reported.

The study was performed in R 4.0.5 and RStudio 1.4 using the caret and caretEnsemble package [17, 18]. Codes are available upon request. We followed the TRIPOD guidelines checklist for prediction model development and validation [19].

Results

A total of 829 preterm infants with a birth GA equal to or less than 30 weeks and a birth weight of less than 1,250 grams were identified in the clinical database. Three infants did not have race/ethnicity information. Forty-one infants were identified as Asian, among which 38 had complete respiratory data in the first 28 days of life. Demographic information is summarized in Table 1. The median GA was 26 weeks 3 days, with a range between 23 weeks 3 days to 30 weeks 6 days. The mean birth weight was 902 grams. Approximately 30% of the cohort were female. Among these newborns, 7 died and 2 were transferred out before reaching 36 weeks PMA. The remaining 29 infants had a total of 168 sets of input data available for model development and testing (Table 2). Table 2 shows their support modes at 36, 37, and 40 weeks. At 36 weeks’ corrected gestational age (CGA), 27.6% (n = 8) patients did not require any support, 27.6% (n = 8) required respiratory support with a low-flow nasal cannula, and the remainder (44.8%, n = 13) required higher support. At 37 and 40 weeks’ CGA, 34.5% (n = 10) and 51.7% (n = 15) patients did not require any support, respectively [20].

Table 1. Demographic characterization.

All infants identified as Asian (n = 38)
Gestational age at birth
median 26 weeks 3 days
range 23 weeks 3 days—30 weeks 6 days
Birth weight (gram)
Mean ± sd 902 ± 200
Sex
Female 12 (31.6%)
Male 26 (68.4%)
Birth place
Inborn 31 (81.6%)
Outborn 7 (18.4%)
Disposition
In-hospital mortality 7 (18.4%)
Transferred before 36 weeks 2 (5.3%)
Remainder (included in the study) 29 (76.3%)

Table 2. Characteristics of infants included in model development and testing.

All infants included in the model development and testing (n = 29)
Steroid exposure
Antenatal, ≥ 1 dose 29 (100%)
Postnatal, for BPD prevention 5 (17.2%)
Surfactant
None 8 (27.6%)
≤ 2 doses 18 (62.1%)
≥ 3 doses 3 (10.3%)
Comorbidity of prematurity
Intraventricular hemorrhage, any grade 6 (20.7%)
Intraventricular hemorrhage, Grade 3 or 4 3 (10.3%)
Medical therapy for patent ductus arteriosus 17 (58.6%)
Invasive therapy for patent ductus arteriosus* 8 (27.6%)
Spontaneous intestinal perforation 1 (3.4%)
Necrotizing enterocolitis, Stage 2 or 3 2 (6.9%)
Respiratory support at postmenstrual age 36 weeks
No support 8 (27.6%)
Low flow nasal cannula 8 (27.6%)
Other ventilatory support 13 (44.8%)
Respiratory support at postmenstrual age 37 weeks
No support 10 (34.5%)
Low flow nasal cannula 7 (24.1%)
Other ventilatory support 12 (41.4%)
Respiratory support at postmenstrual age 40 weeks
No support 15 (51.7%)
Low flow nasal cannula 9 (31.0%)
Other non-invasive ventilatory support 5 (17.3%)

*including surgical ligation or coil embolization.

Inborn if same birth and discharge hospitals; outborn if different birth and discharge hospitals.

BPD: bronchopulmonary dysplasia.

A total of 114 sets of input data were used for training, and the remaining 54 sets of input data were reserved for testing. The performance metrics of the RF-based models were listed in Table 3. Specifically, F1 scores were 0.857, 0.906, and 0.746, and the AUCs were 0.959, 0.974, and 0.956, for predictions at 36, 37, and 40 weeks PMA on the testing data set, respectively. Overall, the models for 36- and 37-week PMA respiratory outcome predictions had better generalizability compared to the model for predicting the 40-week outcome. A comprehensive list of performance metrics for all three modeling approaches, namely SVM, RF, and ensemble, is available in S1 Table.

Table 3. Model performance measures for the indicated outcomes using a random forest algorithm.

PMA 36 weeks 37 weeks 40 weeks
Input 15 probability scores (5 severity categories generated by the BPD Estimator for each of the three surrogate race/ethnicity)
Positive Outcome HFNC, CPAP, Invasive respiratory support Any respiratory support Any respiratory support
Negative Outcome LFNC, No support No support No support
Data set Training Testing Training Testing Training Testing
Sensitivity 0.843 0.808 0.873 0.850 0.821 0.647
Specificity 0.905 0.929 0.886 0.929 0.879 0.850
PPV 0.878 0.913 0.945 0.971 0.868 0.880
NPV 0.877 0.839 0.756 0.684 0.836 0.586
Accuracy (95% CI * ) 0.877 (0.803–0.931) 0.870 (0.751–0.946) 0.877 (0.803–0.931) 0.870 (0.751–0.946) 0.851 (0.772–0.911) 0.722 (0.584–0.835)
Kappa 0.751 0.739 0.725 0.698 0.701 0.455
F1 score 0.860 0.857 0.908 0.906 0.844 0.746
ROCAUC (95% CI ** ) 0.959 (0.926–0.991) 0.934 (0.869–1.000) 0.974 (0.951–0.997) 0.850 (0.725–0.975) 0.956 (0.923–0.989) 0.757 (0.625–0.890)

PMA: postmenstrual age; HFNC: high-flow nasal cannula; CPAP: continuous positive airway pressure; LFNC: low-flow nasal cannula; PPV: positive predictive value; NPV: negative predictive value; ROCAUC: receiver’s operating characteristic area under receiver operating characteristic curve. CI: confidence interval.

*Binomial proportion confidence interval.

**Calculated by the DeLong method.

Discussion

A recent report using the 2016 national pediatric hospitalization dataset, the Kids’ Inpatient Database, showed that Asian, Pacific Islander, and Native Americans as a group had the lowest rate of extreme prematurity when compared to non-Hispanic white, non-Hispanic black, and Hispanic groups. Paradoxically, the report also showed that Asian/Pacific Islander/Native American premature newborns were associated with the highest hospitalization cost, which could be due to a higher rate of morbidities [21]. BPD is one of the major comorbidities of extreme prematurity that may lengthen neonatal hospitalization due to prolonged cardiopulmonary instability. In this report, we developed a machine learning-based solution to address one major limitation of the NICHD BPD Outcome Estimator–the inability to be applied to the Asian population [12]. Respiratory outcome prediction of Asian infants was achieved by using the output from the Estimator with each of the three race/ethnicity surrogates plus the same level of respiratory support as well as the same birth weight and day of life information. Unlike the original Estimator, the models developed in this study were designed to predict two-level categorical respiratory outcomes at various PMA, as we were not able to make predictions for each risk category due to a relatively small sample size. Additionally, the models were not generalizable to Pacific Islanders and Native American because our cohort did not consist of infants from these two racial/ethnic groups.

We chose to use dichotomized outcome measures for simplicity and easy application in the clinical setting. We classified LFNC and no respiratory support as one outcome group at 36 weeks PMA because, from a physiological and developmental standpoint, most of these infants are still developing their oral skills at this age; being able to tolerate LFNC or no support indicates that they will most likely be allowed to receive oral skill training with breast or bottle feeding. The use of LFNC may be due to emerging respiratory reserve and/or residual resolving pulmonary hypertension associated with BPD. At 37 and 40 weeks, we dichotomized the outcome as with or without support needs. Based on the ROCAUC and the F1 scores, the model for predicting respiratory outcomes at 40 weeks PMA showed relatively poor generalizability, although the AUC for the testing data set was still comparable to the original model used in the BPD Outcome estimator.

In this project, we used a well-established decision tree-based ML technique to extend the usability of the Estimator to the Asian population. While our cohort size was small, the advantage of this study was the ability to set aside a portion of the input data for model testing. Similar to the statistical model behind the Estimator, our model did not take repeated measurements for each infant into consideration. We also chose a dichotomized outcome, as opposed to risk stratification primarily due to small sample size and clinical applicability, as estimation at distinct PMAs longitudinally may be easier to comprehend and communicate in the clinical settings.

We developed a web app for the demonstration of these models. The web app is accessible at https://neostat.shinyapps.io/BPD_Asian/. It requires the probability data for each risk category using White, Black, and Hispanic as race/ethnicity. The prediction output is the probability of having the indicated positive outcome for each PMA. Please note that the prediction models developed in this study used in the web app were based on data and outcomes of our practice and were not thought to be generalizable as they have not gone through rigorous prospective validation, which is a major limitation to this study. This web app was not designed to guide clinical decision-making at the moment. Readers taking care of Asian extremely preterm infants may develop local models based on their own institutional data.

In conclusion, this study suggested an interrelationship between racial/ethnic groups for respiratory outcomes among extremely preterm infants and showed the feasibility of extending the use of the Estimator to the Asian population.

Supporting information

S1 Dataset

(CSV)

S1 Table. Model performance comparisons.

(DOCX)

Acknowledgments

The authors would like to thank Dr. John B. C. Tan, PhD of Huckleberry Labs for critically reviewing the manuscript.

Data Availability

All relevant data are within the paper and its Supporting information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Thébaud B, Goss KN, Laughon M, Whitsett JA, Abman SH, Steinhorn RH, et al. Bronchopulmonary dysplasia. Nat Rev Dis Primers. 2019;5: 78. doi: 10.1038/s41572-019-0127-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Keller RL, Ballard RA. 48—Bronchopulmonary Dysplasia. In: Gleason CA, Juul SE, editors. Avery’s Diseases of the Newborn (Tenth Edition). Philadelphia: Elsevier; 2018. pp. 678–694.e6. [Google Scholar]
  • 3.Gisondo CM, Donn SM. Bronchopulmonary dysplasia: An overview. Res Rep Neonatol. 2020;10: 67–79. [Google Scholar]
  • 4.Ahlfeld SK. Respiratory tract disorders. Nelson Textbook of Pediatrics 21st ed Philadelphia, PA: Elsevier. 2020.
  • 5.Michael Z, Spyropoulos F, Ghanta S, Christou H. Bronchopulmonary Dysplasia: An Update of Current Pharmacologic Therapies and New Approaches. Clin Med Insights Pediatr. 2018;12: 1179556518817322. doi: 10.1177/1179556518817322 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Schmidt B, Roberts RS, Davis P, Doyle LW, Barrington KJ, Ohlsson A, et al. Caffeine therapy for apnea of prematurity. N Engl J Med. 2006;354: 2112–2121. doi: 10.1056/NEJMoa054065 [DOI] [PubMed] [Google Scholar]
  • 7.Albertine KH, Dahl MJ, Gonzales LW, Wang Z-M, Metcalfe D, Hyde DM, et al. Chronic lung disease in preterm lambs: effect of daily vitamin A treatment on alveolarization. Am J Physiol Lung Cell Mol Physiol. 2010;299: L59–72. doi: 10.1152/ajplung.00380.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Araki S, Kato S, Namba F, Ota E. Vitamin A to prevent bronchopulmonary dysplasia in extremely low birth weight infants: a systematic review and meta-analysis. PLoS One. 2018;13: e0207730. doi: 10.1371/journal.pone.0207730 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Doyle LW, Cheong JL, Ehrenkranz RA, Halliday HL. Early (< 8 days) systemic postnatal corticosteroids for prevention of bronchopulmonary dysplasia in preterm infants. Cochrane Database of Systematic Reviews. 2017. doi: 10.1002/14651858.cd001146.pub5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Htun ZT, Schulz EV, Desai RK, Marasch JL, McPherson CC, Mastrandrea LD, et al. Postnatal steroid management in preterm infants with evolving bronchopulmonary dysplasia. J Perinatol. 2021;41: 1783–1796. doi: 10.1038/s41372-021-01083-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Onland W, Debray TP, Laughon MM, Miedema M, Cools F, Askie LM, et al. Clinical prediction models for bronchopulmonary dysplasia: a systematic review and external validation study. BMC Pediatr. 2013;13: 207. doi: 10.1186/1471-2431-13-207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Laughon MM, Langer JC, Bose CL, Smith PB, Ambalavanan N, Kennedy KA, et al. Prediction of bronchopulmonary dysplasia by postnatal age in extremely premature infants. Am J Respir Crit Care Med. 2011;183: 1715–1722. doi: 10.1164/rccm.201101-0055OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Laughon MM, Langer JC, Bose CL, Smith PB, Ambalavanan N, Kennedy KA, et al. NICHD Neonatal Research Network Neonatal BPD Outcome Estimator. 2011 [cited 10 May 2021]. https://neonatal.rti.org/index.cfm
  • 14.Cuna A, Liu C, Govindarajan S, Queen M, Dai H, Truog WE. Usefulness of an Online Risk Estimator for Bronchopulmonary Dysplasia in Predicting Corticosteroid Treatment in Infants Born Preterm. J Pediatr. 2018;197: 23–28.e2. doi: 10.1016/j.jpeds.2018.01.065 [DOI] [PubMed] [Google Scholar]
  • 15.Whitehead HV, McPherson CC, Vesoulis ZA, Cohlan BA, Rao R, Warner BB, et al. The Challenge of Risk Stratification of Infants Born Preterm in the Setting of Competing and Disparate Healthcare Outcomes. J Pediatr. 2020;223: 194–196. doi: 10.1016/j.jpeds.2020.04.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hansen TP, Oschman A, Pallotto EK, Palmer R, Younger D, Cuna A. Using quality improvement to implement consensus guidelines for postnatal steroid treatment of preterm infants with developing bronchopulmonary dysplasia. J Perinatol. 2021;41: 891–897. doi: 10.1038/s41372-020-00862-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ensembles of Caret Models [R package caretEnsemble version 2.0.1]. 2019 [cited 11 May 2021]. https://cran.r-project.org/package=caretEnsemble
  • 18.Kuhn M. Classification and Regression Training [R package caret version 6.0–86]. 2020 [cited 2 May 2021]. https://CRAN.R-project.org/package=caret
  • 19.Patzer RE, Kaji AH, Fong Y. TRIPOD Reporting Guidelines for Diagnostic and Prognostic Studies. JAMA Surg. 2021;156: 675–676. doi: 10.1001/jamasurg.2021.0537 [DOI] [PubMed] [Google Scholar]
  • 20.Jensen EA, Dysart K, Gantz MG, McDonald S, Bamat NA, Keszler M, et al. The Diagnosis of Bronchopulmonary Dysplasia in Very Preterm Infants. An Evidence-based Approach. Am J Respir Crit Care Med. 2019;200: 751–759. doi: 10.1164/rccm.201812-2348OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chou F-S. Assessment of social factors influencing hospitalization cost of US preterm newborns, 2016. J Matern Fetal Neonatal Med. 2020; 1–9. doi: 10.1080/14767058.2020.1776252 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Jianhong Zhou

9 Dec 2021

PONE-D-21-16900Developing a machine learning-based tool to extend the usability of the NICHD BPD Outcome Estimator to the Asian populationPLOS ONE

Dear Dr. Chou,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 22 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jianhong Zhou

Associate Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include your full ethics statement in the manuscript Methods.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The article "Developing a machine learning-based tool to extend the usability of the NICHD Outcome Calculator to the Asian population," by Sandhu et al is an interesting application of the BPD Estimator to a local population of infants of Asian descent. Given disparities in outcomes based on racial demographics in infants with BPD, this idea is novel and may be useful in counseling families of Asian descent who have premature infants at risk for BPD. The developed model seems to have good predictive capability at 36 and 37 weeks', but there are some general concerns the authors should address.

1) Abstract: The objective of the study seems to be to apply the Estimator to an Asian demographic and evaluate its performance. However, the hypothesis and goal is unclear and should be revised.

2) Introduction: In general, when making statements about the incidence of BPD or inflammation as an etiology of BPD etc, references should be stated. Many BPD experts would argue about inflammation as being the most common cause of BPD, as it is largely considered to be of multi-factorial origin. Additionally, the authors claim the most significant intervention in preventing mortality and the incidence of BPD are systemic steroids. Although there is some evidence to support this claim, there was also a large NICHD study (Watterburg et al) that does not support this claim. On the whole, the authors should focus the introduction more on what is known about predictors of BPD in an Asian population, and the overall incidence in this group, to support their hypothesis more clearly.

Methods: There should be a statement regarding approval by an Institutional Review Board.

The methods are a little unclear. It seems that the Asian patient population was entered into the BPD Estimator using surrogate races, and that the prediction of the calculator was then compared to the observed outcomes at 36, 37 and 40 weeks'? The authors should clarify the approach some in the Methods section. How Asian descent was determined is an important factor in this study.

Results: The validity of the model is enhanced by having a training and testing set. It could be further strengthened by testing it on an independent data set outside the institution if that is available. It may be interesting to evaluate the performance of the model based on the surrogate race/ethnicity used as input. For example, was there one race/ethnicity that is already in the calculator that performed well in the Asian descent population?

Discussion: The tool included in the Discussion is useful. The authors should address how to apply this in a clinical setting. They should also include confidence limits around the point estimate.

Reviewer #2: This interesting study assessed whether outcome prediction for Asian preterm born infants is possible using information hidden in prediction results for the other major racial groups in het NICHD BPD Outcome Estimator. Prediction of BPD development is an important topic, and investigating whether the risk can be predicted in infants with other descent then is used now in the Estimator is interesting.

After reading the manuscript, I have some addition questions and comments.

1. In the methods section the authors described a retrospective study with prospective data collection, although at the end they describe that retrospective data collection was approved by the institutional review board. So it is unclear whether the data was collected prospectively or retrospectively? And where any exclusion criteria used for this study?

2. In the methods section the authors describe the part of machine learning-bases model development. First of all, good that the authors have performed internal validation (train and test set) as part of model development, however no information is presented whether the authors performed shrinkage of the predictor weights or regression coefficients. In addition, it is not clear which specific variables were used for model development. Are only the five severity categories are used? Or also the demographic data on GA, BW, respiratory support etc. And in case these demographic data is used, from which timepoint are these?

3. In the methods section no information is presented about the usage of any guidelines for development of prediction models? And no information is presented about how the authors dealed with missing data?

4. In the results section the authors described the inclusion of 842 premature newborns with a gestational age equal or less to 30 weeks and 6 days, although the inclusion criteria for this study described in the methods section, stated GA 30 weeks or less.

5. There is no information about how many infants developed BPD in the study cohort.

6. Table 3 missed information about which variables are included in the model.

7. In the results section information about the comparison of the outcomes from the machine-based learning models and the outcomes from the NICHD BPD Outcome Estimator is missing. In my opinion this is necessary to underlie the conclusion that extending the use of the Estimator to Asian population is feasible. In addition, no information is presented regarding the final predictor weights?

8. It can be suggested to add a calibration plot of model performances from both the train and test model in the results section.

9. From the discussion, no clear conclusion is presented. Only in the Abstract, a clear conclusion is presented.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Michelle Romijn

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Sep 16;17(9):e0272709. doi: 10.1371/journal.pone.0272709.r002

Author response to Decision Letter 0


20 Jan 2022

Reviewer #1: The article "Developing a machine learning-based tool to extend the usability of the NICHD Outcome Calculator to the Asian population," by Sandhu et al is an interesting application of the BPD Estimator to a local population of infants of Asian descent. Given disparities in outcomes based on racial demographics in infants with BPD, this idea is novel and may be useful in counseling families of Asian descent who have premature infants at risk for BPD. The developed model seems to have good predictive capability at 36 and 37 weeks', but there are some general concerns the authors should address.

1) Abstract: The objective of the study seems to be to apply the Estimator to an Asian demographic and evaluate its performance. However, the hypothesis and goal is unclear and should be revised.

Response: We thank the reviewer for pointing out the deficiency in the abstract. We have revised it to bring more clearness to the hypothesis and goal of the study. Please refer to the revised manuscript.

2) Introduction: In general, when making statements about the incidence of BPD or inflammation as an etiology of BPD etc, references should be stated. Many BPD experts would argue about inflammation as being the most common cause of BPD, as it is largely considered to be of multi-factorial origin.

Response: We thank the reviewer for pointing out that BPD is a multifactorial disease, which we agree. We have updated this part of the introduction (Paragraphs 1 & 2) and have also included supporting references.

Additionally, the authors claim the most significant intervention in preventing mortality and the incidence of BPD are systemic steroids. Although there is some evidence to support this claim, there was also a large NICHD study (Watterburg et al) that does not support this claim. On the whole, the authors should focus the introduction more on what is known about predictors of BPD in an Asian population, and the overall incidence in this group, to support their hypothesis more clearly.

Response: We have revised the Introduction section to put more emphasis on the lack of a prediction tool for Asian population. Please refer to Page 5 of the revised manuscript.

3) Methods: There should be a statement regarding approval by an Institutional Review Board.

Response: We thank the reviewer for pointing out the deficiency. We have added the statement to the Methods section. Please refer to the revised manuscript. Please refer to Page 6 of the revised manuscript.

The methods are a little unclear. It seems that the Asian patient population was entered into the BPD Estimator using surrogate races, and that the prediction of the calculator was then compared to the observed outcomes at 36, 37 and 40 weeks'? The authors should clarify the approach some in the Methods section. How Asian descent was determined is an important factor in this study.

Response: The reviewer’s understanding of the methodology was correct. We agree that we could have been clearer in describing our methods. The identification of the Asian descent is based on the facesheet available in the electronic medical records. We have edited the methods section to provide more clarity. Please refer to Pages 6 & 7 of the revised manuscript.

4) Results: The validity of the model is enhanced by having a training and testing set. It could be further strengthened by testing it on an independent data set outside the institution if that is available. It may be interesting to evaluate the performance of the model based on the surrogate race/ethnicity used as input. For example, was there one race/ethnicity that is already in the calculator that performed well in the Asian descent population?

Response: Publication of this project will strength our case of obtaining funding and access to a multi-center de-identified research data warehouse to upscale the study. We did not specifically examine the data to answer the question posted by the reviewer. We felt that, since race/ethnicity is social determinant, it would be very difficult to interpret the findings and to provide a sound biological explanation if one specific race/ethnicity performs well in predicting BPD outcome for Asian infants. Please forgive us for not being able to provide a more satisfying answer given the sensitive nature of the race/ethnicity topic.

Discussion: The tool included in the Discussion is useful. The authors should address how to apply this in a clinical setting. They should also include confidence limits around the point estimate.

Response: We added a disclaimer to the web app and also revised the Discuss paragraph to emphasize that the web app developed for this project is meant for demonstration purpose and should not be used to guide clinical decision-making due to the lack of external validation. We added the 95% confidence intervals for accuracy. To calculate receiver’s operating characteristic area under the curve (ROCAUC), we went back to use probability to recalculate ROCAUC (instead of the AUC based on one single cutoff value as shown previously). The generalizability for CGA 40 week respiratory outcome is still much poorer compared to that for CGA 36 and 37 weeks. Please refer to the revised Table 3.

Reviewer #2: This interesting study assessed whether outcome prediction for Asian preterm born infants is possible using information hidden in prediction results for the other major racial groups in the NICHD BPD Outcome Estimator. Prediction of BPD development is an important topic, and investigating whether the risk can be predicted in infants with other descent then is used now in the Estimator is interesting.

After reading the manuscript, I have some additional questions and comments.

1. In the methods section the authors described a retrospective study with prospective data collection, although at the end they describe that retrospective data collection was approved by the institutional review board. So it is unclear whether the data was collected prospectively or retrospectively?

Response: Thank you for pointing this out. We conducted a retrospective study. Data were collected in retrospect. We have updated the manuscript.

Were there any exclusion criteria used for this study?

Response: The exclusion criteria were those that did not meet GA and birth weight criteria, and those that were not designated as Asian descent on the facesheet. This was a single-center pilot study so we wanted to be as inclusive as possible. We have updated and clarified our inclusion and exclusion criteria in our methods.

2. In the methods section the authors describe the part of machine learning-bases model development. First of all, good that the authors have performed internal validation (train and test set) as part of model development, however no information is presented whether the authors performed shrinkage of the predictor weights or regression coefficients.

Response: We thank the reviewer for the comment. We believe the reviewer is referring to the elastic net or other related linear algorithms, where coefficient weights were assessed by the machine. The random forest is a decision tree-based algorithm, and the support vector machine is a distance-based approach in a hyperspace. Assessment of shrinkage of the predictor weights, in our understanding, is not applicable to these algorithms.

In addition, it is not clear which specific variables were used for model development. Are only the five severity categories are used? Or also the demographic data on GA, BW, respiratory support etc. And in case these demographic data is used, from which timepoint are these?

Response: The assumption for the model development is that the five probability scores for all five severity categories for each surrogate race/ethnicity assignment already incorporates the demographic and respiratory support information in them. Therefore, only probability scores (a total of 15 as a result of 5 severity categories with 3 surrogate race/ethnicity) were incorporated as features for model development. We have revised the manuscript for better clarity. Please refer to Page 7 of the revised manuscript.

3. In the methods section no information is presented about the usage of any guidelines for development of prediction models?

Response: Thank you for pointing this out. We followed the TRIPOD guidelines checklist for prediction model development and validation. We have updated Methods to include this detail.

And no information is presented about how the authors dealt with missing data?

Response: The probability scores generated from the BPD estimator were used as predictors for model development, therefore there was no missing data. If the birth weight fell out of the range allowed in the Estimator based on the other demographic data (but still within 500-1,250g), we used the closest weight. Babies that did not meet our inclusion criteria for gestational age and babies that did not have documented Asian race, were excluded.

4. In the results section the authors described the inclusion of 842 premature newborns with a gestational age equal or less to 30 weeks and 6 days, although the inclusion criteria for this study described in the methods section, stated GA 30 weeks or less.

Response: Thank you for pointing this out. We have corrected our description in Methods and Results to consistently state GA 30 weeks or less.

5. There is no information about how many infants developed BPD in the study cohort.

Response: We have updated our results section to further clarify these details. BPD is generally diagnosed when an infant requires continued respiratory support at 36 weeks CGA. We provided breakdown of types of respiratory support at 36, 37, and 40 weeks CGA by partially following the BPD grading system per Jensen et al (2019). Please refer to Table 2.

6. Table 3 missed information about which variables are included in the model.

Response: We only used the probability scores as input. We have updated the table. Please refer to Page 14 of the revised manuscript.

7. In the results section information about the comparison of the outcomes from the machine-based learning models and the outcomes from the NICHD BPD Outcome Estimator is missing. In my opinion this is necessary to underlie the conclusion that extending the use of the Estimator to Asian population is feasible.

Response: It is not possible to compare outcomes from the Estimator to the machine-based learning models as the NICHD BPD Outcome Estimator does not apply to infants of Asian descent, hence we developed this pilot project to address this deficiency of the Estimator.

In addition, no information is presented regarding the final predictor weights?

Response: In this single-center project with relatively small sample size, we felt that delivery of variance importance scores to the readers will create confusion and the relative importance of the probability scores cannot be explained in biological terms. We would like to request permission from the reviewer to not present predictor weights unless a reader requests the information in written correspondence after publication.

8. It can be suggested to add a calibration plot of model performances from both the train and test model in the results section.

Response: It is our understanding that calibration curves require sufficiently large samples, hence we didn't find it particularly useful for this single-center pilot study.

9. From the discussion, no clear conclusion is presented. Only in the Abstract, a clear conclusion is presented.

Response: We added a conclusion statement to the end of Discussion. Please refer to the revised manuscript.

Attachment

Submitted filename: Response to reviewers.docx

Decision Letter 1

Jose Palma

2 May 2022

PONE-D-21-16900R1Developing a machine learning-based tool to extend the usability of the NICHD BPD Outcome Estimator to the Asian populationPLOS ONE

Dear Dr. Chou,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 16 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jose Palma, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have done a good job at addressing most Reviewer comments. However, there are still a few minor points that need to be addressed.

Abstract:

1. The sentence that the Estimator does not apply to infants of Asian descent should be modified to say that is not known if it can be applied (I think this is the hypothesis)?

2. Is the hypothesis that the respiratory outcomes are interconnected between respiratory groups, or that the Estimator can be safely used to predict outcomes in infants of Asian descent? I think it is the latter? There is a grammatical error in this sentence.

Introduction

1. Neonatology on Page 4 does not need to be capitalized.

2. There are grammatical errors in the Introduction.

3. The hypothesis needs to be clarified (as stated for the Abstract).

Results

1. It may be important to know how many infants had missing race information on the facesheet as that may introduce some sampling bias.

2. On page 10, the word “had” should be removed from the sentence “At 37 and 40 weeks, CGA …”

Discussion

1. The national pediatric hospitalization dataset should be capitalized if it is a specific dataset that is being named.

2. The statement “The use of LFNC indicates…residual resolving pulmonary hypertension associated with BPD” is misleading. Level of respiratory support alone is not used to determine PH risk.

3. The authors should carefully proofread the manuscript and correct grammatical errors prior to acceptance.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Sep 16;17(9):e0272709. doi: 10.1371/journal.pone.0272709.r004

Author response to Decision Letter 1


5 May 2022

Abstract:

1. The sentence that the Estimator does not apply to infants of Asian descent should be modified to say that is not known if it can be applied (I think this is the hypothesis)?

Sorry for the confusion. The Estimator really cannot be applied to infants of Asian descent because of the limitation in the choices in the Race/Ethnicity pull-down menu. We changed the sentence to “However, the estimator does not have an option in its pull-down menu for infants of Asian descent.”. Please refer to the revised manuscript.

2. Is the hypothesis that the respiratory outcomes are interconnected between respiratory groups, or that the Estimator can be safely used to predict outcomes in infants of Asian descent? I think it is the latter? There is a grammatical error in this sentence.

The grammatical error has been corrected. Thank you for pointing this out.

Acknowledging that prediction model development projects are not traditional hypothesis-driven research studies, we tried to rephrase the presumption in model design into a hypothesis but could not do a good job. Thank you for wording it so elegantly for us. We hope you don’t mind us using both of your statements in our revised manuscript.

Introduction

1. Neonatology on Page 4 does not need to be capitalized.

Corrected.

2. There are grammatical errors in the Introduction.

We have revised parts of the Introduction section and have grammar checked by a native English speaker who is also a biomedical researcher. Please kindly review the revised manuscript and let us know if more work is still needed.

3. The hypothesis needs to be clarified (as stated for the Abstract).

Updated in the revised manuscript.

Results

1. It may be important to know how many infants had missing race information on the facesheet as that may introduce some sampling bias.

This has been updated in the revised manuscript.

2. On page 10, the word “had” should be removed from the sentence “At 37 and 40 weeks, CGA …”

Corrected. Thank you for pointing this out.

Discussion

1. The national pediatric hospitalization dataset should be capitalized if it is a specific dataset that is being named.

The name of the database is Kids’ Inpatient Database. It has been added to the revised manuscript and capitalized.

2. The statement “The use of LFNC indicates…residual resolving pulmonary hypertension associated with BPD” is misleading. Level of respiratory support alone is not used to determine PH risk.

We changed “indicates” to “may be associated with” in the revised manuscript. Please assist in assessing the adequacy of such wording.

3. The authors should carefully proofread the manuscript and correct grammatical errors prior to acceptance.

We have sought medical writing expert to assist in proofreading. Please kindly review the revised manuscript and advise whether more work is still needed.

Decision Letter 2

Jose Palma

26 Jul 2022

Developing a machine learning-based tool to extend the usability of the NICHD BPD Outcome Estimator to the Asian population

PONE-D-21-16900R2

Dear Dr. Chou,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jose Palma, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: The issues raised by the reviewer have been adequately addressed and the article is now ready for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

**********

Acceptance letter

Jose Palma

8 Sep 2022

PONE-D-21-16900R2

Developing a machine learning-based tool to extend the usability of the NICHD BPD Outcome Estimator to the Asian population

Dear Dr. Chou:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Mr. Jose Palma

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Dataset

    (CSV)

    S1 Table. Model performance comparisons.

    (DOCX)

    Attachment

    Submitted filename: Response to reviewers.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES