Skip to main content
PLOS One logoLink to PLOS One
. 2020 Jun 30;15(6):e0235017. doi: 10.1371/journal.pone.0235017

Machine learning prediction of combat basic training injury from 3D body shape images

Steven Morse 1, Kevin Talty 1, Patrick Kuiper 1, Michael Scioletti 1, Steven B Heymsfield 2, Richard L Atkinson 3, Diana M Thomas 1,*
Editor: Ulas Bagci4
PMCID: PMC7326186  PMID: 32603356

Abstract

Introduction

Athletes and military personnel are both at risk of disabling injuries due to extreme physical activity. A method to predict which individuals might be more susceptible to injury would be valuable, especially in the military where basic recruits may be discharged from service due to injury. We postulate that certain body characteristics may be used to predict risk of injury with physical activity.

Methods

US Army basic training recruits between the ages of 17 and 21 (N = 17,680, 28% female) were scanned for uniform fitting using the 3D body imaging scanner, Human Solutions of North America at Fort Jackson, SC. From the 3D body imaging scans, a database consisting of 161 anthropometric measurements per basic training recruit was used to predict the probability of discharge from the US Army due to injury. Predictions were made using logistic regression, random forest, and artificial neural network (ANN) models. Model comparison was done using the area under the curve (AUC) of a ROC curve.

Results

The ANN model outperformed two other models, (ANN, AUC = 0.70, [0.68,0.72], logistic regression AUC = 0.67, [0.62,0.72], random forest AUC = 0.65, [0.61,0.70]).

Conclusions

Body shape profiles generated from a three-dimensional body scanning imaging in military personnel predicted dischargeable physical injury. The ANN model can be programmed into the scanner to deliver instantaneous predictions of risk, which may provide an opportunity to intervene to prevent injury.

Introduction

The United States Army (US Army) anticipates basic training recruits (BTRs) will be injured during training. Most of these injuries will heal with rest, but some are more severe, leading to expensive treatments and medical separation from the US Army [1]. Injuries, such as stress fractures [2, 3], can result in discharges from the US Army basic combat training. Femoral neck injuries are also observed during basic training and can often result in discharge [4].

These injuries, some of which are situational and others of which are stress-related (or “overuse injuries” [5]), have high medical costs and can impact future quality of life for the BTR. Additionally, under the US Army’s medical policy, the US Army may be financially responsible for the injured BTR’s long-term care. Thus identifying BTRs at increased risk for dischargeable injury prior to training is critical [68].

Low physical fitness, low and high body mass index (BMI), and anthropometry are some of the well-established predictors of Army related injuries [6, 914]. While, for example, categories defined by BMI of injury risk would render too many false positives identified by high or low BMI to be a feasible screening mechanism. Screening BTRs with more manually obtained measurements is not currently feasible, in part due to the added burdens of measuring predictive model input variables and then delivering model predictions quickly and efficiently for the high volume of BTRs that continuously arrive at basic training sites like Fort Jackson, SC.

Recently, for the purpose of personalized uniform fitting, Fort Jackson adopted a 3D whole body scanner that provides 161 anthropometric measurements [1517] of the body. The device images each individual recruit within seconds using laser technology and multiple cameras. The raw image data is automatically transformed to reproducible and highly precise body posture, length, and circumference measurements. In this study we utilized Fort Jackson 3D body image data which included records of individuals who eventually were separated from the service due to injury to develop machine learning models that identify body shape characteristics that correlated with injury. The best of these models was programmed into the body scanner which allows for automatic deployment of risk predictions during uniform sizing without additional burden to the current process With the advancement of 3D body image scanning technology, our approach can be extended to other sports that observe similar injuries.

Methods

Study oversight and ethics

The models were developed as a secondary analysis of a large de-identified dataset. The protocol for this study was determined as not constituting human subjects research by the United States Military Academy Institutional Review Board.

Study design

We tested the hypothesis that BTRs who are discharged due to injury can be identified from body shape measurements obtained from a 3D body image scanner. We developed logistic regression, random forest and artificial neural network models [18] that use 161 different individual anthropometric measurements as model inputs and output a probability of sustaining a dischargable injury.

For all model development and analysis, we used the Python programming language (Python Software Foundation. Python Language Reference, version 2.7, https://www.python.org/).

Human Solutions 3D body image scans

Three-dimensional scans were performed on BTRs using the 3D body imaging scanner, Human Solutions of North America (Mooresville, NC) Vitus Smart XXL 3D. Each scan took less than 20 seconds to capture a body surface image from which the machine’s software automatically delivers 161 different measurements. Previous research has demonstrated an increased accuracy of these automated results in comparison with manually obtained measurements [15, 17].

Participants

US Army BTRs (N = 17,680, 27.5% female) were scanned for uniform fitting using the 3D body imaging scanner, Human Solutions of North America (Mooresville, NC) Vitus Smart XXL 3D body scanner at Fort Jackson, SC from February 1, 2017 to October 27, 2017.

The data from these scans was paired to the discharge status of each individual BTRs that sustained injuries during the 11-week basic training program (N = 147). This consolidated dataset was de-identified and data on the nature or location of the injury was not accessible for this study.

Fort Jackson currently trains 50% of all individuals entering the US Army’s basic training and 60% of all females entering the US Army each year [19]. Although age was not available in the database; basic training recruits are generally between the ages of 17 to 21 years. Individual race/ethnicity information was also not available, however, a 2010 report on overall basic training recruit demographics published total recruit percentages of 63.6% white, 18.9% black, 4.8% Asian, 0.8% Native American, and 11.9% Hispanic [20].

The protocol for this study was determined as not constituting human subjects research by the United States Military Academy Institutional Review Board (#18–020).

Data preparation

BTRs were provided protocols for scanning, however, recruits were not supervised for correct position while inside the scanner. Therefore, some records were missing measurements or contained physiologically implausible measurements. Images of records that had missing measurements or physiologically implausible measurements confirmed that the BTR within the scanner did not follow proper positioning protocol. We removed any records containing five or more missing measurements (2,481 records) to account for this measurement error. To account for implausible measurements, we removed any records with paired measurements on the left and right side of the body (e.g. left and right leg) differing by more than 2 standard deviations (2,214 records).

The final reference database contained 12,985 (25.1% female) records with 97 of these participants sustaining an injury that resulted in medical separation (0.7%). A workflow diagram similar to the cross-industry standard process for data mining protocol [21] describing the data preparation process appears in Fig 1.

Fig 1. Work-flow diagram describing the data preparation to model evaluation process.

Fig 1

Balancing the injured and non-injured data

The ratio of individuals that were injury free to individuals that sustained a medical separation injury in our dataset was approximately 133:1. With such an imbalanced ratio of negative to positive outcomes, a constant model that predicts zero injury would be over 99% accurate. In order to derive meaningful models that predict which individuals are at higher risk for medical separation due to injury we need to (1) amplify the signal from the injured class in order for our models to learn the relevant features, and (2) use measures other than accuracy for evaluating out-of-sample performance.

To address the first concern, we oversampled with replacement of the injured cohort in the training dataset until the ratio was 1:1. This enhances the training process to learn features associated with the injured cohort observations. The employed balancing process is equivalent, under certain conditions, to setting different penalties on misclassification of the injured vs. non-injured cohort, sometimes termed class weighting. We chose to apply the oversampled data for training all models because class weighting in a neural network trained with stochastic gradient descent can affect the effective learning rate [22]. To address the second concern, evaluating out-of-sample performance, we used a ROC curve, discussed later.

Feature engineering

We explored several techniques for reducing the dimensionality and collinearity of the dataset and adding meaningful structure. We first reduced the dimension and collinearity of the data by replacing each paired body measurement (e.g. left and right arm length) with its average value. We then performed a k-means clustering [18] of the entire dataset, retaining each record’s cluster assignment as an additional explanatory variable using what is referred to as a one-hot encoding scheme [22]. Each observation’s assigned cluster membership was included as a feature in the predictive models.

We also evaluated the use of Principal Component Analysis [18, 22] to reduce the dimensionality of our data prior to model training, which we expected to further improve the quality of our models due to the large amounts of collinearity in the data, although we ultimately discarded this approach, as described in Results.

Predictive models

We evaluated three machine learning predictive models: logistic regression, random forests, and neural networks [18]. We used each model’s out-of-sample prediction performance on injuries resulting in medical discharge as a basis for model comparison. Specifically, we trained and tested each model using a stratified k-fold cross-validation scheme, described in more detail below, and evaluated performance using each model’s mean area under the curve (AUC) of a receiver operating characteristic (ROC) curve. We selected ROC curves to evaluate performance instead of precision-recall curves because they do not rely on the assumption that the out of sample baseline probabilities will be the same as in sample probabilities.

Model selection and comparison

There are two sets of parameters determined during the training process: tuning parameters (sometimes termed hyperparameters [22]) which are set before training, and all other parameters (sometimes termed weights or coefficients [22]) learned during training. For example, the logistic regression model consists of a single hyperparameter controlling regularization penalty of model complexity (in our case the L2-norm of the model parameters) and the regression coefficients, one for each feature, including a bias term. We refer to the selection of hyperparameters as model selection, and the evaluation of different models’ out-of-sample predictive performance as model comparison.

Because of the scarcity of injured outcomes, we did not reserve an independent test data set for model evaluation. We instead used average cross-validation scores to measure out-of-sample prediction accuracy for both model selection and comparison.

For model selection, we used randomized search over a grid of possible hyperparameters, with stratified 3-fold cross-validation at each iteration. For each model we began by creating a grid of possible hyperparameters, then iteratively selecting from this grid at random. At each iteration, we evaluated the average out-of-sample performance of the model using the current set of hyperparameters using k-fold cross-validation. Specifically, the entire dataset was partitioned randomly into three sets, or folds [22], each containing an approximately equal number of injured individuals. We oversampled (with replacement) the injured records in each fold to create a 1:1 ratio of positive and negative outcomes (“stratification”). We used a small number of folds r to retain a reasonable number of injured records per fold and minimize the erratic performance that would result from oversampling 5–10 records hundreds of times. Two of the folds were then used to train the model, with the remaining fold used to test the model, and this process was repeated three times. The ROC AUC was retained for each test fold, along with the confusion matrix corresponding to the threshold closest to the optimal outcome of zero false positives and a true positive rate of one. We repeated this 3-fold stratified cross-validation procedure twice for each set of hyperparameters, and computed the average ROC AUC over all six runs. After completing 100 iterations of this procedure, we selected the set of hyperparameters with the best average validation score.

For model comparison, we simply compared each model’s average AUC under the cross-validation scheme outlined above, using the optimal set of hyperparameters. We did not take into account any qualitative or quantitative aspects of the models apart from this predictive ability out-of-sample. As a baseline model, we used BMI and gender in a logistic regression model, using the same model selection and comparison methodology as above.

We used different but standard approaches to investigate feature importance in each model. For logistic regression, we examined the standardized regression coefficients of each feature. For the random forests, we compared the “variable importance” resulting from a comparison of the number of decision trees in which the variable appears, normalized by the associated node impurity decrease. For the neural network, we compared the normalized weights of the input layer.

Results

Participants

Age and race were not available in the database, however, the majority of BTRs are between the ages of 17–22 years old. Participant characteristics in the injured and non-injured classes appear in Table 1. Reported are the breakdowns in male and female cohorts in the original dataset and reference data used for analysis with observations removed for cases with more than five missing measurements or 3 implausible measurements (i.e. left/right measurements differing by more than 2 standard deviations). In both male and female cohort, those discharged from the service were slightly heavier than those who were not.

Table 1. Description of participant characteristics.

The characteristics are provided from the original dataset obtained from the scanner, the reduced dataset after eliminating observations that included more than five missing measurements and the final reference dataset after removing observations with three or more implausible measurements.

Data N (%Female) Injured BMI (kg/m2)
Original 17,680 (27.5%) Males: 147 Males: 25.08 ± 3.82
Females: 74 Females: 23.73 ± 2.93
Injured: 25.59 ± 3.99
< 5 missing measurements 15,199 (25.4%) Males: 125 Males: 25.12 ± 3.83
Females: 63 Females: 23.65 ± 2.96
Injured: 25.62 ± 3.97
< 5 missing measurements &< 3 implausible measurements 12,985 (25.1%) Males: 97 Males: 25.46 ± 3.70
Females: 23.68 ± 2.80
Females: 51 Injured: 24.31 ± 3.58

Data is reported as mean ± SD.

Model results

Feature engineering

The dimension reduction scheme for averaging paired measurements reduced the total features from 161 to 126 consisting of 125 body measurements and gender. We retained 10 clusters by directly inspecting how much of the cumulative variance is explained by addition of each cluster.

Use of principal components for feature extraction resulted in reduced out-of-sample scores in all models, so we ultimately discarded this in favor of the simpler related approach of averaging paired measurements.

Predictive models

The AUC for all final models were above 0.50 (Table 2) with logistic regression AUC = 0.67 [+/- 0.06], random forest AUC = 0.65 [+/- 0.05] and neural network AUC = 0.70 [+/- 0.02] (Fig 2). The neural network outperformed the other models and yielded a smaller variance in AUC (Fig 2D). All models outperformed a baseline model using only BMI and gender, which achieved an AUC = 0.61 [+/- 0.05].

Table 2. Model AUC, 95% confidence interval and influential variables.
Model AUC 95% CI Highest weighted model variables
Logistic Regression 0.67 [0.62, 0.72] Head circumference, torso length
Random Forest 0.65 [0.61, 0.70] Leg length, Torso length, ankle circumference
Neural Network 0.70 [0.68, 0.72] Torso length

Influential variables identified using standardized coefficients (logistic regression), node impurity decrease (Random Forest), or mean first layer absolute weight (neural network).

Fig 2.

Fig 2

ROC curves for logistic regression (Panel A), random forest (Panel B) and neural network (Panel C) models over repeated, stratified, 3-fold cross-validation. Solid curve represents the mean ROC curve. AUC for each model were 0.67 ± 0.05, 0.65 ± 0.04, and 0.70 ± 0.02, respectively. Panel D is a plot of the AUC for each model ± 95% confidence interval.

For the neural network, we also report the confusion matrix resulting from summing across all three folds for one iteration of cross-validation (see S1 Table). The overall true positive rate is 69% and false positive rate is 35%, using an average threshold of 0.45.

The hyperparameters and model architectures selected through cross-validation are as follows. For logistic regression, we selected an L2-penalty with strong regularization. For the random forest, we selected 30 base estimators with a maximum of 12 features in use per tree. For the neural network, we selected one hidden layer of 12 neurons, with tanh activation, and strong L2-regularization.

A summary of variables which were most important to each model’s predictions, as described in Methods, is given in Table 2. We note torso length appears in all three models, all models appear to rely on non-standard measurements available through the laser scanner, but leave further interpretation for Discussion.

Discussion

Here, we for the first time utilize body measurements obtained by a 3D body image scanner to predict injuries of BTRs that result in discharge from the US Army during basic combat training. The model correctly classified Soldiers at risk for dischargeable injuries with a true positive rate of 69% and incorrectly classified Soldiers at a false positive rate of 35% (S1 Table). The algorithm was programmed into the 3D body image scanner and can be used at Fort Jackson to identify BTRs at risk in real time during the 20 second scan for uniform fitting offering an opportunity to identify Soldiers at risk for discharge due to injury.

Our work builds upon existing studies that identified risk factors for injury such as gender, low fitness prior to entering basic training, and high or low BMI [2326]. The findings presented here extend these results by leveraging the predictive accuracy of machine learning techniques and by using improved anthropometric measurements that can be accessed during a 20-second body scan. These advancements lead to personalized predictions that do not require additional measures than what are already being routinely conducted at the site.These findings also strongly suggest that machine learning models using results from a 3D body image scanning could be used to predict injuries in other sports [27, 28] or clinical health outcomes [29].

Model interpretation

The complexity of machine learning models often enables greater predictive ability, but it also increases the difficulty of interpreting the model itself; for example, quantifying the importance of different input features to the model as in Table 2. Logistic regression is a classic technique with well-accepted measures for quantifying the idea of variable importance in a rigorous way. Random forests and neural networks have more inscrutable inner structure, making interpretation more difficult. Although we present estimates of each model’s influential variables using standard techniques in the field, these calculations by no means imply that only these variables are important, that they are important in the same way, or that we can draw any biomechnical conclusions from them.

Study limitations

While this study predicted all injuries that resulted in discharge from basic training, data on injury site or type of injury (e.g. situational vs. stress-related) was not available for analysis. With this type of additional information, the physiological mechanisms underlying the injury could be explored further. We expect the correlation between body measurements and injury is greater in stress-related cases, and therefore limiting the data to this subset would increase the predictive ability of the models to detect stress-related injuries.

A potential explanation for our findings could be that a BTR arrives at basic training with a subclinical pre-existing injury that only presents itself during physical activity, which we were unable to investigate due to limited information; however, including 3D body scanned images into the analysis performed in the epidemiology study conducted at the request of Fort Jackson by the U.S. Army Center for Health Promotion and Preventive Medicine [6] could potentially reveal this mechanistic insight.

Another study limitation involves concerns with the measurement records of BTRs that were not positioned correctly in the scanner. We have access to a second database of Human Solutions measurements collected at the Lackland Air Force Base. The measurements from Lackland Air Force Base did not include concomitant outcomes on injury and therefore could not be applied for modeling injury from body shape data. However, the scans at the Lackland Air Force Base were obtained under supervision, and in the 64,000 scans performed at Lackland, there was not a single record containing missing or implausible measurements. This discrepency suggests that to preserve scan quality, the scanner protocols need to be carefully explained and that individuals should be supervised during their scan.

Finally, we relied on the pre-programmed anthropometric measurements integrated in Human Solutions. For future work, the raw data imaged by Human Solutions could be applied to develop additional body site measurements specific to stress-related injuries as model inputs.

Our findings can also be triangulated with earlier results that identified risk factors [3]. These earlier results identified BTRs with low or high BMI who smoked, and were female, were at higher risk for injury. Using a combined classification approach that leverages the earlier results with the body scan derived model stands to improve predictive accuracy beyond what we have found here. The model presented here and a classification algorithm could be programmed directly into the scanner to identify BTRs at risk automatically while they are being scaanned for uniform sizing. Those flagged at risk can be referred to the base athletic trainers who could then tailor training protocols to build strength and stamina in such BTRs.

Conclusions

Artificial intelligence models that predict outcomes integrated with new technologies like the 3D body scanner provide new scalable and efficient opportunities to minimize injuries. Commanding officers now have a tool that will help them increase the readiness of their units.

Supporting information

S1 Table. Confusion matrix for the neural network model, with true positive rate (TPR) is 69.3%, and the false positive rate (FPR) is 35.2%.

This is based on average across all cross-validation test folds using threshold yielding optimal TPR and FPR, and note this is slightly different than the TPR and FPR reported in the paper which is based on the optimal TPR/FPR of the average ROC curves.

(DOCX)

Acknowledgments

The authors have no conflicts of interest to declare. The authors have not received funding for this work. DMT and MS developed this study. SM designed the models, performed the analysis, and wrote the first draft of the study. KT, PK and MS performed additional data analysis. SBH and RLA reviewed the analysis and wrote portions of the manuscript. All authors reviewed and revised multiple manuscript drafts.

We would like to thank LTG (ret) Mark Hertling for bringing our attention to this problem. We would also like to thank Robert Bona from Human Solutions, LTC Jason Pieri, MAJ Brian Kriesel, and the Fort Jackson team for their support and assistance in compiling the data. We also appreciate the feedback given by an anonymous reviewer, which greatly improved the manuscript. The views expressed in this work are those of the authors and do not reflect the official policy or position of the United States Military Academy, Department of the Army, or the Department of Defense. The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by ACSM.

Data Availability

The data needs to be requested and authorized by release from the United States Army. Requests can be made to the Commanding General at Fort Jackson. Fort Jackson has a contact email which will lead to the CG: usarmy.jackson.93-sig-bde.mbx.atzj-pao@mail.mil Author Diana Thomas will serve as an additional point of contact to facilitate data access. You can reach Dr. Thomas at diana.thomas@westpoint.edu.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Swedler DI, Knapik JJ, Williams KW, Grier TL, Jones BH. Risk factors for medical discharge from United States Army Basic Combat Training. Mil Med. 2011;176(10):1104–10. Epub 2011/12/02. 10.7205/milmed-d-10-00451 . [DOI] [PubMed] [Google Scholar]
  • 2.Jones BH, Cowan DN, Knapik JJ. Exercise, training and injuries. Sports Med. 1994;18(3):202–14. Epub 1994/09/01. 10.2165/00007256-199418030-00005 . [DOI] [PubMed] [Google Scholar]
  • 3.Knapik JJ, Canham-Chervak M, Hauret K, Hoedebecke E, Laurin MJ, Cuthie J. Discharges during U.S. Army basic training: injury rates and risk factors. Mil Med. 2001;166(7):641–7. Epub 2001/07/27. . [PubMed] [Google Scholar]
  • 4.Rohena-Quinquilla IR, Rohena-Quinquilla FJ, Scully WF, Evanson JRL. Femoral Neck Stress Injuries: Analysis of 156 Cases in a U.S. Military Population and Proposal of a New MRI Classification System. AJR Am J Roentgenol. 2018;210(3):601–7. Epub 2018/01/18. 10.2214/AJR.17.18639 . [DOI] [PubMed] [Google Scholar]
  • 5.Scott SJ, Feltwell DN, Knapik JJ, Barkley CB, Hauret KG, Bullock SH, et al. A multiple intervention strategy for reducing femoral neck stress injuries and other serious overuse injuries in U.S. Army Basic Combat Training. Mil Med. 2012;177(9):1081–9. Epub 2012/10/03. 10.7205/milmed-d-12-00085 . [DOI] [PubMed] [Google Scholar]
  • 6.Knapik JJ, Cuthie J, Canham M, Hewitson W, Laurin M, Nee M, et al. Injury incidence, injury risk factors, and physical fitness of U.S. Army basic trainees at FT Jackson, South Carolina. Aberdeen Proving Ground, MD: U.S. Army Center for Health Promotion and Preventive Medicine; 1997. [Google Scholar]
  • 7.Knapik JJ. Tools to Assess and Reduce Injury Risk (Part 2). J Spec Oper Med. 17(4):104–8. Epub 2017/12/20. . [DOI] [PubMed] [Google Scholar]
  • 8.Knapik JJ. Tools to Assess and Reduce Injury Risk (Part 1). J Spec Oper Med. 17(3):116–9. Epub 2017/09/15. . [DOI] [PubMed] [Google Scholar]
  • 9.Jones BH, Hauret KG, Dye SK, Hauschild VD, Rossi SP, Richardson MD, et al. Impact of physical fitness and body composition on injury risk among active young adults: A study of Army trainees. J Sci Med Sport. 2017;20 Suppl 4:S17–S22. Epub 2017/10/11. 10.1016/j.jsams.2017.09.015 . [DOI] [PubMed] [Google Scholar]
  • 10.Harwood GE, Rayson MP, Nevill AM. Fitness, performance, and risk of injury in British Army officer cadets. Mil Med. 1999;164(6):428–34. Epub 1999/06/23. . [PubMed] [Google Scholar]
  • 11.Marriott BM, Grumstrup-Scott J. Body Composition and Physical Performance: Applications For the Military Services. Washington (DC)1990. [PubMed] [Google Scholar]
  • 12.Knapik JJ, Reynolds K, Hoedebecke KL. Stress Fractures: Etiology, Epidemiology, Diagnosis, Treatment, and Prevention. J Spec Oper Med. 17(2):120–30. Epub 2017/06/10. . [DOI] [PubMed] [Google Scholar]
  • 13.Knapik J, Reynolds K, Staab J, Vogel JA, Jones B. Injuries associated with strenuous road marching. Mil Med. 1992;157(2):64–7. Epub 1992/02/01. . [PubMed] [Google Scholar]
  • 14.DeFroda SF, Cameron KL, Posner M, Kriz PK, Owens BD. Bone Stress Injuries in the Military: Diagnosis, Management, and Prevention. Am J Orthop (Belle Mead NJ). 2017;46(4):176–83. Epub 2017/09/01. . [PubMed] [Google Scholar]
  • 15.Soileau L, Bautista D, Johnson C, Gao C, Zhang K, Li X, et al. Automated anthropometric phenotyping with novel Kinect-based three-dimensional imaging method: comparison with a reference laser imaging system. Eur J Clin Nutr. 2016;70(4):475–81. Epub 2015/09/17. 10.1038/ejcn.2015.132 . [DOI] [PubMed] [Google Scholar]
  • 16.Koepke N, Zwahlen M, Wells JC, Bender N, Henneberg M, Ruhli FJ, et al. Comparison of 3D laser-based photonic scans and manual anthropometric measurements of body size and shape in a validation study of 123 young Swiss men. PeerJ. 2017;5:e2980 Epub 2017/03/16. 10.7717/peerj.2980 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kuehnapfel A, Ahnert P, Loeffler M, Broda A, Scholz M. Reliability of 3D laser-based anthropometry and comparison with classical anthropometry. Sci Rep. 2016;6:26672 Epub 2016/05/27. 10.1038/srep26672 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.DeGregory KW, Kuiper P, DeSilvio T, Pleuss JD, Miller R, Roginski JW, et al. A review of machine learning in obesity. Obes Rev. 2018;19(5):668–85. Epub 2018/02/10. 10.1111/obr.12667 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.US Army Training Center, Fort Jackson. http://jackson.armylive.dodlive.mil/about/.
  • 20.Military Recruitment 2010: National Priorities Projet; 2011 [June 30, 2019]. https://www.nationalpriorities.org/analysis/2011/military-recruitment-2010/.
  • 21.Rivo E, de la Fuente J, Rivo A, Garcia-Fontan E, Canizares MA, Gil P. Cross-industry standard process for data mining is applicable to the lung cancer surgery domain, improving decision making as well as knowledge and quality management. Clin Transl Oncol. 2012;14(1):73–9. Epub 2012/01/21. 10.1007/s12094-012-0764-8 . [DOI] [PubMed] [Google Scholar]
  • 22.Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006. xx, 738 p. p. [Google Scholar]
  • 23.Hauret KG, Knapik JJ, Lange JL, Heckel HA, Coval DL, Duplessis DH. Outcomes of Fort Jackson’s Physical Training and Rehabilitation Program in army basic combat training: return to training, graduation, and 2-year retention. Mil Med. 2004;169(7):562–7. Epub 2004/08/05. 10.7205/milmed.169.7.562 . [DOI] [PubMed] [Google Scholar]
  • 24.Knapik J, Montain SJ, McGraw S, Grier T, Ely M, Jones BH. Stress fracture risk factors in basic combat training. Int J Sports Med. 2012;33(11):940–6. Epub 2012/07/24. 10.1055/s-0032-1311583 . [DOI] [PubMed] [Google Scholar]
  • 25.Knapik JJ, Canham-Chervak M, Hauret K, Laurin MJ, Hoedebecke E, Craig S, et al. Seasonal variations in injury rates during US Army Basic Combat Training. Ann Occup Hyg. 2002;46(1):15–23. Epub 2002/05/15. 10.1093/annhyg/mef013 . [DOI] [PubMed] [Google Scholar]
  • 26.Knapik JJ, Sharp MA, Canham-Chervak M, Hauret K, Patton JF, Jones BH. Risk factors for training-related injuries among men and women in basic combat training. Med Sci Sports Exerc. 2001;33(6):946–54. Epub 2001/06/19. 10.1097/00005768-200106000-00014 . [DOI] [PubMed] [Google Scholar]
  • 27.Wilder RP, Sethi S. Overuse injuries: tendinopathies, stress fractures, compartment syndrome, and shin splints. Clin Sports Med. 2004;23(1):55–81, vi. Epub 2004/04/06. 10.1016/S0278-5919(03)00085-1 . [DOI] [PubMed] [Google Scholar]
  • 28.Micheo W. Musculoskeletal, sports, and occupational medicine. New York: Demos Medical; 2011. xxiii, 272 p. p. [Google Scholar]
  • 29.Loffler-Wirth H, Willscher E, Ahnert P, Wirkner K, Engel C, Loeffler M, et al. Novel Anthropometry Based on 3D-Bodyscans Applied to a Large Population Based Cohort. PLoS One. 2016;11(7):e0159887 Epub 2016/07/29. 10.1371/journal.pone.0159887 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Ulas Bagci

1 May 2020

PONE-D-20-08676

Machine learning prediction of combat basic training injury from 3D body shape images

PLOS ONE

Dear Dr. Thomas,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Jun 15 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Ulas Bagci, Ph.D.

Academic Editor

PLOS ONE

Additonal Editor Comments:

The paper has some merits, and reviewers have consensus on this.

There are, however several concerns as well regarding the study design and specific research questions.

I recommend authors to prepare a response letter to those questions with a revised manuscript for further consideration.

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. Please ensure that you refer to Figure 3 in your text as, if accepted, production will need this reference to link the reader to the figure.

4. Please include a caption for figure 3.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: * Very cool motivation for the study. The data is preexisting from 3D body scans for uniform fitting so there is not additional cost (from a real-world application point of view) to employ such a method. Overall I like this study but there are a few areas of missing details and baselines which should be addressed before acceptance.

* The authors make a good point about the applicability of body measurements possibly having a high correlation with "overuse injuries", and BMI is known to be a poor metric. However, there is likely little to no correlation with other sports and military injuries such as the authors mentioned femoral neck injuries. In hockey and likely other areas these injuries are highly situational and not about being "out of shape". It would be wise for the authors to separate any sort of repetitive stress injuries from these more situational injuries, the correlation will likely be much higher, but it seems such data was not available to the authors.

* "...total recruit percentages of 18.9% black, 4.8% Asian, 0.8% Native American, and 11.9% Hispanic (22)" This doesn't add to 100%. Please update.

* The authors state that 97 subjects had injuries resulting in separation, but the confusion matrix only shows 33 injuries?

* The description of the ANN model used is not included. Is it a MLP? How many layers?

* A correlation with BMI and injuries would be a nice baseline to justify these 3D scanners are superior to a basic measurement like that. The reviewer thinks such a comparison is pretty crucial to be added. It seems this work was done in previous studies and currently no comparison with previous work is provided. This would provide a nice connection with previous studies.

* With such limited data, why did the authors do 3-fold cross validation? Something like 10-fold is more common. It gives more training data. The authors don't need to redo all the experiments, but it might lead to superior performance (more training data + more powerful ANN can be used to get better performance).

* Possible suggestions on future work: 1) Don't rely on the identified body measurements, directly use the 3D scans and methods from the computer vision community to possibly identify better features than those which are optimal for uniform measurements. 2) The false positive rate is a big concern for applicability. The authors should think of ways to address this to have any chance of real-world application.

Reviewer #2: The authors propose to predict the risk probability of injury due to basic combat training. The authors use the 3D body shape images captured by a device to extract 161 features to describe the subject. Then reduce feature dimension to 126 by averaging and then clustering them. To model these features, the authors use logistic regression, random forest, and a neural network. The NN performs better than the other modeling methods with AUC 0.70. This is an application paper.

Questions and Comments:

Q1

Line 27-30

What is the basis for postulating that certain body characteristics (as captured in the 3D body shape images) can be used to predict the risk of injury due to physical activity?

Q2

Line 40-42

How can you be certain that this is due to the correlation? Do you have features extracted from the 3D body shape images captured before the physical activity to compare?

Q3

Line 48-50

What purpose does it serve? This is a technical paper.

Q4

Line 176

Explain the choice of the hyperparameters used in logistic regression and random forest. The authors note that NN performs better. However, there is no architecture or NN design choices. Please provide.

Q5

Line 213 - 218

Did you apply both averaging and k-means clustering to reduce the feature dimension from 161 to 125 or just used averaging? Please explain clearly how did you get the final features.

Q6

Please provide an ablation study on the NN choice. I suggest the authors use transfer learning to improve performance.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jun 30;15(6):e0235017. doi: 10.1371/journal.pone.0235017.r002

Author response to Decision Letter 0


13 May 2020

PONE-D-20-08676

Machine learning prediction of combat basic training injury from 3D body shape images

Response to Reviewers

When submitting your revision, we need you to address these additional requirements.

Editor Comments:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Response: We have renamed our files as indicated in the URLs and revised the title page as directed.

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

Response: The data sharing needs to be approved by the Fort Jackson leadership and this requires a memo to be sent to the Chief of Staff of the base. These are Army requirements. We will help anyone who is interested in accessing the data. The Chief of Staff changes every 2-3 years and the contact details will change but in the Data Availability statement, interested parties can be directed to contact the current Chief of Staff at Fort Jackson to receive command approval.

3. Please ensure that you refer to Figure 3 in your text as, if accepted, production will need this reference to link the reader to the figure.

Response: There are only two Figures. This was an oversight on our part.

4. Please include a caption for figure 3.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

Reviewer #1:

Comment 1: Very cool motivation for the study. The data is preexisting from 3D body scans for uniform fitting so there is not additional cost (from a real-world application point of view) to employ such a method. Overall I like this study but there are a few areas of missing details and baselines which should be addressed before acceptance.

Response: We thank the reviewer for their comment. We also found this project interesting and enjoyable.

Comment 2: The authors make a good point about the applicability of body measurements possibly having a high correlation with "overuse injuries", and BMI is known to be a poor metric. However, there is likely little to no correlation with other sports and military injuries such as the authors mentioned femoral neck injuries. In hockey and likely other areas these injuries are highly situational and not about being "out of shape". It would be wise for the authors to separate any sort of repetitive stress injuries from these more situational injuries, the correlation will likely be much higher, but it seems such data was not available to the authors.

Response: The reviewer is correct, our intent was to demonstrate the predictive ability of body measurements for injury in a population of relatively sedentary individuals, and we agree it is more reasonable to expect correlation with stress-related injuries, not situational injury. Moreover, because the recruits are not likely elite athletes, injury in hockey players or other elite athletes are very different. Unfortunately, our data did not include on which injuries were overuse injuries and which were situational. We clarified this in the introduction by removing suggestions that our work will be transferrable to other sports.

Revised Paragraphs in the Introduction:

The United States Army (US Army) anticipates basic training recruits (BTRs) will be injured during training. Most of these injuries will heal with rest, but some are more severe, leading to expensive treatments and medical separation from the US Army (1). Injuries, such as stress fractures (2, 3), can result in discharges from the US Army basic combat training. Femoral neck injuries are also observed during basic training and can often result in discharge (4).

These injuries, some of which are situational and others of which are stress-related (or “overuse injuries”(5)), have high medical costs and can impact future quality of life for the BTR. Additionally, under the US Army’s medical policy, the US Army may be financially responsible for the injured BTR’s long-term care. Thus identifying BTRs at increased risk for dischargeable injury prior to training is critical (6-8).

Revised Paragraph in the Discussion:

Study Limitations

While this study predicted all injuries that resulted in discharge from basic training, data on injury site or type of injury (e.g. situational vs. stress-related) was not available for analysis. With this type of additional information, the physiological mechanisms underlying the injury could be explored further. In particular, we expect the correlation between body measurements and injury is greater in stress-related cases, and therefore limiting the data to this subset would increase the predictive ability of the models to detect stress-related injuries.

Comment 3: "...total recruit percentages of 18.9% black, 4.8% Asian, 0.8% Native American, and 11.9% Hispanic (22)" This doesn't add to 100%. Please update.

Response: The remaining 63.6% of recruits are white, we have updated the paper to reflect this and the revision appears below for the reviewer’s convenience:

Individual race/ethnicity information was also not available, however, a 2010 report on overall basic training recruit demographics published total recruit percentages of 63.6% white, 18.9% black, 4.8% Asian, 0.8% Native American, and 11.9% Hispanic (22).

Comment 4: The authors state that 97 subjects had injuries resulting in separation, but the confusion matrix only shows 33 injuries?

Response: The confusion matrix only depicted the result in one test fold – we agree this is not the full picture, and have amended to show the total results over all test folds.

The amended confusion matrix and caption appears below:

Table S1. Confusion matrix for the neural network model, with true positive rate (TPR) of 69.1% and false positive rate (FPR) of 34.9%. This is based on the totals across all cross-validation test folds, using threshold yielding optimal TPR and FPR.

Predicted

Non-injured Injured

Actual Non-injured 8,384 4,504

Injured 30 67

Comment 4: The description of the ANN model used is not included. Is it a MLP? How many layers?

Response: The ANN selected through cross-validation was a MLP with a single hidden layer of 12 neurons and tanh activation. We used the scikit-learn implementation of the ADAM algorithm for training. We have included this description in the manuscript which has been pasted below.

For the neural network, we selected one hidden layer of 12 neurons, with tanh activation, and strong L2-regularization.

Comment 5: A correlation with BMI and injuries would be a nice baseline to justify these 3D scanners are superior to a basic measurement like that. The reviewer thinks such a comparison is pretty crucial to be added. It seems this work was done in previous studies and currently no comparison with previous work is provided. This would provide a nice connection with previous studies.

Response: We have included a more detailed explanation of motivation in the introduction. We were indeed motivated exactly by BMI studies. Our interest in this topic originated after hearing a presentation on BMI and injuries at Fort Jackson by LTG (ret) Mark Hertling. We we have also included the results of a baseline model using logistic regression with only gender and BMI in the manuscript and in Figure 2D.

The revision to the introduction appears below:

Low physical fitness, low and high body mass index (BMI), and anthropometry are some of the well-established predictors of Army related injuries (8, 11-16). While, for example, BMI defined categories demonstrated risk, the number of false positives identified by high or low BMI would be too high to be a feasible screening mechanism. Screening BTRs with more manually obtained measurements is not currently feasible, in part due to the added burdens of measuring predictive model input variables and then delivering model predictions quickly and efficiently for the high volume of BTRs that continuously arrive at basic training sites like Fort Jackson, SC.

The revision to the methods on the baseline BMI model comparison:

As a baseline model, we used BMI and gender in a logistic regression model, using the same model selection and comparison methodology as above.

The revision to the results on the BMI baseline model in comparison to the 3D body image data models:

Predictive models

The AUC for all final models were above 0.50 (Table 2) with logistic regression AUC = 0.67 [+/- 0.06], random forest AUC = 0.65 [+/- 0.05] and neural network AUC = 0.70 [+/- 0.02] (Figure 2). The neural network outperformed the other models and yielded a smaller variance in AUC (Figure 2 D). All models outperformed a baseline model using only BMI and gender, which achieved an AUC = 0.61 [+/- 0.05].

Figure 2D:

Comment 6: With such limited data, why did the authors do 3-fold cross validation? Something like 10-fold is more common. It gives more training data. The authors don't need to redo all the experiments, but it might lead to superior performance (more training data + more powerful ANN can be used to get better performance).

Response: We agree 5- or 10-fold cross-validation would be preferable, but we chose 3-fold due to the extreme sparseness of positive cases. We added wording to clarify this choice in the methods:

Revision to Methods:

For model selection, we used randomized search over a grid of possible hyperparameters, with stratified 3-fold cross-validation at each iteration. For each model we began by creating a grid of possible hyperparameters, then iteratively selecting from this grid at random. At each iteration, we evaluated the average out-of-sample performance of the model using the current set of hyperparameters using k-fold cross-validation. Specifically, the entire dataset was partitioned randomly into three sets, or folds (24), each containing an approximately equal number of injured individuals. We oversampled (with replacement) the injured records in each fold to create a 1:1 ratio of positive and negative outcomes (“stratification”). We used a small number of folds in order to retain a reasonable number of injured records per fold and minimize the erratic performance that would result from oversampling 5-10 records hundreds of times.

Comment 7: Possible suggestions on future work: 1) Don't rely on the identified body measurements, directly use the 3D scans and methods from the computer vision community to possibly identify better features than those which are optimal for uniform measurements. 2) The false positive rate is a big concern for applicability. The authors should think of ways to address this to have any chance of real-world application.

Response: We agree with the reviewer. Human Solutions is less agreeable for us to use the raw data, however, we have begun to work with Styku. This company shares all internal data with us and we already have done better in our prediction models using this data. Also the Human Solutions device is not portable.

Off the record, we programmed the model into Human Solutions and applied it to the next round of BTRs and the model identified only 11 BTRs at risk. Those BTRs were given stabilizing exercises and that round of training ended with no discharge injuries. We cannot publish this because the Army has not and will not likely approve this as a study, but the Battalion Commander was reluctant to sit on the information the model yielded. He felt he needed to protect his Soldier’s.

Since we cannot discuss this exercise in a publication, we provided it in the discussion as future work:

The model presented here and a classification algorithm could be programmed directly into the scanner to identify BTRs at risk automatically while they are being scaanned for uniform sizing. Those flagged at risk can be referred to the base athletic trainers who could then design specific preventative training protocols.

The suggestion by the reviewer to work with the raw data was included in the study limitations:

Finally, we relied on the pre-programmed anthropometric measurements integrated in Human Solutions. For future work, the raw data imaged by Human Solutions could be applied to develop additional body site measurements specific to stress-related injuries as model inputs.

Reviewer #2:

The authors propose to predict the risk probability of injury due to basic combat training. The authors use the 3D body shape images captured by a device to extract 161 features to describe the subject. Then reduce feature dimension to 126 by averaging and then clustering them. To model these features, the authors use logistic regression, random forest, and a neural network. The NN performs better than the other modeling methods with AUC 0.70. This is an application paper.

Questions and Comments:

Q1

Line 27-30

What is the basis for postulating that certain body characteristics (as captured in the 3D body shape images) can be used to predict the risk of injury due to physical activity?

Response: This is a great question and was also asked by the other reviewer. Our interest in this topic originated after hearing a presentation on BMI and injuries at Fort Jackson by LTG (ret) Mark Hertling. BMI is an anthropometric measurement and our team asked LTG (ret) Hertling if we could take a portable device to Fort Jackson to scan Soldiers. We were surprised to find out that Fort Jackson already had a scanner which is where this research took off.

We have now included a more detailed explanation of motivation in the introduction. We were indeed motivated exactly by BMI studies. As per the request of Reviewer 1, we closed the loop to include the results of a baseline model using logistic regression with only gender and BMI for comparison in the manuscript and in Figure 2D.

The revision to the introduction appears below:

Low physical fitness, low and high body mass index (BMI), and anthropometry are some of the well-established predictors of Army related injuries (8, 11-16). While, for example, BMI defined categories demonstrated risk, the number of false positives identified by high or low BMI would be too high to be a feasible screening mechanism. Screening BTRs with more manually obtained measurements is not currently feasible, in part due to the added burdens of measuring predictive model input variables and then delivering model predictions quickly and efficiently for the high volume of BTRs that continuously arrive at basic training sites like Fort Jackson, SC.

The revision to the methods on the baseline BMI model comparison:

As a baseline model, we used BMI and gender in a logistic regression model, using the same model selection and comparison methodology as above.

The revision to the results on the BMI baseline model in comparison to the 3D body image data models:

Predictive models

The AUC for all final models were above 0.50 (Table 2) with logistic regression AUC = 0.67 [+/- 0.06], random forest AUC = 0.65 [+/- 0.05] and neural network AUC = 0.70 [+/- 0.02] (Figure 2). The neural network outperformed the other models and yielded a smaller variance in AUC (Figure 2 D). All models outperformed a baseline model using only BMI and gender, which achieved an AUC = 0.61 [+/- 0.05].

Figure 2D:

Q2

Line 40-42

How can you be certain that this is due to the correlation? Do you have features extracted from the 3D body shape images captured before the physical activity to compare?

Response: Correlated is not that right word because we are not using statistical inference. A better word would be predicts. We have revised these lines in the manuscript and the revision appears below:

Body shape profiles generated from a three-dimensional body scanning imaging in military personnel predicted dischargeable physical injury. The ANN model can be programmed into the scanner to deliver instantaneous predictions of risk, which may provide an opportunity to intervene to prevent injury.

Q3

Line 48-50

What purpose does it serve? This is a technical paper.

Response: We agree and we have removed it.

Q4

Line 176

Explain the choice of the hyperparameters used in logistic regression and random forest. The authors note that NN performs better. However, there is no architecture or NN design choices. Please provide.

Response: Hyperparameters for all three models were chosen through stratified cross-validation. We have now included much more detailed explanations for the choices for the models. The revision appears below:

Model Selection and Comparison

There are two sets of parameters determined during the training process: tuning parameters (sometimes termed hyperparameters(24)) which are set before training, and all other parameters (sometimes termed weights or coefficients (24)) learned during training. For example, the logistic regression model consists of a single hyperparameter controlling regularization penalty of model complexity (in our case the L2-norm of the model parameters) and the regression coefficients, one for each feature, including a bias term. We refer to the selection of hyperparameters as model selection, and the evaluation of different models’ out-of-sample predictive performance as model comparison.

Because of the scarcity of injured outcomes, we did not reserve an independent test data set for model evaluation. We instead used average cross-validation scores to measure out-of-sample prediction accuracy for both model selection and comparison.

For model selection, we used randomized search over a grid of possible hyperparameters, with stratified 3-fold cross-validation at each iteration. For each model we began by creating a grid of possible hyperparameters, then iteratively selecting from this grid at random. At each iteration, we evaluated the average out-of-sample performance of the model using the current set of hyperparameters using k-fold cross-validation. Specifically, the entire dataset was partitioned randomly into three sets, or folds (24), each containing an approximately equal number of injured individuals. We oversampled (with replacement) the injured records in each fold to create a 1:1 ratio of positive and negative outcomes (“stratification”). We used a small number of folds in order to retain a reasonable number of injured records per fold and minimize the erratic performance that would result from oversampling 5-10 records hundreds of times. Two of the folds were then used to train the model, with the remaining fold used to test the model, and this process was repeated three times. The ROC AUC was retained for each test fold, along with the confusion matrix corresponding to the threshold closest to the optimal outcome of zero false positives and a true positive rate of one. We repeated this 3-fold stratified cross-validation procedure twice for each set of hyperparameters, and computed the average ROC AUC over all six runs. After completing 100 iterations of this procedure, we selected the set of hyperparameters with the best average validation score.

For model comparison, we simply compared each model’s average AUC under the cross-validation scheme outlined above, using the optimal set of hyperparameters. We did not take into account any qualitative or quantitative aspects of the models apart from this predictive ability out-of-sample.

Q5

Line 213 - 218

Did you apply both averaging and k-means clustering to reduce the feature dimension from 161 to 125 or just used averaging? Please explain clearly how did you get the final features.

Response: We first used averaging of paired measurements to reduce the feature dimension to 126 (including gender), then the result of k-means clustering (with k = 10 clusters) to increase the feature dimension to 136. We have clarified these portions of data preparation in the manuscript in two places in the methods. These paragraphs are pasted below for the reviewer’s convenience.

Data Preparation

BTRs were provided protocols for scanning, however, recruits were not supervised for correct position within the scanner. Therefore, some records were missing measurements or contained physiologically implausible measurements. Images of records that had missing measurements or physiologically implausible measurements confirmed that the position of BTR within the scanner did not follow proper protocol. To eliminate any record that did not follow scanner protocols, we removed any records containing five or more missing measurements (2,481 records). We also removed any records with paired measurements on the left and right side of the body (e.g. left and right leg) differing by more than 2 standard deviations (2,214 records).

The final reference database contained 12,985 (25.1% female) records with 97 participants sustaining an injury that resulted in medical separation (0.7%). A workflow diagram similar to the cross-industry standard process for data mining protocol (23) describing the data preparation process appears in Figure 1.

Feature Engineering

We explored several techniques for reducing the dimensionality and collinearity of the dataset and adding meaningful structure. We first reduced the dimension and collinearity of the data by replacing each paired body measurement (e.g. left and right arm length) with its average value. We then performed a k-means clustering(20) of the entire dataset, retaining each record’s cluster assignment as an additional explanatory variable using what is referred to as a one-hot encoding scheme (24). Each observation’s assigned cluster membership was included as a feature in the predictive models.

We also evaluated the use of Principal Component Analysis (20, 24) to reduce the dimensionality of our data prior to model training, which we expected to further improve the quality of our models due to the large amounts of collinearity in the data, although we ultimately discarded this approach, as described in Results.

Q6

Please provide an ablation study on the NN choice. I suggest the authors use transfer learning to improve performance.

Response: We thank the reviewer for this suggestion and agree that ablation studies are extremely important in deep learning architectures, but since our NN is quite small by comparison (single hidden layer of 12 neurons) and heavily regularized during training, we do not feel it would be a fruitful topic of investigation. Similarly, while we agree transfer learning is a powerful method to increase performance in certain contexts, we do not have access to an appropriate pre-training dataset in this study. It is possible pre-training on a simulated dataset, or on a different target, may aid in weight initialization or feature extraction, but we believe it is more appropriate to leave this to future wor

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Ulas Bagci

8 Jun 2020

Machine learning prediction of combat basic training injury from 3D body shape images

PONE-D-20-08676R1

Dear Dr. Thomas,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ulas Bagci, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

A successful rebuttal period,

and reviewers found the article to be an important application paper.

Please note that reviewers also mentioned that authors need to clearly mention in the article that the data will be available upon request with appropriate contact email/address.

Please proofread as well.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for addressing all of my concerns. Congratulations on your revisions and your great work.

Reviewer #2: Thank you for addressing my questions and comments. One final note on the data availability. The authors note that they will help "anyone" who is interested in accessing the data. And also, state that the data needs to be requested and "authorized" from the United States Army. I request the authors to address who can request, share, and for what purpose.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Ulas Bagci

17 Jun 2020

PONE-D-20-08676R1

Machine learning prediction of combat basic training injury from 3D body shape images

Dear Dr. Thomas:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ulas Bagci

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Confusion matrix for the neural network model, with true positive rate (TPR) is 69.3%, and the false positive rate (FPR) is 35.2%.

    This is based on average across all cross-validation test folds using threshold yielding optimal TPR and FPR, and note this is slightly different than the TPR and FPR reported in the paper which is based on the optimal TPR/FPR of the average ROC curves.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data needs to be requested and authorized by release from the United States Army. Requests can be made to the Commanding General at Fort Jackson. Fort Jackson has a contact email which will lead to the CG: usarmy.jackson.93-sig-bde.mbx.atzj-pao@mail.mil Author Diana Thomas will serve as an additional point of contact to facilitate data access. You can reach Dr. Thomas at diana.thomas@westpoint.edu.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES