Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2026 Apr 17;5(4):e0001369. doi: 10.1371/journal.pdig.0001369

Machine Learning-Based Pattern Recognition of Risk Factors for Low Back Pain among Adolescent Cricket Players in Dhaka City

Marzana Afrooj Ria 1,*, Tasrima Trisha Ratna 2, Shudeshna Chakraborttye Purba 3, Rubal Kar 4, Mohoshina Karim 5, Md Osman Ali 6, Erfat Jaren Chaity 7, Shahadath Hossen 6, Joynal Abedin Imran 1, Shahriar Hasan 8
Editor: Cleva Villanueva9
PMCID: PMC13089699  PMID: 41996343

Abstract

Low back pain (LBP) is common among adolescent cricketers, often due to repetitive lumbar stress. This study investigated LBP among 450 adolescent cricketers in Dhaka City, Bangladesh, considering a range of factors, including sociodemographic characteristics, game-related activities, preventive practices, and LBP-related history. Various ML algorithms applied to LBP severity classification included K-Nearest Neighbors, Random Forest, Logistic Regression, and Support Vector Machine (SVM). LBP severity was categorized into three classes as no pain, mild pain, and moderate pain because there was an insufficient amount of data for the severe pain category. The SVM using the sigmoid kernel of the models considered gave the best performance as it produced the best performance metrics of test accuracy (87.6%), precision (90%), recall (87.6%), and F1-score (87.1%). In addition, regression analysis was also applied to identify the predictors of LBP. Key correlates included female gender (adjusted odds ratio [AOR] = 2.52), higher educational attainment (e.g., undergraduate: AOR = 5.38), elevated family income (e.g., > 60,000 BDT: AOR = 4.36), longer weekly practice duration (>20 hours: higher prevalence of 81.7%), inconsistent warm-up (often/sometimes: AOR = 12.48-14.07) and cool-down practices (sometimes: AOR = 2.86), and prior LBP history (AOR = 6.92), all significantly associated with increased LBP risk (p < 0.05). The findings show the importance of early intervention and prevention protocols for minimizing LBP occurrence among junior cricket players. In short, this work demonstrates the effectiveness of ML and regression models for ascertaining sports injury patterns of risk, data-informed prevention and management protocols, and providing a foundation for future studies on this subject. Limitations include the exclusion of a severe pain category due to insufficient data, which reduces the model's capacity to triage urgent cases requiring immediate intervention.

Author summary

Low back pain (LBP) is a common issue among young cricket players, especially adolescents, due to the repetitive stress on their spines from activities like bowling and batting. In Dhaka, Bangladesh, where cricket is hugely popular, many teens face this problem, but little is known about the specific risk factors in this setting, and tools to predict and prevent it are limited. We surveyed 450 adolescent cricketers aged 11–19 from local clubs, collecting data on their backgrounds, playing habits, and training routines. Using machine learning techniques and statistical analysis, we identified key risks and created a model to classify pain levels as none, mild, or moderate. Our best model achieved about 88% accuracy in predicting pain severity. Factors like being female, higher education or family income levels, longer practice hours, inconsistent warm-ups or cool-downs, and a history of previous LBP significantly increased the risk. These insights highlight the need for better training practices to protect young players. By applying simple data tools, coaches and health workers in low-resource areas like Dhaka can spot at-risk teens early, design personalized prevention plans, and reduce long-term injuries—ultimately helping more kids enjoy cricket safely.

Introduction

Low back pain (LBP) in adolescent athletes, particularly cricketers, is a growing concern due to repetitive spinal stress during development [1]. Globally, lifetime prevalence of LBP ranges from 35–85% among adolescents, with higher incidence in females [2,3]. In Bangladesh, where cricket is a national passion, studies report high musculoskeletal pain prevalence (~80%) among adolescent cricketers, including lower back commonly affected; however, specific LBP data for adolescent cricketers are scarce, highlighting a critical research gap [46]. Under-reporting is prevalent, as adolescents often normalize pain due to cultural factors, limited healthcare access, or fear of impacting sports participation [3,7].

LBP etiology is multifactorial, encompassing biomechanical stresses from cricket-specific movements like hyperextension, rotation, and lateral flexion, which increase risks for immature spines and injuries such as spondylolysis [6,8,9]. Genetic predispositions also play a role, with heritability estimates of 30–50% from twin studies and associations with genes linked to disc degeneration [10,11].

Impacts include reduced physical functioning, emotional well-being, strained relationships, and elevated risks for chronic pain into adulthood [1].

Early detection is essential for effective prevention, enabling personalized training and rehabilitation programs [1]. Traditional manual protocols lack capacity to identify complex patterns and exhibit poor predictive validity, especially for nonlinear relationships in heterogeneous populations [12,13].

In contrast, machine learning (ML) provides a data-driven approach, processing large datasets on training, biomechanics, and psychosocial factors to stratify risks and forecast LBP probability [14]. Yet, no predictive ML models exist for screening LBP in adolescent cricketers in Bangladesh.

This study addresses this gap using ML and regression analysis to identify key risk factors in Bangladeshi adolescent cricketers. Findings will guide health professionals, coaches, and managers in optimizing prevention and management strategies, serving as a model for ML applications in sports injury research across regions.

Materials & methods

Study population

For this study, 450 adolescent cricket players data were collected using purposive sampling technique across various cricket clubs (BKSP, Lt. Sheikh Jamal cricket Academy, Kola Bagan Krira chakra, Khelaghar Cricket Academy and City club) in Dhaka City, with the primary focus on identifying the risk factors associated with the development of LBP. To ensure a balanced dataset for robust machine learning evaluation, this purposive sampling incorporated predefined quotas for each pain severity category. The data collection was conducted from 25 February 2024–28 November 2024. Inclusion criteria included adolescents aged 11–19 years actively participating in cricket training or matches for at least 5 hours per week, with no recent (past 6 months) major injuries unrelated to LBP. Exclusion criteria encompassed individuals with congenital spinal deformities, recent surgical interventions, or unwillingness to participate.

Ethical consideration

This study adhered to the ethical principles of the Declaration of Helsinki. This study was also approved by the IRB of the Institutional Review Board (IRB) of the National Institute of Traumatology and Orthopaedic Rehabilitation (NITOR/PT/93/lRB/2024/05). Participants were fully informed of the study’s purpose, methodology, and procedures. The participants had the right to withdraw from the interview completely at any time. Confidentiality and anonymity were ensured. No physical specimens were collected.

Outcome variables

Severity was self-reported by participants using the Numeric Pain Rating Scale (NPRS), where pain intensity is rated from 0 (no pain) to 10 (worst imaginable pain) [15]. Categories were defined as: no pain (0), mild (1–3), moderate (4–6), excluding severe (7–10) due to insufficient cases. To facilitate the data collection and classification process, LBP severity was categorized into three distinct classes, such as Class 0: No pain (No LBP), Class 1: Mild pain, Class 2: Moderate pain. Specifically, data collection was prospectively managed and concluded for each category once a predefined quota of 150 valid instances was reached. This resulted in an inherently balanced distribution of exactly 150 participants categorized as Class 0 (No pain), 150 as Class 1 (Mild pain), and 150 as Class 2 (Moderate pain).

Explanatory variables

The dataset included a variety of features capturing the demographic, physical, and training-related characteristics of adolescent cricket players. Data were collected through structured surveys and physical assessments, and were organized into the following categories:

  • Sociodemographic factors: Age, gender, education, monthly family income, and body mass index.

  • Games-related factors: Playing position, playing experience, and duration of practice per week.

  • Preventive measures factors: Warm-up before sports activity, duration of warm-up, cooldown after sports activity, and duration of cool down.

Players’ ages and heights ranged from 11 to 19 years and from 120 to 180.3 cm, respectively. This information provides insight into their physical stature, which may influence biomechanics during cricketing activities. Similarly, players’ weight can significantly affect the strain on the musculoskeletal system during high-intensity activities like cricket. In this study, players’ weights ranged from 20 to 80 kilograms.

The players’ Body Mass Index (BMI) varied between 8.53 and 29.81. This measure was split into three categories according to WHO: underweight (BMI < 18.5), normal (BMI 18.5 - 24.9), and overweight (BMI ≥ 25), with implications for the overall risk of injury to the player [16]. The educational level was divided into four categories: J.S.C./8th grade, S.S.C./O-levels, H.S.C./A-levels, and undergraduate. This variable is important as education may reflect underlying socio-economic factors influencing players’ overall health and physical activity.

Position of play was another factor of significance, following categories such as right-hand batsman, left-hand batsman, spin hand bowler, pace hand bowler, wicket-keeper, and all-rounder. These roles dictate the type of biomechanical stress players experience; bowlers particularly pace bowlers, face a higher LBP risk due to repetitive movements [17,18]. The players were classified in terms of their playing experience among three categories, 1–3 years, 4–6 years, and 7–10 years. The players’ practice duration per week (hour) was classified in three categories: < 10 hours, 10–20 hours, and >20 hours.

Training habits are captured through the doing and duration of warm-up and cooldown. Warm-up before and cooldown after the sports activity were categorized into ‘always’, ‘often’ and ‘sometimes’. The warm-up duration ranges from <10 minutes to >15 minutes, while cooldown time is similarly categorized into <10 minutes, 10–15 minutes, and >15 minutes. These factors are critical as appropriate warm-up and cooldown routines are known to reduce the risk of injury and improve muscle recovery. Lastly, past history of LBP is documented either “Present” or “Absent”.

In summary, the machine learning model for LBP severity classification was developed using an integrated set of attributes, including sociodemographic factors, game-related factors, preventive, and LBP related factors. The model’s performance was evaluated using 10-fold cross-validation. No formal feature selection was applied, as all variables were selected based on domain knowledge and prior literature relevance. Categorical variables were encoded using a hybrid approach in which ordinal and binary features such as Age category, Sex, and Education level were transformed using label encoding to preserve their inherent order, while nominal features were transformed using one-hot encoding to maintain interpretability and ensure compatibility with machine learning algorithms including SVM, as implemented in scikit-learn. This approach offers technical advantages by reducing feature dimensionality, preserving meaningful ordinal relationships, preventing the introduction of spurious ordinal assumptions for nominal variables, and improving both computational efficiency and predictive performance across diverse algorithms.

Selection of algorithms

The most popular machine learning-based classification models include Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), and Support Vector Machine (SVM), whereas deep learning-based classification models are Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN). These machine learning-based model were chosen in this study because of their effectiveness in handling structured data in contexts with a small number of samples. The deep learning models are in need of a large volume of data to automatically extract and learn intricate feature representation efficiently [19].

This study evaluates the performance of four machine learning classifiers: SVM, RF, KNN, and LR for the classification of LBP severity among adolescent cricket players. The dataset was balanced, consisting of 150 instances in each of the three classes: No pain, Mild pain, and Moderate pain. Of the above-mentioned machine learning algorithms, KNN is a non-parametric model whereas RF and SVM have good classification properties and are suitable for non-linear decision boundary modelling.

Logistic Regression

LR is a widely used statistical method for binary and multiclass classification in medical research [20]. It models the probability of class membership using the logistic function, which is a type of sigmoid function. This function assumes a linear relationship between the independent variables and the log-odds of the outcome. It is efficient, interpretable, and works best when the data has a linear decision boundary. It is sensitive to multicollinearity and may underperform in complex or nonlinear datasets unless extended with interaction terms or regularization [21]. In this study, logistic regression served as a baseline model. While in this study, its simplicity allowed clear insights into individual predictors of low back pain, it was less effective in capturing nonlinear associations compared to tree-based or kernel-based models.

K-Nearest neighbours

KNN is a non-parametric, instance-based learning algorithm commonly used in both classification and regression tasks, particularly within structured medical data contexts due to its simplicity and effectiveness in handling health-related patterns [22]. It predicts outcomes by identifying the k closest data points to a given instance based on a distance metric, typically Euclidean distance:

Distance=(x2x1)2+(y2y1)2

KNN performs well on small to moderately sized datasets with low to medium dimensionality and balanced class distributions. It is particularly useful when the data exhibits local patterns or clustering. However, its performance can degrade in high-dimensional or noisy datasets, and it becomes computationally expensive with large datasets due to its lack of a training phase.

In this study, KNN was applied during the model evaluation phase to explore pattern similarity among adolescent cricket players.

Random forest

Random Forest (RF) is an ensemble learning algorithm [23] that builds multiple decision trees and combines their outputs to improve classification accuracy and reduce overfitting. Each tree is trained on a random subset of the data and features, and final predictions are made by majority voting for classification or averaging for regression. It is well-suited for handling datasets with nonlinear relationships, mixed data types, and moderate levels of noise. It provides good performance even without extensive parameter tuning and offers feature importance scores, aiding interpretability. However, its complexity can increase with many trees, leading to slower predictions and less transparency compared to simpler models.

In this study, Random Forest demonstrated strong classification performance in identifying low back pain patterns, particularly due to its robustness against overfitting and ability to model interactions between risk factors.

Support Vector Machine (SVM)

SVM is a supervised learning machine learning algorithm with excellent performance in multiclass classification even when it has been trained on relatively small data sets. It is a human-interpretable model and has good generalizing capacity on small data samples and is less susceptible to overfitting in comparison to ensembled-based models such as RF [2426]. SVM generates optimal separating decision boundaries, called hyperplanes, in a high-dimensional feature space to classify data into their respective classes. These hyperplanes are maximally positioned between classes. In other words, the hyperplanes are put in such a way that they are maximally distant from data points of any class. These data points are called support vectors. Maximizing this distance between classes is what aims to help improve the model’s performance in generalizing new data [24].

Several mathematical functions known as kernels are employed to deal with non-linearly separable data like linear kernel, polynomial kernel, radial basis function (RBF), and sigmoid. These kernels enable the mapping of original data to a higher-dimensional feature space, where it becomes linearly separable. Mathematical representation of these kernels is provided in Table 1 and a representation of SVM-based decision boundaries for these kernels is shown in Fig 1.

Table 1. SVM kernel functions and their mathematical expressions.

Kernel Type Expression
Linear xiTxj
Radial basis function (RBF) exp(||xixj||22σ2)
Polynomial (xiTxj+c)p
Sigmoid tanh(xiTxj+c)

Fig 1. Decision boundaries of SVM classifiers with different kernel functions: (a) linear, (b) RBF, (c) polynomial, and (d) sigmoid.

Fig 1

In this work, several approaches were undertaken to minimize overfitting and improve generalization. For instance, ten-fold cross-validation was applied to ensure balanced evaluation across the dataset. The dataset itself was balanced with equal class representation to avoid bias. Additionally, hyperparameters for each model were optimized using grid search to identify the most effective configurations. These methods collectively helped reduce variance and ensure more reliable classification performance.

Performance metrics

To ensure reliable model evaluation given the limited dataset size, ten-fold cross-validation (CV-10) was used. The classifier’s effectiveness was assessed using commonly used evaluation metrics, including accuracy, precision, recall, and F1-score, which are also necessary to comprehensively analyze performance in the presence of class imbalance. These metrics are calculated based on the number of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions, as defined below:

Accuracy=TP+TNTP+TN+FP+FN
Precision=TPTP+FP
Recall=TPTP+FN
FMeasure=2×Precision×RecallPrecision+Recall

All performance metrics were computed as averages across the folds, providing stable estimates of each model’s generalization performance.

External validation cohort

To assess the robustness and clinical generalizability of the proposed SVM model, an external validation was conducted using an independent dataset of n = 42 adolescent cricketers collected between 4th to 31st January 2025 from Cricketers Club, Noakhali, This cohort was distinct from the training population. The demographics included 37 males (88.1%) and 5 females (11.9%), with ages ranging from 13 to 19 years (mean age skewed toward 17–19 years). Pain severity was classified using the same criteria as the primary study: No pain (n = 11), Mild (n = 17), and Moderate (n = 14). The pre-trained SVM classifier (Sigmoid kernel, C = 1, gamma = 0.01) was applied to this unseen data without any re-training or parameter tuning.

Experimental setup

All experiments were conducted using Python 3.13.9 with scikit-learn 1.8.0. Initially, categorical features were converted into numeric labels using LabelEncoder, and subsequently, all features were standardized with StandardScaler to ensure zero mean and unit variance, allowing fair treatment of all features by the models. Consequently, hyperparameters for each model were optimized using exhaustive grid search combined with stratified k-fold cross-validation, targeting the weighted F1-score to balance performance across all classes. Importantly, during each fold, hyperparameter combinations were evaluated exclusively on the training subset, while the validation subset was kept completely unseen, thereby preventing any data leakage. Once the optimal hyperparameters were identified, the final SVM model was retrained on the full training dataset for evaluation. In parallel, stratified 10-fold cross-validation was employed to preserve the class distribution in each fold, which helped reduce the risk of biased performance estimates. Four ML algorithms, including KNN, RF, LR, and SVM, were evaluated using their best-found configurations. Among them, the SVM classifier, configured with a sigmoid kernel, C = 1, and gamma = 0.01 without class weighting, demonstrated the highest overall performance on the balanced dataset and was therefore selected for detailed analysis.

Result

The research result was analyzed by two different types of analytical method; among them one was ML-based classification models which included KNN, RF, LR, and SVM model. Another analytical model was binary logistic regression model.

Among the classifiers, SVM achieved the highest overall performance. After extensive grid search-based hyperparameter tuning, the best configuration was found with a sigmoid kernel, C = 1, and gamma = 0.01. As shown in Table 2, SVM attained a test accuracy of 87.6% and a macro-averaged F1 score of 87.1%, outperforming the other models. RF and LR followed closely, with F1 scores of 85.4% and 85.9%, and test accuracies of 85.6% and 86.0%, respectively. KNN, although achieving 100% training accuracy, had the lowest generalization ability, with a test accuracy of 80.4% and an F1 score of 78.9%, indicating overfitting.

Table 2. Overall Algorithm Performance Comparison.

Algorithm Train Acc (%) Test Acc (%) Precision (%) Recall (%) F1 Score (%)
KNN 100.0 80.4 83.7 80.4 78.9
RF 96.7 85.6 86.3 85.6 85.4
LR 89.8 86.0 86.6 86.0 85.9
SVM 88.4 87.6 90.0 87.6 87.1

To further asses the performance of the SVM classifier in detail, class-wise results were evaluated and are summarized in Table 3. Class 0 (No pain) achieved perfect classification, with 100% precision, recall, and F1-score. Class 1 (Mild pain) was also well classified, with an F1 score of 84.0%. In contrast, Class 2 (Moderate pain) exhibited more classification difficulty, with a precision of 95.5% but a lower recall of 66.0%, resulting in an F1 score of 77.4%. These results indicate that while the SVM model performs well overall, it faces significant challenges in accurately identifying moderate pain cases (Class 2). Clinically, the low recall for Class 2 implies potential under-detection of moderate LBP, which could delay targeted interventions, increasing risks for progression to severe pain or chronicity; this underscores the need for enhanced features (e.g., biomechanical data) in future models to improve triage accuracy.

Table 3. Class-wise Performance Metrics.

Class Train Acc (%) Test Acc (%) Precision (%) Recall (%) F1 Score (%)
0 100.0 100.0 100.0 100.0 100.0
1 97.4 96.7 74.5 96.7 84.0
2 67.9 66.0 95.5 66.0 77.4

Visual performance metrics are shown in Fig 2. In Fig 2(a), training and test accuracies across all folds remain closely aligned, with training accuracy around 88–89% and test accuracy between 81–88%, confirming strong generalization and minimal overfitting. Fig 2(b) displays class-wise accuracy across folds. Class 0 achieved perfect accuracy in all folds, while Class 1 remained stable. Class 2 varied more significantly, with fold-wise accuracy ranging from 45% to 80%, underscoring its classification complexity.

Fig 2. Performance of the SVM classifier with sigmoid kernel (C = 1, gamma = 0.01) for LBP classification: (a) train and test accuracy across folds, (b) class-wise test accuracy, (c) macro-average ROC curve (AUC = 0.952), and (d) normalized confusion matrix.

Fig 2

Fig 2(c) presents the macro-averaged ROC curve, where SVM achieved an AUC of 0.952, reflecting excellent discrimination among all three classes. The normalized confusion matrix in Fig 2(d) shows 100% correct classification for Class 0, 97% for Class 1, and 66% for Class 2. Misclassification of 34% of Class 2 samples as mild pain illustrates the overlap in features between moderate and mild pain cases and signals a need for enhanced feature differentiation.

External validation performance

The SVM model demonstrated strong generalizability on the external cohort (n = 42), achieving an overall accuracy of 81.0% and a weighted F1-score of 0.81. The performance on the unseen data is summarized in Table 4. Consistent with the internal validation, the model achieved perfect classification for Class 0 (No Pain) with 100% precision and recall. For symptomatic cases, the model successfully identified 79% of Moderate Pain (Class 2) cases, which is an improvement over the 66% recall observed during internal cross-validation. However, some overlap persisted, with 21% of moderate cases (3/14) misclassified as mild, and 29% of mild cases (5/17) misclassified as moderate.

Table 4. Comparative performance of the SVM model during Internal Cross-Validation vs. External Validation.

Performance Metric Internal Validation (CV-10) External Validation (n = 42) Δ (Change)
Overall Accuracy (%) 87.6 81.0 -6.6
Macro-Averaged F1-score (%) 87.1 83.0 -4.1
Class 0 (No Pain) Precision (%) 100.0 100.0 0.0
Recall (%) 100.0 100.0 0.0
F1-score (%) 100.0 100.0 0.0
Class 1 (Mild Pain) Precision (%) 74.5 80.0 +5.5
Recall (%) 96.7 71.0 -25.7
F1-score (%) 84.0 75.0 -9.0
Class 2 (Moderate Pain) Precision (%) 95.5 69.0 -26.5
Recall (%) 66.0 79.0 +13.0
F1-score (%) 77.4 73.0 -4.4

Furthermore, the performance of the SVM model on the independent external validation dataset (n = 42) is summarized in Table 5. To account for uncertainty due to the relatively small sample size, 95% confidence intervals (CIs) were estimated using bootstrap resampling with 10,000 iterations, where the lower and upper bounds of each interval represent the 2.5th and 97.5th percentiles of the bootstrap distribution, respectively. Overall, the model achieved a weighted F1-score of 81.0% with 95% CI [68.8%, 92.8%], demonstrating reliable performance on unseen data. Class 0 (No Pain) was classified perfectly 100% across all metrics, representing the highest performance, whereas Class 2 (Moderate Pain) exhibited the lowest precision at 68.8% with 95% CI [45.0%, 91.7%], reflecting some overlap in features between mild and moderate pain. The normalized confusion matrix in Fig 3 further illustrates these patterns, with Class 0 (No Pain) correctly classified 100%, Class 1 (Mild Pain) correctly predicted 71%, and Class 2 (Moderate Pain) correctly classified 79%, confirming that the model performs best for the no-pain group and shows some misclassification between moderate and mild pain levels. These results indicate that the SVM model can distinguish between pain levels with quantified uncertainty, even in a small external cohort.

Table 5. Performance of the SVM model on the external validation dataset with 95% bootstrap confidence intervals.

Performance Metric Precision (%) Recall (%) F1-score (%)
Overall (Weighted) 81.5 [69.7, 93.0] 81.0 [69.0, 92.9] 81.0 [68.8, 92.8]
Class 0 (No Pain) 100.0 [100.0, 100.0] 100.0 [100.0, 100.0] 100.0 [100.0, 100.0]
Class 1 (Mild Pain) 80.0 [57.1, 100.0] 70.6 [47.1, 92.3] 75.0 [54.5, 89.7]
Class 2 (Moderate Pain) 68.8 [45.0, 91.7] 78.6 [54.5, 100.0] 73.3 [52.2, 88.9]

Fig 3. Normalize confusion matrix of the SVM model on the external validation dataset.

Fig 3

Sociodemographic, games and preventive measures related characteristics of the participants

This study revealed that several sociodemographic characteristics are significantly associated with the prevalence of low back pain (LBP). A strong association exists for both age (p < 0.001) and educational level (p < 0.001), showing that LBP becomes more common as players get older and advance in their schooling. Monthly family income is also a significant factor (p < 0.001), where participants from the lowest income group reported substantially less LBP than those from higher-income families. Body Mass Index (BMI) showed a significant association (p = 0.009) (Table 6).

Table 6. Distribution of sociodemographic characteristics of the participants (n = 450).

Characteristics Low back pain Chi-square P-value
Present (%)
(n = 300)
Absent (%)
(n = 150)
Age
11-13 years 25 (44.6) 31 (55.4) 19.05 0.000a
14-16 years 148 (65.2) 79 (34.8)
17-19 years 127 (76) 40 (24)
Gender
Male 195 (64.3) 110 (35.7) 2.49 0.132b
Female 102 (71.8) 40 (28.2)
Education
J.S.C/8th grade 27 (42.2) 37 (57.8) 50.03 0.000a
S.S.C/O-levels 77 (55) 63 (45)
H.S.C/A-levels 83(72.2) 32 (27.8)
Under graduation 113 (86.3) 18 (13.7)
Monthly family income (BDT)c
Below 20000 BDT 31 (36.9) 53 (63.1) 42.82 0.000a
20000-40000 BDT 131 (70.4) 55 (29.6)
41000-60000 BDT 95 (77.2) 28 (22.8)
>60000 BDT 43 (75.4) 14 (24.6)
Body Mass Index
Underweight 67 (58.1) 49 (41.9) 9.39 0.009a
Normal 218 (71.2) 88 (28.8)
Overweight 14 (51.9) 13 (48.1)

aPearson Chi-square; bFisher’s exact correction test used for having 20% or more expected frequencies less than 5; c1 US dollar = 121.85 BDT (in May 2025)

Table 7 revealed a highly significant association between LBP and both playing experience (p < 0.001) and the duration of practice per week (p < 0.001). A clear dose-response relationship is visible for both factors. LBP prevalence rises steadily with more years of playing, from 56.4% for those with 1–3 years of experience to 82.5% for those with 7–10 years. Similarly, LBP prevalence increases with more intense weekly practice, from 47.1% for those training less than 10 hours to 81.7% for those training more than 20 hours.

Table 7. Distribution of games related characteristics of the participants (n = 450).

Characteristics Low back pain Chi-square P-value
Present (%)
(n = 300)
Absent (%)
(n = 150)
Playing position
Right hand batsman 118 (66.7) 59 (33.3) 4.42 0.490a
Left hand batsman 24 (60) 16 (40)
Spin hand bowler 23 (56.1) 18 (43.9)
Pace hand bowler 22 (73.3) 8 (26.7)
Wicket keeper 20 (74.1) 7 (25.9)
All rounder 93 (68.9) 42 (31.1)
Playing experience
1-3 years 119 (56.4) 92 (43.6) 20.37 0.000a
4-6 years 134 (73.6) 48 (26.4)
7-10 years 47 (82.5) 10 (17.5)
Duration of practice per week
<10 hours 24 (47.1) 27 (52.9) 27.47 0.000a
10-20 hours 151 (61.4) 95 (38.6)
>20 hours 125 (81.7) 28 (18.3)

aPearson Chi-square; bFisher’s exact correction test used for having 20% or more expected frequencies less than 5

Table 8 shows of both the frequency of warming up (p < 0.001) and its duration (p < 0.001) were significantly associated with LBP. The players who always warm up had a much lower prevalence of LBP (58.7%) than those who did so “Often” or “Sometimes” (96.3-97%). For duration, the group warming up for 10–15 minutes reported the highest rate of LBP (73.7%). Likewise, the frequency of cooling down after activity was highly significant (p < 0.001), with those who “Always” cool down showing a markedly lower LBP prevalence (56.1%) compared to less consistent players. There is also a significant association between the duration of cool-down and low back pain (LBP) (p < 0.001). The lowest prevalence of LBP (30.3%) is observed in the group of players who cool down for 10–15 minutes. In contrast, the prevalence of LBP is considerably higher for those with shorter cool-down periods of less than 10 minutes (61.4%) and is also elevated for those with longer cool-downs of more than 15 minutes (51.0%).

Table 8. Distribution of preventive measure related characteristics of the participants (n = 450).

Characteristics Low back pain Chi-square P-value
Present (%)
(n = 300)
Absent (%)
(n = 150)
Warm-up before sports activity
Always 209 (58.7) 147 (41.3) 48.58 0.000a
Often 26 (96.3) 1 (3.7)
Sometimes 65 (97) 2 (3)
Duration of warm-up
<10 minutes 77 (52.7) 69 (47.3) 19.08 0.000a
10-15 minutes 210 (73.7) 75 (26.3)
>15 minutes 13 (68.4) 6 (31.6)
Cool-down after sports activity
Always 169 (56.1) 132 (43.9) 45.41 0.000a
Often 16 (84.2) 3 (15.8)
Sometimes 115 (88.5) 15 (11.5)
Duration of cool-down
<10 minutes 32 (88.9) 4 (11.1) 23.38 0.000a
10-15 minutes 219 (69.7) 95 (30.3)
>15 minutes 49 (49.0) 51 (51.0)

aPearson Chi-square; bFisher’s exact correction test used for having 20% or more expected frequencies less than 5

Logistic regression to predict risk factors of LBP

Table 9 presented the logistic regression analysis and this table revealed several key risk factors for low back pain (LBP) among adolescent cricketers. In the regression model, it was revealed that inconsistent preventive habits were the most powerful predictor. Specifically, cricketers who sometimes (OR = 14.07, 95% CI: 2.78-71.13, p = 0.001) or often (OR = 12.48, 95% CI: 1.29-120.22, p = 0.029) warmed up had substantially higher odds of experiencing LBP compared to those who always did. Furthermore, the participants who cooled down sometimes nearly tripled the odds of LBP (AOR = 2.86, 95% CI: 1.34-6.09, p = 0.006). A previous history of LBP also emerged as a major risk factor, increasing the odds of a current episode nearly sevenfold (AOR = 6.92, 95% CI: 3.98-12.02, p = 0.010). Other significant independent predictors included female gender (AOR = 2.52, 95% CI: 1.25-5.07, p = 0.010), higher educational attainment as the undergraduate level (AOR = 5.38, 95% CI: 1.83-15.76, p = 0.002), and higher family income brackets.

Table 9. Logistic regression for sociodemographic and play related risk factors of low back pain among adolescent cricketers (n = 450).

Variables Category Low back pain
Absent vs Present
Unadjusted model Adjusted model
OR
95% CI
p-value OR
95% CI
p-value
Age 17-19 years Reference
14-16 years 0.59 (0.37-0.92) 0.021 1.12 (0.60-2.09) 0.710
11-13 years 0.25 (0.13-0.47) <0.001 1.52 (0.55-4.21) 0.418
Gender Male Reference
Female 1.41 (0.91-2.18) 0.115 2.52 (1.25-5.07) 0.010
Education J.S.C Reference
S.S.C/O-levels 1.67 (0.92-3.04) 0.091 1.85 (0.79-4.33) 0.154
H.S.C/A-levels 3.55 (1.87-6.75) <0.001 3.75 (1.47-9.57) 0.006
Under graduation 8.60 (4.26-17.36) <0.001 5.38 (1.83-15.76) 0.002
Monthly family income (BDT) <20000 BDT Reference
20000-40000 BDT 4.07 (2.36-7.01) <0.001 3.12 (1.47-6.63) 0.003
41000-60000 BDT 5.80 (3.14-10.69) <0.001 4.69 (1.99-11.04) <0.001
>60000 BDT 5.25 (2.48-11.09) <0.001 4.36 (1.58-11.99) 0.004
Practice duration per week (hour) <10 hours Reference
10-20 hours 1.78 (0.97-3.28) 0.060 1.21 (0.55-2.63) 0.623
>20 hours 5.02 (2.52-9.97) 0.000 1.70 (0.68-4.22) 0.249
Warm-up before the sports activity Always Reference
Often 18.28 (2.45-136.26) 0.005 12.48 (1.29-120.22) 0.029
Sometimes 22.85 (5.50-94.83) <0.001 14.07 (2.78-71.13) 0.001
Warm-up duration >15 minutes Reference
10-15 minutes 2.50 (1.65-3.81) 0.000 0.93 (0.53-1.65) 0.825
<10 minutes 1.94 (0.69-5.38) 0.203 0.84 (0.22-3.13) 0.800
Cool-down after the sports activity Always Reference
Often 4.1 (1.18-14.59) 0.026 2.21 (0.47-10.31) 0.313
Sometimes 5.98 (3.33-10.74) <0.001 2.86 (1.34-6.09) 0.006
Cool-down duration >15 minutes Reference
10-15 minutes 2.39 (1.51-3.80) <0.001 1.63 (0.88-2.98) 0.114
<10 minutes 8.32 (2.74-25.28) <0.001 2.65 (0.65-10.78) 0.173
Pervious history of LBP Absent Reference
Present 7.52 (4.85-11.67) <0.001 6.92 (3.98-12.02) 0.010

To assess the model's fit, the Hosmer-Lemeshow goodness-of-fit test demonstrated that the logistic regression model is effective for forecasting the dependent variable. To evaluate any possible instability due to correlated predictors, variance inflation factors (VIF) were calculated for the regression variables, revealing no notable multicollinearity (VIF < 5).

Discussion

This study utilized a combination of machine learning (ML) classification models and binary logistic regression to analyze risk factors associated with low back pain (LBP) in adolescent cricketers in Dhaka, Bangladesh. The SVM model with a sigmoid kernel demonstrated superior performance, achieving an overall test accuracy of 87.6% and a macro-averaged F1-score of 87.1%, outperforming KNN, RF, and LR. Key risk factors identified through regression included inconsistent warm-up and cool-down practices, prior LBP history, female gender, higher educational attainment, and elevated family income, aligning with multifactorial LBP etiology.

The SVM's strong performance for Class 0 (no pain) and Class 1 (mild pain) underscores its utility in distinguishing lower-severity cases, consistent with prior ML applications in LBP classification, where accuracies range from 83 to 92% in general populations [27]. However, Class 2 (moderate pain) showed lower recall (66%) and F1-score (77.4%), indicating challenges in capturing moderate cases due to feature overlap with mild pain, as evidenced by the confusion matrix. Clinically, this low recall implies a risk of under-detection (false negatives), potentially delaying interventions like targeted physiotherapy or training adjustments, which could lead to progression to severe LBP or chronicity critical in adolescents where early management prevents long-term disability. The external validation confirmed the model's clinical utility, retaining an accuracy of 81.0% on unseen data. This slight decrease from the internal accuracy (87.6%) is expected in machine learning when moving from controlled training sets to real-world data and represents a robust stability. Importantly, the external validation addressed a key concern regarding the detection of moderate pain. While internal testing showed a recall of 66% for Class 2, the external validation achieved a recall of 79%, suggesting the model is highly sensitive to clinically significant pain levels in diverse samples. The confusion matrix of the external data confirms that errors were strictly between adjacent classes (Mild vs. Moderate); no symptomatic players were misclassified as pain-free (Class 0), ensuring that no injured player would be cleared to play erroneously.

Findings on sociodemographic risks, such as Female gender was found to be an independent risk factor (AOR = 2.52, p = 0.010), aligning with general epidemiological trends where females often report higher rates of LBP, potentially due to anatomical, hormonal, or biomechanical differences. A large French occupational health study found female gender associated with increased risk of musculoskeletal disorder-related work unfitness (OR = 2.09) [28].

This research revealed a robust and statistically significant correlation between education level and low back pain (LBP) (P < 0.001). Individuals with undergraduate degrees displayed a higher LBP prevalence of 86.3% relative to those with less education. Likewise, a Chinese investigation involving 15,743 people identified 1.24 times greater odds of LBP among those with higher education levels [29].

In this study it was found that players who practiced > 20 hours/week had the highest prevalence of LBP (81.7%), while those who practiced <10 hours/week had the lowest prevalence (47.1%). Similarly, in a research, adolescent basketball players practicing over 6.6 hours/week exhibited 43.8% LBP prevalence [30] and youth weightlifters training long-term weightlifting developed lumbar disc degeneration within 4 years, progressing to herniation in 33% by year 5 [31].

A strong association was found between warming up before sports activity and LBP (P < 0.001), with those who always warm up having a lower prevalence of LBP (56.6%). Similarly, a meta-analysis of Warm-up Intervention Programs (WIPs) found a 36% reduction in sports injuries when accounting for hours of risk exposure [32].

Similarly, strong association was also observed between cool-down after sports activity and LBP (P < 0.001), with those who always perform a cool-down having a lower prevalence of LBP (56.1%). Similarly, a randomized crossover study compared aqua- and land-based cool-down exercises found both have similar recovery effects on muscle soreness and performance-based parameters [33].

Previous history of LBP was a consistent and strong predictor of current LBP, with participants who had a prior history of LBP showing higher odds of presenting LBP (OR = 7.52). A comprehensive systematic review investigating the risk of LBP recurrence found that “a history of previous episodes of LBP prior to the most recent episode was the only factor that consistently predicted recurrence of LBP” [34]. Another systematic review also discovered intrinsic factors related to LBP in cricket fast bowlers to include previous injury as a factor known to influence load tolerance and LBP injury [17].

The application of machine learning here is a useful sports medicine innovation. Methods of traditional risk factor analysis cannot account for such complex, non-linear interdependencies amongst a large number of variables such as sociodemographic, game and prevention-of-injury factors and LBP factors.

Conclusion

The aim of this study was to perform a risk assessment and classification of LBP among adolescent cricketers in Dhaka City, Bangladesh. This was an effort to fill an existing knowledge gap through ML techniques to derive key risk factors of LBP in this population. A wide range of variables were considered, including sociodemographic, game-related, preventive measures, and LBP-related variables, to evaluate model performance. A regression model was proposed to predict LBP risk factors, while a SVM model was identified as the most suitable ML classifier for the classification of LBP levels. LBP was classified into three levels, i.e., no pain, mild pain, and moderate pain, excluding severe pain category due to a limited number of severe cases. Despite the overall satisfactory performance, results show that identifying Class 2 (moderate pain) remains challenging. This may be due to overlapping clinical features between mild and moderate pain that limit separability. The findings provide significant insight to health professionals, coaches, and sport organizations in designing effective interventions to prevent and treat LBP in young players.

Limitation

This study has several limitations. The exclusion of a severe pain category due to insufficient data reduces the model's depth and rigor, limiting its capacity to triage urgent cases involving high biomechanical stress or chronic progression. Purposive sampling from specific Dhaka clubs introduces selection bias, potentially restricting generalizability to broader populations like rural or non-club-based players, while reliance on self-reported data may cause recall or social desirability bias, especially among adolescents normalizing pain amid cultural influences. While external validation was performed, the validation cohort was smaller than the training set and predominantly male, which may limit inferences regarding female athletes in the validation phase.

Supporting information

S1 Appendix. Informed consent form.

The written consent form was administered to the participants.

(DOCX)

pdig.0001369.s001.docx (17.2KB, docx)
S2 Appendix. Questionnaire.

The included questions comprised the questionnaire administered to the participants.

(DOCX)

pdig.0001369.s002.docx (59.3KB, docx)
S3 Appendix. Data.

This file contains primary research data.

(CSV)

pdig.0001369.s003.csv (30.7KB, csv)
S4 Appendix. Data.

This file contains research data for external validation.

(CSV)

Acknowledgments

The authors would like to thank coaches and players of BKSP, Lt. Sheikh Jamal cricket Academy, Kola Bagan Krira chakra, Khelaghar Cricket Academy, and City club for all assistance and cooperation throughout the study.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Gera A, Pereira PC, Eapen C. Low back pain in adolescent athletes; evaluation and rehabilitation. J Exerc Sci Physiother. 2015;11(2):76. [Google Scholar]
  • 2.Kikuchi R, Hirano T, Watanabe K, Sano A, Sato T, Ito T, et al. Gender differences in the prevalence of low back pain associated with sports activities in children and adolescents: a six-year annual survey of a birth cohort in Niigata City, Japan. BMC Musculoskelet Disord. 2019;20(1):327. doi: 10.1186/s12891-019-2707-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wall J, Meehan WP 3rd, Trompeter K, Gissane C, Mockler D, van Dyk N, et al. Incidence, prevalence and risk factors for low back pain in adolescent athletes: a systematic review and meta-analysis. Br J Sports Med. 2022;56(22):1299–306. doi: 10.1136/bjsports-2021-104749 [DOI] [PubMed] [Google Scholar]
  • 4.Faruk MO, Begum N, Hossain K, Rahman MR, Rahman MS, Hossain S. Risk Factors Associated With Low Back Pain in Bangladesh: A Cross-Sectional Study Conducted in 2023. Health Sci Rep. 2025;8(8):e71151. doi: 10.1002/hsr2.71151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Noorbhai M, Essack F, Thwala S, Ellapen T, Van Heerden J. Prevalence of cricket-related musculoskeletal pain among adolescent cricketers in KwaZulu-Natal. S Afr j sports med. 2012;24(1). doi: 10.17159/2078-516x/2012/v24i1a352 [DOI] [Google Scholar]
  • 6.Rashaduzzaman M, Kamrujjaman M, Islam MA, Ahmed S, Al Azad S. An experimental analysis of different point specific musculoskeletal pain among selected adolescent-club cricketers in Dhaka City. Eur J Clin Exp Med. 2020;17(4):308–14. doi: 10.15584/ejcem.2019.4.4 [DOI] [Google Scholar]
  • 7.Sany SA. Low back pain and associated risk factors among medical students in Bangladesh: A cross-sectional study. 2021. 10.17632/MFKY2JTTWP.3 [DOI] [PMC free article] [PubMed]
  • 8.Bayne H, Elliott B, Campbell A, Alderson J. Lumbar load in adolescent fast bowlers: A prospective injury study. J Sci Med Sport. 2016;19(2):117–22. doi: 10.1016/j.jsams.2015.02.011 [DOI] [PubMed] [Google Scholar]
  • 9.Orchard JW, Saw R, Kountouris A, Redrup D, Farhart P, Sims K. Management of lumbar bone stress injury in cricket fast bowlers and other athletes. S Afr J Sports Med. 2023;35(1):v35i1a15172. doi: 10.17159/2078-516X/2023/v35i1a15172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Battié MC, Videman T, Levalahti E, Gill K, Kaprio J. Heritability of low back pain and the role of disc degeneration. Pain. 2007;131(3):272–80. doi: 10.1016/j.pain.2007.01.010 [DOI] [PubMed] [Google Scholar]
  • 11.Livshits G, Popham M, Malkin I, Sambrook PN, Macgregor AJ, Spector T, et al. Lumbar disc degeneration and genetic factors are the main risk factors for low back pain in women: the UK Twin Spine Study. Ann Rheum Dis. 2011;70(10):1740–5. doi: 10.1136/ard.2010.137836 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Karran EL, McAuley JH, Traeger AC, Hillier SL, Grabherr L, Russek LN, et al. Can screening instruments accurately determine poor outcome risk in adults with recent onset low back pain? A systematic review and meta-analysis. BMC Med. 2017;15(1):13. doi: 10.1186/s12916-016-0774-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Trompeter K, Fett D, Platen P. Prevalence of Back Pain in Sports: A Systematic Review of the Literature. Sports Med. 2017;47(6):1183–207. doi: 10.1007/s40279-016-0645-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Abdollahi M, Ashouri S, Abedi M, Azadeh-Fard N, Parnianpour M, Khalaf K. Using a motion sensor to categorize nonspecific low back pain patients: A machine learning approach. Sensors. 2020;20(12):3600. doi: 10.3390/s20123600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Karcioglu O, Topacoglu H, Dikme O, Dikme O. A systematic review of the pain scales in adults: Which to use?. Am J Emerg Med. 2018;36(4):707–14. doi: 10.1016/j.ajem.2018.01.008 [DOI] [PubMed] [Google Scholar]
  • 16.de Onis M, Onyango AW, Borghi E, Siyam A, Nishida C, Siekmann J. Development of a WHO growth reference for school-aged children and adolescents. Bull World Health Organ. 2007;85(9):660–7. doi: 10.2471/blt.07.043497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Farhart P, Beakley D, Diwan A, Duffield R, Rodriguez EP, Chamoli U, et al. Intrinsic variables associated with low back pain and lumbar spine injury in fast bowlers in cricket: a systematic review. BMC Sports Sci Med Rehabil. 2023;15(1):114. doi: 10.1186/s13102-023-00732-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Senington B, Lee RY, Williams JM. Biomechanical risk factors of lower back pain in cricket fast bowlers using inertial measurement units: a prospective and retrospective investigation. BMJ Open Sport Exerc Med. 2020;6(1):e000818. doi: 10.1136/bmjsem-2020-000818 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hossen S, Ali MO, Rashid MA, Higa H. Development of proning pose classification system for selfcare of COVID-19 patients. Prog Eng Sci. 2025;2(2):100064. [Google Scholar]
  • 20.Panda NR. A review on logistic regression in medical research. Natl J Community Med. 2022;13(4):265–70. doi: 10.55489/njcm.134202222 [DOI] [Google Scholar]
  • 21.Hosmer DW, Lemeshow S, Sturdivant RX. Applied logistic regression. 3rd ed. Wiley. 2013. doi: 10.1002/9781118548387 [DOI] [Google Scholar]
  • 22.Xing W, Bei Y. Medical Health Big Data Classification Based on KNN Classification Algorithm. IEEE Access. 2020;8:28808–19. doi: 10.1109/access.2019.2955754 [DOI] [Google Scholar]
  • 23.Apao NJ, Feliscuzo LS, Romana C, Tagaro J. Multiclass classification using random forest algorithm to prognosticate the level of activity of patients with stroke. Int J Sci Technol Res. 2020;9(4):1233–40. [Google Scholar]
  • 24.Abe S. Support Vector Machines for Pattern Classification. London: Springer London. 2010. doi: 10.1007/978-1-84996-098-4 [DOI] [Google Scholar]
  • 25.Mohsen S, Elkaseer A, Scholz SG. Human Activity Recognition Using K-Nearest Neighbor Machine Learning Algorithm. Smart Innovation, Systems and Technologies. Springer Singapore. 2021. 304–13. doi: 10.1007/978-981-16-6128-0_29 [DOI] [Google Scholar]
  • 26.Xu L, Yang W, Cao Y, Li Q. Human activity recognition based on random forests. In: 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2017. 548–53. doi: 10.1109/fskd.2017.8393329 [DOI] [Google Scholar]
  • 27.Tagliaferri SD, Angelova M, Zhao X, Owen PJ, Miller CT, Wilkin T, et al. Artificial intelligence to improve back pain outcomes and lessons learnt from clinical classification approaches: three systematic reviews. NPJ Digit Med. 2020;3:93. doi: 10.1038/s41746-020-0303-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pucci C, Dell’Omo M, Lesage FX, Murgia N, Annesi-Maesano I. Musculoskeletal disorders as a work-related risk factor of unfitness for work: a cross-sectional study of 1,327,540 workers in Occitania, France. Occup Med. 2024;74(Supplement_1):0–0. doi: 10.1093/occmed/kqae023.0487 [DOI] [Google Scholar]
  • 29.Jiang X, Wang R, Bai YW, Tang L, Xing WY, Chen N, et al. Prevalence and risk factors of low back pain in middle-aged and older adult in China: a cross-sectional study. Arch Public Health. 2025;83(1):207. doi: 10.1186/s13690-025-01695-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Arend M, Toomsalu L, Kaasik P. High prevalence of ankle, knee and low back problems in highly trained adolescent basketball players at the beginning of their competitive season. Pap Anthropol. 2024;33(1):72–84. doi: 10.12697/poa.2024.33.1.04 [DOI] [Google Scholar]
  • 31.Yoshimizu R, Nakase J, Yoshioka K, Shimozaki K, Asai K, Kimura M, et al. Incidence and temporal changes in lumbar degeneration and low back pain in child and adolescent weightlifters: A prospective 5-year cohort study. PLoS One. 2022;17(6):e0270046. doi: 10.1371/journal.pone.0270046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ding L, Luo J, Smith DM, Mackey M, Fu H, Davis M. Effectiveness of warm-up intervention programs to prevent sports injuries among children and adolescents: A systematic review and meta-analysis. Int J Environ Res Public Health. 2022;19(10):6336. doi: 10.3390/ijerph19106336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chin ECY, Lai S, Tsang SF, Chung HN, Wong YL, Sran N. Comparing the Effects of Aqua- and Land-Based Active Cooldown Exercises on Muscle Soreness and Sport Performance: A Randomized Crossover Study. Int J Sports Physiol Perform. 2024;19(12):1381–90. doi: 10.1123/ijspp.2024-0020 [DOI] [PubMed] [Google Scholar]
  • 34.da Silva T, Mills K, Brown BT, Herbert RD, Maher CG, Hancock MJ. Risk of Recurrence of Low Back Pain: A Systematic Review. J Orthop Sports Phys Ther. 2017;47(5):305–13. doi: 10.2519/jospt.2017.7415 [DOI] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0001369.r001

Decision Letter 0

Cleva Villanueva

16 Sep 2025

Response to Reviewers'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.-->-->* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.'.-->-->* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.'.-->--> -->-->If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.-->--> -->-->We look forward to receiving your revised manuscript.-->--> -->-->Kind regards,-->--> -->-->Cleva Villanueva, M.D., Ph.D.-->-->Academic Editor-->-->PLOS Digital Health-->--> -->-->Cleva Villanueva-->-->Academic Editor-->-->PLOS Digital Health-->--> -->-->Leo Anthony Celi-->-->Editor-in-Chief-->-->PLOS Digital Health-->-->orcid.org/0000-0001-6712-6626-->--> -->-->Journal Requirements:-->-->

1. Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150–200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines:

https://journals.plos.org/digitalhealth/s/submission-guidelines#loc-parts-of-a-submission

2. We note that there is identifying data in the Supporting Information file < Data.csv>. Due to the inclusion of these potentially identifying data, we have removed this file from your file inventory. Prior to sharing human research participant data, authors should consult with an ethics committee to ensure data are shared in accordance with participant consent and all applicable local laws.

Data sharing should never compromise participant privacy. It is therefore not appropriate to publicly share personally identifiable data on human research participants. The following are examples of data that should not be shared:

-Name, initials, physical address

-Ages more specific than whole numbers

-Internet protocol (IP) address

-Specific dates (birth dates, death dates, examination dates, etc.)

-Contact information such as phone number or email address

-Location data

-ID numbers that seem specific (long numbers, include initials, titled “Hospital ID”) rather than random (small numbers in numerical order)

Data that are not directly identifying may also be inappropriate to share, as in combination they can become identifying. For example, data collected from a small group of participants, vulnerable populations, or private groups should not be shared if they involve indirect identifiers (such as sex, ethnicity, location, etc.) that may risk the identification of study participants.

Additional guidance on preparing raw data for publication can be found in our Data Policy (https://journals.plos.org/plosone/s/data-availability#loc-human-research-participant-data-and-other-sensitive-data) and in the following article: http://www.bmj.com/content/340/bmj.c181.long.

Please remove or anonymize all personal information (<specific identifying information in file to be removed>), ensure that the data shared are in accordance with participant consent, and re-upload a fully anonymized data set. Please note that spreadsheet columns with personal information must be removed and not hidden as all hidden columns will appear in the published file.

Additional Editor Comments (if provided):-->--> -->-->Reviewer #1: Abstract: Needs clearer statement of feature correlates and explicit mention of study limitations. Absence of the severe pain category reduces the model’s capacity to triage urgent cases.

Introduction: Should include local statistics on LBP in Bangladesh cricketers, acknowledge genetic predisposition as a factor, and cite comparative studies (traditional vs. ML approaches). Needs references to existing ML work on LBP and discussion of ML’s achievements in other contexts.

Methods: Adequate sample size (n=450) and sound justification for regression models. However, purposive sampling may cause bias, reliance on self-reported data is limiting, no feature selection analysis mentioned, unclear categorical variable encoding, and absence of external validation.

Results: Well presented, but tables could be shorter. Authors should discuss the clinical implications of low recall in Class 2 (66%).

Discussion: Strong comparative analysis, but requires deeper discussion on clinical significance of Class 2 low recall and external validation for generalizability.

Overall: A relevant study with strong potential, but would be significantly strengthened by external validation, refined methodology, and clearer justification in introduction.

Reviewer #2: Introduction: Too long and should be condensed to fewer than 400 words without omitting important content.

Methods: Missing inclusion/exclusion criteria (e.g., training duration per week required for eligibility). Avoid reporting model performance in the Methods section—this belongs in Results.

Results: First paragraph should instead describe participant demographics. Authors should follow a standardized checklist (e.g., STROBE) for reporting cross-sectional studies.

Formatting: Inconsistent numerical reporting (percentages in text, decimals in tables) should be standardized.

Overall: Main issues relate to manuscript organization, clarity, and adherence to reporting standards.

Reviewer #3: Overall Premise: Interesting use of ML for a low-cost solution, but key conceptual issues remain.

Main Concerns:

Lack of clarity on how athletes were classified into “no, mild, and moderate LBP” despite existing literature.

No discussion on potential collinearity of variables, which may compromise model stability.

Risk factors identified are already well-established in prior studies—unclear what the novel contribution is.

Practical application of the algorithm in real-world settings is not addressed.

Introduction: Some references inappropriate (e.g., citation of qualitative/lived experience study as demographic evidence). Authors should use more robust sources (large cohort studies, meta-analyses, RCTs). Questionable logical connections in cited arguments.

Methods: Lack of clarity on who categorized LBP severity, which standard was used, and what reference supports it.

Reviewers' Comments:-->--> -->-->Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.-->?>

Reviewer #1: Partly

Reviewer #2: No

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #1: Review of Machine Learning-Based Pattern Recognition of Risk Factors for Low Back Pain among Adolescent Cricket Players in Dhaka City

Abstract

A Relevant study with clear focus

The abstract section will be more helpful if the feature correlates were clearly stated.

The study acknowledged the absence of the fourth class (Severe pain category) impacting the depth and rigor of the study.

(The model developed by this study may have reduced capacity to triage LBP and determine cricket players who need urgent intervention especially the severe pain category). The circumstances is understandable albeit.

The limitations of this study should be summarized in the abstract.

Introduction

The authors should add compelling local statistics demonstrating the severity of LBP amongst cricketers in Bangladesh or is there a lack of studies exploring this ? Which then may be a solid justification for the existence of this study.

The study acknowledged the strong possibility of the under-reportage of LBP in Bangladesh

The authors did not acknowledge the fact that genetic predisposition also account for LBP, it isn’t all mechanical.

The study will benefit from citation of comparative studies where traditional methods have shown limited capacity which will strengthen the justification of Machine learning for this study.

”There are no known predictive models with ML to be used in screening LBP in adolescent cricketers in Dhaka “ Other studies on LBP among cricketers using ML for predictive analysis should be cited to justify the need to carry out this study in Bangladesh

Also, the study should highlight statistically how the performance and effectiveness of ML in predicting the occurrence of LBP amongst cricketers in other places, what achievements so far? Has it reduced the occurrence?

Materials and Method:

n = 450 seems adequate for this study

Ethical consideration clearly stated

Purposive sampling may introduce selection bias

The study demonstrated heavy reliance on self reported data and also, no mention of feature selection analysis

How were the categorical variables encoded?

The study could have benefited from an external validation

The method section provided a good justification for the use of the regression models and dropping the deep learning model considering the small sample size.

The inclusion of the regression models for comparative analysis makes this study more robust

The use of a balanced dataset, cross validation and grid search reduces overfitting which makes the study more robust.

Results.

Clear presentation of performance metrics of the models employed for the study

Table can be more abridged

For class 2, a low recall value of 66% and F1 value of 77% was achieved by the SVM classifier. Can the authors address the clinical implication of this?

The study has a visual appeal (Visualization of results)

Discussion

This chapter features a robust analysis of the performance metrics and good comparative analysis of the study with other studies on LBP.

The authors should have a more robust discussion on the clinical significance of the class 2 low recall value and what steps to take to mitigate it.

There is a need for the authors to conduct an external validation of their model to assess generalizability and performance on unseen data.

Overall: A very germane study that would be more impactful with an external validation and more refined methodology.

Reviewer #2: - Introduction is too long and can be summarized without deletion of important points. please re-write the introduction in less than 400 words.

- please report the inclusion and exclusion criteria; for example, what was the minimum training duration per week needed to enter the study.

- please avoid reporting the performance of different models in method section. for example: "Although it offered interpretable baseline results, it was less effective than more robust classifiers like SVM in capturing complex relationships between low back pain and its contributing factors". this information belongs to result section

- The first paragraph of results belongs to reporting the demographics of included subjects. please use a standard check list such as "STROBE Statement—Checklist of items that should be included in reports of cross-sectional studies"

- It is advised to report the numbers in one format. in this manuscript the numbers are reported as percentages in text but as decimals in tables.

Reviewer #3: The authors provided an interesting premise on using machine learning to provide low cost solution for assessing low back pain (LBP) risk among young cricket athletes in Dhaka. Specific cricket fast bowling techniques are well known to increase risk of neural arch fracture due to the biomechancial load on the spine. My main concern is that the authors did not discuss how they even classify these athletes into no, mild and moderate LBP, when multiple reviews have been done regarding this subject. Another main issue is lack of discussion how some of the variables used in this study might be highly correlated, which is known to cause instability on a predictive model. Last but not least, the risk factors identified in this study had previously been discovered in prior study, so what is the added value of this study? How would the algorithm be used in real life setting?

Below are more detailed comments on this paper:

Intro

- Line 5: up to 30% of adolescents affected. The cited reference 2 is not a demographic study but qualitative study regarding young athlete’s lived in experience. not appropriate for this statement

- Ref 4 narative review, ref 7 is case report, ref 8 is scoping review. Can you provide more objective sources of information such as large demographic studies, network meta analysis or randomized controlled trials?

- Line 34: How does argument 6 supports the argument about the scarcity of biomechanical assessment in Dhaka?

Methods

- Line 60-64: Who categorized the LBP severity, based on what standard and what is the related reference used to use this standard?

- Line 75-99 provide justifications for different types of participant classifications such as height, weight, all the way to playing position with no reference on why these are important relative to CLBP.

- Line 126-7: “ it was less effective in capturing nonlinear associations compared to tree-based or kernel-based models.” Any reference for this?

- Overall, please cite proper references as basis of your methodology

Results

- Some of these factors are highly correlated to each other, such as age and level of education within recruited demographics, assuming none of the recruited participants will drop out on elementry. Use of highly correlated variables to create a predictive model can create an unstable model. How come there’s no assessment on the variable correlation?

Discussion

line 292-296: your results may appear to contradict reference 19, when reference 19 refers to physical education that not all participants received, while your education level may be strongly correlated with age. Receiving education is not equal to receiving physical education.

**********

what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy..-->

Reviewer #1: No

Reviewer #2: Yes: Hamidreza AshayeriHamidreza AshayeriHamidreza AshayeriHamidreza Ashayeri

Reviewer #3: No

**********

Figure resubmission:-->--> -->While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix.--> -->

Reproducibility:-->--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->?>

PLOS Digit Health. doi: 10.1371/journal.pdig.0001369.r003

Decision Letter 1

Cleva Villanueva

26 Dec 2025

Response to Reviewers'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.-->-->* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.'.-->-->* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.'.-->--> -->-->If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.-->--> -->-->We look forward to receiving your revised manuscript.-->--> -->--> -->-->Kind regards,-->--> -->-->Cleva Villanueva, M.D., Ph.D.-->-->Academic Editor-->-->PLOS Digital Health-->--> -->-->Cleva Villanueva-->-->Academic Editor-->-->PLOS Digital Health-->--> -->-->Leo Anthony Celi-->-->Editor-in-Chief-->-->PLOS Digital Health-->-->orcid.org/0000-0001-6712-6626-->--> -->--> -->-->Journal Requirements:-->--> -->-->If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. -->--> -->--> -->-->Additional Editor Comments:-->--> -->-->We appreciate the effort made to revise the manuscript and to address several of the reviewers’ comments. The revised version shows improvements; however, an important methodological concern remains unresolved.

Specifically, the manuscript lacks external validation of the proposed model. While the authors state that they do not currently have the resources to perform external validation and mention this as future work, external validation is a key requirement to ensure the reproducibility, robustness, and generalizability of the results. Even a limited or temporary external validation (e.g., using an independent dataset, a temporal split, or data from a different setting) would substantially strengthen the manuscript.

Without external validation, the conclusions remain insufficiently supported for publication at this stage. We therefore encourage the authors to address this issue directly by providing an appropriate form of external validation or, alternatively, by clearly justifying why such validation cannot be performed and revising the scope and claims of the manuscript accordingly.

Reviewers' Comments:-->-->Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.-->?>

Reviewer #1: No

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #1: No

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: Thanks for the corrections added but it will be nice if this model was externally validated. This seems like the only missing piece in the puzzle.

Reviewer #2: I`m grateful for the chance to review the revised version of this manuscript. The authors have addressed the reviewers' comments, and I appreciate their dedication.

They have included essential statistical analyses, such as multicollinearity testing, which enhances the value of their work. They have also applied fundamental revisions in the main text, especially in the discussion section, improving the content of their manuscript.

Their work may prove beneficial for the application of artificial intelligence in low-resource areas.

**********

what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy..-->

Reviewer #1: No

Reviewer #2: Yes: Hamidreza AshayeriHamidreza AshayeriHamidreza AshayeriHamidreza Ashayeri

**********

Figure resubmission:-->--> -->While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix.--> -->

Reproducibility:-->--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->?>

PLOS Digit Health. doi: 10.1371/journal.pdig.0001369.r005

Decision Letter 2

Cleva Villanueva

25 Feb 2026

Response to Reviewers'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.-->-->* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.'.-->-->* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.'.-->--> -->-->If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.-->--> -->-->We look forward to receiving your revised manuscript.-->--> -->-->Kind regards,-->--> -->-->Cleva Villanueva, M.D., Ph.D.-->-->Academic Editor-->-->PLOS Digital Health-->--> -->-->Cleva Villanueva-->-->Academic Editor-->-->PLOS Digital Health-->--> -->-->Leo Anthony Celi-->-->Editor-in-Chief-->-->PLOS Digital Health-->-->orcid.org/0000-0001-6712-6626-->--> -->--> -->-->Journal Requirements:-->--> -->-->If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. -->--> -->-->Additional Editor Comments (if provided):-->--> -->-->Dear Authors,

Your manuscript has significantly improved, and the revisions have addressed the major methodological concerns raised in the previous round of review. In particular, the addition of an independent external validation cohort (n = 42) substantially strengthens the study and directly resolves the key gap identified earlier. The revised manuscript is now very close to being publication-ready.

Thank you for the thorough and thoughtful revision.

There are, however, a few remaining points that should be addressed prior to acceptance:

1. Reporting uncertainty in external validation

Please include measures of uncertainty for the performance metrics obtained from the external cohort (e.g., 95% confidence intervals via bootstrap methods or exact binomial CIs for class recalls). Given the relatively small sample size (n = 42), performance estimates may be unstable. Reporting confidence intervals will ensure appropriate calibration of the claims. Additionally, consider including the external confusion matrix in the main text or Supplementary Materials.

2. Clarification of hyperparameter tuning and preprocessing

Please clarify how grid search and preprocessing were conducted relative to cross-validation. If hyperparameters were tuned on the full dataset prior to cross-validation, reported performance may be optimistic. A brief statement confirming that tuning occurred within training folds (e.g., nested cross-validation or equivalent) would resolve this concern.

3. Dataset balancing procedure

The manuscript states that the dataset is balanced (150 samples per class), but the method and stage at which balancing was performed are not fully explained. Please specify whether under-sampling, over-sampling, or class weighting was used. If feasible, consider including a brief sensitivity analysis using the original class distribution (or class weights) to demonstrate robustness.

4. Data Availability statement

There appears to be inconsistent language regarding whether the dataset is fully available within the manuscript/Supplementary Information or “available upon request.” PLOS requires that the underlying data necessary to replicate the results be publicly accessible without restriction, except in rare and justified cases. Please ensure that the final Data Availability statement is fully compliant and consistent throughout the manuscript.

Reviewers' Comments:-->--> -->-->Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #4: (No Response)

**********

publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.-->?>

Reviewer #1: Yes

Reviewer #4: No

**********

3. Has the statistical analysis been performed appropriately and rigorously?-->?>

Reviewer #1: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)??>

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: Yes

Reviewer #4: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #4: Yes

**********

Reviewer #1: All concerns have been addressed.

Reviewer #4: Thank you for the thorough revision. The addition of an independent external validation cohort (n = 42) substantially strengthens the work and directly addresses the key methodological gap raised previously. The revised manuscript is much closer to being publication-ready.

Key remaining points to address:

1. Please add uncertainty around performance on the external cohort (for example 95% CIs via bootstrap or exact binomial CIs for class recalls). With n = 42, estimates can be unstable, and reporting CIs will make the claims appropriately calibrated. Consider also showing the external confusion matrix in the main text or supplement.

2. Clarify exactly how grid search and preprocessing were conducted relative to cross-validation. If hyperparameters were tuned on the full dataset before CV, reported CV performance may be optimistic. A brief statement confirming tuning within training folds (nested CV or equivalent) would resolve this.

3. The text states the dataset is balanced (150 per class) but does not fully explain how balancing was achieved and at what step. Please specify whether you used under-sampling, over-sampling, or weighting, and consider a short sensitivity analysis using the original class distribution (or class weights) to show robustness.

4. The manuscript contains potentially conflicting language about data being in the manuscript/SI vs “available on request.” PLOS requires the underlying data needed to replicate results to be publicly available without restriction except rare justified cases, so please ensure the final Data Availability statement is fully compliant and consistent throughout.

**********

what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy..-->

Reviewer #1: No

Reviewer #4: No

**********

Figure resubmission:-->--> -->--> -->While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix.--> -->

Reproducibility:-->--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->--> -->-->To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols-->?>

PLOS Digit Health. doi: 10.1371/journal.pdig.0001369.r007

Decision Letter 3

Cleva Villanueva

30 Mar 2026

Machine Learning-Based Pattern Recognition of Risk Factors for Low Back Pain among Adolescent Cricket Players in Dhaka City

PDIG-D-25-00567R3

Dear Marzana Afrooj Ria,

We are pleased to inform you that your manuscript 'Machine Learning-Based Pattern Recognition of Risk Factors for Low Back Pain among Adolescent Cricket Players in Dhaka City' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Cleva Villanueva, M.D., Ph.D.

Academic Editor

PLOS Digital Health

***********************************************************

Additional Editor Comments (if provided):

After carefully reviewing the original manuscript, the revised versions, and the reviewers’ comments, the editor has decided to accept the manuscript. The authors have adequately addressed all reviewer comments, and the manuscript fulfills all the requirements for publication in PLOS Digital Health

Reviewer Comments (if any, and for reference):

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Informed consent form.

    The written consent form was administered to the participants.

    (DOCX)

    pdig.0001369.s001.docx (17.2KB, docx)
    S2 Appendix. Questionnaire.

    The included questions comprised the questionnaire administered to the participants.

    (DOCX)

    pdig.0001369.s002.docx (59.3KB, docx)
    S3 Appendix. Data.

    This file contains primary research data.

    (CSV)

    pdig.0001369.s003.csv (30.7KB, csv)
    S4 Appendix. Data.

    This file contains research data for external validation.

    (CSV)

    Attachment

    Submitted filename: Rebuttal Letter.docx

    pdig.0001369.s006.docx (39.6KB, docx)
    Attachment

    Submitted filename: Rebuttal_Letter_auresp_2.docx

    pdig.0001369.s007.docx (26.1KB, docx)
    Attachment

    Submitted filename: Rebuttal_Letter_auresp_3.docx

    pdig.0001369.s008.docx (26.7KB, docx)

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES