Abstract
Background
As obesity presents a growing public health challenge, demand for personalized fitness solutions has increased. This study evaluates GPT-4's effectiveness as a virtual fitness coach for creating personalized plans.
Method
A 24-year-old Chinese female's data was used by GPT-4 and three professional coaches to develop 16-week fitness plans. Experts evaluated these plans on personalization, effectiveness, comprehensiveness, and safety. Statistical analyses were performed.
Results
GPT-4 excelled in personalization (M = 12.80, SD = 0.84) compared to coaches (M = 11.53, SD = 0.46), However, the difference wasn't big enough to be considered statistically significant (p > 0.05). In terms of effectiveness (GPT-4: M = 12.60, Coaches: M = 12.80), safety (GPT-4: M = 12.20, Coaches: M = 12.33), and comprehensiveness (GPT-4: M = 12.00, Coaches: M = 12.13), coaches slightly outperformed GPT-4, But again, these differences didn't reach statistical significance (p > 0.05).
Conclusions
GPT-4 shows promise as a virtual fitness coach but cannot fully replace human coaches due to technological limitations. Future research should explore enhancing AI models' applicability in sports and their collaboration with coaches for optimal personalized fitness solutions.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12889-025-22739-8.
Keywords: GPT- 4, Virtual fitness coach, Personalized fitness solutions, Technological limitations, Health sciences
Introduction
Obesity has become a significant public health challenge in contemporary society [46, 47]. Research predicts that by 2035, approximately 1.9 billion people worldwide will be facing obesity issues [50], potentially making it one of the most significant health threats of the twenty-first century [26, 50]. On the other hand, obesity has been proven to be detrimental to both physical and psychological health [59]. The World Health Organization warns that the rising rates of obesity will increase the risks of various diseases, including heart disease, diabetes, and cancer [1, 6, 24].
Currently, there are a variety of treatment options for obesity including pharmacological interventions [4] and surgical procedures [35, 48]. However, these methods are considered to carry associated health risks and may not provide long-term control of obesity [12, 13]. In this context, behavioral therapy is the most advocated approach, primarily involving dietary adjustments and increased physical activity [53, 66]. To date, countless studies have demonstrated that physical exercise plays a crucial role in controlling weight and improving health [39, 44]. However, in a wide range of physical activities and exercises, we are more concerned with maximizing the benefits of exercise while minimizing the risks. Substantial research has indeed demonstrated that improper training methods can lead to injury. Considering the differences among individuals, personalized scientific planning is crucial. Buford [7] posits that personalized exercise programs are key in addressing obesity and maintaining a healthy lifestyle. This customized approach takes into account individual characteristics and needs, thereby avoiding the negative impacts that may arise from non-customized exercise plans [7].
For the average individual, there is a lack of knowledge in training science, making it challenging to design appropriate plans for themselves. Although fitness coaches can offer professional advice, their high costs and the inconsistency in professional quality make widespread accessibility difficult to achieve. As highlighted in Galiuto's [17] study, patients with heart disease require personalized exercise guidance. In such cases, the involvement of professional coaches is critically important. However, due to cost and accessibility issues, such services are not widely available [17]. Especially in developing countries like China, the demand brought by the massive population does not match the relatively underdeveloped community sports service resources, which further reduces the opportunities for the general public to obtain high-quality fitness guidance [45].
Against this backdrop, as an emerging natural language processing tool, GPT (especially the latest version, GPT- 4) demonstrates substantial potential [2]. It shows promise in providing cost-effective, user-friendly personalized health and fitness advice. Research by Hirak Mazumdar, K. Bhattarai, and others suggests that personalized health services are becoming a reality [3, 38]. This could potentially transform traditional fitness models and drive innovation in the fitness industry. Current research underscores the potential application of GPT in the field of exercise science. For instance, Wang M et al. [55] summarized the performance of Chat GPT across various domains in their research, stating that its performance in the field of exercise science [51] leaves much to be desired. However, there is a notable lack of in-depth research into the development and evaluation of personalized training programs. Vogel's study [54] explored the ability of GPT to identify individual preferences for exercise plans, but did not thoroughly evaluate the actual effectiveness of these plans [54]. Although the research by Washif J et al. confirmed that GPT- 4 has improved in designing personalized resistance training programs compared to its predecessors, this improvement was only observed in comparisons between artificial intelligence [60]. Our study seeks to conduct a comparative case analysis of 16-week fitness programs developed by artificial intelligence (GPT- 4) and human coaches, aiming to preliminarily evaluate the potential role and efficacy of GPT- 4 in mitigating or addressing the uneven distribution of traditional fitness coaching resources.
Methods
Ethical considerations
This study was approved by the Ethics Review Committee of Southwest University, China (Approval Code: SWU-PE- 20240311; Approval Date: March 11, 2024), adhering to the Declaration of Helsinki. Participants were informed of procedures, risks, benefits, confidentiality, and withdrawal rights via Mandarin Chinese interviews. Written informed consent was obtained, ensuring voluntary participation and compliance with ethical standards. No compensation was provided. Data were de-identified, with personal information securely stored and accessible only to researchers.
Test samples
The rising incidence of obesity among young populations has attracted widespread attention [43]. Considering the exploratory nature of this study and resource constraints, we adopted a case study approach to conduct an in-depth analysis of GPT- 4’s capabilities in fitness plan generation.
Participants in this study were recruited from the Weight Loss and Fitness Club at X University, a student-initiated and self-managed group driven by shared interests and goals, aimed at promoting weight management and healthy lifestyles. To ensure the scientific rigor and representativeness of the sample, participant selection adhered to strict inclusion and exclusion criteria.
Inclusion criteria required participants to be aged 18–25 years, aligning with the demographic most affected by rising obesity trends in young adulthood [22]. A body mass index (BMI) range of 24.0–26.0 kg/m2was specified, corresponding to the overweight category as defined by the World Health Organization for Asian populations, which is associated with increased health risks [52]. A body fat percentage ≥ 28% was required, reflecting thresholds linked to elevated metabolic risk in young females [18]. Participants were also required to exhibit insufficient physical activity, defined as engaging in less than 150 min of moderate-intensity exercise per week, consistent with global physical activity guidelines [8]. Finally, eligibility was contingent upon clearance through the Physical Activity Readiness Questionnaire (PAR-Q), ensuring no contraindications to exercise [58].
Exclusion criteria included pregnancy, diagnosed metabolic disorders such as diabetes or thyroid disease, and use of medications known to affect body weight, such as hormonal or antidepressant drugs.
From the Weight Loss and Fitness Club database, which included 30 members, we used stratified sampling to identify 15 eligible candidates. Following professional evaluation by two sports medicine physicians, we applied purposive sampling to select a 24-year-old female participant with mild overweight and a sedentary lifestyle as the study subject. This sampling approach was designed to maximize the representativeness of the sample for urban sedentary female populations while ensuring that her anthropometric parameters and lifestyle characteristics aligned with those of the target population [28]. The baseline anthropometric and physiological data for this participant are as follows (Table 1): body mass index (BMI) 25.1 kg/m2, body fat percentage 30.2%, and resting heart rate 74 bpm. These values were consistent with the mean (± standard deviation) from preliminary surveys of women in the same university cohort (BMI: 24.8 ± 1.2 kg/m2; body fat percentage: 29.7 ± 2.1%; resting heart rate: 76 ± 3 bpm), indicating that the participant's baseline characteristics fell within the typical range (± 1 SD) for this population. The participant, a 24-year-old female with mild overweight (defined as BMI 25.0–27.0 kg/m2) and a sedentary lifestyle (self-reported physical activity < 150 min/week, consistent with WHO guidelines), was selected to represent a subgroup commonly observed in urban academic settings.
Table 1.
Basic information of test sample
Parameter | Test Value | Parameter | Test Value |
---|---|---|---|
Height | 159 cm | BMI | 25.1 |
Weight | 63.5 kg | Body Fat Percentage | 30.24% |
Standard Weight (Reference) | 49 KG | ||
Waist Circumference | 86 CM | Waist-to-Hip Ratio | 0.86 |
Hip Circumference | 100 CM | ||
Degree of Obesity (%) | 13.5% | Basal Metabolic Rate | 1701 kcal |
Resting Heart Rate | 74 bpm | Obesity Type | Overweight |
Based on the screening results from the Physical Activity Readiness Questionnaire (PAR-Q) [10], an exercise prescription was subsequently formulated
While a single participant cannot fully represent the broader population, case studies are recognized as a valid methodology for preliminary investigations of novel technologies [64]. This approach allowed us to rigorously evaluate GPT- 4’s technical feasibility and iterative refinement process, laying a foundation for future large-scale studies.
Study design
The aim of this case study is to compare the differences between GPT- 4 and fitness coaches in formulating personalized fitness plans through expert evaluations, and to assess the potential of GPT- 4 as a virtual fitness coach.
Selection of experts
The selection of experts was based on their extensive experience and specialized knowledge in sports, fitness coaching, and related medical disciplines, adhering to the following criteria: possession of nationally or internationally recognized professional certifications, such as those from the China Bodybuilding Association, the American Council on Exercise, the American College of Sports Medicine, the National Academy of Sports Medicine, or equivalent qualifications; at least five years of practical experience focused on designing fitness programs, managing weight, or implementing health interventions; and no prior involvement in the development or testing of GPT- 4 or other AI models to ensure independence and avoid potential conflicts of interest.
To assess exercise prescriptions, we adopted a collaborative approach, integrating the expertise of physicians, exercise scientists, and fitness coaches [68]. In 2024. For this evaluation, we invited senior experts XXH and ZJ from the China Fitness and Bodybuilding Association, trainers PGF and SXB from the School of Physical Education at Chongqing University of Posts and Telecommunications, and ZZW from Huawei's Health Technology Department (Table 2). These experts not only hold professional fitness certifications but also possess years of experience in fitness and weight loss coaching.
Table 2.
Basic information of experts
Name (Abbreviation) | Affiliation | Field of Work | Years of Experience | Expertise Domains |
---|---|---|---|---|
XXH | China Fitness and Bodybuilding Association | Sports Medicine | 15 | Chronic disease rehabilitation through exercise; injury prevention in athletes |
ZJ | China Fitness and Bodybuilding Association | Sports Medicine | 12 | Sports nutrition integration; post-surgical fitness programming |
SXB | Chongqing University of Posts and Telecommunications | Fitness and Bodybuilding | 8 | Body composition optimization; weight management for young adults |
PGF | Chongqing University of Posts and Telecommunications | Equipment-based Fitness | 10 | Resistance training program design; biomechanics of equipment-based exercises |
ZZW | Huawei Health Technology Department | Health Science | 6 | Wearable technology integration; data-driven personalized health interventions |
Application of GPT
In this study, we employed the initial version of GPT- 4, which was released by OpenAI on March 14, 2023. Although it is not the latest model from OpenAI, it has been extensively validated and tested, demonstrating exceptional performance across various fields.To better verify the specific performance of GPT- 4 during weight loss, we adopted a comprehensive strategy. We have adopted a comprehensive strategy. Firstly, based on detailed body data of a 24-year-old female, including age, height, weight, BMI, and body fat percentage, we utilized the GPT- 4 model to generate a 16-week training program. This duration is considered ideal as it ensures effective training outcomes while effectively preventing overtraining and the accumulation of fatigue [5, 25]. The program is designed to meet her needs for muscle gain and fat loss, while also improving cardiovascular function and reducing heart rate, with the ultimate goal of enhancing overall physical fitness. Subsequently, we further instructed GPT- 4 to enhance the original program by adjusting parameters such as training intensity, rest intervals, and exercise cadence. This was done to ensure the specificity and scientific validity of the program [19, 60].
Given that GPT- 4 tends to stabilize after generating 15 consecutive question-and-answer pairs [23]. We have focused our analysis on a single comprehensive training plan generated by GPT- 4 (n = 1). This plan is meticulously crafted based on the insights and adjustments derived from 15 interactions between GPT- 4 and the user. We will compare this plan with the exercise plans provided by three professional coaches (n = 3), each bringing their unique expertise to the table. Our goal is to evaluate the practicality and accuracy of these plans in fitness guidance, ensuring that our assessment is both fair and scientifically sound.
Here are the specific prompts we provided to GPT (Table 3):
Table 3.
Prompt content
Sequence | Content |
---|---|
PROMPT1 | You are a seasoned fitness expert and an ACE-certified personal trainer. Your task is to generate a 16-week weight loss training plan based on the personal physical metrics I provide, aiming to enhance physical fitness, reduce excess body fat, improve cardiovascular function, and lower heart rate. Here are the specific personal details: Female; 24 years old; Height: 159 cm; BMI: 25.1; Weight: 63.5 kg; Body fat percentage: 30.24%; Reference standard weight: 49 kg; Waist circumference: 86 cm; Waist-to-hip ratio: 0.86; Hip circumference: 100 cm; Obesity degree: 13.5%; Basal metabolic rate: 1701 kcal; Resting heart rate: 74 bpm; Obesity type: Overweight |
PROMPT2 | Due to high academic pressure, I have little time for exercise and do not restrict my diet or calorie intake. Currently, I can only train about three times a week. Please personalize the plan accordingly. If you need more detailed information, let me know in advance |
PROMPT3 | Please use the NASM OPT model combined with appropriate aerobic exercises and adjust according to the FITT principles to provide a training plan that can improve my exercise performance. Also, specify the precautions during the workout process |
PROMPT4 | Please include the following information in the training plan: load intensity (percentage), rest intervals, and exercise tempo |
PROMPT5 | Briefly explain the rationale for the recommended plan details or variables in bullet points |
PROMPT6 | Detail the workout content for each day in a concise manner and output it in an EXCEL table format |
The more detailed the questions posed to GPT, the higher the accuracy of the responses received [23]. Continuous follow-up questions were made to refine and perfect the training plan. The above only showcases the main prompts
Selection and consultation of fitness coaches
Given that the study participants were women residing in Beibei District, Chongqing, and to ensure that the findings are highly relevant and applicable to the general population in similar urban settings, the research team carefully selected three top-rated fitness coaches from the top three gyms in Beibei District. This selection was based on user reviews from Dianping, a leading Chinese online platform for merchant evaluations, as well as nationally recognized certifications. The selection criteria for these fitness coaches were as follows: a minimum rating of 4.8 out of 5 on Dianping; completion of at least 100 client cases in the past year, with a client retention rate exceeding 80%. During the study, to ensure comparability with the input parameters of GPT- 4, the fitness coaches were required to meticulously design training plans based on predefined key parameters, such as training frequency and fitness goals, and were strictly prohibited from requesting any additional information from the participants.
Collection of feedback from the research subject
To comprehensively evaluate the effectiveness of the fitness plan generated by GPT- 4, we also incorporated the collection of feedback from the research subject. This involved conducting semi-structured interviews with the participant during the implementation of the plan to obtain her personal experience and satisfaction with the fitness plan. Additionally, through the semi-structured interviews, we were able to gain in-depth insights into the participant's feelings, experiences, and acceptance of the fitness plan. These qualitative data provided us with valuable first-hand information, which is helpful in understanding the actual impact of the fitness plan and the personal needs of the participant.
Evaluation criteria and procedure
We employed a single-blind evaluation method, where experts were unaware of whether the plans were generated by GPT or real fitness coaches. The evaluation criteria were primarily based on the FITT model (Frequency, Intensity, Time, Type) [10, 11], a well-established framework for assessing exercise programs, and were structured around the following four key dimensions:
Personalization: This dimension examines whether the plan adequately considers the participant's age, gender, current fitness level, existing health issues, personal fitness goals, and exercise preferences. The aim is to evaluate whether the plan can effectively adapt and adjust to individual characteristics.
Effectiveness: This dimension focuses on the plan's functionality in achieving the predetermined fitness goals, such as improving cardiovascular health, increasing muscle strength, and enhancing flexibility. It assesses whether the frequency, intensity, time, and type of exercises are scientifically designed to meet individual goals and whether the plan includes a progressive strategy to achieve these goals.
Safety: The core of this dimension is whether the plan adheres to safety guidelines and best practices. It evaluates whether the plan considers the individual's physical limitations, injury risks, or special medical conditions and whether it provides correct exercise guidance and techniques to prevent injuries during the workout process.
Comprehensiveness: This dimension addresses whether the plan holistically meets the individual's primary fitness goals and overall mental and physical health. It examines whether the plan integrates cardiovascular training, strength training, flexibility training, and neuromotor training. Additionally, it considers factors related to a healthy lifestyle, such as stress management, sleep, and nutrition.
Each dimension includes four sub-criteria, scored on a four-point Likert scale [56] (1 = Unacceptable, 2 = Acceptable, 3 = Satisfactory, 4 = Very Satisfactory). This scale design intentionally omits a neutral midpoint to encourage evaluators to provide definitive judgments, thereby reducing ambiguity in expert assessments. Similar approaches have been validated in prior studies evaluating AI-generated fitness plans, where forced-choice scales improved discriminative validity by minimizing central tendency bias. Each sub-criterion has a maximum score of 4 points, resulting in a maximum score of 16 points per dimension. Consequently, the total score for the entire evaluation system is 64 points (Table 4).
Table 4.
Evaluation criteria dimension
Dimension | Sub-criteria | Highly Satisfied | Satisfied | Acceptable | Unacceptable |
---|---|---|---|---|---|
Personalization | (1) Does it match the individual's age, gender, and current fitness level? | ||||
(2) Does it take into account the individual's medical history, any existing injuries or limitations, and personal fitness goals? | |||||
(3) Does it exhibit flexibility to adapt to changes in personal circumstances or preferences over time? | |||||
(4) Does it consider the individual's exercise preferences and obstacles to sustaining exercise? | |||||
Effectiveness | (1) Can it address the specific physiological adaptability required to achieve personal fitness goals (e.g., improving cardiovascular health, muscle strength, and flexibility)? | ||||
(2) Analyzes the variables of the exercise plan such as frequency, intensity, time, and type in relation to personal fitness goals and current capabilities | |||||
(3) Assesses the progressive plan of the exercise regime, ensuring a gradual and appropriate increase in workout intensity, duration, and complexity | |||||
(4) Evaluates whether objective assessments (such as fitness tests, body composition analysis) are incorporated to track and monitor individual progress | |||||
Safety | (1) Does it comply with recognized safety guidelines and recommended best practices for exercise? | ||||
(2) Does it take into consideration the individual's current physical limitations, injury risks or medical conditions, and whether the exercise selection and modification are appropriate? | |||||
(3) Includes correct exercise technique guidance and monitoring to prevent injuries | |||||
(4) Considers exercise intensity, duration, and recovery periods to ensure the individual's safety and health | |||||
Comprehensiveness | (1) Does it not only meet the individual's primary fitness goals but also cater to their overall physical and mental health? | ||||
(2) Incorporates a balanced approach to fitness, including cardiovascular, muscle, flexibility, and neuromotor training components | |||||
(3) Includes stress management techniques such as relaxation, mindfulness, or recovery activities | |||||
(4) Considers personal lifestyle factors (such as sleep, nutrition) and their integration with these health aspects |
Data analysis
We confirmed the normality of the data using the Shapiro–Wilk test and the Kolmogorov–Smirnov test. The scoring results are presented as means and standard deviations. Furthermore, we conducted inferential statistical analysis using an independent samples t-test to compare the effectiveness of the two types of fitness plans and to explore the potential capabilities of GPT- 4 in the role of a fitness coach. All data analyses were performed using the statistical software SPSS (version 26), with the significance level set at 0.05.
Results
This study compared the fitness plans generated by GPT- 4 with those created by three professional fitness coaches using both descriptive and inferential statistical analyses. The evaluation focused on four key dimensions: personalization, effectiveness, safety, and comprehensiveness.
Descriptive statistics
We used descriptive analysis revealed performance differences between GPT- 4 and human coaches. GPT- 4 showed higher personalization scores (M = 12.80, SD = 0.84) compared to coaches (M = 11.53, SD = 0.46), while coaches outperformed GPT- 4 slightly in effectiveness (ΔM = 0.20), safety (ΔM = 0.13), and comprehensiveness (ΔM = 0.13). However, these differences were not statistically significant (all p > 0.05), suggesting comparable overall performance between the two types of fitness plans.
Inferential statistics
Independent samples t-tests corroborated the descriptive findings (Table 5).
Table 5.
Results of independent samples t-test
Dimension | GPT- 4 Mean | Coaches Mean | t-value | df | p-value | 95% Confidence Interval |
---|---|---|---|---|---|---|
Personalization | 12.80 | 11.53 | − 2.38 | 2 | .141 | [− 3.56, 1.03] |
Effectiveness | 12.60 | 12.80 | 0.87 | 2 | .478 | [− 0.79, 1.19] |
Comprehensiveness | 12.20 | 12.33 | 0.14 | 2 | .899 | [− 3.88, 4.14] |
Safety | 12.00 | 12.13 | 0.16 | 2 | .885 | [− 3.36, 3.62] |
These findings confirm that, under the current sample and evaluation criteria, GPT- 4 and human coaches'fitness plans perform similarly across all evaluated dimensions.
These findings suggest that (Fig. 1), without considering ongoing feedback and adjustments to the training plans, GPT- 4 has potential in creating personalized exercise prescriptions. Nonetheless, its plans remain comparable to those by professional coaches in effectiveness, safety, and comprehensiveness, reflecting similar overall performance.
Fig. 1.
Comparison of scores between the two groups of training plans
Subjective information from the participant's feedback
Interview results show that after consistently following the workout routine, the participant expressed satisfaction with the overall design of the fitness plan generated by GPT- 4. She noted that GPT- 4 was able to create a personalized fitness program based on her provided physiological characteristics, accompanied by relevant precautions. The plan was easy to understand and implement, offering detailed step-by-step guidance, which effectively supported the consistency and continuity of her workouts. However, she also expressed some concerns, primarily related to the lack of real-time feedback, such as uncertainty about the accuracy of her movements and whether the exercise intensity met expectations. Additionally, she mentioned feeling bored and unmotivated when exercising alone. She further pointed out that having a real coach present to provide real-time guidance would not only boost her confidence but also enhance her sense of safety during exercise, an aspect that GPT- 4 currently cannot address.
This feedback highlights the potential of GPT- 4 in personalized fitness guidance, particularly in terms of customization and usability, but it also underscores the significant value of human coaches in providing immediate feedback and dynamic adjustments. This suggests that combining artificial intelligence technology with professional human guidance may offer users a more comprehensive and effective fitness experience.
Discussion
This study adopted a case study approach to preliminarily explore the differences between artificial intelligence (GPT- 4) and human fitness coaches in formulating weight loss plans for individuals with obesity. The results indicate that GPT- 4 exhibits a certain advantage in the design of personalized plans, yet it shows no significant differences compared to human coaches in terms of effectiveness, safety, and comprehensiveness. Interview findings reveal that the participant expressed a high level of approval for both types of plans, perceiving the GPT- 4-generated plan as more structured and easier to implement. However, limitations were evident in its capacity for real-time feedback and personalized adjustments, particularly its difficulty in providing dynamic support tailored to individual emotions and immediate guidance. The preliminary exploration of this study suggests that artificial intelligence holds potential in the personalization of fitness plans, potentially alleviating, to some extent, the issue of unequal access to fitness resources. Nevertheless, in aspects such as real-time monitoring, emotional interaction, and individualized guidance, it remains unable to replace human coaches.
Performance of GPT- 4 generated plans
In our evaluation, the primary advantage of GPT- 4 lies in its robust personalization capabilities, attributable to its deep learning algorithms and extensive data processing capacity [57]. However, the performance of GPT- 4 in meeting personalized needs did not fully achieve statistical significance, which could be attributed to the small sample size and limitations during model training [30]. GPT- 4 demonstrated a substantial mean difference in personalization (ΔM = 1.27), an independent-samples t-test indicated that this difference did not reach statistical significance (t = − 2.38, p = 0.141). This may be attributed to the limited statistical power resulting from the small sample size [9]. However, the effect size for personalization, as measured by Cohen’s d (1.87), suggests a potentially meaningful practical difference based on conventional effect size classifications. This finding aligns with Wang et al. [55], who reported the advantages of large language models in personalized services, though further validation with larger samples is warranted. Additionally, there were no significant differences found in effectiveness, comprehensiveness, and safety when compared to human coaches, indicating that GPT- 4 can provide fitness guidance of a quality comparable to that of professional fitness coaches in these critical dimensions. This study further supports the findings of Washif et al., who noted that Chat GPT can quickly generate detailed resistance training prescriptions [60]. However, Wang et al. [55] pointed out that the application of Chat GPT in the field of exercise science is not entirely satisfactory [55]. The divergence in these viewpoints may be due to several factors. First, Chat GPT is based on a large language model (LLM) driven by the GPT architecture, which processes natural language through neural networks and can be trained on large-scale multilingual text data to generate human-like responses [30]. The rapid pace of updates and iterations means that the language models used in different studies may vary. Second, the prompts used in the studies differ. Properly crafted prompts can significantly enhance the response quality of large language models (LLMs) [62]. deeplearning.ai has launched an online course which suggests some basic guidelines for developing prompts [42]. Guohao Li confirmed the efficacy of these prompt resources [34].
Accessibility and convenience of GPT- 4 reduces the cost of weight loss
In China, a developing country [21], the vast demand for high-quality fitness guidance is mismatched by the limited public sports and fitness services, making it difficult for the general population to access personalized fitness guidance [45]. To quantify the cost of personalized fitness guidance, we conducted an on-site survey across 10 commercial fitness centers in Beibei District, Chongqing [67]. Data collection included direct inquiries about pricing for one-on-one training sessions focused on weight loss and body shaping. The average cost per session was calculated as 240 yuan (approximately$37 USD), with prices ranging from 200 to 300 yuan depending on trainer qualifications and facility tier. This represents a significant expense, particularly in cities with relatively limited sports resources. Consequently, the widespread implementation of effective and personalized health interventions across society continues to face numerous challenges [37]. However, the advent of artificial intelligence (AI) technology introduces new strategies in the health promotion realm [32]. GPT- 4, demonstrating exceptional performance across various domains, emerges as a preferred option due to its convenience and accessibility [32]. The training plans and dietary recommendations generated by GPT- 4, based on users'detailed information, can rival the guidance provided by professional fitness coaches, offering an efficient and cost-effective approach to fitness. While the current study has concentrated on static plan generation, it's important to recognize that GPT- 4's open API facilitates integration with real-time monitoring systems—think wearable devices or gym equipment sensors. This capability allows for plans to be dynamically adjusted based on immediate user feedback. For instance, as highlighted by research into AI-driven fitness technologies, numerous fitness centers have already adopted AI-powered monitoring systems to track exercise performance and provide instant feedback. This clearly demonstrates the potential for incorporating real-time feedback. Although implementing such technology does entail additional costs, it nonetheless offers a promising avenue for enhancing AI adaptability in practical scenarios. Looking ahead, future endeavors should prioritize merging GPT- 4's generative strengths with real-time data streams to effectively bridge the divide between personalized planning and dynamic execution. The advantages of using AI models like GPT- 4 not only lower the cost of personalized fitness guidance but also significantly enhance its accessibility, especially in areas with limited fitness resources. This AI-driven approach aligns with the global trend of leveraging technology to overcome barriers in medical and health services [65]. Future research should focus on assessing the efficacy of GPT- 4 in delivering personalized health and fitness interventions that seamlessly integrate into people's daily lives. This not only addresses the accessibility issues of fitness guidance but also plays a significant role in combating obesity and promoting healthier lifestyles worldwide.
While GPT- 4 excels in generating personalized fitness plans, it cannot provide real-time corrective feedback on exercise techniques or emotional support, which are critical for adherence and safety. Participant feedback highlights a potential risk: concerns about the accuracy of their movements—such as fears of injury due to improper posture—may be linked to the relatively lower expert-rated safety score of the AI-generated fitness plan (ΔM = 0.13). Although this numerical difference does not reach statistical significance, qualitative analysis suggests that the absence of real-time corrective feedback could compromise the plan’s practical safety. This observation aligns with findings in the field of health technology, where users’ perceived risks of AI-driven systems often stem from limitations in interactive design [20], and such subjective concerns may affect real-world safety before measurable statistical differences emerge. These limitations may affect the practical application of the fitness plans, particularly for users with medical conditions or limited exercise experience.
Limitations
This study has several key limitations. The single-participant case study design fundamentally restricts the generalizability of the findings [23]. Although all participating fitness experts were professionally certified and demonstrated high levels of expertise, expanding the sample posed practical difficulties. Each additional participant required the expert team to review roughly twenty individualized plans, substantially increasing the workload and placing considerable strain on time and coordination, particularly under limited funding conditions. Moreover, the sample focused exclusively on young women with mild obesity, excluding other important demographic groups such as individuals of different ages, sexes, those with severe obesity, or those affected by metabolic syndrome [16]. This limits the comprehensiveness of the conclusions. Future studies should aim to expand both the cohort size and its diversity, prioritizing the inclusion of participants with varied gender, age, and obesity profiles to validate broader applicability [40]. To rigorously assess the impact of AI-generated interventions, key anthropometric indicators will be systematically measured [29], along with essential blood biomarkers [61]. Adherence data will also be continuously collected through a dedicated mobile application to ensure a comprehensive evaluation of the program’s effectiveness across diverse populations.
Secondly, this study primarily examines short-term outcomes over a 16-week period, which limits our understanding of the sustained impact of the intervention. To address this, a 12-month longitudinal follow-up is planned, with comprehensive evaluations conducted every three months. These will include body composition measurement using dual-energy x-ray absorptiometry [15], cardiopulmonary fitness assessment through maximal oxygen uptake testing [36], and psychological well-being evaluation using the SF- 36 scale [27]. Long-term changes in weight, metabolic indicators such as glucose and cholesterol, and other health outcomes will be analyzed using repeated-measures ANOVA. This extended observation period is expected to provide a clearer understanding of the long-term effectiveness and sustainability of the AI-guided fitness intervention, addressing the limitations associated with short-term assessment.
Finally, GPT- 4’s current limitations in delivering real-time feedback on exercise technique and providing emotional support remain notable challenges [31]. To enhance system responsiveness, future iterations will incorporate wearable technologies and human oversight [32, 33]. However, the cost of scaling such integrated systems presents a significant barrier to broader adoption [63]. With sufficient funding, we plan to integrate devices capable of capturing physiological and motion data with high accuracy [41]. This data will be streamed via API to GPT- 4, enabling dynamic adjustments to exercise intensity and providing corrective feedback on form [41], enabling real-time feedback and adaptive exercise modifications via API connectivity with GPT- 4. In parallel, a human-in-the-loop framework will be trialed, where certified coaches review AI-generated plans and provide bi-weekly support to address psychological needs. The effectiveness of this hybrid model will be evaluated through metrics such as injury rates, exercise efficiency, user satisfaction [14], and overall cost-effectiveness compared to traditional coaching models [49].
Conclusion
In regions where access to qualified fitness professionals is scarce, GPT- 4 has emerged as a highly promising virtual fitness coach. Our study shows that GPT- 4 can generate fitness plans that are as effective, safe, and comprehensive as those created by human coaches. Notably, GPT- 4 excels in personalization, offering tailored guidance that can be accessed through basic internet connections. This capability can help address the shortage of sports resources and fitness experts in underserved regions. It enables individuals in these areas to access affordable and effective fitness plans more easily, offering a practical solution to improve their access to quality fitness guidance.
In summary, our study highlights the potential of GPT- 4 to enhance fitness guidance in resource-limited settings, where access to professional expertise may be constrained. Future research will expand the sample size to include participants with varying levels of obesity, age groups, and health conditions, aiming to evaluate the adaptability and applicability of GPT- 4 as a virtual fitness coach more comprehensively. Additionally, future AI-driven fitness guidance could integrate real-time data from wearable devices to enable adaptive plan modifications, addressing the gap between static AI recommendations and real-time user feedback. This study provides an initial exploration of AI's role in personalized health promotion, and we anticipate further advancements in this field.
Supplementary Information
Acknowledgements
We would like to thank all of the researchers who kindly provided us with the data necessary to complete this study.
Clinical trial related statement
Clinical trial number: not applicable.
This study is a case analysis and does not involve clinical trial protocols. All participant data were collected with informed consent under non-clinical research guidelines.
Authors’ contributions
GCL, HSL, YQS, YL,SJJ and GDZ designed this study. GCL and HSL jointly collected training programs. YQS and HSL analyzed data. GCL, YQS and HSL wrote the first draft of the manuscript. All authors contributed to the final manuscript.
Funding
This work was supported by the Horizontal Scientific Research Project of Southwest University in the year of 2023 (No. 2308017).
The Chongqing Municipal Education Commission supported this study; Humanities and Social Sciences Research Project (23SKGH105)
Data availability
Our data are kept in the Institute of Sports Science of Southwest University. If you need the original data, you can contact the correspondent.
Declarations
Ethics approval and consent to participate
This study received approval and oversight from the Ethics Review Committee of the College of Physical Education at Southwest University (SWU-PE- 20240311). Prior to the commencement of the survey, subjects were briefed on the overarching research theme, although specific research questions were not disclosed. Informed consent forms were signed by participants before initiating the survey. All procedures were performed in accordance with the relevant guidelines and regulations.
Consent for publication
All participants have provided written consent.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Almeida LG, Dera A, Murphy J, Santosa S. Improvements in cardiorespiratory fitness, muscle strength and body composition to modest weight loss are similar in those with adult-versus childhood-onset obesity. Clin Obes. 2024;14(1):e12623. [DOI] [PubMed] [Google Scholar]
- 2.Arslan S. Exploring the potential of Chat GPT in personalized obesity treatment. Ann Biomed Eng. 2023;51(9):1887–8. [DOI] [PubMed] [Google Scholar]
- 3.Bhattarai K, Oh IY, Sierra JM, Payne PR, Abrams ZB, Lai AM. Leveraging GPT-4 for identifying clinical phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo and spaCy’s rule-based & machine learning-based methods. 2023. [DOI] [PMC free article] [PubMed]
- 4.Blüher M. Efficacy and safety of the weight-loss drug rimonabant. Lancet. 2008;371(9612):555–6. [DOI] [PubMed] [Google Scholar]
- 5.Bompa TO, Buzzichelli C. Periodization-: theory and methodology of training: Human kinetics. 2019.
- 6.Bouchard C, Depres JP, Tremblay A. Exercise and obesity. Obes Res. 1993;1(2):133–47. [DOI] [PubMed] [Google Scholar]
- 7.Buford TW, Roberts MD, Church TS. Toward exercise as personalized medicine. J Sports Med. 2013;43:157–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Burtscher J, Millet GP, Burtscher M. Pushing the limits of strength training. Am J Prev Med. 2023;64(1):145–6. [DOI] [PubMed] [Google Scholar]
- 9.Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. New York, NY: Routledge; 2013. [Google Scholar]
- 10.Crookham J. A guide to exercise prescription. Prim Care: Clin Off Pract. 2013;40(4):801–20. [DOI] [PubMed] [Google Scholar]
- 11.Deng S. Exercise Physiology. 3rd ed. Beijing, China: Higher Education Press; 2015. [Google Scholar]
- 12.Dergaa I, Saad HB, El Omri A, Glenn J, Clark C, Washif J, et al. Using artificial intelligence for exercise prescription in personalised health promotion: a critical evaluation of OpenAI’s GPT-4 model. Biol Sport. 2024;41(2):221–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Derickson M, Phillips C, Barron M, Kuckelman J, Martin M, DeBarros M. Panniculectomy after bariatric surgical weight loss: analysis of complications and modifiable risk factors. Am J Surg. 2018;215(5):887–90. [DOI] [PubMed] [Google Scholar]
- 14.Donnelly JE, Blair SN, Jakicic JM, Manore MM, Rankin JW, Smith BK, American College of Sports Medicine Position Stand. Appropriate physical activity intervention strategies for weight loss and prevention of weight regain for adults. Med Sci Sports Exerc. 2009;41(2):459–71. [DOI] [PubMed] [Google Scholar]
- 15.Dorgan JF, Ryan AS, LeBlanc ES, Van Horn L, Magder LS, Snetselaar LG, et al. A comparison of associations of body mass index and dual-energy x-ray absorptiometry measured percentage fat and total fat with global serum metabolites in young women. Obesity (Silver Spring). 2023;31(2):525–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Douketis J, Macie C, Thabane L, Williamson D. Systematic review of long-term weight loss studies in obese adults: clinical significance and applicability to clinical practice. Int J Obes. 2005;29(10):1153–67. [DOI] [PubMed] [Google Scholar]
- 17.Galiuto L, Fedele E, Vitale E, Lucini D. Personalized exercise prescription for heart patients. Curr Sports Med Rep. 2019;18(11):380–1. [DOI] [PubMed] [Google Scholar]
- 18.Gallagher D, Heymsfield SB, Heo M, Jebb SA, Murgatroyd PR, Sakamoto Y. Healthy percentage body fat ranges: an approach for developing guidelines based on body mass index. Am J Clin Nutr. 2000;72(3):694–701. [DOI] [PubMed] [Google Scholar]
- 19.Geerling W, Mateer GD, Wooten J, Damodaran NJAAS. Is ChatGPT smarter than a student in principles of economics. 2023:4356034.
- 20.Goh E, Gallo RJ, Strong E, Weng, Y, Kerman H, Freed JA, et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. 2025:1–6. [DOI] [PMC free article] [PubMed]
- 21.Gu J, Humphrey J, Messner D. Global governance and developing countries: the implications of the rise of China. World Dev. 2008;36(2):274–92. [Google Scholar]
- 22.Guthold R, Stevens GA, Riley LM, Bull FC. Global trends in insufficient physical activity among adolescents: a pooled analysis of 298 population-based surveys with 1· 6 million participants. Lancet Child Adolesc Health. 2020;4(1):23–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hadi MU, Qureshi R, Shah A, Irfan M, Zafar A, Shaikh MB, et al. A survey on large language models: applications, challenges, limitations, and practical usage. 2023.
- 24.Iqbal RK, Masood F, Ikram S. How obesity affects our health. Natl J Health Sci. 2019;4(3):113–8. [Google Scholar]
- 25.Issurin VB. New horizons for the methodology and physiology of training periodization. Sports Med. 2010;40:189–206. [DOI] [PubMed] [Google Scholar]
- 26.James WP. Obesity: a global public health challenge. Clin Chem. 2018;64(1):24–9. [DOI] [PubMed] [Google Scholar]
- 27.Kaenmuang P, Ratruakorn D, Geater SL. The relationship between generic health-related quality of life by the 36-Item Short Form Health Survey questionnaire (SF-36) and pulmonary function test. In European Respiratory Society International Congress 2023. Lausanne, Switzerland: European Respiratory Society; 2023.
- 28.Kaleta D, Kalucka S, Szatko F, Makowiec-Dąbrowska T. Prevalence and correlates of physical inactivity during leisure-time and commuting among beneficiaries of government welfare assistance in Poland. Int J Environ Res Public health. 2017;14(10):1126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kamarudin A, Tengah R, Raysid N, Jusoh N. Relationship between body mass index, waist circumference, fat mass and fat percentage as a measurement of obesity among Universiti Pendidikan Sultan Idris students. J Fundam Appl Sci. 2017;9(6S):1161–72. [Google Scholar]
- 30.Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Adv Neural Inform Process Syst. 2022;35:22199–213. [Google Scholar]
- 31.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. [DOI] [PubMed] [Google Scholar]
- 32.Lee JC, Lin R. The continuous usage of artificial intelligence (AI)-powered mobile fitness applications: the goal-setting theory perspective. Ind Manag Data Syst. 2023;123(6):1840–60. [Google Scholar]
- 33.Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI Chatbot for medicine. Reply. New England J Med. 2023;388(25):2400–2400. [DOI] [PubMed] [Google Scholar]
- 34.Li G, Hammoud H, Itani H, Khizbullin D, Ghanem B. CAMEL: Communicative agents for cognitive exploration of large language model societies. Adv Neural Inform Processing Syst. 2023;36:51991–2008. [Google Scholar]
- 35.Lim GB. Weight loss from surgery or drug therapy reduces blood pressure. Nat Rev Cardiol. 2024;21(4):218–218. [DOI] [PubMed] [Google Scholar]
- 36.Loprinzi PD. Estimated cardiorespiratory fitness assessment as a patient vital sign. Paper presented at the Mayo Clinic Proceedings. 2018. [DOI] [PubMed]
- 37.Mauro M, Taylor V, Wharton S, Sharma AM. Barriers to obesity treatment. Eur J Internal Med. 2008;19(3):173–80. [DOI] [PubMed] [Google Scholar]
- 38.Mazumdar H, Chakraborty C, Sathvik M, Panigrahi PKJIJOB, Informatics H. GPTFX: a novel GPT-3 based framework for mental health detection and explanations. 2023. [DOI] [PubMed]
- 39.McInnis KJ. Exercise and obesity. Exerc Obes. 2000;11(2):111–6. [DOI] [PubMed] [Google Scholar]
- 40.Mehmet S, Pelin A, Gökmen K, Akan B, İdris K. Fitness Sporcularında Akut Yorgunluğun Denge Performansı üzerine etkisi var mıdır? Akdeniz Spor Bilimleri dergisi. 2023. 10.38021/asbid.1212280.
- 41.Mukhopadhyay SC. Wearable sensors for human activity monitoring: a review. IEEE Sensors J. 2014;15(3):1321–30. [Google Scholar]
- 42.DeepLearning.AI. ChatGPT Prompt Engineering for Developers. https://learn.deeplearning.ai/. Accessed 16 Apr 2025.
- 43.Ng M, Fleming T, Robinson M, Thomson B, Graetz N, Margono C, et al. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet. 2014;384(9945):766–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Petridou A, Siopi A, Mougios VJM. Exercise in the management of obesity. 2019;92:163–9. [DOI] [PubMed] [Google Scholar]
- 45.Ren P, Liu Z. Efficiency evaluation of China’s public sports services: a three-stage DEA model. Int J Environ Res Public Health. 2021;18(20):10597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sabin JA, Marini M, Nosek BA. Implicit and explicit anti-fat bias among a large sample of medical doctors by BMI, race/ethnicity and gender. PloS one. 2012;7(11):e48448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Schwartz MB, Chambliss HON, Brownell KD, Blair SN, Billington C. Weight bias among health professionals specializing in obesity. Obes Res. 2003;11(9):1033–9. [DOI] [PubMed] [Google Scholar]
- 48.Slomski A. Weight loss is still substantial a decade after bariatric surgery. JAMA. 2022;328(5):415–415. [DOI] [PubMed] [Google Scholar]
- 49.Spieker EA, Pyzocha N. Economic impact of obesity. Prim Care: Clin Off Pract. 2016;43(1):83–95. [DOI] [PubMed] [Google Scholar]
- 50.Štempeľová I, Takáč O, Hudáková H. Obesity as a 21st century pandemic. J Adv Nat Sci Eng Res. 2023;7(6):449–54. [Google Scholar]
- 51.Szabo A. ChatGPT a breakthrough in science and education: can it fail a test? 2023.
- 52.Tan KJTl. Appropriate body-mass index for Asian populations and its implications for policy and intervention strategies. 2004. [DOI] [PubMed]
- 53.Villareal DT, Chode S, Parimi N, Sinacore DR, Hilton T, Armamento-Villareal R, et al. Weight loss, exercise, or both and physical function in obese older adults. New England J Med. 2011;364(13):1218–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Vogel T, Brechat PH, Leprêtre PM, Kaltenbach G, Berthel M, Lonsdorfer J. Health benefits of physical activity in older patients: a review. Int J Clin Pract. 2009;63(2):303–20. [DOI] [PubMed] [Google Scholar]
- 55.Wang M, Wang M, Xu X, Yang L, Cai D, Yin M. Unleashing ChatGPT’s power: a case study on optimizing information retrieval in flipped classrooms via prompt engineering. IEEE Trans Learn Technol. 2023a;17:629–41. [Google Scholar]
- 56.Wang M, Wang M, Xu X, Yang L, Cai D, Yin MJITOLT. Unleashing ChatGPT's power: a case study on optimizing information retrieval in flipped classrooms via prompt engineering. 2023b.
- 57.Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S. et al. Self-consistency improves chain of thought reasoning in language models. 2022.
- 58.Warburton DE, Jamnik VK, Bredin SS, Gledhill N. The physical activity readiness questionnaire for everyone (PAR-Q+) and electronic physical activity readiness medical examination (ePARmed-X+). Health Fitness J Can. 2011;4(2):3–17. [Google Scholar]
- 59.Wardle J, Cooke L. The impact of obesity on psychological well-being. Best Pract Res Clin Endocrinol Metab. 2005;19(3):421–40. [DOI] [PubMed] [Google Scholar]
- 60.Washif J, Pagaduan J, James C, Dergaa I, Beaven C. Artificial intelligence in sport: exploring the potential of using ChatGPT in resistance training prescription. Biol Sport. 2024;41(2):209–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.WLGUBN D, WVTDG B. Relationship of serum cholesterol and triglycerides levels with the body mass index in a group of healthy undergraduates. J Health Sci Innov Res. 2022;3(1):1–7. [Google Scholar]
- 62.Xu B, Yang A, Lin J, Wang Q, Zhou C, Zhang Y, Mao ZJAPA. Expertprompting: instructing large language models to be distinguished experts. 2023.
- 63.Yanyan Z, A Iahad, N, Yusof, AFJC, Applications, FTOA. Artificial Intelligence (AI)-enabled mobile fitness apps and goal attainment: systematic literature review. 2025:167–182.
- 64.Yin RK. Case Study Research and Applications: Design and Methods. 6th ed. Thousand Oaks, CA: Sage Publications; 2017. [Google Scholar]
- 65.Zakerabasali S, Ayyoubzadeh SM, Baniasadi T, Yazdani A, Abhari S. Mobile health technology and healthcare providers: systemic barriers to adoption. Healthcare Inform Res. 2021;27(4):267–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Zhi Y. Diet control and mineral intake help obese students lose weight and prevent some diseases. International conference on biotechnology, life science. 2022.
- 67.Zhou C, Wang Q. Effect of the characteristic town policy on sewage treatment in mountainous areas: Evidence from Chongqing. Heliyon. 9(12):e22830. [DOI] [PMC free article] [PubMed]
- 68.Zhu W, Geng W, Huang L, Qin X, Chen Z, Yan H. Who could and should give exercise prescription: physicians, exercise and health scientists, fitness trainers, or ChatGPT? J Sport Health Sci. 2024;13(3):368. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Our data are kept in the Institute of Sports Science of Southwest University. If you need the original data, you can contact the correspondent.